Blame - poky/bitbake/lib/bs4/NEWS.txt - mdmillerii/openbmc

blob: 88a60a2458f46592dcbe30da80119825b9540683 [file] [log] [blame]

Patrick Williams	c124f4f	2015-09-15 14:41:29 -0500	[diff] [blame]	1	= 4.3.2 (20131002) =
				2
				3	* Fixed a bug in which short Unicode input was improperly encoded to
				4	ASCII when checking whether or not it was the name of a file on
				5	disk. [bug=1227016]
				6
				7	* Fixed a crash when a short input contains data not valid in
				8	filenames. [bug=1232604]
				9
				10	* Fixed a bug that caused Unicode data put into UnicodeDammit to
				11	return None instead of the original data. [bug=1214983]
				12
				13	* Combined two tests to stop a spurious test failure when tests are
				14	run by nosetests. [bug=1212445]
				15
				16	= 4.3.1 (20130815) =
				17
				18	* Fixed yet another problem with the html5lib tree builder, caused by
				19	html5lib's tendency to rearrange the tree during
				20	parsing. [bug=1189267]
				21
				22	* Fixed a bug that caused the optimized version of find_all() to
				23	return nothing. [bug=1212655]
				24
				25	= 4.3.0 (20130812) =
				26
				27	* Instead of converting incoming data to Unicode and feeding it to the
				28	lxml tree builder in chunks, Beautiful Soup now makes successive
				29	guesses at the encoding of the incoming data, and tells lxml to
				30	parse the data as that encoding. Giving lxml more control over the
				31	parsing process improves performance and avoids a number of bugs and
				32	issues with the lxml parser which had previously required elaborate
				33	workarounds:
				34
				35	- An issue in which lxml refuses to parse Unicode strings on some
				36	systems. [bug=1180527]
				37
				38	- A returning bug that truncated documents longer than a (very
				39	small) size. [bug=963880]
				40
				41	- A returning bug in which extra spaces were added to a document if
				42	the document defined a charset other than UTF-8. [bug=972466]
				43
				44	This required a major overhaul of the tree builder architecture. If
				45	you wrote your own tree builder and didn't tell me, you'll need to
				46	modify your prepare_markup() method.
				47
				48	* The UnicodeDammit code that makes guesses at encodings has been
				49	split into its own class, EncodingDetector. A lot of apparently
				50	redundant code has been removed from Unicode, Dammit, and some
				51	undocumented features have also been removed.
				52
				53	* Beautiful Soup will issue a warning if instead of markup you pass it
				54	a URL or the name of a file on disk (a common beginner's mistake).
				55
				56	* A number of optimizations improve the performance of the lxml tree
				57	builder by about 33%, the html.parser tree builder by about 20%, and
				58	the html5lib tree builder by about 15%.
				59
				60	* All find_all calls should now return a ResultSet object. Patch by
				61	Aaron DeVore. [bug=1194034]
				62
				63	= 4.2.1 (20130531) =
				64
				65	* The default XML formatter will now replace ampersands even if they
				66	appear to be part of entities. That is, "<" will become
				67	"&lt;". The old code was left over from Beautiful Soup 3, which
				68	didn't always turn entities into Unicode characters.
				69
				70	If you really want the old behavior (maybe because you add new
				71	strings to the tree, those strings include entities, and you want
				72	the formatter to leave them alone on output), it can be found in
				73	EntitySubstitution.substitute_xml_containing_entities(). [bug=1182183]
				74
				75	* Gave new_string() the ability to create subclasses of
				76	NavigableString. [bug=1181986]
				77
				78	* Fixed another bug by which the html5lib tree builder could create a
				79	disconnected tree. [bug=1182089]
				80
				81	* The .previous_element of a BeautifulSoup object is now always None,
				82	not the last element to be parsed. [bug=1182089]
				83
				84	* Fixed test failures when lxml is not installed. [bug=1181589]
				85
				86	* html5lib now supports Python 3. Fixed some Python 2-specific
				87	code in the html5lib test suite. [bug=1181624]
				88
				89	* The html.parser treebuilder can now handle numeric attributes in
				90	text when the hexidecimal name of the attribute starts with a
				91	capital X. Patch by Tim Shirley. [bug=1186242]
				92
				93	= 4.2.0 (20130514) =
				94
				95	* The Tag.select() method now supports a much wider variety of CSS
				96	selectors.
				97
				98	- Added support for the adjacent sibling combinator (+) and the
				99	general sibling combinator (~). Tests by "liquider". [bug=1082144]
				100
				101	- The combinators (>, +, and ~) can now combine with any supported
				102	selector, not just one that selects based on tag name.
				103
				104	- Added limited support for the "nth-of-type" pseudo-class. Code
				105	by Sven Slootweg. [bug=1109952]
				106
				107	* The BeautifulSoup class is now aliased to "_s" and "_soup", making
				108	it quicker to type the import statement in an interactive session:
				109
				110	from bs4 import _s
				111	or
				112	from bs4 import _soup
				113
				114	The alias may change in the future, so don't use this in code you're
				115	going to run more than once.
				116
				117	* Added the 'diagnose' submodule, which includes several useful
				118	functions for reporting problems and doing tech support.
				119
				120	- diagnose(data) tries the given markup on every installed parser,
				121	reporting exceptions and displaying successes. If a parser is not
				122	installed, diagnose() mentions this fact.
				123
				124	- lxml_trace(data, html=True) runs the given markup through lxml's
				125	XML parser or HTML parser, and prints out the parser events as
				126	they happen. This helps you quickly determine whether a given
				127	problem occurs in lxml code or Beautiful Soup code.
				128
				129	- htmlparser_trace(data) is the same thing, but for Python's
				130	built-in HTMLParser class.
				131
				132	* In an HTML document, the contents of a <script> or <style> tag will
				133	no longer undergo entity substitution by default. XML documents work
				134	the same way they did before. [bug=1085953]
				135
				136	* Methods like get_text() and properties like .strings now only give
				137	you strings that are visible in the document--no comments or
				138	processing commands. [bug=1050164]
				139
				140	* The prettify() method now leaves the contents of <pre> tags
				141	alone. [bug=1095654]
				142
				143	* Fix a bug in the html5lib treebuilder which sometimes created
				144	disconnected trees. [bug=1039527]
				145
				146	* Fix a bug in the lxml treebuilder which crashed when a tag included
				147	an attribute from the predefined "xml:" namespace. [bug=1065617]
				148
				149	* Fix a bug by which keyword arguments to find_parent() were not
				150	being passed on. [bug=1126734]
				151
				152	* Stop a crash when unwisely messing with a tag that's been
				153	decomposed. [bug=1097699]
				154
				155	* Now that lxml's segfault on invalid doctype has been fixed, fixed a
				156	corresponding problem on the Beautiful Soup end that was previously
				157	invisible. [bug=984936]
				158
				159	* Fixed an exception when an overspecified CSS selector didn't match
				160	anything. Code by Stefaan Lippens. [bug=1168167]
				161
				162	= 4.1.3 (20120820) =
				163
				164	* Skipped a test under Python 2.6 and Python 3.1 to avoid a spurious
				165	test failure caused by the lousy HTMLParser in those
				166	versions. [bug=1038503]
				167
				168	* Raise a more specific error (FeatureNotFound) when a requested
				169	parser or parser feature is not installed. Raise NotImplementedError
				170	instead of ValueError when the user calls insert_before() or
				171	insert_after() on the BeautifulSoup object itself. Patch by Aaron
				172	Devore. [bug=1038301]
				173
				174	= 4.1.2 (20120817) =
				175
				176	* As per PEP-8, allow searching by CSS class using the 'class_'
				177	keyword argument. [bug=1037624]
				178
				179	* Display namespace prefixes for namespaced attribute names, instead of
				180	the fully-qualified names given by the lxml parser. [bug=1037597]
				181
				182	* Fixed a crash on encoding when an attribute name contained
				183	non-ASCII characters.
				184
				185	* When sniffing encodings, if the cchardet library is installed,
				186	Beautiful Soup uses it instead of chardet. cchardet is much
				187	faster. [bug=1020748]
				188
				189	* Use logging.warning() instead of warning.warn() to notify the user
				190	that characters were replaced with REPLACEMENT
				191	CHARACTER. [bug=1013862]
				192
				193	= 4.1.1 (20120703) =
				194
				195	* Fixed an html5lib tree builder crash which happened when html5lib
				196	moved a tag with a multivalued attribute from one part of the tree
				197	to another. [bug=1019603]
				198
				199	* Correctly display closing tags with an XML namespace declared. Patch
				200	by Andreas Kostyrka. [bug=1019635]
				201
				202	* Fixed a typo that made parsing significantly slower than it should
				203	have been, and also waited too long to close tags with XML
				204	namespaces. [bug=1020268]
				205
				206	* get_text() now returns an empty Unicode string if there is no text,
				207	rather than an empty bytestring. [bug=1020387]
				208
				209	= 4.1.0 (20120529) =
				210
				211	* Added experimental support for fixing Windows-1252 characters
				212	embedded in UTF-8 documents. (UnicodeDammit.detwingle())
				213
				214	* Fixed the handling of " with the built-in parser. [bug=993871]
				215
				216	* Comments, processing instructions, document type declarations, and
				217	markup declarations are now treated as preformatted strings, the way
				218	CData blocks are. [bug=1001025]
				219
				220	* Fixed a bug with the lxml treebuilder that prevented the user from
				221	adding attributes to a tag that didn't originally have
				222	attributes. [bug=1002378] Thanks to Oliver Beattie for the patch.
				223
				224	* Fixed some edge-case bugs having to do with inserting an element
				225	into a tag it's already inside, and replacing one of a tag's
				226	children with another. [bug=997529]
				227
				228	* Added the ability to search for attribute values specified in UTF-8. [bug=1003974]
				229
				230	This caused a major refactoring of the search code. All the tests
				231	pass, but it's possible that some searches will behave differently.
				232
				233	= 4.0.5 (20120427) =
				234
				235	* Added a new method, wrap(), which wraps an element in a tag.
				236
				237	* Renamed replace_with_children() to unwrap(), which is easier to
				238	understand and also the jQuery name of the function.
				239
				240	* Made encoding substitution in <meta> tags completely transparent (no
				241	more %SOUP-ENCODING%).
				242
				243	* Fixed a bug in decoding data that contained a byte-order mark, such
				244	as data encoded in UTF-16LE. [bug=988980]
				245
				246	* Fixed a bug that made the HTMLParser treebuilder generate XML
				247	definitions ending with two question marks instead of
				248	one. [bug=984258]
				249
				250	* Upon document generation, CData objects are no longer run through
				251	the formatter. [bug=988905]
				252
				253	* The test suite now passes when lxml is not installed, whether or not
				254	html5lib is installed. [bug=987004]
				255
				256	* Print a warning on HTMLParseErrors to let people know they should
				257	install a better parser library.
				258
				259	= 4.0.4 (20120416) =
				260
				261	* Fixed a bug that sometimes created disconnected trees.
				262
				263	* Fixed a bug with the string setter that moved a string around the
				264	tree instead of copying it. [bug=983050]
				265
				266	* Attribute values are now run through the provided output formatter.
				267	Previously they were always run through the 'minimal' formatter. In
				268	the future I may make it possible to specify different formatters
				269	for attribute values and strings, but for now, consistent behavior
				270	is better than inconsistent behavior. [bug=980237]
				271
				272	* Added the missing renderContents method from Beautiful Soup 3. Also
				273	added an encode_contents() method to go along with decode_contents().
				274
				275	* Give a more useful error when the user tries to run the Python 2
				276	version of BS under Python 3.
				277
				278	* UnicodeDammit can now convert Microsoft smart quotes to ASCII with
				279	UnicodeDammit(markup, smart_quotes_to="ascii").
				280
				281	= 4.0.3 (20120403) =
				282
				283	* Fixed a typo that caused some versions of Python 3 to convert the
				284	Beautiful Soup codebase incorrectly.
				285
				286	* Got rid of the 4.0.2 workaround for HTML documents--it was
				287	unnecessary and the workaround was triggering a (possibly different,
				288	but related) bug in lxml. [bug=972466]
				289
				290	= 4.0.2 (20120326) =
				291
				292	* Worked around a possible bug in lxml that prevents non-tiny XML
				293	documents from being parsed. [bug=963880, bug=963936]
				294
				295	* Fixed a bug where specifying `text` while also searching for a tag
				296	only worked if `text` wanted an exact string match. [bug=955942]
				297
				298	= 4.0.1 (20120314) =
				299
				300	* This is the first official release of Beautiful Soup 4. There is no
				301	4.0.0 release, to eliminate any possibility that packaging software
				302	might treat "4.0.0" as being an earlier version than "4.0.0b10".
				303
				304	* Brought BS up to date with the latest release of soupselect, adding
				305	CSS selector support for direct descendant matches and multiple CSS
				306	class matches.
				307
				308	= 4.0.0b10 (20120302) =
				309
				310	* Added support for simple CSS selectors, taken from the soupselect project.
				311
				312	* Fixed a crash when using html5lib. [bug=943246]
				313
				314	* In HTML5-style <meta charset="foo"> tags, the value of the "charset"
				315	attribute is now replaced with the appropriate encoding on
				316	output. [bug=942714]
				317
				318	* Fixed a bug that caused calling a tag to sometimes call find_all()
				319	with the wrong arguments. [bug=944426]
				320
				321	* For backwards compatibility, brought back the BeautifulStoneSoup
				322	class as a deprecated wrapper around BeautifulSoup.
				323
				324	= 4.0.0b9 (20120228) =
				325
				326	* Fixed the string representation of DOCTYPEs that have both a public
				327	ID and a system ID.
				328
				329	* Fixed the generated XML declaration.
				330
				331	* Renamed Tag.nsprefix to Tag.prefix, for consistency with
				332	NamespacedAttribute.
				333
				334	* Fixed a test failure that occured on Python 3.x when chardet was
				335	installed.
				336
				337	* Made prettify() return Unicode by default, so it will look nice on
				338	Python 3 when passed into print().
				339
				340	= 4.0.0b8 (20120224) =
				341
				342	* All tree builders now preserve namespace information in the
				343	documents they parse. If you use the html5lib parser or lxml's XML
				344	parser, you can access the namespace URL for a tag as tag.namespace.
				345
				346	However, there is no special support for namespace-oriented
				347	searching or tree manipulation. When you search the tree, you need
				348	to use namespace prefixes exactly as they're used in the original
				349	document.
				350
				351	* The string representation of a DOCTYPE always ends in a newline.
				352
				353	* Issue a warning if the user tries to use a SoupStrainer in
				354	conjunction with the html5lib tree builder, which doesn't support
				355	them.
				356
				357	= 4.0.0b7 (20120223) =
				358
				359	* Upon decoding to string, any characters that can't be represented in
				360	your chosen encoding will be converted into numeric XML entity
				361	references.
				362
				363	* Issue a warning if characters were replaced with REPLACEMENT
				364	CHARACTER during Unicode conversion.
				365
				366	* Restored compatibility with Python 2.6.
				367
				368	* The install process no longer installs docs or auxillary text files.
				369
				370	* It's now possible to deepcopy a BeautifulSoup object created with
				371	Python's built-in HTML parser.
				372
				373	* About 100 unit tests that "test" the behavior of various parsers on
				374	invalid markup have been removed. Legitimate changes to those
				375	parsers caused these tests to fail, indicating that perhaps
				376	Beautiful Soup should not test the behavior of foreign
				377	libraries.
				378
				379	The problematic unit tests have been reformulated as informational
				380	comparisons generated by the script
				381	scripts/demonstrate_parser_differences.py.
				382
				383	This makes Beautiful Soup compatible with html5lib version 0.95 and
				384	future versions of HTMLParser.
				385
				386	= 4.0.0b6 (20120216) =
				387
				388	* Multi-valued attributes like "class" always have a list of values,
				389	even if there's only one value in the list.
				390
				391	* Added a number of multi-valued attributes defined in HTML5.
				392
				393	* Stopped generating a space before the slash that closes an
				394	empty-element tag. This may come back if I add a special XHTML mode
				395	(http://www.w3.org/TR/xhtml1/#C_2), but right now it's pretty
				396	useless.
				397
				398	* Passing text along with tag-specific arguments to a find* method:
				399
				400	find("a", text="Click here")
				401
				402	will find tags that contain the given text as their
				403	.string. Previously, the tag-specific arguments were ignored and
				404	only strings were searched.
				405
				406	* Fixed a bug that caused the html5lib tree builder to build a
				407	partially disconnected tree. Generally cleaned up the html5lib tree
				408	builder.
				409
				410	* If you restrict a multi-valued attribute like "class" to a string
				411	that contains spaces, Beautiful Soup will only consider it a match
				412	if the values correspond to that specific string.
				413
				414	= 4.0.0b5 (20120209) =
				415
				416	* Rationalized Beautiful Soup's treatment of CSS class. A tag
				417	belonging to multiple CSS classes is treated as having a list of
				418	values for the 'class' attribute. Searching for a CSS class will
				419	match any of the CSS classes.
				420
				421	This actually affects all attributes that the HTML standard defines
				422	as taking multiple values (class, rel, rev, archive, accept-charset,
				423	and headers), but 'class' is by far the most common. [bug=41034]
				424
				425	* If you pass anything other than a dictionary as the second argument
				426	to one of the find* methods, it'll assume you want to use that
				427	object to search against a tag's CSS classes. Previously this only
				428	worked if you passed in a string.
				429
				430	* Fixed a bug that caused a crash when you passed a dictionary as an
				431	attribute value (possibly because you mistyped "attrs"). [bug=842419]
				432
				433	* Unicode, Dammit now detects the encoding in HTML 5-style <meta> tags
				434	like <meta charset="utf-8" />. [bug=837268]
				435
				436	* If Unicode, Dammit can't figure out a consistent encoding for a
				437	page, it will try each of its guesses again, with errors="replace"
				438	instead of errors="strict". This may mean that some data gets
				439	replaced with REPLACEMENT CHARACTER, but at least most of it will
				440	get turned into Unicode. [bug=754903]
				441
				442	* Patched over a bug in html5lib (?) that was crashing Beautiful Soup
				443	on certain kinds of markup. [bug=838800]
				444
				445	* Fixed a bug that wrecked the tree if you replaced an element with an
				446	empty string. [bug=728697]
				447
				448	* Improved Unicode, Dammit's behavior when you give it Unicode to
				449	begin with.
				450
				451	= 4.0.0b4 (20120208) =
				452
				453	* Added BeautifulSoup.new_string() to go along with BeautifulSoup.new_tag()
				454
				455	* BeautifulSoup.new_tag() will follow the rules of whatever
				456	tree-builder was used to create the original BeautifulSoup object. A
				457	new <p> tag will look like "<p />" if the soup object was created to
				458	parse XML, but it will look like "<p></p>" if the soup object was
				459	created to parse HTML.
				460
				461	* We pass in strict=False to html.parser on Python 3, greatly
				462	improving html.parser's ability to handle bad HTML.
				463
				464	* We also monkeypatch a serious bug in html.parser that made
				465	strict=False disastrous on Python 3.2.2.
				466
				467	* Replaced the "substitute_html_entities" argument with the
				468	more general "formatter" argument.
				469
				470	* Bare ampersands and angle brackets are always converted to XML
				471	entities unless the user prevents it.
				472
				473	* Added PageElement.insert_before() and PageElement.insert_after(),
				474	which let you put an element into the parse tree with respect to
				475	some other element.
				476
				477	* Raise an exception when the user tries to do something nonsensical
				478	like insert a tag into itself.
				479
				480
				481	= 4.0.0b3 (20120203) =
				482
				483	Beautiful Soup 4 is a nearly-complete rewrite that removes Beautiful
				484	Soup's custom HTML parser in favor of a system that lets you write a
				485	little glue code and plug in any HTML or XML parser you want.
				486
				487	Beautiful Soup 4.0 comes with glue code for four parsers:
				488
				489	* Python's standard HTMLParser (html.parser in Python 3)
				490	* lxml's HTML and XML parsers
				491	* html5lib's HTML parser
				492
				493	HTMLParser is the default, but I recommend you install lxml if you
				494	can.
				495
				496	For complete documentation, see the Sphinx documentation in
				497	bs4/doc/source/. What follows is a summary of the changes from
				498	Beautiful Soup 3.
				499
				500	=== The module name has changed ===
				501
				502	Previously you imported the BeautifulSoup class from a module also
				503	called BeautifulSoup. To save keystrokes and make it clear which
				504	version of the API is in use, the module is now called 'bs4':
				505
				506	>>> from bs4 import BeautifulSoup
				507
				508	=== It works with Python 3 ===
				509
				510	Beautiful Soup 3.1.0 worked with Python 3, but the parser it used was
				511	so bad that it barely worked at all. Beautiful Soup 4 works with
				512	Python 3, and since its parser is pluggable, you don't sacrifice
				513	quality.
				514
				515	Special thanks to Thomas Kluyver and Ezio Melotti for getting Python 3
				516	support to the finish line. Ezio Melotti is also to thank for greatly
				517	improving the HTML parser that comes with Python 3.2.
				518
				519	=== CDATA sections are normal text, if they're understood at all. ===
				520
				521	Currently, the lxml and html5lib HTML parsers ignore CDATA sections in
				522	markup:
				523
				524	<p><![CDATA[foo]]></p> => <p></p>
				525
				526	A future version of html5lib will turn CDATA sections into text nodes,
				527	but only within tags like <svg> and <math>:
				528
				529	<svg><![CDATA[foo]]></svg> => <p>foo</p>
				530
				531	The default XML parser (which uses lxml behind the scenes) turns CDATA
				532	sections into ordinary text elements:
				533
				534	<p><![CDATA[foo]]></p> => <p>foo</p>
				535
				536	In theory it's possible to preserve the CDATA sections when using the
				537	XML parser, but I don't see how to get it to work in practice.
				538
				539	=== Miscellaneous other stuff ===
				540
				541	If the BeautifulSoup instance has .is_xml set to True, an appropriate
				542	XML declaration will be emitted when the tree is transformed into a
				543	string:
				544
				545	<?xml version="1.0" encoding="utf-8">
				546	<markup>
				547	...
				548	</markup>
				549
				550	The ['lxml', 'xml'] tree builder sets .is_xml to True; the other tree
				551	builders set it to False. If you want to parse XHTML with an HTML
				552	parser, you can set it manually.
				553
				554
				555	= 3.2.0 =
				556
				557	The 3.1 series wasn't very useful, so I renamed the 3.0 series to 3.2
				558	to make it obvious which one you should use.
				559
				560	= 3.1.0 =
				561
				562	A hybrid version that supports 2.4 and can be automatically converted
				563	to run under Python 3.0. There are three backwards-incompatible
				564	changes you should be aware of, but no new features or deliberate
				565	behavior changes.
				566
				567	1. str() may no longer do what you want. This is because the meaning
				568	of str() inverts between Python 2 and 3; in Python 2 it gives you a
				569	byte string, in Python 3 it gives you a Unicode string.
				570
				571	The effect of this is that you can't pass an encoding to .__str__
				572	anymore. Use encode() to get a string and decode() to get Unicode, and
				573	you'll be ready (well, readier) for Python 3.
				574
				575	2. Beautiful Soup is now based on HTMLParser rather than SGMLParser,
				576	which is gone in Python 3. There's some bad HTML that SGMLParser
				577	handled but HTMLParser doesn't, usually to do with attribute values
				578	that aren't closed or have brackets inside them:
				579
				580	<a href="foo</a>, </a><a href="bar">baz</a>
				581	<a b="<a>">', '<a b="<a>"></a><a>"></a>
				582
				583	A later version of Beautiful Soup will allow you to plug in different
				584	parsers to make tradeoffs between speed and the ability to handle bad
				585	HTML.
				586
				587	3. In Python 3 (but not Python 2), HTMLParser converts entities within
				588	attributes to the corresponding Unicode characters. In Python 2 it's
				589	possible to parse this string and leave the é intact.
				590
				591	<a href="http://crummy.com?sacré&bleu">
				592
				593	In Python 3, the é is always converted to \xe9 during
				594	parsing.
				595
				596
				597	= 3.0.7a =
				598
				599	Added an import that makes BS work in Python 2.3.
				600
				601
				602	= 3.0.7 =
				603
				604	Fixed a UnicodeDecodeError when unpickling documents that contain
				605	non-ASCII characters.
				606
				607	Fixed a TypeError that occured in some circumstances when a tag
				608	contained no text.
				609
				610	Jump through hoops to avoid the use of chardet, which can be extremely
				611	slow in some circumstances. UTF-8 documents should never trigger the
				612	use of chardet.
				613
				614	Whitespace is preserved inside <pre> and <textarea> tags that contain
				615	nothing but whitespace.
				616
				617	Beautiful Soup can now parse a doctype that's scoped to an XML namespace.
				618
				619
				620	= 3.0.6 =
				621
				622	Got rid of a very old debug line that prevented chardet from working.
				623
				624	Added a Tag.decompose() method that completely disconnects a tree or a
				625	subset of a tree, breaking it up into bite-sized pieces that are
				626	easy for the garbage collecter to collect.
				627
				628	Tag.extract() now returns the tag that was extracted.
				629
				630	Tag.findNext() now does something with the keyword arguments you pass
				631	it instead of dropping them on the floor.
				632
				633	Fixed a Unicode conversion bug.
				634
				635	Fixed a bug that garbled some <meta> tags when rewriting them.
				636
				637
				638	= 3.0.5 =
				639
				640	Soup objects can now be pickled, and copied with copy.deepcopy.
				641
				642	Tag.append now works properly on existing BS objects. (It wasn't
				643	originally intended for outside use, but it can be now.) (Giles
				644	Radford)
				645
				646	Passing in a nonexistent encoding will no longer crash the parser on
				647	Python 2.4 (John Nagle).
				648
				649	Fixed an underlying bug in SGMLParser that thinks ASCII has 255
				650	characters instead of 127 (John Nagle).
				651
				652	Entities are converted more consistently to Unicode characters.
				653
				654	Entity references in attribute values are now converted to Unicode
				655	characters when appropriate. Numeric entities are always converted,
				656	because SGMLParser always converts them outside of attribute values.
				657
				658	ALL_ENTITIES happens to just be the XHTML entities, so I renamed it to
				659	XHTML_ENTITIES.
				660
				661	The regular expression for bare ampersands was too loose. In some
				662	cases ampersands were not being escaped. (Sam Ruby?)
				663
				664	Non-breaking spaces and other special Unicode space characters are no
				665	longer folded to ASCII spaces. (Robert Leftwich)
				666
				667	Information inside a TEXTAREA tag is now parsed literally, not as HTML
				668	tags. TEXTAREA now works exactly the same way as SCRIPT. (Zephyr Fang)
				669
				670	= 3.0.4 =
				671
				672	Fixed a bug that crashed Unicode conversion in some cases.
				673
				674	Fixed a bug that prevented UnicodeDammit from being used as a
				675	general-purpose data scrubber.
				676
				677	Fixed some unit test failures when running against Python 2.5.
				678
				679	When considering whether to convert smart quotes, UnicodeDammit now
				680	looks at the original encoding in a case-insensitive way.
				681
				682	= 3.0.3 (20060606) =
				683
				684	Beautiful Soup is now usable as a way to clean up invalid XML/HTML (be
				685	sure to pass in an appropriate value for convertEntities, or XML/HTML
				686	entities might stick around that aren't valid in HTML/XML). The result
				687	may not validate, but it should be good enough to not choke a
				688	real-world XML parser. Specifically, the output of a properly
				689	constructed soup object should always be valid as part of an XML
				690	document, but parts may be missing if they were missing in the
				691	original. As always, if the input is valid XML, the output will also
				692	be valid.
				693
				694	= 3.0.2 (20060602) =
				695
				696	Previously, Beautiful Soup correctly handled attribute values that
				697	contained embedded quotes (sometimes by escaping), but not other kinds
				698	of XML character. Now, it correctly handles or escapes all special XML
				699	characters in attribute values.
				700
				701	I aliased methods to the 2.x names (fetch, find, findText, etc.) for
				702	backwards compatibility purposes. Those names are deprecated and if I
				703	ever do a 4.0 I will remove them. I will, I tell you!
				704
				705	Fixed a bug where the findAll method wasn't passing along any keyword
				706	arguments.
				707
				708	When run from the command line, Beautiful Soup now acts as an HTML
				709	pretty-printer, not an XML pretty-printer.
				710
				711	= 3.0.1 (20060530) =
				712
				713	Reintroduced the "fetch by CSS class" shortcut. I thought keyword
				714	arguments would replace it, but they don't. You can't call soup('a',
				715	class='foo') because class is a Python keyword.
				716
				717	If Beautiful Soup encounters a meta tag that declares the encoding,
				718	but a SoupStrainer tells it not to parse that tag, Beautiful Soup will
				719	no longer try to rewrite the meta tag to mention the new
				720	encoding. Basically, this makes SoupStrainers work in real-world
				721	applications instead of crashing the parser.
				722
				723	= 3.0.0 "Who would not give all else for two p" (20060528) =
				724
				725	This release is not backward-compatible with previous releases. If
				726	you've got code written with a previous version of the library, go
				727	ahead and keep using it, unless one of the features mentioned here
				728	really makes your life easier. Since the library is self-contained,
				729	you can include an old copy of the library in your old applications,
				730	and use the new version for everything else.
				731
				732	The documentation has been rewritten and greatly expanded with many
				733	more examples.
				734
				735	Beautiful Soup autodetects the encoding of a document (or uses the one
				736	you specify), and converts it from its native encoding to
				737	Unicode. Internally, it only deals with Unicode strings. When you
				738	print out the document, it converts to UTF-8 (or another encoding you
				739	specify). [Doc reference]
				740
				741	It's now easy to make large-scale changes to the parse tree without
				742	screwing up the navigation members. The methods are extract,
				743	replaceWith, and insert. [Doc reference. See also Improving Memory
				744	Usage with extract]
				745
				746	Passing True in as an attribute value gives you tags that have any
				747	value for that attribute. You don't have to create a regular
				748	expression. Passing None for an attribute value gives you tags that
				749	don't have that attribute at all.
				750
				751	Tag objects now know whether or not they're self-closing. This avoids
				752	the problem where Beautiful Soup thought that tags like <BR /> were
				753	self-closing even in XML documents. You can customize the self-closing
				754	tags for a parser object by passing them in as a list of
				755	selfClosingTags: you don't have to subclass anymore.
				756
				757	There's a new built-in parser, MinimalSoup, which has most of
				758	BeautifulSoup's HTML-specific rules, but no tag nesting rules. [Doc
				759	reference]
				760
				761	You can use a SoupStrainer to tell Beautiful Soup to parse only part
				762	of a document. This saves time and memory, often making Beautiful Soup
				763	about as fast as a custom-built SGMLParser subclass. [Doc reference,
				764	SoupStrainer reference]
				765
				766	You can (usually) use keyword arguments instead of passing a
				767	dictionary of attributes to a search method. That is, you can replace
				768	soup(args={"id" : "5"}) with soup(id="5"). You can still use args if
				769	(for instance) you need to find an attribute whose name clashes with
				770	the name of an argument to findAll. [Doc reference: **kwargs attrs]
				771
				772	The method names have changed to the better method names used in
				773	Rubyful Soup. Instead of find methods and fetch methods, there are
				774	only find methods. Instead of a scheme where you can't remember which
				775	method finds one element and which one finds them all, we have find
				776	and findAll. In general, if the method name mentions All or a plural
				777	noun (eg. findNextSiblings), then it finds many elements
				778	method. Otherwise, it only finds one element. [Doc reference]
				779
				780	Some of the argument names have been renamed for clarity. For instance
				781	avoidParserProblems is now parserMassage.
				782
				783	Beautiful Soup no longer implements a feed method. You need to pass a
				784	string or a filehandle into the soup constructor, not with feed after
				785	the soup has been created. There is still a feed method, but it's the
				786	feed method implemented by SGMLParser and calling it will bypass
				787	Beautiful Soup and cause problems.
				788
				789	The NavigableText class has been renamed to NavigableString. There is
				790	no NavigableUnicodeString anymore, because every string inside a
				791	Beautiful Soup parse tree is a Unicode string.
				792
				793	findText and fetchText are gone. Just pass a text argument into find
				794	or findAll.
				795
				796	Null was more trouble than it was worth, so I got rid of it. Anything
				797	that used to return Null now returns None.
				798
				799	Special XML constructs like comments and CDATA now have their own
				800	NavigableString subclasses, instead of being treated as oddly-formed
				801	data. If you parse a document that contains CDATA and write it back
				802	out, the CDATA will still be there.
				803
				804	When you're parsing a document, you can get Beautiful Soup to convert
				805	XML or HTML entities into the corresponding Unicode characters. [Doc
				806	reference]
				807
				808	= 2.1.1 (20050918) =
				809
				810	Fixed a serious performance bug in BeautifulStoneSoup which was
				811	causing parsing to be incredibly slow.
				812
				813	Corrected several entities that were previously being incorrectly
				814	translated from Microsoft smart-quote-like characters.
				815
				816	Fixed a bug that was breaking text fetch.
				817
				818	Fixed a bug that crashed the parser when text chunks that look like
				819	HTML tag names showed up within a SCRIPT tag.
				820
				821	THEAD, TBODY, and TFOOT tags are now nestable within TABLE
				822	tags. Nested tables should parse more sensibly now.
				823
				824	BASE is now considered a self-closing tag.
				825
				826	= 2.1.0 "Game, or any other dish?" (20050504) =
				827
				828	Added a wide variety of new search methods which, given a starting
				829	point inside the tree, follow a particular navigation member (like
				830	nextSibling) over and over again, looking for Tag and NavigableText
				831	objects that match certain criteria. The new methods are findNext,
				832	fetchNext, findPrevious, fetchPrevious, findNextSibling,
				833	fetchNextSiblings, findPreviousSibling, fetchPreviousSiblings,
				834	findParent, and fetchParents. All of these use the same basic code
				835	used by first and fetch, so you can pass your weird ways of matching
				836	things into these methods.
				837
				838	The fetch method and its derivatives now accept a limit argument.
				839
				840	You can now pass keyword arguments when calling a Tag object as though
				841	it were a method.
				842
				843	Fixed a bug that caused all hand-created tags to share a single set of
				844	attributes.
				845
				846	= 2.0.3 (20050501) =
				847
				848	Fixed Python 2.2 support for iterators.
				849
				850	Fixed a bug that gave the wrong representation to tags within quote
				851	tags like <script>.
				852
				853	Took some code from Mark Pilgrim that treats CDATA declarations as
				854	data instead of ignoring them.
				855
				856	Beautiful Soup's setup.py will now do an install even if the unit
				857	tests fail. It won't build a source distribution if the unit tests
				858	fail, so I can't release a new version unless they pass.
				859
				860	= 2.0.2 (20050416) =
				861
				862	Added the unit tests in a separate module, and packaged it with
				863	distutils.
				864
				865	Fixed a bug that sometimes caused renderContents() to return a Unicode
				866	string even if there was no Unicode in the original string.
				867
				868	Added the done() method, which closes all of the parser's open
				869	tags. It gets called automatically when you pass in some text to the
				870	constructor of a parser class; otherwise you must call it yourself.
				871
				872	Reinstated some backwards compatibility with 1.x versions: referencing
				873	the string member of a NavigableText object returns the NavigableText
				874	object instead of throwing an error.
				875
				876	= 2.0.1 (20050412) =
				877
				878	Fixed a bug that caused bad results when you tried to reference a tag
				879	name shorter than 3 characters as a member of a Tag, eg. tag.table.td.
				880
				881	Made sure all Tags have the 'hidden' attribute so that an attempt to
				882	access tag.hidden doesn't spawn an attempt to find a tag named
				883	'hidden'.
				884
				885	Fixed a bug in the comparison operator.
				886
				887	= 2.0.0 "Who cares for fish?" (20050410)
				888
				889	Beautiful Soup version 1 was very useful but also pretty stupid. I
				890	originally wrote it without noticing any of the problems inherent in
				891	trying to build a parse tree out of ambiguous HTML tags. This version
				892	solves all of those problems to my satisfaction. It also adds many new
				893	clever things to make up for the removal of the stupid things.
				894
				895	== Parsing ==
				896
				897	The parser logic has been greatly improved, and the BeautifulSoup
				898	class should much more reliably yield a parse tree that looks like
				899	what the page author intended. For a particular class of odd edge
				900	cases that now causes problems, there is a new class,
				901	ICantBelieveItsBeautifulSoup.
				902
				903	By default, Beautiful Soup now performs some cleanup operations on
				904	text before parsing it. This is to avoid common problems with bad
				905	definitions and self-closing tags that crash SGMLParser. You can
				906	provide your own set of cleanup operations, or turn it off
				907	altogether. The cleanup operations include fixing self-closing tags
				908	that don't close, and replacing Microsoft smart quotes and similar
				909	characters with their HTML entity equivalents.
				910
				911	You can now get a pretty-print version of parsed HTML to get a visual
				912	picture of how Beautiful Soup parses it, with the Tag.prettify()
				913	method.
				914
				915	== Strings and Unicode ==
				916
				917	There are separate NavigableText subclasses for ASCII and Unicode
				918	strings. These classes directly subclass the corresponding base data
				919	types. This means you can treat NavigableText objects as strings
				920	instead of having to call methods on them to get the strings.
				921
				922	str() on a Tag always returns a string, and unicode() always returns
				923	Unicode. Previously it was inconsistent.
				924
				925	== Tree traversal ==
				926
				927	In a first() or fetch() call, the tag name or the desired value of an
				928	attribute can now be any of the following:
				929
				930	* A string (matches that specific tag or that specific attribute value)
				931	* A list of strings (matches any tag or attribute value in the list)
				932	* A compiled regular expression object (matches any tag or attribute
				933	value that matches the regular expression)
				934	* A callable object that takes the Tag object or attribute value as a
				935	string. It returns None/false/empty string if the given string
				936	doesn't match, and any other value if it does.
				937
				938	This is much easier to use than SQL-style wildcards (see, regular
				939	expressions are good for something). Because of this, I took out
				940	SQL-style wildcards. I'll put them back if someone complains, but
				941	their removal simplifies the code a lot.
				942
				943	You can use fetch() and first() to search for text in the parse tree,
				944	not just tags. There are new alias methods fetchText() and firstText()
				945	designed for this purpose. As with searching for tags, you can pass in
				946	a string, a regular expression object, or a method to match your text.
				947
				948	If you pass in something besides a map to the attrs argument of
				949	fetch() or first(), Beautiful Soup will assume you want to match that
				950	thing against the "class" attribute. When you're scraping
				951	well-structured HTML, this makes your code a lot cleaner.
				952
				953	1.x and 2.x both let you call a Tag object as a shorthand for
				954	fetch(). For instance, foo("bar") is a shorthand for
				955	foo.fetch("bar"). In 2.x, you can also access a specially-named member
				956	of a Tag object as a shorthand for first(). For instance, foo.barTag
				957	is a shorthand for foo.first("bar"). By chaining these shortcuts you
				958	traverse a tree in very little code: for header in
				959	soup.bodyTag.pTag.tableTag('th'):
				960
				961	If an element relationship (like parent or next) doesn't apply to a
				962	tag, it'll now show up Null instead of None. first() will also return
				963	Null if you ask it for a nonexistent tag. Null is an object that's
				964	just like None, except you can do whatever you want to it and it'll
				965	give you Null instead of throwing an error.
				966
				967	This lets you do tree traversals like soup.htmlTag.headTag.titleTag
				968	without having to worry if the intermediate stages are actually
				969	there. Previously, if there was no 'head' tag in the document, headTag
				970	in that instance would have been None, and accessing its 'titleTag'
				971	member would have thrown an AttributeError. Now, you can get what you
				972	want when it exists, and get Null when it doesn't, without having to
				973	do a lot of conditionals checking to see if every stage is None.
				974
				975	There are two new relations between page elements: previousSibling and
				976	nextSibling. They reference the previous and next element at the same
				977	level of the parse tree. For instance, if you have HTML like this:
				978
				979	<p><ul><li>Foo<br /><li>Bar</ul>
				980
				981	The first 'li' tag has a previousSibling of Null and its nextSibling
				982	is the second 'li' tag. The second 'li' tag has a nextSibling of Null
				983	and its previousSibling is the first 'li' tag. The previousSibling of
				984	the 'ul' tag is the first 'p' tag. The nextSibling of 'Foo' is the
				985	'br' tag.
				986
				987	I took out the ability to use fetch() to find tags that have a
				988	specific list of contents. See, I can't even explain it well. It was
				989	really difficult to use, I never used it, and I don't think anyone
				990	else ever used it. To the extent anyone did, they can probably use
				991	fetchText() instead. If it turns out someone needs it I'll think of
				992	another solution.
				993
				994	== Tree manipulation ==
				995
				996	You can add new attributes to a tag, and delete attributes from a
				997	tag. In 1.x you could only change a tag's existing attributes.
				998
				999	== Porting Considerations ==
				1000
				1001	There are three changes in 2.0 that break old code:
				1002
				1003	In the post-1.2 release you could pass in a function into fetch(). The
				1004	function took a string, the tag name. In 2.0, the function takes the
				1005	actual Tag object.
				1006
				1007	It's no longer to pass in SQL-style wildcards to fetch(). Use a
				1008	regular expression instead.
				1009
				1010	The different parsing algorithm means the parse tree may not be shaped
				1011	like you expect. This will only actually affect you if your code uses
				1012	one of the affected parts. I haven't run into this problem yet while
				1013	porting my code.
				1014
				1015	= Between 1.2 and 2.0 =
				1016
				1017	This is the release to get if you want Python 1.5 compatibility.
				1018
				1019	The desired value of an attribute can now be any of the following:
				1020
				1021	* A string
				1022	* A string with SQL-style wildcards
				1023	* A compiled RE object
				1024	* A callable that returns None/false/empty string if the given value
				1025	doesn't match, and any other value otherwise.
				1026
				1027	This is much easier to use than SQL-style wildcards (see, regular
				1028	expressions are good for something). Because of this, I no longer
				1029	recommend you use SQL-style wildcards. They may go away in a future
				1030	release to clean up the code.
				1031
				1032	Made Beautiful Soup handle processing instructions as text instead of
				1033	ignoring them.
				1034
				1035	Applied patch from Richie Hindle (richie at entrian dot com) that
				1036	makes tag.string a shorthand for tag.contents[0].string when the tag
				1037	has only one string-owning child.
				1038
				1039	Added still more nestable tags. The nestable tags thing won't work in
				1040	a lot of cases and needs to be rethought.
				1041
				1042	Fixed an edge case where searching for "%foo" would match any string
				1043	shorter than "foo".
				1044
				1045	= 1.2 "Who for such dainties would not stoop?" (20040708) =
				1046
				1047	Applied patch from Ben Last (ben at benlast dot com) that made
				1048	Tag.renderContents() correctly handle Unicode.
				1049
				1050	Made BeautifulStoneSoup even dumber by making it not implicitly close
				1051	a tag when another tag of the same type is encountered; only when an
				1052	actual closing tag is encountered. This change courtesy of Fuzzy (mike
				1053	at pcblokes dot com). BeautifulSoup still works as before.
				1054
				1055	= 1.1 "Swimming in a hot tureen" =
				1056
				1057	Added more 'nestable' tags. Changed popping semantics so that when a
				1058	nestable tag is encountered, tags are popped up to the previously
				1059	encountered nestable tag (of whatever kind). I will revert this if
				1060	enough people complain, but it should make more people's lives easier
				1061	than harder. This enhancement was suggested by Anthony Baxter (anthony
				1062	at interlink dot com dot au).
				1063
				1064	= 1.0 "So rich and green" (20040420) =
				1065
				1066	Initial release.