18 Commits

Author SHA1 Message Date
Kovid Goyal
0f7b73d7c4 Accept transport_encoding in html5lib.parse 2017-03-09 21:33:33 +05:30
Kovid Goyal
ec8780bdff Micro-optimization 2016-09-27 06:46:51 +05:30
Kovid Goyal
a82ff8b749 Work around for error when parsing malformed documents containing annotation-xml tags 2016-01-09 08:21:00 +05:30
Kovid Goyal
c68b9b7d64 Update Amazon metadata download plugin to handle amazon.com website change that was preventing any metadata from being downloaded
Fixes a bug in the default html5lib lxml treebuilder that caused it to
fail on pages that have comments with -- or trailing hyphens.
2015-11-21 13:32:21 +05:30
Kovid Goyal
dbb4092b35 Fix a couple of things I forgot to merge from upstream html5lib 2015-11-04 20:24:57 +05:30
Kovid Goyal
892571a180 Merge in changes from upstream html5lib 2015-11-03 10:41:47 +05:30
Kovid Goyal
5454fca03e Remove the dependency on six.py from html5lib 2015-01-09 09:59:17 +05:30
Kovid Goyal
dd676227b8 Preserve attribute order when parsing 2013-10-28 10:57:42 +05:30
Kovid Goyal
12a581786b oops 2013-10-28 10:57:42 +05:30
Kovid Goyal
6310c2feac Handle attributes from multiple body tags 2013-10-28 10:57:42 +05:30
Kovid Goyal
3e986bccf3 Speed up unnecessarily slow and obtuse dict comparison 2013-10-28 10:57:42 +05:30
Kovid Goyal
9503652a4b Add support for line numbers to the HTML 5 parser 2013-10-28 10:57:42 +05:30
Kovid Goyal
0d1c917281 Speed up parsing some more by using a faster stream class 2013-10-28 10:57:42 +05:30
Kovid Goyal
ea7930ee83 Speed up parsing by not using element __bool__ 2013-10-28 10:57:42 +05:30
Kovid Goyal
62d042d9d4 Basic parsing with the new html5lib lxml tree builder works 2013-10-28 10:57:41 +05:30
Kovid Goyal
b9421065f9 Update HTML 5 parser used in calibre (html5lib-python) 2013-10-23 11:04:05 +05:30
Kovid Goyal
c0f549625a Replace CRLF line endings 2013-05-28 11:42:53 +05:30
Kovid Goyal
66934ec8fb Conversion engine: When parsing invalid XHTML use the HTML 5 algorithm, for greater robustness. Fixes #901466 (<h*> tag bug) 2011-12-15 13:50:23 +05:30