sync with Kovid's branch

2025-07-09 03:04:10 -04:00 · 2013-05-23 23:44:42 +02:00 · 2013-05-23 23:44:42 +02:00 · 5df48e1b4d
commit 5df48e1b4d
parent a3bbcf2bfa da865fa068
201 changed files with 94855 additions and 76144 deletions
--- a/Changelog.yaml
+++ b/Changelog.yaml
@ -20,6 +20,96 @@
 #   new recipes:
 #     - title: 

+
+- version: 0.9.31
+  date: 2013-05-17
+
+  new features:
+    - title: "Book list: Highlight the current cell in the book list, particularly convenient for usage with the keyboard."
+
+    - title: "Allow creation of advanced rules for column icons."
+
+    - title: "Driver for the limited edition SONY PRS-T2N"
+
+    - title: "MOBI Input: Add support for MOBI/KF8 files generated with the to be released kindlegen 2.9."
+      tickets: [1179144]
+ 
+  bug fixes:
+    - title: "ToC Editor: Fix incorrect playOrders in the generated toc.ncx when editing the toc in an epub file. This apparently affects FBReader."
+
+    - title: "PDF Input: Fix crashes on some malformed files, by updating the PDF library calibre uses (poppler 0.22.4)"
+
+    - title: "PDF Output: Ignore invalid links instead of erroring out on them."
+      tickets: [1179314]
+
+    - title: "MOBI Output: Fix space errorneously being removed when the input document contains a tag with leading space and sub-tags."
+      tickets: [1179216]
+
+    - title: "Search and replace wizard: Fix generated html being slightly different from the actual html in the conversion pipeline for some input formats (mainly HTML, CHM, LIT)."
+
+
+  improved recipes:
+    - Weblogs SL
+    - .net magazine
+
+  new recipes:
+    - title: nrc-next 
+      author: Niels Giesen
+
+- version: 0.9.30
+  date: 2013-05-10
+
+  new features:
+    - title: "Kobo driver: Add support for showing 'Archived' books on the device. Also up the supported firmware version to 2.5.3."
+      tickets: [1177677]
+
+    - title: "Driver for Blackberry 9790"
+      tickets: [1176607]
+
+    - title: "Add a tweak to turn off the highlighting of the book count when using a virtual library (Preferences->Tweaks)"
+
+    - title: "Add a button to clear the viewer search history in the viewer Preferences, under Miscellaneous"
+
+    - title: "Add keyboard shortcuts to clear the virtual Library and the additional restriction (Ctrl+Esc and Alt+Esc). Also use Shift+Esc to bring keyboard focus back tot he book list. Can be changed under Preferences->Keyboard"
+
+    - title: "Docx metadata: Read the language of the file, if present"
+ 
+  bug fixes:
+    - title: "Kobo driver: Fix unable to read SD card on OS X/Linux"
+      tickets: [1174815]
+
+    - title: "Content server: Fix unable to download ORIGINAL_* formats"
+      tickets: [1177158]
+
+    - title: "Fix regression that broke searching for terms containing a quote mark"
+      tickets: [1177114]
+
+    - title: "Fix regression that broke conversion of txt files when no input encoding is specified"
+      tickets: [1176622]
+
+    - title: "When changing to a virtual library, refresh the Book Details panel."
+      tickets: [1176296]
+
+    - title: "Fix regression that caused searching for user categories to break."
+      tickets: [1176187]
+
+    - title: "Fix error when downloading only covers and reviewing downloaded metadata."
+      tickets: [1176253]
+
+    - title: "MOBI metadata: Strip XML unsafe unicode codepoints when reading metadata from MOBI files."
+      tickets: [1175965]
+
+    - title: "Txt Input: Use the gbk encoding for txt files with detected encoding of gb2312."
+      tickets: [1175974]
+
+    - title: "When pressing Ctrl+Home/End preserve the horizontal scroll position in the book list"
+
+  improved recipes:
+    - NSFW
+    - Go Comics
+    - Various Polish news sources
+    - The Sun
+
 - version: 0.9.29
  date: 2013-05-03

--- a/manual/gui.rst
+++ b/manual/gui.rst
@ -504,6 +504,31 @@ There is a search bar at the top of the Tag Browser that allows you to easily fi

 You can control how items are sorted in the Tag browser via the box at the bottom of the Tag Browser. You can choose to sort by name, average rating or popularity (popularity is the number of books with an item in your library; for example, the popularity of Isaac Asimov is the number of books in your library by Isaac Asimov).

+Quickview
+----------
+
+Sometimes you want to to select a book and quickly get a list of books with the same value in some category (authors, tags, publisher, series, etc) as the currently selected book, but without changing the current view of the library. You can do this with Quickview. Quickview opens a second window showing the list of books matching the value of interest.
+
+For example, assume you want to see a list of all the books with the same author of the currently-selected book. Click in the author cell you are interested in and press the 'Q' key. A window will open with all the authors for that book on the left, and all the books by the selected author on the right. 
+
+Some example Quickview usages: quickly seeing what other books:
+	- have some tag that is applied to the currently selected book,
+	- are in the same series as the current book
+	- have the same values in a custom column as the current book
+	- are written by one of the same authors of the current book
+
+without changing the contents of the library view.
+
+The Quickview window opens on top of the |app| window and will stay open until you explicitly close it. You can use Quickview and the |app| library view at the same time. For example, if in the |app| library view you click on a category column (tags, series, publisher, authors, etc) for a book, the Quickview window contents will change to show you in the left-hand side pane the items in that category for the selected book (e.g., the tags for that book). The first item in that list will be selected, and Quickview will show you on the right-hand side pane all the books in your library that reference that item. Click on an different item in the left-hand pane to see the books with that different item. 
+
+Double-click on a book in the Quickview window to select that book in the library view. This will also change the items display in the QuickView window(the left-hand pane) to show the items in the newly-selected book.
+
+Shift- (or Ctrl-) double-click on a book in the Quickview window to open the edit metadata dialog on that book in the |app| window.
+
+You can see if a column can be Quickview'ed by hovering your mouse over the column heading and looking at the tooltip for that heading. You can also know by right-clicking on the column heading to see of the "Quickview" option is shown in the menu, in which case choosing that Quickview option is equivalent to pressing 'Q' in the current cell.
+
+Quickview respects the virtual library setting, showing only books in the current virtual library.
+
 Jobs
 -----
 .. image:: images/jobs.png
--- a/manual/virtual_libraries.rst
+++ b/manual/virtual_libraries.rst
@ -57,6 +57,26 @@ library. The virtual library will then be created based on the search
 you just typed in. Searches are very powerful, for examples of the kinds 
 of things you can do with them, see :ref:`search_interface`. 

+Examples of useful Virtual Libraries
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+  * Books added to |app| in the last day::
+        date:>1daysago
+  * Books added to |app| in the last month::
+        date:>30daysago
+  * Books with a rating of 5 stars::
+        rating:5
+  * Books with a rating of at least 4 stars::
+        rating:>=4
+  * Books with no rating::
+        rating:false
+  * Periodicals downloaded by the Fetch News function in |app|::
+        tags:=News and author:=calibre
+  * Books with no tags::
+        tags:false
+  * Books with no covers::
+        cover:false
+
 Working with Virtual Libraries
 -------------------------------------

--- a/recipes/comics_com.recipe
+++ b/recipes/comics_com.recipe
@ -1,224 +0,0 @@
-from calibre.web.feeds.news import BasicNewsRecipe
-
-class Comics(BasicNewsRecipe):
-    title               = 'Comics.com'
-    __author__          = 'Starson17'
-    description         = 'Comics from comics.com. You should customize this recipe to fetch only the comics you are interested in'
-    language            = 'en'
-    use_embedded_content= False
-    no_stylesheets      = True
-    oldest_article      = 24
-    remove_javascript   = True
-    cover_url           = 'http://www.bsb.lib.tx.us/images/comics.com.gif'
-    recursions          = 0
-    max_articles_per_feed = 10
-    num_comics_to_get = 7
-    simultaneous_downloads = 1
-    # delay = 3
-
-    keep_only_tags     = [dict(name='a', attrs={'class':'STR_StripImage'}),
-                          dict(name='div', attrs={'class':'STR_Date'})
-                          ]
-
-    def parse_index(self):
-        feeds = []
-        for title, url in [
-                            ("9 Chickweed Lane", "http://comics.com/9_chickweed_lane"),
-                            ("Agnes", "http://comics.com/agnes"),
-                            ("Alley Oop", "http://comics.com/alley_oop"),
-                            ("Andy Capp", "http://comics.com/andy_capp"),
-                            ("Arlo & Janis", "http://comics.com/arlo&janis"),
-                            ("B.C.", "http://comics.com/bc"),
-                            ("Ballard Street", "http://comics.com/ballard_street"),
-                            # ("Ben", "http://comics.com/ben"),
-                            # ("Betty", "http://comics.com/betty"),
-                            # ("Big Nate", "http://comics.com/big_nate"),
-                            # ("Brevity", "http://comics.com/brevity"),
-                            # ("Candorville", "http://comics.com/candorville"),
-                            # ("Cheap Thrills", "http://comics.com/cheap_thrills"),
-                            # ("Committed", "http://comics.com/committed"),
-                            # ("Cow & Boy", "http://comics.com/cow&boy"),
-                            # ("Daddy's Home", "http://comics.com/daddys_home"),
-                            # ("Dog eat Doug", "http://comics.com/dog_eat_doug"),
-                            # ("Drabble", "http://comics.com/drabble"),
-                            # ("F Minus", "http://comics.com/f_minus"),
-                            # ("Family Tree", "http://comics.com/family_tree"),
-                            # ("Farcus", "http://comics.com/farcus"),
-                            # ("Fat Cats Classics", "http://comics.com/fat_cats_classics"),
-                            # ("Ferd'nand", "http://comics.com/ferdnand"),
-                            # ("Flight Deck", "http://comics.com/flight_deck"),
-                            # ("Flo & Friends", "http://comics.com/flo&friends"),
-                            # ("Fort Knox", "http://comics.com/fort_knox"),
-                            # ("Frank & Ernest", "http://comics.com/frank&ernest"),
-                            # ("Frazz", "http://comics.com/frazz"),
-                            # ("Free Range", "http://comics.com/free_range"),
-                            # ("Geech Classics", "http://comics.com/geech_classics"),
-                            # ("Get Fuzzy", "http://comics.com/get_fuzzy"),
-                            # ("Girls & Sports", "http://comics.com/girls&sports"),
-                            # ("Graffiti", "http://comics.com/graffiti"),
-                            # ("Grand Avenue", "http://comics.com/grand_avenue"),
-                            # ("Heathcliff", "http://comics.com/heathcliff"),
-                            # "Heathcliff, a street-smart and mischievous cat with many adventures."
-                            # ("Herb and Jamaal", "http://comics.com/herb_and_jamaal"),
-                            # ("Herman", "http://comics.com/herman"),
-                            # ("Home and Away", "http://comics.com/home_and_away"),
-                            # ("It's All About You", "http://comics.com/its_all_about_you"),
-                            # ("Jane's World", "http://comics.com/janes_world"),
-                            # ("Jump Start", "http://comics.com/jump_start"),
-                            # ("Kit 'N' Carlyle", "http://comics.com/kit_n_carlyle"),
-                            # ("Li'l Abner Classics", "http://comics.com/lil_abner_classics"),
-                            # ("Liberty Meadows", "http://comics.com/liberty_meadows"),
-                            # ("Little Dog Lost", "http://comics.com/little_dog_lost"),
-                            # ("Lola", "http://comics.com/lola"),
-                            # ("Luann", "http://comics.com/luann"),
-                            # ("Marmaduke", "http://comics.com/marmaduke"),
-                            # ("Meg! Classics", "http://comics.com/meg_classics"),
-                            # ("Minimum Security", "http://comics.com/minimum_security"),
-                            # ("Moderately Confused", "http://comics.com/moderately_confused"),
-                            # ("Momma", "http://comics.com/momma"),
-                            # ("Monty", "http://comics.com/monty"),
-                            # ("Motley Classics", "http://comics.com/motley_classics"),
-                            # ("Nancy", "http://comics.com/nancy"),
-                            # ("Natural Selection", "http://comics.com/natural_selection"),
-                            # ("Nest Heads", "http://comics.com/nest_heads"),
-                            # ("Off The Mark", "http://comics.com/off_the_mark"),
-                            # ("On a Claire Day", "http://comics.com/on_a_claire_day"),
-                            # ("One Big Happy Classics", "http://comics.com/one_big_happy_classics"),
-                            # ("Over the Hedge", "http://comics.com/over_the_hedge"),
-                            # ("PC and Pixel", "http://comics.com/pc_and_pixel"),
-                            # ("Peanuts", "http://comics.com/peanuts"),
-                            # ("Pearls Before Swine", "http://comics.com/pearls_before_swine"),
-                            # ("Pickles", "http://comics.com/pickles"),
-                            # ("Prickly City", "http://comics.com/prickly_city"),
-                            # ("Raising Duncan Classics", "http://comics.com/raising_duncan_classics"),
-                            # ("Reality Check", "http://comics.com/reality_check"),
-                            # ("Red & Rover", "http://comics.com/red&rover"),
-                            # ("Rip Haywire", "http://comics.com/rip_haywire"),
-                            # ("Ripley's Believe It or Not!", "http://comics.com/ripleys_believe_it_or_not"),
-                            # ("Rose Is Rose", "http://comics.com/rose_is_rose"),
-                            # ("Rubes", "http://comics.com/rubes"),
-                            # ("Rudy Park", "http://comics.com/rudy_park"),
-                            # ("Scary Gary", "http://comics.com/scary_gary"),
-                            # ("Shirley and Son Classics", "http://comics.com/shirley_and_son_classics"),
-                            # ("Soup To Nutz", "http://comics.com/soup_to_nutz"),
-                            # ("Speed Bump", "http://comics.com/speed_bump"),
-                            # ("Spot The Frog", "http://comics.com/spot_the_frog"),
-                            # ("State of the Union", "http://comics.com/state_of_the_union"),
-                            # ("Strange Brew", "http://comics.com/strange_brew"),
-                            # ("Tarzan Classics", "http://comics.com/tarzan_classics"),
-                            # ("That's Life", "http://comics.com/thats_life"),
-                            # ("The Barn", "http://comics.com/the_barn"),
-                            # ("The Born Loser", "http://comics.com/the_born_loser"),
-                            # ("The Buckets", "http://comics.com/the_buckets"),
-                            # ("The Dinette Set", "http://comics.com/the_dinette_set"),
-                            # ("The Grizzwells", "http://comics.com/the_grizzwells"),
-                            # ("The Humble Stumble", "http://comics.com/the_humble_stumble"),
-                            # ("The Knight Life", "http://comics.com/the_knight_life"),
-                            # ("The Meaning of Lila", "http://comics.com/the_meaning_of_lila"),
-                            # ("The Other Coast", "http://comics.com/the_other_coast"),
-                            # ("The Sunshine Club", "http://comics.com/the_sunshine_club"),
-                            # ("Unstrange Phenomena", "http://comics.com/unstrange_phenomena"),
-                            # ("Watch Your Head", "http://comics.com/watch_your_head"),
-                            # ("Wizard of Id", "http://comics.com/wizard_of_id"),
-                            # ("Working Daze", "http://comics.com/working_daze"),
-                            # ("Working It Out", "http://comics.com/working_it_out"),
-                            # ("Zack Hill", "http://comics.com/zack_hill"),
-                            # ("(Th)ink", "http://comics.com/think"),
-                            # "Tackling the political and social issues impacting communities of color."
-                            # ("Adam Zyglis", "http://comics.com/adam_zyglis"),
-                            # "Known for his excellent caricatures, as well as independent and incisive imagery. "
-                            # ("Andy Singer", "http://comics.com/andy_singer"),
-                            # ("Bill Day", "http://comics.com/bill_day"),
-                            # "Powerful images on sensitive issues."
-                            # ("Bill Schorr", "http://comics.com/bill_schorr"),
-                            # ("Bob Englehart", "http://comics.com/bob_englehart"),
-                            # ("Brian Fairrington", "http://comics.com/brian_fairrington"),
-                            # ("Bruce Beattie", "http://comics.com/bruce_beattie"),
-                            # ("Cam Cardow", "http://comics.com/cam_cardow"),
-                            # ("Chip Bok", "http://comics.com/chip_bok"),
-                            # ("Chris Britt", "http://comics.com/chris_britt"),
-                            # ("Chuck Asay", "http://comics.com/chuck_asay"),
-                            # ("Clay Bennett", "http://comics.com/clay_bennett"),
-                            # ("Daryl Cagle", "http://comics.com/daryl_cagle"),
-                            # ("David Fitzsimmons", "http://comics.com/david_fitzsimmons"),
-                            # "David Fitzsimmons is a new editorial cartoons on comics.com.  He is also a staff writer and editorial cartoonist for the Arizona Daily Star. "
-                            # ("Drew Litton", "http://comics.com/drew_litton"),
-                            # "Drew Litton is an artist who is probably best known for his sports cartoons. He received the National Cartoonist Society Sports Cartoon Award for 1993. "
-                            # ("Ed Stein", "http://comics.com/ed_stein"),
-                            # "Winner of the Fischetti Award in 2006 and the Scripps Howard National Journalism Award, 1999, Ed Stein has been the editorial cartoonist for the Rocky Mountain News since 1978. "
-                            # ("Eric Allie", "http://comics.com/eric_allie"),
-                            # "Eric Allie is an editorial cartoonist with the Pioneer Press and CNS News. "
-                            # ("Gary Markstein", "http://comics.com/gary_markstein"),
-                            # ("Gary McCoy", "http://comics.com/gary_mccoy"),
-                            # "Gary McCoy is known for his editorial cartoons, humor and inane ramblings. He is a 2 time nominee for  Best  Magazine Cartoonist of the Year by the National Cartoonists Society. He resides in Belleville, IL. "
-                            # ("Gary Varvel", "http://comics.com/gary_varvel"),
-                            # ("Henry Payne", "http://comics.com/henry_payne"),
-                            # ("JD Crowe", "http://comics.com/jd_crowe"),
-                            # ("Jeff Parker", "http://comics.com/jeff_parker"),
-                            # ("Jeff Stahler", "http://comics.com/jeff_stahler"),
-                            # ("Jerry Holbert", "http://comics.com/jerry_holbert"),
-                            # ("John Cole", "http://comics.com/john_cole"),
-                            # ("John Darkow", "http://comics.com/john_darkow"),
-                            # "John Darkow is a contributing editorial cartoonist for the Humor Times as well as editoiral cartoonist for  the Columbia Daily Tribune, Missouri"
-                            # ("John Sherffius", "http://comics.com/john_sherffius"),
-                            # ("Larry Wright", "http://comics.com/larry_wright"),
-                            # ("Lisa Benson", "http://comics.com/lisa_benson"),
-                            # ("Marshall Ramsey", "http://comics.com/marshall_ramsey"),
-                            # ("Matt Bors", "http://comics.com/matt_bors"),
-                            # ("Michael Ramirez", "http://comics.com/michael_ramirez"),
-                            # ("Mike Keefe", "http://comics.com/mike_keefe"),
-                            # ("Mike Luckovich", "http://comics.com/mike_luckovich"),
-                            # ("MIke Thompson", "http://comics.com/mike_thompson"),
-                            # ("Monte Wolverton", "http://comics.com/monte_wolverton"),
-                            # "Unique mix of perspectives"
-                            # ("Mr. Fish", "http://comics.com/mr_fish"),
-                            # "Side effects may include swelling"
-                            # ("Nate Beeler", "http://comics.com/nate_beeler"),
-                            # "Middle America meets the Beltway."
-                            # ("Nick Anderson", "http://comics.com/nick_anderson"),
-                            # ("Pat Bagley", "http://comics.com/pat_bagley"),
-                            # "Unfair and Totally Unbalanced."
-                            # ("Paul Szep", "http://comics.com/paul_szep"),
-                            # ("RJ Matson", "http://comics.com/rj_matson"),
-                            # "Power cartoons from NYC and Capitol Hill"
-                            # ("Rob Rogers", "http://comics.com/rob_rogers"),
-                            # "Humorous slant on current events"
-                            # ("Robert Ariail", "http://comics.com/robert_ariail"),
-                            # "Clever and unpredictable"
-                            # ("Scott Stantis", "http://comics.com/scott_stantis"),
-                            # ("Signe Wilkinson", "http://comics.com/signe_wilkinson"),
-                            # ("Steve Benson", "http://comics.com/steve_benson"),
-                            # ("Steve Breen", "http://comics.com/steve_breen"),
-                            # ("Steve Kelley", "http://comics.com/steve_kelley"),
-                            # ("Steve Sack", "http://comics.com/steve_sack"),
-                            ]:
-            articles = self.make_links(url)
-            if articles:
-                feeds.append((title, articles))
-        return feeds
-
-    def make_links(self, url):
-        soup = self.index_to_soup(url)
-        # print 'soup: ', soup
-        title = ''
-        current_articles = []
-        pages = range(1, self.num_comics_to_get+1)
-        for page in pages:
-            page_url = url + '/?Page=' + str(page)
-            soup = self.index_to_soup(page_url)
-            if soup:
-                strip_tag = soup.find('a', attrs={'class': 'STR_StripImage'})
-                if strip_tag:
-                  print 'strip_tag: ', strip_tag
-                  title = strip_tag['title']
-                  print 'title: ', title
-            current_articles.append({'title': title, 'url': page_url, 'description':'', 'date':''})
-        current_articles.reverse()
-        return current_articles
-
-    extra_css = '''
-                    h1{font-family:Arial,Helvetica,sans-serif; font-weight:bold;font-size:large;}
-                    h2{font-family:Arial,Helvetica,sans-serif; font-weight:normal;font-size:small;}
-                    p{font-family:Arial,Helvetica,sans-serif;font-size:small;}
-                    body{font-family:Helvetica,Arial,sans-serif;font-size:small;}
-		'''
--- a/recipes/di.recipe
+++ b/recipes/di.recipe
@ -1,4 +1,5 @@
 #!/usr/bin/env  python
+# vim:fileencoding=UTF-8

 __license__     = 'GPL v3'
 __author__ = 'Mori'
--- a/recipes/dot_net.recipe
+++ b/recipes/dot_net.recipe
@ -1,32 +1,37 @@
-# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai
 from calibre.web.feeds.news import BasicNewsRecipe
 import re

-class NetMagazineRecipe (BasicNewsRecipe):
-   __author__ = u'Marc Busqué <marc@lamarciana.com>'
-   __url__ = 'http://www.lamarciana.com'
+class dotnetMagazine (BasicNewsRecipe):
+    __author__ = u'Bonni Salles'
    __version__ = '1.0'
    __license__   = 'GPL v3'
-   __copyright__ = u'2012, Marc Busqué <marc@lamarciana.com>'
-   title = u'.net magazine'
-   description = u'net is the world’s best-selling magazine for web designers and developers, featuring tutorials from leading agencies, interviews with the web’s biggest names, and agenda-setting features on the hottest issues affecting the internet today.'
-   language = 'en'
-   tags = 'web development, software'
+    __copyright__ = u'2013, Bonni Salles'
+    title                 = '.net magazine'
    oldest_article        = 7
-   remove_empty_feeds = True
    no_stylesheets        = True
+    encoding              = 'utf8'
+    use_embedded_content  = False
+    language              = 'en'
+    remove_empty_feeds    = True
+    extra_css             = ' body{font-family: Arial,Helvetica,sans-serif } img{margin-bottom: 0.4em} '
    cover_url = u'http://media.netmagazine.futurecdn.net/sites/all/themes/netmag/logo.png'
-   keep_only_tags = [
-         dict(name='article', attrs={'class': re.compile('^node.*$', re.IGNORECASE)})
-         ]
+
+    remove_tags_after = dict(name='footer', id=lambda x:not x)
+    remove_tags_before = dict(name='header', id=lambda x:not x)
+
    remove_tags = [
-         dict(name='span', attrs={'class': 'comment-count'}),
-         dict(name='div', attrs={'class': 'item-list share-links'}),
-         dict(name='footer'),
+         dict(name='div', attrs={'class': 'item-list'}),
+         dict(name='h4', attrs={'class': 'std-hdr'}),
+         dict(name='div', attrs={'class': 'item-list share-links'}), #removes share links
+         dict(name=['script', 'noscript']),
+         dict(name='div', attrs={'id': 'comments-form'}), #comment these out if you want the comments to show
+         dict(name='div', attrs={'id': re.compile('advertorial_block_($|| )')}),
+         dict(name='div', attrs={'id': 'right-col'}),
+         dict(name='div', attrs={'id': 'comments'}), #comment these out if you want the comments to show
+         dict(name='div', attrs={'class': 'item-list related-content'}),
+
         ]
-   remove_attributes = ['border', 'cellspacing', 'align', 'cellpadding', 'colspan', 'valign', 'vspace', 'hspace', 'alt', 'width', 'height', 'style']
-   extra_css = 'img {max-width: 100%; display: block; margin: auto;} .captioned-image div {text-align: center; font-style: italic;}'

    feeds = [
-         (u'.net', u'http://feeds.feedburner.com/net/topstories'),
+               (u'net', u'http://feeds.feedburner.com/net/topstories')
            ]
--- a/recipes/go_comics.recipe
+++ b/recipes/go_comics.recipe
@ -1,229 +1,443 @@
+__license__   = 'GPL v3'
+__copyright__ = 'Copyright 2010 Starson17'
+'''
+www.gocomics.com
+'''
 from calibre.web.feeds.news import BasicNewsRecipe
+import re

-
-class Comics(BasicNewsRecipe):
-    title               = 'Comics.com'
+class GoComics(BasicNewsRecipe):
+    title               = 'Go Comics'
    __author__          = 'Starson17'
-    description         = 'Comics from comics.com. You should customize this recipe to fetch only the comics you are interested in'
+    __version__         = '1.06'
+    __date__            = '07 June 2011'
+    description         = u'200+ Comics - Customize for more days/comics: Defaults to 7 days, 25 comics - 20 general, 5 editorial.'
+    category            = 'news, comics'
    language            = 'en'
    use_embedded_content= False
    no_stylesheets      = True
-    oldest_article      = 24
    remove_javascript   = True
-    cover_url           = 'http://www.bsb.lib.tx.us/images/comics.com.gif'
-    recursions          = 0
-    max_articles_per_feed = 10
-    num_comics_to_get = 7
-    simultaneous_downloads = 1
-    # delay = 3
+    remove_attributes = ['style']

-    keep_only_tags     = [dict(name='h1'),
-                          dict(name='p', attrs={'class':'feature_item'})
+    ####### USER PREFERENCES - COMICS, IMAGE SIZE AND NUMBER OF COMICS TO RETRIEVE ########
+    # num_comics_to_get - I've tried up to 99 on Calvin&Hobbes
+    num_comics_to_get = 1
+    # comic_size 300 is small, 600 is medium, 900 is large, 1500 is extra-large
+    comic_size = 900
+    # CHOOSE COMIC STRIPS BELOW - REMOVE COMMENT '# ' FROM IN FRONT OF DESIRED STRIPS
+    # Please do not overload their servers by selecting all comics and 1000 strips from each!
+
+    conversion_options = {'linearize_tables'  : True
+                        , 'comment'           : description
+                        , 'tags'              : category
+                        , 'language'          : language
+                        }
+
+    keep_only_tags     = [dict(name='div', attrs={'class':['feature','banner']}),
                          ]

+    remove_tags = [dict(name='a', attrs={'class':['beginning','prev','cal','next','newest']}),
+                   dict(name='div', attrs={'class':['tag-wrapper']}),
+                   dict(name='a', attrs={'href':re.compile(r'.*mutable_[0-9]+', re.IGNORECASE)}),
+                   dict(name='img', attrs={'src':re.compile(r'.*mutable_[0-9]+', re.IGNORECASE)}),
+                   dict(name='ul', attrs={'class':['share-nav','feature-nav']}),
+                   ]
+
+    def get_browser(self):
+        br = BasicNewsRecipe.get_browser(self)
+        br.addheaders = [('Referer','http://www.gocomics.com/')]
+        return br
+
    def parse_index(self):
        feeds = []
        for title, url in [
-                            ("9 Chickweed Lane", "http://gocomics.com/9_chickweed_lane"),
-                            ("Agnes", "http://gocomics.com/agnes"),
-                            ("Alley Oop", "http://gocomics.com/alley_oop"),
-                            ("Andy Capp", "http://gocomics.com/andy_capp"),
-                            ("Arlo & Janis", "http://gocomics.com/arlo&janis"),
-                            ("B.C.", "http://gocomics.com/bc"),
-                            ("Ballard Street", "http://gocomics.com/ballard_street"),
-                            # ("Ben", "http://comics.com/ben"),
-                            # ("Betty", "http://comics.com/betty"),
-                            # ("Big Nate", "http://comics.com/big_nate"),
-                            # ("Brevity", "http://comics.com/brevity"),
-                            # ("Candorville", "http://comics.com/candorville"),
-                            # ("Cheap Thrills", "http://comics.com/cheap_thrills"),
-                            # ("Committed", "http://comics.com/committed"),
-                            # ("Cow & Boy", "http://comics.com/cow&boy"),
-                            # ("Daddy's Home", "http://comics.com/daddys_home"),
-                            # ("Dog eat Doug", "http://comics.com/dog_eat_doug"),
-                            # ("Drabble", "http://comics.com/drabble"),
-                            # ("F Minus", "http://comics.com/f_minus"),
-                            # ("Family Tree", "http://comics.com/family_tree"),
-                            # ("Farcus", "http://comics.com/farcus"),
-                            # ("Fat Cats Classics", "http://comics.com/fat_cats_classics"),
-                            # ("Ferd'nand", "http://comics.com/ferdnand"),
-                            # ("Flight Deck", "http://comics.com/flight_deck"),
-                            # ("Flo & Friends", "http://comics.com/flo&friends"),
-                            # ("Fort Knox", "http://comics.com/fort_knox"),
-                            # ("Frank & Ernest", "http://comics.com/frank&ernest"),
-                            # ("Frazz", "http://comics.com/frazz"),
-                            # ("Free Range", "http://comics.com/free_range"),
-                            # ("Geech Classics", "http://comics.com/geech_classics"),
-                            # ("Get Fuzzy", "http://comics.com/get_fuzzy"),
-                            # ("Girls & Sports", "http://comics.com/girls&sports"),
-                            # ("Graffiti", "http://comics.com/graffiti"),
-                            # ("Grand Avenue", "http://comics.com/grand_avenue"),
-                            # ("Heathcliff", "http://comics.com/heathcliff"),
-                            # "Heathcliff, a street-smart and mischievous cat with many adventures."
-                            # ("Herb and Jamaal", "http://comics.com/herb_and_jamaal"),
-                            # ("Herman", "http://comics.com/herman"),
-                            # ("Home and Away", "http://comics.com/home_and_away"),
-                            # ("It's All About You", "http://comics.com/its_all_about_you"),
-                            # ("Jane's World", "http://comics.com/janes_world"),
-                            # ("Jump Start", "http://comics.com/jump_start"),
-                            # ("Kit 'N' Carlyle", "http://comics.com/kit_n_carlyle"),
-                            # ("Li'l Abner Classics", "http://comics.com/lil_abner_classics"),
-                            # ("Liberty Meadows", "http://comics.com/liberty_meadows"),
-                            # ("Little Dog Lost", "http://comics.com/little_dog_lost"),
-                            # ("Lola", "http://comics.com/lola"),
-                            # ("Luann", "http://comics.com/luann"),
-                            # ("Marmaduke", "http://comics.com/marmaduke"),
-                            # ("Meg! Classics", "http://comics.com/meg_classics"),
-                            # ("Minimum Security", "http://comics.com/minimum_security"),
-                            # ("Moderately Confused", "http://comics.com/moderately_confused"),
-                            # ("Momma", "http://comics.com/momma"),
-                            # ("Monty", "http://comics.com/monty"),
-                            # ("Motley Classics", "http://comics.com/motley_classics"),
-                            # ("Nancy", "http://comics.com/nancy"),
-                            # ("Natural Selection", "http://comics.com/natural_selection"),
-                            # ("Nest Heads", "http://comics.com/nest_heads"),
-                            # ("Off The Mark", "http://comics.com/off_the_mark"),
-                            # ("On a Claire Day", "http://comics.com/on_a_claire_day"),
-                            # ("One Big Happy Classics", "http://comics.com/one_big_happy_classics"),
-                            # ("Over the Hedge", "http://comics.com/over_the_hedge"),
-                            # ("PC and Pixel", "http://comics.com/pc_and_pixel"),
-                            # ("Peanuts", "http://comics.com/peanuts"),
-                            # ("Pearls Before Swine", "http://comics.com/pearls_before_swine"),
-                            # ("Pickles", "http://comics.com/pickles"),
-                            # ("Prickly City", "http://comics.com/prickly_city"),
-                            # ("Raising Duncan Classics", "http://comics.com/raising_duncan_classics"),
-                            # ("Reality Check", "http://comics.com/reality_check"),
-                            # ("Red & Rover", "http://comics.com/red&rover"),
-                            # ("Rip Haywire", "http://comics.com/rip_haywire"),
-                            # ("Ripley's Believe It or Not!", "http://comics.com/ripleys_believe_it_or_not"),
-                            # ("Rose Is Rose", "http://comics.com/rose_is_rose"),
-                            # ("Rubes", "http://comics.com/rubes"),
-                            # ("Rudy Park", "http://comics.com/rudy_park"),
-                            # ("Scary Gary", "http://comics.com/scary_gary"),
-                            # ("Shirley and Son Classics", "http://comics.com/shirley_and_son_classics"),
-                            # ("Soup To Nutz", "http://comics.com/soup_to_nutz"),
-                            # ("Speed Bump", "http://comics.com/speed_bump"),
-                            # ("Spot The Frog", "http://comics.com/spot_the_frog"),
-                            # ("State of the Union", "http://comics.com/state_of_the_union"),
-                            # ("Strange Brew", "http://comics.com/strange_brew"),
-                            # ("Tarzan Classics", "http://comics.com/tarzan_classics"),
-                            # ("That's Life", "http://comics.com/thats_life"),
-                            # ("The Barn", "http://comics.com/the_barn"),
-                            # ("The Born Loser", "http://comics.com/the_born_loser"),
-                            # ("The Buckets", "http://comics.com/the_buckets"),
-                            # ("The Dinette Set", "http://comics.com/the_dinette_set"),
-                            # ("The Grizzwells", "http://comics.com/the_grizzwells"),
-                            # ("The Humble Stumble", "http://comics.com/the_humble_stumble"),
-                            # ("The Knight Life", "http://comics.com/the_knight_life"),
-                            # ("The Meaning of Lila", "http://comics.com/the_meaning_of_lila"),
-                            # ("The Other Coast", "http://comics.com/the_other_coast"),
-                            # ("The Sunshine Club", "http://comics.com/the_sunshine_club"),
-                            # ("Unstrange Phenomena", "http://comics.com/unstrange_phenomena"),
-                            # ("Watch Your Head", "http://comics.com/watch_your_head"),
-                            # ("Wizard of Id", "http://comics.com/wizard_of_id"),
-                            # ("Working Daze", "http://comics.com/working_daze"),
-                            # ("Working It Out", "http://comics.com/working_it_out"),
-                            # ("Zack Hill", "http://comics.com/zack_hill"),
-                            # ("(Th)ink", "http://comics.com/think"),
-                            # "Tackling the political and social issues impacting communities of color."
-                            # ("Adam Zyglis", "http://comics.com/adam_zyglis"),
-                            # "Known for his excellent caricatures, as well as independent and incisive imagery. "
-                            # ("Andy Singer", "http://comics.com/andy_singer"),
-                            # ("Bill Day", "http://comics.com/bill_day"),
-                            # "Powerful images on sensitive issues."
-                            # ("Bill Schorr", "http://comics.com/bill_schorr"),
-                            # ("Bob Englehart", "http://comics.com/bob_englehart"),
-                            # ("Brian Fairrington", "http://comics.com/brian_fairrington"),
-                            # ("Bruce Beattie", "http://comics.com/bruce_beattie"),
-                            # ("Cam Cardow", "http://comics.com/cam_cardow"),
-                            # ("Chip Bok", "http://comics.com/chip_bok"),
-                            # ("Chris Britt", "http://comics.com/chris_britt"),
-                            # ("Chuck Asay", "http://comics.com/chuck_asay"),
-                            # ("Clay Bennett", "http://comics.com/clay_bennett"),
-                            # ("Daryl Cagle", "http://comics.com/daryl_cagle"),
-                            # ("David Fitzsimmons", "http://comics.com/david_fitzsimmons"),
-                            # "David Fitzsimmons is a new editorial cartoons on comics.com.  He is also a staff writer and editorial cartoonist for the Arizona Daily Star. "
-                            # ("Drew Litton", "http://comics.com/drew_litton"),
-                            # "Drew Litton is an artist who is probably best known for his sports cartoons. He received the National Cartoonist Society Sports Cartoon Award for 1993. "
-                            # ("Ed Stein", "http://comics.com/ed_stein"),
-                            # "Winner of the Fischetti Award in 2006 and the Scripps Howard National Journalism Award, 1999, Ed Stein has been the editorial cartoonist for the Rocky Mountain News since 1978. "
-                            # ("Eric Allie", "http://comics.com/eric_allie"),
-                            # "Eric Allie is an editorial cartoonist with the Pioneer Press and CNS News. "
-                            # ("Gary Markstein", "http://comics.com/gary_markstein"),
-                            # ("Gary McCoy", "http://comics.com/gary_mccoy"),
-                            # "Gary McCoy is known for his editorial cartoons, humor and inane ramblings. He is a 2 time nominee for  Best  Magazine Cartoonist of the Year by the National Cartoonists Society. He resides in Belleville, IL. "
-                            # ("Gary Varvel", "http://comics.com/gary_varvel"),
-                            # ("Henry Payne", "http://comics.com/henry_payne"),
-                            # ("JD Crowe", "http://comics.com/jd_crowe"),
-                            # ("Jeff Parker", "http://comics.com/jeff_parker"),
-                            # ("Jeff Stahler", "http://comics.com/jeff_stahler"),
-                            # ("Jerry Holbert", "http://comics.com/jerry_holbert"),
-                            # ("John Cole", "http://comics.com/john_cole"),
-                            # ("John Darkow", "http://comics.com/john_darkow"),
-                            # "John Darkow is a contributing editorial cartoonist for the Humor Times as well as editoiral cartoonist for  the Columbia Daily Tribune, Missouri"
-                            # ("John Sherffius", "http://comics.com/john_sherffius"),
-                            # ("Larry Wright", "http://comics.com/larry_wright"),
-                            # ("Lisa Benson", "http://comics.com/lisa_benson"),
-                            # ("Marshall Ramsey", "http://comics.com/marshall_ramsey"),
-                            # ("Matt Bors", "http://comics.com/matt_bors"),
-                            # ("Michael Ramirez", "http://comics.com/michael_ramirez"),
-                            # ("Mike Keefe", "http://comics.com/mike_keefe"),
-                            # ("Mike Luckovich", "http://comics.com/mike_luckovich"),
-                            # ("MIke Thompson", "http://comics.com/mike_thompson"),
-                            # ("Monte Wolverton", "http://comics.com/monte_wolverton"),
-                            # "Unique mix of perspectives"
-                            # ("Mr. Fish", "http://comics.com/mr_fish"),
-                            # "Side effects may include swelling"
-                            # ("Nate Beeler", "http://comics.com/nate_beeler"),
-                            # "Middle America meets the Beltway."
-                            # ("Nick Anderson", "http://comics.com/nick_anderson"),
-                            # ("Pat Bagley", "http://comics.com/pat_bagley"),
-                            # "Unfair and Totally Unbalanced."
-                            # ("Paul Szep", "http://comics.com/paul_szep"),
-                            # ("RJ Matson", "http://comics.com/rj_matson"),
-                            # "Power cartoons from NYC and Capitol Hill"
-                            # ("Rob Rogers", "http://comics.com/rob_rogers"),
-                            # "Humorous slant on current events"
-                            # ("Robert Ariail", "http://comics.com/robert_ariail"),
-                            # "Clever and unpredictable"
-                            # ("Scott Stantis", "http://comics.com/scott_stantis"),
-                            # ("Signe Wilkinson", "http://comics.com/signe_wilkinson"),
-                            # ("Steve Benson", "http://comics.com/steve_benson"),
-                            # ("Steve Breen", "http://comics.com/steve_breen"),
-                            # ("Steve Kelley", "http://comics.com/steve_kelley"),
-                            # ("Steve Sack", "http://comics.com/steve_sack"),
+                       #(u"2 Cows and a Chicken", u"http://www.gocomics.com/2cowsandachicken"),
+                       #(u"9 Chickweed Lane", u"http://www.gocomics.com/9chickweedlane"),
+                       #(u"Adam At Home", u"http://www.gocomics.com/adamathome"),
+                       #(u"Agnes", u"http://www.gocomics.com/agnes"),
+                       #(u"Alley Oop", u"http://www.gocomics.com/alleyoop"),
+                       #(u"Andy Capp", u"http://www.gocomics.com/andycapp"),
+                       (u"Animal Crackers", u"http://www.gocomics.com/animalcrackers"),
+                       #(u"Annie", u"http://www.gocomics.com/annie"),
+                       #(u"Arlo & Janis", u"http://www.gocomics.com/arloandjanis"),
+                       #(u"Ask Shagg", u"http://www.gocomics.com/askshagg"),
+                       (u"B.C.", u"http://www.gocomics.com/bc"),
+                       #(u"Back in the Day", u"http://www.gocomics.com/backintheday"),
+                       #(u"Bad Reporter", u"http://www.gocomics.com/badreporter"),
+                       (u"Baldo", u"http://www.gocomics.com/baldo"),
+                       #(u"Ballard Street", u"http://www.gocomics.com/ballardstreet"),
+                       #(u"Barkeater Lake", u"http://www.gocomics.com/barkeaterlake"),
+                       #(u"Basic Instructions", u"http://www.gocomics.com/basicinstructions"),
+                       #(u"Ben", u"http://www.gocomics.com/ben"),
+                       #(u"Betty", u"http://www.gocomics.com/betty"),
+                       #(u"Bewley", u"http://www.gocomics.com/bewley"),
+                       #(u"Big Nate", u"http://www.gocomics.com/bignate"),
+                       #(u"Big Top", u"http://www.gocomics.com/bigtop"),
+                       #(u"Biographic", u"http://www.gocomics.com/biographic"),
+                       #(u"Birdbrains", u"http://www.gocomics.com/birdbrains"),
+                       #(u"Bleeker: The Rechargeable Dog", u"http://www.gocomics.com/bleeker"),
+                       #(u"Bliss", u"http://www.gocomics.com/bliss"),
+                       #(u"Bloom County", u"http://www.gocomics.com/bloomcounty"),
+                       #(u"Bo Nanas", u"http://www.gocomics.com/bonanas"),
+                       #(u"Bob the Squirrel", u"http://www.gocomics.com/bobthesquirrel"),
+                       #(u"Boomerangs", u"http://www.gocomics.com/boomerangs"),
+                       #(u"Bottomliners", u"http://www.gocomics.com/bottomliners"),
+                       (u"Bound and Gagged", u"http://www.gocomics.com/boundandgagged"),
+                       #(u"Brainwaves", u"http://www.gocomics.com/brainwaves"),
+                       #(u"Brenda Starr", u"http://www.gocomics.com/brendastarr"),
+                       #(u"Brevity", u"http://www.gocomics.com/brevity"),
+                       #(u"Brewster Rockit", u"http://www.gocomics.com/brewsterrockit"),
+                       (u"Broom Hilda", u"http://www.gocomics.com/broomhilda"),
+                       (u"Calvin and Hobbes", u"http://www.gocomics.com/calvinandhobbes"),
+                       #(u"Candorville", u"http://www.gocomics.com/candorville"),
+                       #(u"Cathy", u"http://www.gocomics.com/cathy"),
+                       #(u"C'est la Vie", u"http://www.gocomics.com/cestlavie"),
+                       #(u"Cheap Thrills", u"http://www.gocomics.com/cheapthrills"),
+                       #(u"Chuckle Bros", u"http://www.gocomics.com/chucklebros"),
+                       #(u"Citizen Dog", u"http://www.gocomics.com/citizendog"),
+                       #(u"Cleats", u"http://www.gocomics.com/cleats"),
+                       #(u"Close to Home", u"http://www.gocomics.com/closetohome"),
+                       #(u"Committed", u"http://www.gocomics.com/committed"),
+                       #(u"Compu-toon", u"http://www.gocomics.com/compu-toon"),
+                       #(u"Cornered", u"http://www.gocomics.com/cornered"),
+                       #(u"Cow & Boy", u"http://www.gocomics.com/cow&boy"),
+                       #(u"Cul de Sac", u"http://www.gocomics.com/culdesac"),
+                       #(u"Daddy's Home", u"http://www.gocomics.com/daddyshome"),
+                       #(u"Deep Cover", u"http://www.gocomics.com/deepcover"),
+                       #(u"Dick Tracy", u"http://www.gocomics.com/dicktracy"),
+                       #(u"Dog Eat Doug", u"http://www.gocomics.com/dogeatdoug"),
+                       #(u"Domestic Abuse", u"http://www.gocomics.com/domesticabuse"),
+                       #(u"Doodles", u"http://www.gocomics.com/doodles"),
+                       #(u"Doonesbury", u"http://www.gocomics.com/doonesbury"),
+                       #(u"Drabble", u"http://www.gocomics.com/drabble"),
+                       #(u"Eek!", u"http://www.gocomics.com/eek"),
+                       #(u"F Minus", u"http://www.gocomics.com/fminus"),
+                       #(u"Family Tree", u"http://www.gocomics.com/familytree"),
+                       #(u"Farcus", u"http://www.gocomics.com/farcus"),
+                       #(u"Fat Cats Classics", u"http://www.gocomics.com/fatcatsclassics"),
+                       #(u"Ferd'nand", u"http://www.gocomics.com/ferdnand"),
+                       #(u"Flight Deck", u"http://www.gocomics.com/flightdeck"),
+                       #(u"Flo and Friends", u"http://www.gocomics.com/floandfriends"),
+                       (u"For Better or For Worse", u"http://www.gocomics.com/forbetterorforworse"),
+                       #(u"For Heaven's Sake", u"http://www.gocomics.com/forheavenssake"),
+                       #(u"Fort Knox", u"http://www.gocomics.com/fortknox"),
+                       #(u"FoxTrot Classics", u"http://www.gocomics.com/foxtrotclassics"),
+                       #(u"FoxTrot", u"http://www.gocomics.com/foxtrot"),
+                       (u"Frank & Ernest", u"http://www.gocomics.com/frankandernest"),
+                       #(u"Frazz", u"http://www.gocomics.com/frazz"),
+                       #(u"Fred Basset", u"http://www.gocomics.com/fredbasset"),
+                       #(u"Free Range", u"http://www.gocomics.com/freerange"),
+                       #(u"Frog Applause", u"http://www.gocomics.com/frogapplause"),
+                       #(u"Garfield Minus Garfield", u"http://www.gocomics.com/garfieldminusgarfield"),
+                       (u"Garfield", u"http://www.gocomics.com/garfield"),
+                       #(u"Gasoline Alley", u"http://www.gocomics.com/gasolinealley"),
+                       #(u"Geech Classics", u"http://www.gocomics.com/geechclassics"),
+                       (u"Get Fuzzy", u"http://www.gocomics.com/getfuzzy"),
+                       #(u"Gil Thorp", u"http://www.gocomics.com/gilthorp"),
+                       #(u"Ginger Meggs", u"http://www.gocomics.com/gingermeggs"),
+                       #(u"Girls & Sports", u"http://www.gocomics.com/girlsandsports"),
+                       #(u"Graffiti", u"http://www.gocomics.com/graffiti"),
+                       #(u"Grand Avenue", u"http://www.gocomics.com/grandavenue"),
+                       #(u"Haiku Ewe", u"http://www.gocomics.com/haikuewe"),
+                       #(u"Heart of the City", u"http://www.gocomics.com/heartofthecity"),
+                       #(u"Herb and Jamaal", u"http://www.gocomics.com/herbandjamaal"),
+                       #(u"Home and Away", u"http://www.gocomics.com/homeandaway"),
+                       #(u"Housebroken", u"http://www.gocomics.com/housebroken"),
+                       #(u"Hubert and Abby", u"http://www.gocomics.com/hubertandabby"),
+                       #(u"Imagine This", u"http://www.gocomics.com/imaginethis"),
+                       #(u"In the Bleachers", u"http://www.gocomics.com/inthebleachers"),
+                       #(u"In the Sticks", u"http://www.gocomics.com/inthesticks"),
+                       #(u"Ink Pen", u"http://www.gocomics.com/inkpen"),
+                       #(u"It's All About You", u"http://www.gocomics.com/itsallaboutyou"),
+                       #(u"Jane's World", u"http://www.gocomics.com/janesworld"),
+                       #(u"Joe Vanilla", u"http://www.gocomics.com/joevanilla"),
+                       #(u"Jump Start", u"http://www.gocomics.com/jumpstart"),
+                       #(u"Kit 'N' Carlyle", u"http://www.gocomics.com/kitandcarlyle"),
+                       #(u"La Cucaracha", u"http://www.gocomics.com/lacucaracha"),
+                       #(u"Last Kiss", u"http://www.gocomics.com/lastkiss"),
+                       #(u"Legend of Bill", u"http://www.gocomics.com/legendofbill"),
+                       #(u"Liberty Meadows", u"http://www.gocomics.com/libertymeadows"),
+                       #(u"Li'l Abner Classics", u"http://www.gocomics.com/lilabnerclassics"),
+                       #(u"Lio", u"http://www.gocomics.com/lio"),
+                       #(u"Little Dog Lost", u"http://www.gocomics.com/littledoglost"),
+                       #(u"Little Otto", u"http://www.gocomics.com/littleotto"),
+                       #(u"Lola", u"http://www.gocomics.com/lola"),
+                       #(u"Love Is...", u"http://www.gocomics.com/loveis"),
+                       (u"Luann", u"http://www.gocomics.com/luann"),
+                       #(u"Maintaining", u"http://www.gocomics.com/maintaining"),
+                       #(u"Meg! Classics", u"http://www.gocomics.com/megclassics"),
+                       #(u"Middle-Aged White Guy", u"http://www.gocomics.com/middleagedwhiteguy"),
+                       #(u"Minimum Security", u"http://www.gocomics.com/minimumsecurity"),
+                       #(u"Moderately Confused", u"http://www.gocomics.com/moderatelyconfused"),
+                       (u"Momma", u"http://www.gocomics.com/momma"),
+                       #(u"Monty", u"http://www.gocomics.com/monty"),
+                       #(u"Motley Classics", u"http://www.gocomics.com/motleyclassics"),
+                       #(u"Mutt & Jeff", u"http://www.gocomics.com/muttandjeff"),
+                       #(u"Mythtickle", u"http://www.gocomics.com/mythtickle"),
+                       #(u"Nancy", u"http://www.gocomics.com/nancy"),
+                       #(u"Natural Selection", u"http://www.gocomics.com/naturalselection"),
+                       #(u"Nest Heads", u"http://www.gocomics.com/nestheads"),
+                       #(u"NEUROTICA", u"http://www.gocomics.com/neurotica"),
+                       #(u"New Adventures of Queen Victoria", u"http://www.gocomics.com/thenewadventuresofqueenvictoria"),
+                       (u"Non Sequitur", u"http://www.gocomics.com/nonsequitur"),
+                       #(u"Off The Mark", u"http://www.gocomics.com/offthemark"),
+                       #(u"On A Claire Day", u"http://www.gocomics.com/onaclaireday"),
+                       #(u"One Big Happy Classics", u"http://www.gocomics.com/onebighappyclassics"),
+                       #(u"One Big Happy", u"http://www.gocomics.com/onebighappy"),
+                       #(u"Out of the Gene Pool Re-Runs", u"http://www.gocomics.com/outofthegenepool"),
+                       #(u"Over the Hedge", u"http://www.gocomics.com/overthehedge"),
+                       #(u"Overboard", u"http://www.gocomics.com/overboard"),
+                       #(u"PC and Pixel", u"http://www.gocomics.com/pcandpixel"),
+                       (u"Peanuts", u"http://www.gocomics.com/peanuts"),
+                       (u"Pearls Before Swine", u"http://www.gocomics.com/pearlsbeforeswine"),
+                       #(u"Pibgorn Sketches", u"http://www.gocomics.com/pibgornsketches"),
+                       #(u"Pibgorn", u"http://www.gocomics.com/pibgorn"),
+                       #(u"Pickles", u"http://www.gocomics.com/pickles"),
+                       #(u"Pinkerton", u"http://www.gocomics.com/pinkerton"),
+                       #(u"Pluggers", u"http://www.gocomics.com/pluggers"),
+                       (u"Pooch Cafe", u"http://www.gocomics.com/poochcafe"),
+                       #(u"PreTeena", u"http://www.gocomics.com/preteena"),
+                       #(u"Prickly City", u"http://www.gocomics.com/pricklycity"),
+                       #(u"Rabbits Against Magic", u"http://www.gocomics.com/rabbitsagainstmagic"),
+                       #(u"Raising Duncan Classics", u"http://www.gocomics.com/raisingduncanclassics"),
+                       #(u"Real Life Adventures", u"http://www.gocomics.com/reallifeadventures"),
+                       #(u"Reality Check", u"http://www.gocomics.com/realitycheck"),
+                       #(u"Red and Rover", u"http://www.gocomics.com/redandrover"),
+                       #(u"Red Meat", u"http://www.gocomics.com/redmeat"),
+                       #(u"Reynolds Unwrapped", u"http://www.gocomics.com/reynoldsunwrapped"),
+                       #(u"Rip Haywire", u"http://www.gocomics.com/riphaywire"),
+                       #(u"Ronaldinho Gaucho", u"http://www.gocomics.com/ronaldinhogaucho"),
+                       (u"Rose Is Rose", u"http://www.gocomics.com/roseisrose"),
+                       #(u"Rudy Park", u"http://www.gocomics.com/rudypark"),
+                       #(u"Scary Gary", u"http://www.gocomics.com/scarygary"),
+                       #(u"Shirley and Son Classics", u"http://www.gocomics.com/shirleyandsonclassics"),
+                       (u"Shoe", u"http://www.gocomics.com/shoe"),
+                       #(u"Shoecabbage", u"http://www.gocomics.com/shoecabbage"),
+                       #(u"Skin Horse", u"http://www.gocomics.com/skinhorse"),
+                       #(u"Slowpoke", u"http://www.gocomics.com/slowpoke"),
+                       #(u"Soup To Nutz", u"http://www.gocomics.com/souptonutz"),
+                       #(u"Spot The Frog", u"http://www.gocomics.com/spotthefrog"),
+                       #(u"State of the Union", u"http://www.gocomics.com/stateoftheunion"),
+                       #(u"Stone Soup", u"http://www.gocomics.com/stonesoup"),
+                       #(u"Sylvia", u"http://www.gocomics.com/sylvia"),
+                       #(u"Tank McNamara", u"http://www.gocomics.com/tankmcnamara"),
+                       #(u"Tarzan Classics", u"http://www.gocomics.com/tarzanclassics"),
+                       #(u"That's Life", u"http://www.gocomics.com/thatslife"),
+                       #(u"The Academia Waltz", u"http://www.gocomics.com/academiawaltz"),
+                       #(u"The Barn", u"http://www.gocomics.com/thebarn"),
+                       #(u"The Boiling Point", u"http://www.gocomics.com/theboilingpoint"),
+                       #(u"The Boondocks", u"http://www.gocomics.com/boondocks"),
+                       (u"The Born Loser", u"http://www.gocomics.com/thebornloser"),
+                       #(u"The Buckets", u"http://www.gocomics.com/thebuckets"),
+                       #(u"The City", u"http://www.gocomics.com/thecity"),
+                       #(u"The Dinette Set", u"http://www.gocomics.com/dinetteset"),
+                       #(u"The Doozies", u"http://www.gocomics.com/thedoozies"),
+                       #(u"The Duplex", u"http://www.gocomics.com/duplex"),
+                       #(u"The Elderberries", u"http://www.gocomics.com/theelderberries"),
+                       #(u"The Flying McCoys", u"http://www.gocomics.com/theflyingmccoys"),
+                       #(u"The Fusco Brothers", u"http://www.gocomics.com/thefuscobrothers"),
+                       #(u"The Grizzwells", u"http://www.gocomics.com/thegrizzwells"),
+                       #(u"The Humble Stumble", u"http://www.gocomics.com/thehumblestumble"),
+                       #(u"The Knight Life", u"http://www.gocomics.com/theknightlife"),
+                       #(u"The Meaning of Lila", u"http://www.gocomics.com/meaningoflila"),
+                       (u"The Middletons", u"http://www.gocomics.com/themiddletons"),
+                       #(u"The Norm", u"http://www.gocomics.com/thenorm"),
+                       #(u"The Other Coast", u"http://www.gocomics.com/theothercoast"),
+                       #(u"The Quigmans", u"http://www.gocomics.com/thequigmans"),
+                       #(u"The Sunshine Club", u"http://www.gocomics.com/thesunshineclub"),
+                       #(u"Tiny Sepuk", u"http://www.gocomics.com/tinysepuk"),
+                       #(u"TOBY", u"http://www.gocomics.com/toby"),
+                       #(u"Tom the Dancing Bug", u"http://www.gocomics.com/tomthedancingbug"),
+                       #(u"Too Much Coffee Man", u"http://www.gocomics.com/toomuchcoffeeman"),
+                       #(u"Unstrange Phenomena", u"http://www.gocomics.com/unstrangephenomena"),
+                       #(u"W.T. Duck", u"http://www.gocomics.com/wtduck"),
+                       #(u"Watch Your Head", u"http://www.gocomics.com/watchyourhead"),
+                       #(u"Wee Pals", u"http://www.gocomics.com/weepals"),
+                       #(u"Winnie the Pooh", u"http://www.gocomics.com/winniethepooh"),
+                       (u"Wizard of Id", u"http://www.gocomics.com/wizardofid"),
+                       #(u"Working Daze", u"http://www.gocomics.com/workingdaze"),
+                       #(u"Working It Out", u"http://www.gocomics.com/workingitout"),
+                       #(u"Yenny", u"http://www.gocomics.com/yenny"),
+                       #(u"Zack Hill", u"http://www.gocomics.com/zackhill"),
+                       #(u"Ziggy", u"http://www.gocomics.com/ziggy"),
+                       (u"9 to 5", u"http://www.gocomics.com/9to5"),
+                       (u"Heathcliff", u"http://www.gocomics.com/heathcliff"),
+                       (u"Herman", u"http://www.gocomics.com/herman"),
+                       (u"Loose Parts", u"http://www.gocomics.com/looseparts"),
+                       (u"Marmaduke", u"http://www.gocomics.com/marmaduke"),
+                       (u"Ripley's Believe It or Not!", u"http://www.gocomics.com/ripleysbelieveitornot"),
+                       (u"Rubes", u"http://www.gocomics.com/rubes"),
+                       (u"Speed Bump", u"http://www.gocomics.com/speedbump"),
+                       (u"Strange Brew", u"http://www.gocomics.com/strangebrew"),
+                       (u"The Argyle Sweater", u"http://www.gocomics.com/theargylesweater"),
+                       #
+                       ######## EDITORIAL CARTOONS #####################
+                       #(u"Adam Zyglis", u"http://www.gocomics.com/adamzyglis"),
+                       #(u"Andy Singer", u"http://www.gocomics.com/andysinger"),
+                       #(u"Ben Sargent",u"http://www.gocomics.com/bensargent"),
+                       #(u"Bill Day", u"http://www.gocomics.com/billday"),
+                       #(u"Bill Schorr", u"http://www.gocomics.com/billschorr"),
+                       #(u"Bob Englehart", u"http://www.gocomics.com/bobenglehart"),
+                       #(u"Bob Gorrell",u"http://www.gocomics.com/bobgorrell"),
+                       #(u"Brian Fairrington", u"http://www.gocomics.com/brianfairrington"),
+                       #(u"Bruce Beattie", u"http://www.gocomics.com/brucebeattie"),
+                       #(u"Cam Cardow", u"http://www.gocomics.com/camcardow"),
+                       #(u"Chan Lowe",u"http://www.gocomics.com/chanlowe"),
+                       #(u"Chip Bok",u"http://www.gocomics.com/chipbok"),
+                       #(u"Chris Britt",u"http://www.gocomics.com/chrisbritt"),
+                       #(u"Chuck Asay",u"http://www.gocomics.com/chuckasay"),
+                       #(u"Clay Bennett",u"http://www.gocomics.com/claybennett"),
+                       #(u"Clay Jones",u"http://www.gocomics.com/clayjones"),
+                       #(u"Dan Wasserman",u"http://www.gocomics.com/danwasserman"),
+                       #(u"Dana Summers",u"http://www.gocomics.com/danasummers"),
+                       #(u"Daryl Cagle", u"http://www.gocomics.com/darylcagle"),
+                       #(u"David Fitzsimmons", u"http://www.gocomics.com/davidfitzsimmons"),
+                       #(u"Dick Locher",u"http://www.gocomics.com/dicklocher"),
+                       #(u"Don Wright",u"http://www.gocomics.com/donwright"),
+                       #(u"Donna Barstow",u"http://www.gocomics.com/donnabarstow"),
+                       #(u"Drew Litton", u"http://www.gocomics.com/drewlitton"),
+                       #(u"Drew Sheneman",u"http://www.gocomics.com/drewsheneman"),
+                       #(u"Ed Stein", u"http://www.gocomics.com/edstein"),
+                       #(u"Eric Allie", u"http://www.gocomics.com/ericallie"),
+                       #(u"Gary Markstein", u"http://www.gocomics.com/garymarkstein"),
+                       #(u"Gary McCoy", u"http://www.gocomics.com/garymccoy"),
+                       #(u"Gary Varvel", u"http://www.gocomics.com/garyvarvel"),
+                       #(u"Glenn McCoy",u"http://www.gocomics.com/glennmccoy"),
+                       #(u"Henry Payne", u"http://www.gocomics.com/henrypayne"),
+                       #(u"Jack Ohman",u"http://www.gocomics.com/jackohman"),
+                       #(u"JD Crowe", u"http://www.gocomics.com/jdcrowe"),
+                       #(u"Jeff Danziger",u"http://www.gocomics.com/jeffdanziger"),
+                       #(u"Jeff Parker", u"http://www.gocomics.com/jeffparker"),
+                       #(u"Jeff Stahler", u"http://www.gocomics.com/jeffstahler"),
+                       #(u"Jerry Holbert", u"http://www.gocomics.com/jerryholbert"),
+                       #(u"Jim Morin",u"http://www.gocomics.com/jimmorin"),
+                       #(u"Joel Pett",u"http://www.gocomics.com/joelpett"),
+                       #(u"John Cole", u"http://www.gocomics.com/johncole"),
+                       #(u"John Darkow", u"http://www.gocomics.com/johndarkow"),
+                       #(u"John Deering",u"http://www.gocomics.com/johndeering"),
+                       #(u"John Sherffius", u"http://www.gocomics.com/johnsherffius"),
+                       #(u"Ken Catalino",u"http://www.gocomics.com/kencatalino"),
+                       #(u"Kerry Waghorn",u"http://www.gocomics.com/facesinthenews"),
+                       #(u"Kevin Kallaugher",u"http://www.gocomics.com/kevinkallaugher"),
+                       #(u"Lalo Alcaraz",u"http://www.gocomics.com/laloalcaraz"),
+                       #(u"Larry Wright", u"http://www.gocomics.com/larrywright"),
+                       #(u"Lisa Benson", u"http://www.gocomics.com/lisabenson"),
+                       #(u"Marshall Ramsey", u"http://www.gocomics.com/marshallramsey"),
+                       #(u"Matt Bors", u"http://www.gocomics.com/mattbors"),
+                       #(u"Matt Davies",u"http://www.gocomics.com/mattdavies"),
+                       #(u"Michael Ramirez", u"http://www.gocomics.com/michaelramirez"),
+                       #(u"Mike Keefe", u"http://www.gocomics.com/mikekeefe"),
+                       #(u"Mike Luckovich", u"http://www.gocomics.com/mikeluckovich"),
+                       #(u"MIke Thompson", u"http://www.gocomics.com/mikethompson"),
+                       #(u"Monte Wolverton", u"http://www.gocomics.com/montewolverton"),
+                       #(u"Mr. Fish", u"http://www.gocomics.com/mrfish"),
+                       #(u"Nate Beeler", u"http://www.gocomics.com/natebeeler"),
+                       #(u"Nick Anderson", u"http://www.gocomics.com/nickanderson"),
+                       #(u"Pat Bagley", u"http://www.gocomics.com/patbagley"),
+                       #(u"Pat Oliphant",u"http://www.gocomics.com/patoliphant"),
+                       #(u"Paul Conrad",u"http://www.gocomics.com/paulconrad"),
+                       #(u"Paul Szep", u"http://www.gocomics.com/paulszep"),
+                       #(u"RJ Matson", u"http://www.gocomics.com/rjmatson"),
+                       #(u"Rob Rogers", u"http://www.gocomics.com/robrogers"),
+                       #(u"Robert Ariail", u"http://www.gocomics.com/robertariail"),
+                       #(u"Scott Stantis", u"http://www.gocomics.com/scottstantis"),
+                       #(u"Signe Wilkinson", u"http://www.gocomics.com/signewilkinson"),
+                       #(u"Small World",u"http://www.gocomics.com/smallworld"),
+                       #(u"Steve Benson", u"http://www.gocomics.com/stevebenson"),
+                       #(u"Steve Breen", u"http://www.gocomics.com/stevebreen"),
+                       #(u"Steve Kelley", u"http://www.gocomics.com/stevekelley"),
+                       #(u"Steve Sack", u"http://www.gocomics.com/stevesack"),
+                       #(u"Stuart Carlson",u"http://www.gocomics.com/stuartcarlson"),
+                       #(u"Ted Rall",u"http://www.gocomics.com/tedrall"),
+                       #(u"(Th)ink", u"http://www.gocomics.com/think"),
+                       #(u"Tom Toles",u"http://www.gocomics.com/tomtoles"),
+                       #(u"Tony Auth",u"http://www.gocomics.com/tonyauth"),
+                       #(u"Views of the World",u"http://www.gocomics.com/viewsoftheworld"),
+                       #(u"ViewsAfrica",u"http://www.gocomics.com/viewsafrica"),
+                       #(u"ViewsAmerica",u"http://www.gocomics.com/viewsamerica"),
+                       #(u"ViewsAsia",u"http://www.gocomics.com/viewsasia"),
+                       #(u"ViewsBusiness",u"http://www.gocomics.com/viewsbusiness"),
+                       #(u"ViewsEurope",u"http://www.gocomics.com/viewseurope"),
+                       #(u"ViewsLatinAmerica",u"http://www.gocomics.com/viewslatinamerica"),
+                       #(u"ViewsMidEast",u"http://www.gocomics.com/viewsmideast"),
+                       #(u"Walt Handelsman",u"http://www.gocomics.com/walthandelsman"),
+                       #(u"Wayne Stayskal",u"http://www.gocomics.com/waynestayskal"),
+                       #(u"Wit of the World",u"http://www.gocomics.com/witoftheworld"),
                             ]:
+            print 'Working on: ', title
            articles = self.make_links(url)
            if articles:
                feeds.append((title, articles))
        return feeds

    def make_links(self, url):
-        soup = self.index_to_soup(url)
-        # print 'soup: ', soup
-        title = ''
+        title = 'Temp'
        current_articles = []
-        from datetime import datetime, timedelta
-        now = datetime.now()
-        dates = [(now-timedelta(days=d)).strftime('%Y/%m/%d') for d in range(self.num_comics_to_get)]
-
-        for page in dates:
-            page_url = url + '/' + str(page)
-            print(page_url)
-            soup = self.index_to_soup(page_url)
-            if soup:
-                strip_tag = self.tag_to_string(soup.find('a'))
-                if strip_tag:
-                  print 'strip_tag: ', strip_tag
-                  title = strip_tag
-                  print 'title: ', title
+        pages = range(1, self.num_comics_to_get+1)
+        for page in pages:
+            page_soup = self.index_to_soup(url)
+            if page_soup:
+                try:
+                    strip_title = page_soup.find(name='div', attrs={'class':'top'}).h1.a.string
+                except:
+                    strip_title = 'Error - no Title found'
+                try:
+                    date_title = page_soup.find('ul', attrs={'class': 'feature-nav'}).li.string
+                    if not date_title:
+                        date_title = page_soup.find('ul', attrs={'class': 'feature-nav'}).li.string
+                except:
+                    date_title = 'Error - no Date found'
+                title = strip_title + ' - ' + date_title
+                for i in range(2):
+                    try:
+                        strip_url_date = page_soup.find(name='div', attrs={'class':'top'}).h1.a['href']
+                        break  # success - this is normal exit
+                    except:
+                        strip_url_date = None
+                        continue  # try to get strip_url_date again
+                for i in range(2):
+                    try:
+                        prev_strip_url_date = page_soup.find('a', attrs={'class': 'prev'})['href']
+                        break  # success - this is normal exit
+                    except:
+                        prev_strip_url_date = None
+                        continue  # try to get prev_strip_url_date again
+                if strip_url_date:
+                    page_url = 'http://www.gocomics.com' + strip_url_date
+                else:
+                    continue
+                if prev_strip_url_date:
+                    prev_page_url = 'http://www.gocomics.com' + prev_strip_url_date
+                else:
+                    continue
            current_articles.append({'title': title, 'url': page_url, 'description':'', 'date':''})
+            url = prev_page_url
        current_articles.reverse()
        return current_articles

+    def preprocess_html(self, soup):
+        if soup.title:
+            title_string = soup.title.string.strip()
+            _cd = title_string.split(',',1)[1]
+            comic_date = ' '.join(_cd.split(' ', 4)[0:-1])
+        if soup.h1.span:
+            artist = soup.h1.span.string
+            soup.h1.span.string.replaceWith(comic_date + artist)
+        feature_item = soup.find('p',attrs={'class':'feature_item'})
+        if feature_item.a:
+            a_tag = feature_item.a
+            a_href = a_tag["href"]
+            img_tag = a_tag.img
+            img_tag["src"] = a_href
+            img_tag["width"] = self.comic_size
+            img_tag["height"] = None
+        return self.adeify_images(soup)
+
    extra_css = '''
                    h1{font-family:Arial,Helvetica,sans-serif; font-weight:bold;font-size:large;}
                    h2{font-family:Arial,Helvetica,sans-serif; font-weight:normal;font-size:small;}
+                    img {max-width:100%; min-width:100%;}
                    p{font-family:Arial,Helvetica,sans-serif;font-size:small;}
                    body{font-family:Helvetica,Arial,sans-serif;font-size:small;}
    '''
+
--- a/recipes/handelsblatt.recipe
+++ b/recipes/handelsblatt.recipe
@ -1,16 +1,61 @@
+import re
 from calibre.web.feeds.news import BasicNewsRecipe

 class Handelsblatt(BasicNewsRecipe):
    title          = u'Handelsblatt'
-    __author__ = 'malfi'
+    __author__ = 'malfi'  # modified by Hegi, last change 2013-05-20
+    description           = u'Handelsblatt - basierend auf den RRS-Feeds von Handelsblatt.de'
+    tags 	                = 'Nachrichten, Blog, Wirtschaft'
+    publisher             = 'Verlagsgruppe Handelsblatt GmbH'
+    category              = 'business, economy, news, Germany'
+    publication_type      = 'daily newspaper'
+    language              = 'de_DE'
    oldest_article        = 7
    max_articles_per_feed = 100
-    no_stylesheets = True
-#    cover_url = 'http://www.handelsblatt.com/images/logo/logo_handelsblatt.com.png'
-    language = 'de'
+    simultaneous_downloads= 20

-    remove_tags_before =  dict(attrs={'class':'hcf-overline'})
-    remove_tags_after  =  dict(attrs={'class':'hcf-footer'})
+    auto_cleanup          = False
+    no_stylesheets        = True
+    remove_javascript     = True
+    remove_empty_feeds    = True
+
+    # don't duplicate articles from "Schlagzeilen" / "Exklusiv" to other rubrics
+    ignore_duplicate_articles = {'title', 'url'}
+
+    # if you want to reduce size for an b/w or E-ink device, uncomment this:
+    # compress_news_images  = True
+    # compress_news_images_auto_size = 16
+    # scale_news_images     = (400,300)
+
+    timefmt               = ' [%a, %d %b %Y]'
+
+    conversion_options    = {'smarten_punctuation' : True,
+                        'authors'		  : publisher,
+                        'publisher'  	  : publisher}
+    language              = 'de_DE'
+    encoding              = 'UTF-8'
+
+    cover_source          = 'http://www.handelsblatt-shop.com/epaper/482/'
+    # masthead_url          = 'http://www.handelsblatt.com/images/hb_logo/6543086/1-format3.jpg'
+    masthead_url          = 'http://www.handelsblatt-chemie.de/wp-content/uploads/2012/01/hb-logo.gif'
+
+    def get_cover_url(self):
+        cover_source_soup = self.index_to_soup(self.cover_source)
+        preview_image_div = cover_source_soup.find(attrs={'class':'vorschau'})
+        return 'http://www.handelsblatt-shop.com'+preview_image_div.a.img['src']
+
+    # remove_tags_before =  dict(attrs={'class':'hcf-overline'})
+    # remove_tags_after  =  dict(attrs={'class':'hcf-footer'})
+    # Alternatively use this:
+
+    keep_only_tags    = [
+                          dict(name='div', attrs={'class':['hcf-column hcf-column1 hcf-teasercontainer hcf-maincol']}),
+                          dict(name='div', attrs={'id':['contentMain']})
+                        ]
+
+    remove_tags = [
+                    dict(name='div', attrs={'class':['hcf-link-block hcf-faq-open', 'hcf-article-related']})
+                  ]

    feeds          = [
                        (u'Handelsblatt Exklusiv',u'http://www.handelsblatt.com/rss/exklusiv'),
@ -25,15 +70,19 @@ class Handelsblatt(BasicNewsRecipe):
                        (u'Handelsblatt Weblogs',u'http://www.handelsblatt.com/rss/blogs')
                     ]

-    extra_css = '''
-        h1{font-family:Arial,Helvetica,sans-serif; font-weight:bold;font-size:large;}
-        h2{font-family:Arial,Helvetica,sans-serif; font-weight:normal;font-size:small;}
-        p{font-family:Arial,Helvetica,sans-serif;font-size:small;}
-        body{font-family:Helvetica,Arial,sans-serif;font-size:small;}
-        '''
+    # Insert ". " after "Place" in <span class="hcf-location-mark">Place</span>
+    # If you use .epub format you could also do this as extra_css '.hcf-location-mark:after {content: ". "}'
+    preprocess_regexps    = [(re.compile(r'(<span class="hcf-location-mark">[^<]*)(</span>)',
+                              re.DOTALL|re.IGNORECASE), lambda match: match.group(1) + '. ' + match.group(2))]
+
+    extra_css      =  'h1 {font-size: 1.6em; text-align: left} \
+                       h2 {font-size: 1em; font-style: italic; font-weight: normal} \
+                       h3 {font-size: 1.3em;text-align: left} \
+                       h4, h5, h6, a {font-size: 1em;text-align: left} \
+                       .hcf-caption {font-size: 1em;text-align: left; font-style: italic} \
+                       .hcf-location-mark {font-style: italic}'

    def print_version(self, url):
-        url = url.split('/')
-        url[-1] = 'v_detail_tab_print,'+url[-1]
-        url = '/'.join(url)
-        return url
+        main, sep, id = url.rpartition('/')
+        return main + '/v_detail_tab_print/' + id
+
--- a/recipes/las_vegas_review.recipe
+++ b/recipes/las_vegas_review.recipe
@ -9,11 +9,14 @@ class AdvancedUserRecipe1274742400(BasicNewsRecipe):
    oldest_article = 7

    max_articles_per_feed = 100
-    keep_only_tags = [dict(id='content-main')]
-    remove_tags = [dict(id=['right-col-content', 'trending-topics']),
-            {'class':['ppy-outer']}
-            ]
+    #keep_only_tags = [dict(id='content-main')]
+    #remove_tags = [dict(id=['right-col-content', 'trending-topics']),
+            #{'class':['ppy-outer']}
+            #]
    no_stylesheets = True
+    use_embedded_content = False
+    auto_cleanup = True
+

    feeds = [
    (u'News', u'http://www.lvrj.com/news.rss'),
@ -21,9 +24,9 @@ class AdvancedUserRecipe1274742400(BasicNewsRecipe):
    (u'Living', u'http://www.lvrj.com/living.rss'),
    (u'Opinion', u'http://www.lvrj.com/opinion.rss'),
    (u'Neon', u'http://www.lvrj.com/neon.rss'),
-    (u'Image', u'http://www.lvrj.com/image.rss'),
-    (u'Home & Garden', u'http://www.lvrj.com/home_and_garden.rss'),
-    (u'Furniture & Design', u'http://www.lvrj.com/furniture_and_design.rss'),
-    (u'Drive', u'http://www.lvrj.com/drive.rss'),
-    (u'Real Estate', u'http://www.lvrj.com/real_estate.rss'),
+    #(u'Image', u'http://www.lvrj.com/image.rss'),
+    #(u'Home & Garden', u'http://www.lvrj.com/home_and_garden.rss'),
+    #(u'Furniture & Design', u'http://www.lvrj.com/furniture_and_design.rss'),
+    #(u'Drive', u'http://www.lvrj.com/drive.rss'),
+    #(u'Real Estate', u'http://www.lvrj.com/real_estate.rss'),
    (u'Sports', u'http://www.lvrj.com/sports.rss')]
--- a/recipes/nme.recipe
+++ b/recipes/nme.recipe
@ -4,7 +4,7 @@ class AdvancedUserRecipe1306061239(BasicNewsRecipe):
    title          = u'New Musical Express Magazine'
    description = 'Author D.Asbury. UK Rock & Pop Mag. '
    __author__ = 'Dave Asbury'
-    # last updated 7/10/12
+    # last updated 17/5/13 News feed url altered
    remove_empty_feeds = True
    remove_javascript     = True
    no_stylesheets = True
@ -13,15 +13,14 @@ class AdvancedUserRecipe1306061239(BasicNewsRecipe):
    #auto_cleanup = True
    language = 'en_GB'
    compress_news_images = True
-
    def get_cover_url(self):
        soup = self.index_to_soup('http://www.nme.com/component/subscribe')
        cov = soup.find(attrs={'id' : 'magazine_cover'})
+
        cov2 = str(cov['src'])
        # print '**** Cov url =*', cover_url,'***'
        #print '**** Cov url =*','http://www.magazinesdirect.com/article_images/articledir_3138/1569221/1_largelisting.jpg','***'

-
        br = browser()
        br.set_handle_redirect(False)
        try:
@ -30,7 +29,6 @@ class AdvancedUserRecipe1306061239(BasicNewsRecipe):
        except:
                cover_url = 'http://tawanda3000.files.wordpress.com/2011/02/nme-logo.jpg'
        return cover_url
-
    masthead_url   = 'http://tawanda3000.files.wordpress.com/2011/02/nme-logo.jpg'

    remove_tags = [
@ -56,11 +54,8 @@ class AdvancedUserRecipe1306061239(BasicNewsRecipe):

    ]

-
-
-
    feeds          = [
-    (u'NME News', u'http://feeds.feedburner.com/nmecom/rss/newsxml?format=xml'),
+        (u'NME News', u'http://www.nme.com/news?alt=rss' ), #http://feeds.feedburner.com/nmecom/rss/newsxml?format=xml'),
        #(u'Reviews', u'http://feeds2.feedburner.com/nme/SdML'),
        (u'Reviews',u'http://feed43.com/1817687144061333.xml'),
                        (u'Bloggs',u'http://feed43.com/3326754333186048.xml'),
--- a/recipes/nrc_next.recipe
+++ b/recipes/nrc_next.recipe
@ -0,0 +1,75 @@
+#!/usr/bin/env  python2
+# -*- coding: utf-8 -*-
+# Based on veezh's original recipe, Kovid Goyal's New York Times recipe and Snaabs nrc Handelsblad recipe
+
+__license__   = 'GPL v3'
+__copyright__ = '2013, Niels Giesen'
+
+'''
+www.nrc.nl
+'''
+import os, zipfile
+import time
+from calibre.web.feeds.news import BasicNewsRecipe
+from calibre.ptempfile import PersistentTemporaryFile
+
+
+class NRCNext(BasicNewsRecipe):
+
+    title = u'nrc•next'
+    description = u'De ePaper-versie van nrc•next'
+    language = 'nl'
+    lang = 'nl-NL'
+    needs_subscription = True
+
+    __author__ = 'Niels Giesen'
+
+    conversion_options = {
+        'no_default_epub_cover' : True
+    }
+
+    def get_browser(self):
+        br = BasicNewsRecipe.get_browser(self)
+        if self.username is not None and self.password is not None:
+            br.open('http://login.nrc.nl/login')
+            br.select_form(nr=0)
+            br['username'] = self.username
+            br['password'] = self.password
+            br.submit()
+        return br
+
+    def build_index(self):
+
+        today = time.strftime("%Y%m%d")
+
+        domain = "http://digitaleeditie.nrc.nl"
+
+        url = domain + "/digitaleeditie/helekrant/epub/nn_" + today + ".epub"
+        #print url
+
+        try:
+            br = self.get_browser()
+            f = br.open(url)
+        except:
+            self.report_progress(0,_('Kan niet inloggen om editie te downloaden'))
+            raise ValueError('Krant van vandaag nog niet beschikbaar')
+
+        tmp = PersistentTemporaryFile(suffix='.epub')
+        self.report_progress(0,_('downloading epub'))
+        tmp.write(f.read())
+        f.close()
+        br.close()
+        if zipfile.is_zipfile(tmp):
+            try:
+                zfile = zipfile.ZipFile(tmp.name, 'r')
+                zfile.extractall(self.output_dir)
+                self.report_progress(0,_('extracting epub'))
+            except zipfile.BadZipfile:
+                self.report_progress(0,_('BadZip error, continuing'))
+
+        tmp.close()
+        index = os.path.join(self.output_dir, 'metadata.opf')
+
+        self.report_progress(1,_('epub downloaded and extracted'))
+
+        return index
--- a/recipes/nsfw_corp.recipe
+++ b/recipes/nsfw_corp.recipe
@ -1,11 +1,9 @@
-
 __license__   = 'GPL v3'
-__copyright__ = '2012, Darko Miletic <darko.miletic at gmail.com>'
+__copyright__ = '2012-2013, Darko Miletic <darko.miletic at gmail.com>'
 '''
 www.nsfwcorp.com
 '''

-import urllib
 from calibre.web.feeds.news import BasicNewsRecipe

 class NotSafeForWork(BasicNewsRecipe):
@ -20,8 +18,8 @@ class NotSafeForWork(BasicNewsRecipe):
    needs_subscription     = True
    auto_cleanup           = False
    INDEX                  = 'https://www.nsfwcorp.com'
-    LOGIN                  = INDEX + '/login/target/'
-    SETTINGS               = INDEX + '/settings/'
+    LOGIN                  = INDEX + '/account/login/?next=%2F'
+    SETTINGS               = INDEX + '/account/settings/'
    use_embedded_content   = True
    language               = 'en'
    publication_type       = 'magazine'
@ -48,19 +46,20 @@ class NotSafeForWork(BasicNewsRecipe):

    def get_browser(self):
        br = BasicNewsRecipe.get_browser(self)
-        br.open(self.LOGIN)
+        br.open(self.INDEX)
        if self.username is not None and self.password is not None:
-            data = urllib.urlencode({ 'email':self.username
-                                     ,'password':self.password
-                                   })
-            br.open(self.LOGIN, data)
+            br.open(self.LOGIN)
+            br.select_form(nr=0)
+            br['email'   ] = self.username
+            br['password'] = self.password
+            br.submit()
        return br

    def get_feeds(self):
        self.feeds = []
        soup = self.index_to_soup(self.SETTINGS)
        for item in soup.findAll('input', attrs={'type':'text'}):
-            if item.has_key('value') and item['value'].startswith('http://www.nsfwcorp.com/feed/'):
+            if item.has_key('value') and item['value'].startswith('https://www.nsfwcorp.com/feed/'):
               self.feeds.append(item['value'])
               return self.feeds
        return self.feeds
--- a/recipes/the_oz.recipe
+++ b/recipes/the_oz.recipe
@ -45,7 +45,8 @@ class DailyTelegraph(BasicNewsRecipe):
                    .caption{font-family:Trebuchet MS,Trebuchet,Helvetica,sans-serif; font-size: xx-small;}
                '''

-    feeds = [       (u'News', u'http://feeds.news.com.au/public/rss/2.0/aus_news_807.xml'),
+    feeds = [
+        (u'News', u'http://feeds.news.com.au/public/rss/2.0/aus_news_807.xml'),
        (u'Opinion', u'http://feeds.news.com.au/public/rss/2.0/aus_opinion_58.xml'),
        (u'The Nation', u'http://feeds.news.com.au/public/rss/2.0/aus_the_nation_62.xml'),
        (u'World News', u'http://feeds.news.com.au/public/rss/2.0/aus_world_808.xml'),
@ -68,7 +69,7 @@ class DailyTelegraph(BasicNewsRecipe):
        br = BasicNewsRecipe.get_browser(self)
        if self.username and self.password:
            br.open('http://www.theaustralian.com.au')
-            br.select_form(nr=0)
+            br.select_form(nr=1)
            br['username'] = self.username
            br['password'] = self.password
            raw = br.submit().read()
@ -87,3 +88,4 @@ class DailyTelegraph(BasicNewsRecipe):
        # return br.geturl()


+
--- a/recipes/weblogs_sl.recipe
+++ b/recipes/weblogs_sl.recipe
@ -3,7 +3,7 @@ __license__     = 'GPL v3'
 __copyright__   = '4 February 2011, desUBIKado'
 __author__      = 'desUBIKado'
 __version__     = 'v0.09'
-__date__        = '02, December 2012'
+__date__        = '14, May 2013'
 '''
 http://www.weblogssl.com/
 '''
@ -56,15 +56,16 @@ class weblogssl(BasicNewsRecipe):
                          ,(u'Zona FandoM', u'http://feeds.weblogssl.com/zonafandom')
                          ,(u'Fandemia', u'http://feeds.weblogssl.com/fandemia')
                          ,(u'Tendencias', u'http://feeds.weblogssl.com/trendencias')
-                          ,(u'Beb\xe9s y m\xe1s', u'http://feeds.weblogssl.com/bebesymas')
+                          ,(u'Tendencias Belleza', u'http://feeds.weblogssl.com/trendenciasbelleza')
+                          ,(u'Tendencias Hombre', u'http://feeds.weblogssl.com/trendenciashombre')
+                          ,(u'Tendencias Shopping', u'http://feeds.weblogssl.com/trendenciasshopping')
                          ,(u'Directo al paladar', u'http://feeds.weblogssl.com/directoalpaladar')
                          ,(u'Compradicci\xf3n', u'http://feeds.weblogssl.com/compradiccion')
                          ,(u'Decoesfera', u'http://feeds.weblogssl.com/decoesfera')
                          ,(u'Embelezzia', u'http://feeds.weblogssl.com/embelezzia')
                          ,(u'Vit\xf3nica', u'http://feeds.weblogssl.com/vitonica')
                          ,(u'Ambiente G', u'http://feeds.weblogssl.com/ambienteg')
-                          ,(u'Tendencias Belleza', u'http://feeds.weblogssl.com/trendenciasbelleza')
-                          ,(u'Tendencias Hombre', u'http://feeds.weblogssl.com/trendenciashombre')
+                          ,(u'Beb\xe9s y m\xe1s', u'http://feeds.weblogssl.com/bebesymas')
                          ,(u'Peques y m\xe1s', u'http://feeds.weblogssl.com/pequesymas')
                          ,(u'Motorpasi\xf3n', u'http://feeds.weblogssl.com/motorpasion')
                          ,(u'Motorpasi\xf3n F1', u'http://feeds.weblogssl.com/motorpasionf1')
@ -119,23 +120,6 @@ class weblogssl(BasicNewsRecipe):

        return soup

-    # Para obtener la url original del articulo a partir de la de "feedsportal"
-    # El siguiente código es gracias al usuario "bosplans" de www.mobileread.com
-    # http://www.mobileread.com/forums/showthread.php?t=130297

    def get_article_url(self, article):
-       link = article.get('link', None)
-       if link is None:
-           return article
-       # if link.split('/')[-4]=="xataka2":
-       #     return article.get('feedburner_origlink', article.get('link', article.get('guid')))
-       if link.split('/')[-4]=="xataka2":
           return article.get('guid', None)
-       if link.split('/')[-1]=="story01.htm":
-           link=link.split('/')[-2]
-           a=['0B','0C','0D','0E','0F','0G','0N'  ,'0L0S','0A']
-           b=['.' ,'/' ,'?' ,'-' ,'=' ,'&' ,'.com','www.','0']
-           for i in range(0,len(a)):
-              link=link.replace(a[i],b[i])
-           link="http://"+link
-       return link
--- a/recipes/wirtscafts_woche.recipe
+++ b/recipes/wirtscafts_woche.recipe
@ -0,0 +1,86 @@
+__license__   = 'GPL v3'
+__copyright__ = '2013, Armin Geller'
+
+'''
+Fetch WirtschaftsWoche Online
+'''
+import re
+# import time
+from calibre.web.feeds.news import BasicNewsRecipe
+class WirtschaftsWocheOnline(BasicNewsRecipe):
+    title                 = u'WirtschaftsWoche Online'
+    __author__            = 'Hegi'  # Update AGE 2013-01-05; Modified by Hegi 2013-04-28
+    description           = u'Wirtschaftswoche Online - basierend auf den RRS-Feeds von Wiwo.de'
+    tags 	                = 'Nachrichten, Blog, Wirtschaft'
+    publisher             = 'Verlagsgruppe Handelsblatt GmbH / Redaktion WirtschaftsWoche Online'
+    category              = 'business, economy, news, Germany'
+    publication_type      = 'weekly magazine'
+    language              = 'de'
+    oldest_article        = 7
+    max_articles_per_feed = 100
+    simultaneous_downloads= 20
+
+    auto_cleanup          = False
+    no_stylesheets        = True
+    remove_javascript     = True
+    remove_empty_feeds    = True
+
+    # don't duplicate articles from "Schlagzeilen" / "Exklusiv" to other rubrics
+    ignore_duplicate_articles = {'title', 'url'}
+
+    # if you want to reduce size for an b/w or E-ink device, uncomment this:
+    # compress_news_images  = True
+    # compress_news_images_auto_size = 16
+    # scale_news_images     = (400,300)
+
+    timefmt               = ' [%a, %d %b %Y]'
+
+    conversion_options    = {'smarten_punctuation' : True,
+                        'authors'		  : publisher,
+                        'publisher'  	  : publisher}
+    language              = 'de_DE'
+    encoding              = 'UTF-8'
+    cover_source          = 'http://www.wiwo-shop.de/wirtschaftswoche/wirtschaftswoche-emagazin-p1952.html'
+    masthead_url          = 'http://www.wiwo.de/images/wiwo_logo/5748610/1-formatOriginal.png'
+
+    def get_cover_url(self):
+        cover_source_soup = self.index_to_soup(self.cover_source)
+        preview_image_div = cover_source_soup.find(attrs={'class':'container vorschau'})
+        return 'http://www.wiwo-shop.de'+preview_image_div.a.img['src']
+
+    # Insert ". " after "Place" in <span class="hcf-location-mark">Place</span>
+    # If you use .epub format you could also do this as extra_css '.hcf-location-mark:after {content: ". "}'
+    preprocess_regexps    = [(re.compile(r'(<span class="hcf-location-mark">[^<]*)(</span>)',
+                              re.DOTALL|re.IGNORECASE), lambda match: match.group(1) + '. ' + match.group(2))]
+
+    extra_css      =  'h1 {font-size: 1.6em; text-align: left} \
+                       h2 {font-size: 1em; font-style: italic; font-weight: normal} \
+                       h3 {font-size: 1.3em;text-align: left} \
+                       h4, h5, h6, a {font-size: 1em;text-align: left} \
+                       .hcf-caption {font-size: 1em;text-align: left; font-style: italic} \
+                       .hcf-location-mark {font-style: italic}'
+
+    keep_only_tags    = [
+                          dict(name='div', attrs={'class':['hcf-column hcf-column1 hcf-teasercontainer hcf-maincol']}),
+                          dict(name='div', attrs={'id':['contentMain']})
+                        ]
+
+    remove_tags = [
+                    dict(name='div', attrs={'class':['hcf-link-block hcf-faq-open', 'hcf-article-related']})
+                  ]
+
+    feeds = [
+              (u'Schlagzeilen', u'http://www.wiwo.de/contentexport/feed/rss/schlagzeilen'),
+              (u'Exklusiv', u'http://www.wiwo.de/contentexport/feed/rss/exklusiv'),
+              # (u'Themen', u'http://www.wiwo.de/contentexport/feed/rss/themen'), # AGE no print version
+              (u'Unternehmen', u'http://www.wiwo.de/contentexport/feed/rss/unternehmen'),
+              (u'Finanzen', u'http://www.wiwo.de/contentexport/feed/rss/finanzen'),
+              (u'Politik', u'http://www.wiwo.de/contentexport/feed/rss/politik'),
+              (u'Erfolg', u'http://www.wiwo.de/contentexport/feed/rss/erfolg'),
+              (u'Technologie', u'http://www.wiwo.de/contentexport/feed/rss/technologie'),
+              # (u'Green-WiWo', u'http://green.wiwo.de/feed/rss/') # AGE no print version
+            ]
+    def print_version(self, url):
+        main, sep, id = url.rpartition('/')
+        return main + '/v_detail_tab_print/' + id
+
--- a/recipes/wsj.recipe
+++ b/recipes/wsj.recipe
@ -9,8 +9,9 @@ import copy
 # http://online.wsj.com/page/us_in_todays_paper.html

 def filter_classes(x):
-    if not x: return False
-    bad_classes = {'sTools', 'printSummary', 'mostPopular', 'relatedCollection'}
+    if not x:
+        return False
+    bad_classes = {'articleInsetPoll', 'trendingNow', 'sTools', 'printSummary', 'mostPopular', 'relatedCollection'}
    classes = frozenset(x.split())
    return len(bad_classes.intersection(classes)) > 0

@ -42,14 +43,15 @@ class WallStreetJournal(BasicNewsRecipe):
    remove_tags_before = dict(name='h1')
    remove_tags = [
                    dict(id=["articleTabs_tab_article",
-                        "articleTabs_tab_comments",
-                        'articleTabs_panel_comments', 'footer',
+                             "articleTabs_tab_comments", 'msnLinkback', 'yahooLinkback',
+                        'articleTabs_panel_comments', 'footer', 'emailThisScrim', 'emailConfScrim', 'emailErrorScrim',
                        "articleTabs_tab_interactive", "articleTabs_tab_video",
                        "articleTabs_tab_map", "articleTabs_tab_slideshow",
                        "articleTabs_tab_quotes", "articleTabs_tab_document",
                        "printModeAd", "aFbLikeAuth", "videoModule",
                        "mostRecommendations", "topDiscussions"]),
-                    {'class':['footer_columns','network','insetCol3wide','interactive','video','slideshow','map','insettip','insetClose','more_in', "insetContent", 'articleTools_bottom', 'aTools', "tooltip", "adSummary", "nav-inline"]},
+                    {'class':['footer_columns','hidden', 'network','insetCol3wide','interactive','video','slideshow','map','insettip',
+                        'insetClose','more_in', "insetContent", 'articleTools_bottom', 'aTools', "tooltip", "adSummary", "nav-inline"]},
                    dict(rel='shortcut icon'),
                    {'class':filter_classes},
                    ]
@ -74,7 +76,10 @@ class WallStreetJournal(BasicNewsRecipe):
        for tag in soup.findAll(name=['table', 'tr', 'td']):
            tag.name = 'div'

-        for tag in soup.findAll('div', dict(id=["articleThumbnail_1", "articleThumbnail_2", "articleThumbnail_3", "articleThumbnail_4", "articleThumbnail_5", "articleThumbnail_6", "articleThumbnail_7"])):
+        for tag in soup.findAll('div', dict(id=[
+            "articleThumbnail_1", "articleThumbnail_2", "articleThumbnail_3",
+            "articleThumbnail_4", "articleThumbnail_5", "articleThumbnail_6",
+            "articleThumbnail_7"])):
            tag.extract()

        return soup
@ -199,7 +204,6 @@ class WallStreetJournal(BasicNewsRecipe):

        return articles

-
    def cleanup(self):
        self.browser.open('http://online.wsj.com/logout?url=http://online.wsj.com')

--- a/resources/default_tweaks.py
+++ b/resources/default_tweaks.py
@ -32,7 +32,7 @@ defaults.
 # Set the use_series_auto_increment_tweak_when_importing tweak to True to
 # use the above values when importing/adding books. If this tweak is set to
 # False (the default) then the series number will be set to 1 if it is not
-# explicitly set to during the import. If set to True, then the
+# explicitly set during the import. If set to True, then the
 # series index will be set according to the series_index_auto_increment setting.
 # Note that the use_series_auto_increment_tweak_when_importing tweak is used
 # only when a value is not provided during import. If the importing regular
@ -536,3 +536,4 @@ many_libraries = 10
 # yellow when using a Virtual Library. By setting this to False, you can turn
 # that off.
 highlight_virtual_library_book_count = True
+
--- a/setup/installer/linux/freeze2.py
+++ b/setup/installer/linux/freeze2.py
@ -38,7 +38,7 @@ binary_includes = [
                '/lib/libz.so.1',
                '/usr/lib/libtiff.so.5',
                '/lib/libbz2.so.1',
-                '/usr/lib/libpoppler.so.28',
+                '/usr/lib/libpoppler.so.37',
                '/usr/lib/libxml2.so.2',
                '/usr/lib/libopenjpeg.so.2',
                '/usr/lib/libxslt.so.1',
--- a/setup/installer/osx/app/main.py
+++ b/setup/installer/osx/app/main.py
@ -378,7 +378,7 @@ class Py2App(object):
    @flush
    def add_poppler(self):
        info('\nAdding poppler')
-        for x in ('libpoppler.28.dylib',):
+        for x in ('libpoppler.37.dylib',):
            self.install_dylib(os.path.join(SW, 'lib', x))
        for x in ('pdftohtml', 'pdftoppm', 'pdfinfo'):
            self.install_dylib(os.path.join(SW, 'bin', x), False)
--- a/setup/installer/windows/notes.rst
+++ b/setup/installer/windows/notes.rst
@ -116,7 +116,9 @@ tarball. Edit setup.py and set zip_safe=False. Then run::

 Run the following command to install python dependencies::

-    easy_install --always-unzip -U mechanize pyreadline python-dateutil dnspython cssutils clientform pycrypto cssselect
+    easy_install --always-unzip -U mechanize python-dateutil dnspython cssutils clientform pycrypto cssselect
+
+Install pyreadline from https://pypi.python.org/pypi/pyreadline/2.0

 Install pywin32 and edit win32com\__init__.py setting _frozen = True and
 __gen_path__ to a temp dir (otherwise it tries to set it to a dir in the
--- a/setup/iso_639/ca.po
+++ b/setup/iso_639/ca.po
@ -12,14 +12,14 @@ msgstr ""
 "Report-Msgid-Bugs-To: Debian iso-codes team <pkg-isocodes-"
 "devel@lists.alioth.debian.org>\n"
 "POT-Creation-Date: 2011-11-25 14:01+0000\n"
-"PO-Revision-Date: 2013-04-21 08:00+0000\n"
+"PO-Revision-Date: 2013-05-06 09:36+0000\n"
 "Last-Translator: Ferran Rius <frius64@hotmail.com>\n"
 "Language-Team: Catalan <linux@softcatala.org>\n"
 "MIME-Version: 1.0\n"
 "Content-Type: text/plain; charset=UTF-8\n"
 "Content-Transfer-Encoding: 8bit\n"
-"X-Launchpad-Export-Date: 2013-04-22 05:23+0000\n"
-"X-Generator: Launchpad (build 16567)\n"
+"X-Launchpad-Export-Date: 2013-05-07 05:28+0000\n"
+"X-Generator: Launchpad (build 16598)\n"
 "Language: ca\n"

 #. name for aaa
@ -2024,7 +2024,7 @@ msgstr "Àzeri meridional"

 #. name for aze
 msgid "Azerbaijani"
-msgstr "Serbi"
+msgstr ""

 #. name for azg
 msgid "Amuzgo; San Pedro Amuzgos"
@ -7288,7 +7288,7 @@ msgstr "Epie"

 #. name for epo
 msgid "Esperanto"
-msgstr "Alemany"
+msgstr "Esperanto"

 #. name for era
 msgid "Eravallan"
@ -21816,7 +21816,7 @@ msgstr "Ramoaaina"

 #. name for raj
 msgid "Rajasthani"
-msgstr "Marwari"
+msgstr ""

 #. name for rak
 msgid "Tulu-Bohuai"
--- a/setup/iso_639/cs.po
+++ b/setup/iso_639/cs.po
@ -13762,7 +13762,7 @@ msgstr ""

 #. name for lav
 msgid "Latvian"
-msgstr "litevština"
+msgstr ""

 #. name for law
 msgid "Lauje"
--- a/setup/iso_639/da.po
+++ b/setup/iso_639/da.po
@ -1429,7 +1429,7 @@ msgstr ""

 #. name for arg
 msgid "Aragonese"
-msgstr "Færøsk"
+msgstr ""

 #. name for arh
 msgid "Arhuaco"
--- a/setup/iso_639/de.po
+++ b/setup/iso_639/de.po
@ -18,14 +18,14 @@ msgstr ""
 "Report-Msgid-Bugs-To: Debian iso-codes team <pkg-isocodes-"
 "devel@lists.alioth.debian.org>\n"
 "POT-Creation-Date: 2011-11-25 14:01+0000\n"
-"PO-Revision-Date: 2013-04-11 13:29+0000\n"
+"PO-Revision-Date: 2013-05-06 09:41+0000\n"
 "Last-Translator: Simon Schütte <simonschuette@arcor.de>\n"
 "Language-Team: Ubuntu German Translators\n"
 "MIME-Version: 1.0\n"
 "Content-Type: text/plain; charset=UTF-8\n"
 "Content-Transfer-Encoding: 8bit\n"
-"X-Launchpad-Export-Date: 2013-04-12 05:20+0000\n"
-"X-Generator: Launchpad (build 16564)\n"
+"X-Launchpad-Export-Date: 2013-05-07 05:29+0000\n"
+"X-Generator: Launchpad (build 16598)\n"
 "Language: de\n"

 #. name for aaa
@ -319,7 +319,7 @@ msgstr "Adangme"

 #. name for adb
 msgid "Adabe"
-msgstr "Adangme"
+msgstr "Adabe"

 #. name for add
 msgid "Dzodinka"
@ -367,7 +367,7 @@ msgstr "Adap"

 #. name for adq
 msgid "Adangbe"
-msgstr "Adangme"
+msgstr "Adangbe"

 #. name for adr
 msgid "Adonara"
--- a/setup/iso_639/eu.po
+++ b/setup/iso_639/eu.po
@ -2022,7 +2022,7 @@ msgstr ""

 #. name for aze
 msgid "Azerbaijani"
-msgstr "Turkiera"
+msgstr ""

 #. name for azg
 msgid "Amuzgo; San Pedro Amuzgos"
@ -13126,7 +13126,7 @@ msgstr ""

 #. name for kur
 msgid "Kurdish"
-msgstr "Turkiera"
+msgstr ""

 #. name for kus
 msgid "Kusaal"
@ -16190,7 +16190,7 @@ msgstr ""

 #. name for mlt
 msgid "Maltese"
-msgstr "Koreera"
+msgstr ""

 #. name for mlu
 msgid "To'abaita"
--- a/setup/iso_639/gl.po
+++ b/setup/iso_639/gl.po
@ -13764,7 +13764,7 @@ msgstr "Laba"

 #. name for lav
 msgid "Latvian"
-msgstr "Lituano"
+msgstr ""

 #. name for law
 msgid "Lauje"
@ -22212,7 +22212,7 @@ msgstr "Roglai do norte"

 #. name for roh
 msgid "Romansh"
-msgstr "Romanés"
+msgstr ""

 #. name for rol
 msgid "Romblomanon"
--- a/setup/iso_639/hu.po
+++ b/setup/iso_639/hu.po
@ -20538,7 +20538,7 @@ msgstr ""

 #. name for peo
 msgid "Persian; Old (ca. 600-400 B.C.)"
-msgstr "perzsa"
+msgstr ""

 #. name for pep
 msgid "Kunja"
--- a/setup/iso_639/is.po
+++ b/setup/iso_639/is.po
@ -15049,7 +15049,7 @@ msgstr "Magahi"

 #. name for mah
 msgid "Marshallese"
-msgstr "Maltneska"
+msgstr ""

 #. name for mai
 msgid "Maithili"
--- a/setup/iso_639/ko.po
+++ b/setup/iso_639/ko.po
@ -3742,7 +3742,7 @@ msgstr ""

 #. name for bre
 msgid "Breton"
-msgstr "프랑스어"
+msgstr ""

 #. name for brf
 msgid "Bera"
--- a/setup/iso_639/mr.po
+++ b/setup/iso_639/mr.po
@ -6804,7 +6804,7 @@ msgstr "डोगोन; तेबुल उरे"

 #. name for dua
 msgid "Duala"
-msgstr "ड्युला"
+msgstr ""

 #. name for dub
 msgid "Dubli"
--- a/setup/iso_639/nb.po
+++ b/setup/iso_639/nb.po
@ -27790,7 +27790,7 @@ msgstr ""

 #. name for wln
 msgid "Walloon"
-msgstr "Vietnamesisk"
+msgstr ""

 #. name for wlo
 msgid "Wolio"
--- a/setup/iso_639/oc.po
+++ b/setup/iso_639/oc.po
@ -9862,7 +9862,7 @@ msgstr "Hya"

 #. name for hye
 msgid "Armenian"
-msgstr "Albanés"
+msgstr ""

 #. name for iai
 msgid "Iaai"
@ -13762,7 +13762,7 @@ msgstr "Laba"

 #. name for lav
 msgid "Latvian"
-msgstr "Lituanian"
+msgstr ""

 #. name for law
 msgid "Lauje"
--- a/setup/iso_639/ru.po
+++ b/setup/iso_639/ru.po
@ -13,14 +13,14 @@ msgstr ""
 "Report-Msgid-Bugs-To: Debian iso-codes team <pkg-isocodes-"
 "devel@lists.alioth.debian.org>\n"
 "POT-Creation-Date: 2011-11-25 14:01+0000\n"
-"PO-Revision-Date: 2013-03-23 10:17+0000\n"
+"PO-Revision-Date: 2013-05-21 06:13+0000\n"
 "Last-Translator: Глория Хрусталёва <gloriya@hushmail.com>\n"
 "Language-Team: Russian <debian-l10n-russian@lists.debian.org>\n"
 "MIME-Version: 1.0\n"
 "Content-Type: text/plain; charset=UTF-8\n"
 "Content-Transfer-Encoding: 8bit\n"
-"X-Launchpad-Export-Date: 2013-03-24 04:45+0000\n"
-"X-Generator: Launchpad (build 16540)\n"
+"X-Launchpad-Export-Date: 2013-05-22 04:38+0000\n"
+"X-Generator: Launchpad (build 16626)\n"
 "Language: ru\n"

 #. name for aaa
@ -2089,7 +2089,7 @@ msgstr "Башкирский"

 #. name for bal
 msgid "Baluchi"
-msgstr "Балийский"
+msgstr ""

 #. name for bam
 msgid "Bambara"
@ -5361,7 +5361,7 @@ msgstr ""

 #. name for coa
 msgid "Malay; Cocos Islands"
-msgstr ""
+msgstr "Малайский; Кокосовые острова"

 #. name for cob
 msgid "Chicomuceltec"
--- a/setup/iso_639/sk.po
+++ b/setup/iso_639/sk.po
@ -13763,7 +13763,7 @@ msgstr ""

 #. name for lav
 msgid "Latvian"
-msgstr "Lotyšský"
+msgstr ""

 #. name for law
 msgid "Lauje"
--- a/setup/iso_639/sv.po
+++ b/setup/iso_639/sv.po
--- a/setup/iso_639/zh_CN.po
+++ b/setup/iso_639/zh_CN.po
@ -1016,7 +1016,7 @@ msgstr ""

 #. name for amh
 msgid "Amharic"
-msgstr "阿拉伯语"
+msgstr ""

 #. name for ami
 msgid "Amis"
--- a/setup/translations.py
+++ b/setup/translations.py
@ -63,7 +63,6 @@ class POT(Command): # {{{

        return '\n'.join(ans)

-
    def run(self, opts):
        pot_header = textwrap.dedent('''\
        # Translation template file..
@ -117,7 +116,6 @@ class POT(Command): # {{{
                f.write(src)
            self.info('Translations template:', os.path.abspath(pot))

-
        return pot
 # }}}

@ -134,6 +132,7 @@ class Translations(POT): # {{{
        return locale, os.path.join(self.DEST, locale, 'messages.mo')

    def run(self, opts):
+        self.iso639_errors = []
        for f in self.po_files():
            locale, dest = self.mo_file(f)
            base = os.path.dirname(dest)
@ -146,18 +145,46 @@ class Translations(POT): # {{{
                    '%s.po'%iscpo)

            if os.path.exists(iso639):
+                self.check_iso639(iso639)
                dest = self.j(self.d(dest), 'iso639.mo')
                if self.newer(dest, iso639):
-                    self.info('\tCopying ISO 639 translations')
+                    self.info('\tCopying ISO 639 translations for %s' % iscpo)
                    subprocess.check_call(['msgfmt', '-o', dest, iso639])
            elif locale not in ('en_GB', 'en_CA', 'en_AU', 'si', 'ur', 'sc',
                    'ltg', 'nds', 'te', 'yi', 'fo', 'sq', 'ast', 'ml', 'ku',
                    'fr_CA', 'him', 'jv', 'ka', 'fur', 'ber'):
                self.warn('No ISO 639 translations for locale:', locale)

+        if self.iso639_errors:
+            for err in self.iso639_errors:
+                print (err)
+            raise SystemExit(1)
+
        self.write_stats()
        self.freeze_locales()

+    def check_iso639(self, path):
+        from calibre.utils.localization import langnames_to_langcodes
+        with open(path, 'rb') as f:
+            raw = f.read()
+        rmap = {}
+        msgid = None
+        for match in re.finditer(r'^(msgid|msgstr)\s+"(.*?)"', raw, re.M):
+            if match.group(1) == 'msgid':
+                msgid = match.group(2)
+            else:
+                msgstr = match.group(2)
+                if not msgstr:
+                    continue
+                omsgid = rmap.get(msgstr, None)
+                if omsgid is not None:
+                    cm = langnames_to_langcodes([omsgid, msgid])
+                    if cm[msgid] and cm[omsgid] and cm[msgid] != cm[omsgid]:
+                        self.iso639_errors.append('In file %s the name %s is used as translation for both %s and %s' % (
+                            os.path.basename(path), msgstr, msgid, rmap[msgstr]))
+                    # raise SystemExit(1)
+                rmap[msgstr] = msgid
+
    def freeze_locales(self):
        zf = self.DEST + '.zip'
        from calibre import CurrentDir
@ -191,7 +218,6 @@ class Translations(POT): # {{{
            locale = self.mo_file(f)[0]
            stats[locale] = min(1.0, float(trans)/total)

-
        import cPickle
        cPickle.dump(stats, open(dest, 'wb'), -1)

--- a/src/calibre/constants.py
+++ b/src/calibre/constants.py
@ -4,7 +4,7 @@ __license__   = 'GPL v3'
 __copyright__ = '2008, Kovid Goyal kovid@kovidgoyal.net'
 __docformat__ = 'restructuredtext en'
 __appname__   = u'calibre'
-numeric_version = (0, 9, 29)
+numeric_version = (0, 9, 31)
 __version__   = u'.'.join(map(unicode, numeric_version))
 __author__    = u"Kovid Goyal <kovid@kovidgoyal.net>"

@ -66,10 +66,8 @@ else:
            filesystem_encoding = 'utf-8'
            # On linux, unicode arguments to os file functions are coerced to an ascii
            # bytestring if sys.getfilesystemencoding() == 'ascii', which is
-            # just plain dumb. So issue a warning.
-            print ('WARNING: You do not have the LANG environment variable set correctly. '
-                    'This will cause problems with non-ascii filenames. '
-                    'Set it to something like en_US.UTF-8.\n')
+            # just plain dumb. This is fixed by the icu.py module which, when
+            # imported changes ascii to utf-8
    except:
        filesystem_encoding = 'utf-8'

--- a/src/calibre/customize/builtins.py
+++ b/src/calibre/customize/builtins.py
@ -1476,6 +1476,7 @@ class StoreKoobeStore(StoreBase):
    drm_free_only = True
    headquarters = 'PL'
    formats = ['EPUB', 'MOBI', 'PDF']
+    affiliate = True

 class StoreLegimiStore(StoreBase):
    name = 'Legimi'
@ -1548,12 +1549,13 @@ class StoreNextoStore(StoreBase):

 class StoreNookUKStore(StoreBase):
    name = 'Nook UK'
-    author = 'John Schember'
-    description = u'Barnes & Noble S.Ã  r.l, a subsidiary of Barnes & Noble, Inc., a leading retailer of content, digital media and educational products, is proud to bring the award-winning NOOKÂ® reading experience and a leading digital bookstore to the UK.'  # noqa
+    author = 'Charles Haley'
+    description = u'Barnes & Noble S.A.R.L, a subsidiary of Barnes & Noble, Inc., a leading retailer of content, digital media and educational products, is proud to bring the award-winning NOOK reading experience and a leading digital bookstore to the UK.'  # noqa
    actual_plugin = 'calibre.gui2.store.stores.nook_uk_plugin:NookUKStore'

    headquarters = 'UK'
    formats = ['NOOK']
+    affiliate = True

 class StoreOpenBooksStore(StoreBase):
    name = 'Open Books'
@ -1659,6 +1661,7 @@ class StoreWoblinkStore(StoreBase):

    headquarters = 'PL'
    formats = ['EPUB', 'MOBI', 'PDF', 'WOBLINK']
+    affiliate = True

 class XinXiiStore(StoreBase):
    name = 'XinXii'
--- a/src/calibre/devices/android/driver.py
+++ b/src/calibre/devices/android/driver.py
@ -219,7 +219,7 @@ class ANDROID(USBMS):
            'POCKET', 'ONDA_MID', 'ZENITHIN', 'INGENIC', 'PMID701C', 'PD',
            'PMP5097C', 'MASS', 'NOVO7', 'ZEKI', 'COBY', 'SXZ', 'USB_2.0',
            'COBY_MID', 'VS', 'AINOL', 'TOPWISE', 'PAD703', 'NEXT8D12',
-            'MEDIATEK', 'KEENHI', 'TECLAST', 'SURFTAB']
+            'MEDIATEK', 'KEENHI', 'TECLAST', 'SURFTAB', 'XENTA',]
    WINDOWS_MAIN_MEM = ['ANDROID_PHONE', 'A855', 'A853', 'A953', 'INC.NEXUS_ONE',
            '__UMS_COMPOSITE', '_MB200', 'MASS_STORAGE', '_-_CARD', 'SGH-I897',
            'GT-I9000', 'FILE-STOR_GADGET', 'SGH-T959_CARD', 'SGH-T959', 'SAMSUNG_ANDROID',
@ -240,7 +240,9 @@ class ANDROID(USBMS):
            'ADVANCED', 'SGH-I727', 'USB_FLASH_DRIVER', 'ANDROID',
            'S5830I_CARD', 'MID7042', 'LINK-CREATE', '7035', 'VIEWPAD_7E',
            'NOVO7', 'MB526', '_USB#WYK7MSF8KE', 'TABLET_PC', 'F', 'MT65XX_MS',
-            'ICS', 'E400', '__FILE-STOR_GADG', 'ST80208-1', 'GT-S5660M_CARD', 'XT894']
+            'ICS', 'E400', '__FILE-STOR_GADG', 'ST80208-1', 'GT-S5660M_CARD', 'XT894', '_USB',
+            'PROD_TAB13-201',
+    ]
    WINDOWS_CARD_A_MEM = ['ANDROID_PHONE', 'GT-I9000_CARD', 'SGH-I897',
            'FILE-STOR_GADGET', 'SGH-T959_CARD', 'SGH-T959', 'SAMSUNG_ANDROID', 'GT-P1000_CARD',
            'A70S', 'A101IT', '7', 'INCREDIBLE', 'A7EB', 'SGH-T849_CARD',
@ -251,7 +253,9 @@ class ANDROID(USBMS):
            'FILE-CD_GADGET', 'GT-I9001_CARD', 'USB_2.0', 'XT875',
            'UMS_COMPOSITE', 'PRO', '.KOBO_VOX', 'SGH-T989_CARD', 'SGH-I727',
            'USB_FLASH_DRIVER', 'ANDROID', 'MID7042', '7035', 'VIEWPAD_7E',
-            'NOVO7', 'ADVANCED', 'TABLET_PC', 'F', 'E400_SD_CARD', 'ST80208-1', 'XT894']
+            'NOVO7', 'ADVANCED', 'TABLET_PC', 'F', 'E400_SD_CARD', 'ST80208-1', 'XT894',
+            '_USB', 'PROD_TAB13-201',
+    ]

    OSX_MAIN_MEM = 'Android Device Main Memory'

@ -366,7 +370,6 @@ class WEBOS(USBMS):
        except ImportError:
            import Image, ImageDraw

-
        coverdata = getattr(metadata, 'thumbnail', None)
        if coverdata and coverdata[2]:
            cover = Image.open(cStringIO.StringIO(coverdata[2]))
@ -415,3 +418,4 @@ class WEBOS(USBMS):
            coverfile.write(coverdata)


+
--- a/src/calibre/devices/apple/driver.py
+++ b/src/calibre/devices/apple/driver.py
--- a/src/calibre/devices/blackberry/driver.py
+++ b/src/calibre/devices/blackberry/driver.py
@ -19,10 +19,10 @@ class BLACKBERRY(USBMS):

    VENDOR_ID   = [0x0fca]
    PRODUCT_ID  = [0x8004, 0x0004]
-    BCD         = [0x0200, 0x0107, 0x0210, 0x0201, 0x0211, 0x0220]
+    BCD         = [0x0200, 0x0107, 0x0210, 0x0201, 0x0211, 0x0220, 0x232]

    VENDOR_NAME = 'RIM'
-    WINDOWS_MAIN_MEM = 'BLACKBERRY_SD'
+    WINDOWS_MAIN_MEM = WINDOWS_CARD_A_MEM = ['BLACKBERRY_SD', 'BLACKBERRY']

    MAIN_MEMORY_VOLUME_LABEL  = 'Blackberry SD Card'

--- a/src/calibre/devices/eb600/driver.py
+++ b/src/calibre/devices/eb600/driver.py
@ -279,11 +279,11 @@ class POCKETBOOK602(USBMS):
 class POCKETBOOK622(POCKETBOOK602):

    name = 'PocketBook 622 Device Interface'
-    description    = _('Communicate with the PocketBook 622 reader.')
+    description    = _('Communicate with the PocketBook 622 and 623 readers.')
    EBOOK_DIR_MAIN = ''

    VENDOR_ID   = [0x0489]
-    PRODUCT_ID  = [0xe107]
+    PRODUCT_ID  = [0xe107, 0xcff1]
    BCD         = [0x0326]

    VENDOR_NAME = 'LINUX'
--- a/src/calibre/devices/idevice/init.py
+++ b/src/calibre/devices/idevice/init.py
@ -0,0 +1,2 @@
+__license__   = 'GPL v3'
+__copyright__ = '2008, Kovid Goyal <kovid at kovidgoyal.net>'
--- a/src/calibre/devices/idevice/libimobiledevice.py
+++ b/src/calibre/devices/idevice/libimobiledevice.py
--- a/src/calibre/devices/idevice/parse_xml.py
+++ b/src/calibre/devices/idevice/parse_xml.py
@ -0,0 +1,300 @@
+#!/usr/bin/env python
+from __future__ import (unicode_literals, division, absolute_import,
+                        print_function)
+
+"""
+https://github.com/ishikawa/python-plist-parser/blob/master/plist_parser.py
+
+A `Property Lists`_ is a data representation used in Apple's Mac OS X as
+a convenient way to store standard object types, such as string, number,
+boolean, and container object.
+
+This file contains a class ``XmlPropertyListParser`` for parse
+a property list file and get back a python native data structure.
+
+    :copyright: 2008 by Takanori Ishikawa <takanori.ishikawa@gmail.com>
+    :license: MIT (See LICENSE file for more details)
+
+.. _Property Lists: http://developer.apple.com/documentation/Cocoa/Conceptual/PropertyLists/
+"""
+
+
+class PropertyListParseError(Exception):
+    """Raised when parsing a property list is failed."""
+    pass
+
+
+class XmlPropertyListParser(object):
+    """
+    The ``XmlPropertyListParser`` class provides methods that
+    convert `Property Lists`_ objects from xml format.
+    Property list objects include ``string``, ``unicode``,
+    ``list``, ``dict``, ``datetime``, and ``int`` or ``float``.
+
+        :copyright: 2008 by Takanori Ishikawa <takanori.ishikawa@gmail.com>
+        :license: MIT License
+
+    .. _Property List: http://developer.apple.com/documentation/Cocoa/Conceptual/PropertyLists/
+    """
+
+    def _assert(self, test, message):
+        if not test:
+            raise PropertyListParseError(message)
+
+    # ------------------------------------------------
+    # SAX2: ContentHandler
+    # ------------------------------------------------
+    def setDocumentLocator(self, locator):
+        pass
+
+    def startPrefixMapping(self, prefix, uri):
+        pass
+
+    def endPrefixMapping(self, prefix):
+        pass
+
+    def startElementNS(self, name, qname, attrs):
+        pass
+
+    def endElementNS(self, name, qname):
+        pass
+
+    def ignorableWhitespace(self, whitespace):
+        pass
+
+    def processingInstruction(self, target, data):
+        pass
+
+    def skippedEntity(self, name):
+        pass
+
+    def startDocument(self):
+        self.__stack = []
+        self.__plist = self.__key = self.__characters = None
+        # For reducing runtime type checking,
+        # the parser caches top level object type.
+        self.__in_dict = False
+
+    def endDocument(self):
+        self._assert(self.__plist is not None, "A top level element must be <plist>.")
+        self._assert(
+            len(self.__stack) is 0,
+            "multiple objects at top level.")
+
+    def startElement(self, name, attributes):
+        if name in XmlPropertyListParser.START_CALLBACKS:
+            XmlPropertyListParser.START_CALLBACKS[name](self, name, attributes)
+        if name in XmlPropertyListParser.PARSE_CALLBACKS:
+            self.__characters = []
+
+    def endElement(self, name):
+        if name in XmlPropertyListParser.END_CALLBACKS:
+            XmlPropertyListParser.END_CALLBACKS[name](self, name)
+        if name in XmlPropertyListParser.PARSE_CALLBACKS:
+            # Creates character string from buffered characters.
+            content = ''.join(self.__characters)
+            # For compatibility with ``xml.etree`` and ``plistlib``,
+            # convert text string to ascii, if possible
+            try:
+                content = content.encode('ascii')
+            except (UnicodeError, AttributeError):
+                pass
+            XmlPropertyListParser.PARSE_CALLBACKS[name](self, name, content)
+            self.__characters = None
+
+    def characters(self, content):
+        if self.__characters is not None:
+            self.__characters.append(content)
+
+    # ------------------------------------------------
+    # XmlPropertyListParser private
+    # ------------------------------------------------
+    def _push_value(self, value):
+        if not self.__stack:
+            self._assert(self.__plist is None, "Multiple objects at top level")
+            self.__plist = value
+        else:
+            top = self.__stack[-1]
+            #assert isinstance(top, (dict, list))
+            if self.__in_dict:
+                k = self.__key
+                if k is None:
+                    raise PropertyListParseError("Missing key for dictionary.")
+                top[k] = value
+                self.__key = None
+            else:
+                top.append(value)
+
+    def _push_stack(self, value):
+        self.__stack.append(value)
+        self.__in_dict = isinstance(value, dict)
+
+    def _pop_stack(self):
+        self.__stack.pop()
+        self.__in_dict = self.__stack and isinstance(self.__stack[-1], dict)
+
+    def _start_plist(self, name, attrs):
+        self._assert(not self.__stack and self.__plist is None, "<plist> more than once.")
+        self._assert(attrs.get('version', '1.0') == '1.0',
+                     "version 1.0 is only supported, but was '%s'." % attrs.get('version'))
+
+    def _start_array(self, name, attrs):
+        v = list()
+        self._push_value(v)
+        self._push_stack(v)
+
+    def _start_dict(self, name, attrs):
+        v = dict()
+        self._push_value(v)
+        self._push_stack(v)
+
+    def _end_array(self, name):
+        self._pop_stack()
+
+    def _end_dict(self, name):
+        if self.__key is not None:
+            raise PropertyListParseError("Missing value for key '%s'" % self.__key)
+        self._pop_stack()
+
+    def _start_true(self, name, attrs):
+        self._push_value(True)
+
+    def _start_false(self, name, attrs):
+        self._push_value(False)
+
+    def _parse_key(self, name, content):
+        if not self.__in_dict:
+            print("XmlPropertyListParser() WARNING: ignoring <key>%s</key> (<key> elements must be contained in <dict> element)" % content)
+            #raise PropertyListParseError("<key> element '%s' must be in <dict> element." % content)
+        else:
+            self.__key = content
+
+    def _parse_string(self, name, content):
+        self._push_value(content)
+
+    def _parse_data(self, name, content):
+        import base64
+        self._push_value(base64.b64decode(content))
+
+    # http://www.apple.com/DTDs/PropertyList-1.0.dtd says:
+    #
+    # Contents should conform to a subset of ISO 8601
+    # (in particular, YYYY '-' MM '-' DD 'T' HH ':' MM ':' SS 'Z'.
+    # Smaller units may be omitted with a loss of precision)
+    import re
+    DATETIME_PATTERN = re.compile(r"(?P<year>\d\d\d\d)(?:-(?P<month>\d\d)(?:-(?P<day>\d\d)(?:T(?P<hour>\d\d)(?::(?P<minute>\d\d)(?::(?P<second>\d\d))?)?)?)?)?Z$")
+
+    def _parse_date(self, name, content):
+        import datetime
+
+        units = ('year', 'month', 'day', 'hour', 'minute', 'second', )
+        pattern = XmlPropertyListParser.DATETIME_PATTERN
+        match = pattern.match(content)
+        if not match:
+            raise PropertyListParseError("Failed to parse datetime '%s'" % content)
+
+        groups, components = match.groupdict(), []
+        for key in units:
+            value = groups[key]
+            if value is None:
+                break
+            components.append(int(value))
+        while len(components) < 3:
+            components.append(1)
+
+        d = datetime.datetime(*components)
+        self._push_value(d)
+
+    def _parse_real(self, name, content):
+        self._push_value(float(content))
+
+    def _parse_integer(self, name, content):
+        self._push_value(int(content))
+
+    START_CALLBACKS = {
+        'plist': _start_plist,
+        'array': _start_array,
+        'dict': _start_dict,
+        'true': _start_true,
+        'false': _start_false,
+    }
+
+    END_CALLBACKS = {
+        'array': _end_array,
+        'dict': _end_dict,
+    }
+
+    PARSE_CALLBACKS = {
+        'key': _parse_key,
+        'string': _parse_string,
+        'data': _parse_data,
+        'date': _parse_date,
+        'real': _parse_real,
+        'integer': _parse_integer,
+    }
+
+    # ------------------------------------------------
+    # XmlPropertyListParser
+    # ------------------------------------------------
+    def _to_stream(self, io_or_string):
+        if isinstance(io_or_string, basestring):
+            # Creates a string stream for in-memory contents.
+            from cStringIO import StringIO
+            return StringIO(io_or_string)
+        elif hasattr(io_or_string, 'read') and callable(getattr(io_or_string, 'read')):
+            return io_or_string
+        else:
+            raise TypeError('Can\'t convert %s to file-like-object' % type(io_or_string))
+
+    def _parse_using_etree(self, xml_input):
+        from xml.etree.cElementTree import iterparse
+
+        parser = iterparse(self._to_stream(xml_input), events=(b'start', b'end'))
+        self.startDocument()
+        try:
+            for action, element in parser:
+                name = element.tag
+                if action == 'start':
+                    if name in XmlPropertyListParser.START_CALLBACKS:
+                        XmlPropertyListParser.START_CALLBACKS[name](self, element.tag, element.attrib)
+                elif action == 'end':
+                    if name in XmlPropertyListParser.END_CALLBACKS:
+                        XmlPropertyListParser.END_CALLBACKS[name](self, name)
+                    if name in XmlPropertyListParser.PARSE_CALLBACKS:
+                        XmlPropertyListParser.PARSE_CALLBACKS[name](self, name, element.text or "")
+                    element.clear()
+        except SyntaxError, e:
+            raise PropertyListParseError(e)
+
+        self.endDocument()
+        return self.__plist
+
+    def _parse_using_sax_parser(self, xml_input):
+        from xml.sax import make_parser, xmlreader, SAXParseException
+        source = xmlreader.InputSource()
+        source.setByteStream(self._to_stream(xml_input))
+        reader = make_parser()
+        reader.setContentHandler(self)
+        try:
+            reader.parse(source)
+        except SAXParseException, e:
+            raise PropertyListParseError(e)
+
+        return self.__plist
+
+    def parse(self, xml_input):
+        """
+        Parse the property list (`.plist`, `.xml, for example) ``xml_input``,
+        which can be either a string or a file-like object.
+
+        >>> parser = XmlPropertyListParser()
+        >>> parser.parse(r'<plist version="1.0">'
+        ...              r'<dict><key>Python</key><string>.py</string></dict>'
+        ...              r'</plist>')
+        {'Python': '.py'}
+        """
+        try:
+            return self._parse_using_etree(xml_input)
+        except ImportError:
+            # No xml.etree.ccElementTree found.
+            return self._parse_using_sax_parser(xml_input)
--- a/src/calibre/devices/interface.py
+++ b/src/calibre/devices/interface.py
@ -107,6 +107,12 @@ class DevicePlugin(Plugin):
    #: :meth:`set_user_blacklisted_devices`
    ASK_TO_ALLOW_CONNECT = False

+    #: Set this to a dictionary of the form {'title':title, 'msg':msg, 'det_msg':detailed_msg} to have calibre popup
+    #: a message to the user after some callbacks are run (currently only upload_books).
+    #: Be careful to not spam the user with too many messages. This variable is checked after *every* callback,
+    #: so only set it when you really need to.
+    user_feedback_after_callback = None
+
    @classmethod
    def get_gui_name(cls):
        if hasattr(cls, 'gui_name'):
@ -165,8 +171,7 @@ class DevicePlugin(Plugin):
                                            'rev_')[-1].replace(':', 'a'), 16)
                            except:
                                bcd = None
-                           return True, (vendor_id, product_id, bcd, None,
-                                   None, None)
+                            return True, (vendor_id, product_id, bcd, None, None, None)
        return False, None

    def test_bcd(self, bcdDevice, bcd):
@ -638,7 +643,6 @@ class DevicePlugin(Plugin):
        '''
        device_prefs.set_overrides()

-
    # Dynamic control interface.
    # The following methods are probably called on the GUI thread. Any driver
    # that implements these methods must take pains to be thread safe, because
--- a/src/calibre/devices/kobo/driver.py
+++ b/src/calibre/devices/kobo/driver.py
@ -35,7 +35,7 @@ class KOBO(USBMS):
    gui_name = 'Kobo Reader'
    description = _('Communicate with the Kobo Reader')
    author = 'Timothy Legge and David Forrester'
-    version = (2, 0, 9)
+    version = (2, 0, 10)

    dbversion = 0
    fwversion = 0
@ -45,6 +45,7 @@ class KOBO(USBMS):
    supported_platforms = ['windows', 'osx', 'linux']

    booklist_class = CollectionsBookList
+    book_class = Book

    # Ordered list of supported formats
    FORMATS     = ['epub', 'pdf', 'txt', 'cbz', 'cbr']
@ -115,7 +116,6 @@ class KOBO(USBMS):

    def initialize(self):
        USBMS.initialize(self)
-        self.book_class = Book
        self.dbversion = 7

    def books(self, oncard=None, end_session=True):
@ -1213,7 +1213,7 @@ class KOBOTOUCH(KOBO):
    min_dbversion_archive           = 71
    min_dbversion_images_on_sdcard  = 77

-    max_supported_fwversion         = (2,5,1)
+    max_supported_fwversion         = (2,5,3)
    min_fwversion_images_on_sdcard  = (2,4,1)

    has_kepubs = True
@ -1237,11 +1237,9 @@ class KOBOTOUCH(KOBO):
            _('Keep cover aspect ratio') +
            ':::'+_('When uploading covers, do not change the aspect ratio when resizing for the device.'
                    ' This is for firmware versions 2.3.1 and later.'),
-            _('Show expired books') +
-            ':::'+_('A bug in an earlier version left non kepubs book records'
-                ' in the database.  With this option Calibre will show the '
-                'expired records and allow you to delete them with '
-                'the new delete logic.'),
+            _('Show archived books') +
+            ':::'+_('Archived books are listed on the device but need to be downloaded to read.'
+                    ' Use this option to show these books and match them with books in the calibre library.'),
            _('Show Previews') +
            ':::'+_('Kobo previews are included on the Touch and some other versions'
                ' by default they are no longer displayed as there is no good reason to '
@ -1289,7 +1287,7 @@ class KOBOTOUCH(KOBO):
    OPT_UPLOAD_COVERS               = 3
    OPT_UPLOAD_GRAYSCALE_COVERS     = 4
    OPT_KEEP_COVER_ASPECT_RATIO     = 5
-    OPT_SHOW_EXPIRED_BOOK_RECORDS   = 6
+    OPT_SHOW_ARCHIVED_BOOK_RECORDS  = 6
    OPT_SHOW_PREVIEWS               = 7
    OPT_SHOW_RECOMMENDATIONS        = 8
    OPT_UPDATE_SERIES_DETAILS       = 9
@ -1347,6 +1345,10 @@ class KOBOTOUCH(KOBO):
        self.set_device_name()
        return super(KOBOTOUCH, self).get_device_information(end_session)

+
+    def device_database_path(self):
+        return self.normalize_path(self._main_prefix + '.kobo/KoboReader.sqlite')
+
    def books(self, oncard=None, end_session=True):
        debug_print("KoboTouch:books - oncard='%s'"%oncard)
        from calibre.ebooks.metadata.meta import path_to_ext
@ -1599,9 +1601,7 @@ class KOBOTOUCH(KOBO):

        self.debug_index = 0
        import sqlite3 as sqlite
-        with closing(sqlite.connect(
-            self.normalize_path(self._main_prefix +
-                '.kobo/KoboReader.sqlite'))) as connection:
+        with closing(sqlite.connect(self.device_database_path())) as connection:
            debug_print("KoboTouch:books - reading device database")

            # return bytestrings if the content cannot the decoded as unicode
@ -1618,7 +1618,21 @@ class KOBOTOUCH(KOBO):
            debug_print("KoboTouch:books - shelf list:", self.bookshelvelist)

            opts = self.settings()
-            if self.supports_series():
+            if self.supports_kobo_archive():
+                query= ("select Title, Attribution, DateCreated, ContentID, MimeType, ContentType, " \
+                    "ImageID, ReadStatus, ___ExpirationStatus, FavouritesIndex, Accessibility, " \
+                    "IsDownloaded, Series, SeriesNumber, ___UserID " \
+                    " from content " \
+                    " where BookID is Null " \
+                    " and ((Accessibility = -1 and IsDownloaded in ('true', 1 )) or (Accessibility in (1,2) %(expiry)s) " \
+                    "    %(previews)s %(recomendations)s )" \
+                    " and not ((___ExpirationStatus=3 or ___ExpirationStatus is Null) and ContentType = 6)") % \
+                        dict(\
+                             expiry="" if opts.extra_customization[self.OPT_SHOW_ARCHIVED_BOOK_RECORDS] else "and IsDownloaded in ('true', 1)", \
+                             previews=" or (Accessibility in (6) and ___UserID <> '')" if opts.extra_customization[self.OPT_SHOW_PREVIEWS] else "", \
+                             recomendations=" or (Accessibility in (-1, 4, 6) and ___UserId = '')" if opts.extra_customization[self.OPT_SHOW_RECOMMENDATIONS] else "" \
+                             )
+            elif self.supports_series():
                query= ("select Title, Attribution, DateCreated, ContentID, MimeType, ContentType, " \
                    "ImageID, ReadStatus, ___ExpirationStatus, FavouritesIndex, Accessibility, " \
                    "IsDownloaded, Series, SeriesNumber, ___UserID " \
@ -1627,7 +1641,7 @@ class KOBOTOUCH(KOBO):
                    " and ((Accessibility = -1 and IsDownloaded in ('true', 1)) or (Accessibility in (1,2)) %(previews)s %(recomendations)s )" \
                    " and not ((___ExpirationStatus=3 or ___ExpirationStatus is Null) %(expiry)s") % \
                        dict(\
-                             expiry=" and ContentType = 6)" if opts.extra_customization[self.OPT_SHOW_EXPIRED_BOOK_RECORDS] else ")", \
+                             expiry=" and ContentType = 6)" if opts.extra_customization[self.OPT_SHOW_ARCHIVED_BOOK_RECORDS] else ")", \
                             previews=" or (Accessibility in (6) and ___UserID <> '')" if opts.extra_customization[self.OPT_SHOW_PREVIEWS] else "", \
                             recomendations=" or (Accessibility in (-1, 4, 6) and ___UserId = '')" if opts.extra_customization[self.OPT_SHOW_RECOMMENDATIONS] else "" \
                             )
@ -1638,7 +1652,7 @@ class KOBOTOUCH(KOBO):
                    ' from content ' \
                    ' where BookID is Null %(previews)s %(recomendations)s and not ((___ExpirationStatus=3 or ___ExpirationStatus is Null) %(expiry)s') % \
                        dict(\
-                             expiry=' and ContentType = 6)' if opts.extra_customization[self.OPT_SHOW_EXPIRED_BOOK_RECORDS] else ')', \
+                             expiry=' and ContentType = 6)' if opts.extra_customization[self.OPT_SHOW_ARCHIVED_BOOK_RECORDS] else ')', \
                             previews=' and Accessibility <> 6' if opts.extra_customization[self.OPT_SHOW_PREVIEWS] == False else '', \
                             recomendations=' and IsDownloaded in (\'true\', 1)' if opts.extra_customization[self.OPT_SHOW_RECOMMENDATIONS] == False else ''\
                             )
@ -1648,7 +1662,7 @@ class KOBOTOUCH(KOBO):
                    '"1" as IsDownloaded, null as Series, null as SeriesNumber, ___UserID' \
                    ' from content where ' \
                    'BookID is Null and not ((___ExpirationStatus=3 or ___ExpirationStatus is Null) %(expiry)s') % dict(expiry=' and ContentType = 6)' \
-                    if opts.extra_customization[self.OPT_SHOW_EXPIRED_BOOK_RECORDS] else ')')
+                    if opts.extra_customization[self.OPT_SHOW_ARCHIVED_BOOK_RECORDS] else ')')
            else:
                query= 'select Title, Attribution, DateCreated, ContentID, MimeType, ContentType, ' \
                    'ImageID, ReadStatus, "-1" as ___ExpirationStatus, "-1" as FavouritesIndex, "-1" as Accessibility, ' \
@ -2586,7 +2600,7 @@ class KOBOTOUCH(KOBO):
    def modify_database_check(self, function):
        # Checks to see whether the database version is supported
        # and whether the user has chosen to support the firmware version
-#        debug_print("KoboTouch:modify_database_check - self.fwversion <= self.max_supported_fwversion=", self.fwversion > self.max_supported_fwversion)
+#        debug_print("KoboTouch:modify_database_check - self.fwversion > self.max_supported_fwversion=", self.fwversion > self.max_supported_fwversion)
        if self.dbversion > self.supported_dbversion or self.fwversion > self.max_supported_fwversion:
            # Unsupported database
            opts = self.settings()
--- a/src/calibre/devices/nook/driver.py
+++ b/src/calibre/devices/nook/driver.py
@ -53,7 +53,6 @@ class NOOK(USBMS):
        except ImportError:
            import Image, ImageDraw

-
        coverdata = getattr(metadata, 'thumbnail', None)
        if coverdata and coverdata[2]:
            cover = Image.open(cStringIO.StringIO(coverdata[2]))
@ -93,6 +92,7 @@ class NOOK_COLOR(NOOK):
    WINDOWS_MAIN_MEM = WINDOWS_CARD_A_MEM = ['EBOOK_DISK', 'NOOK_TABLET',
            'NOOK_SIMPLETOUCH']
    EBOOK_DIR_MAIN = 'My Files'
+    SCAN_FROM_ROOT = True
    NEWS_IN_FOLDER = False

    def upload_cover(self, path, filename, metadata, filepath):
--- a/src/calibre/devices/prst1/driver.py
+++ b/src/calibre/devices/prst1/driver.py
@ -50,10 +50,10 @@ class PRST1(USBMS):

    VENDOR_NAME        = 'SONY'
    WINDOWS_MAIN_MEM   = re.compile(
-            r'(PRS-T(1|2)&)'
+            r'(PRS-T(1|2|2N)&)'
            )
    WINDOWS_CARD_A_MEM = re.compile(
-            r'(PRS-T(1|2)__SD&)'
+            r'(PRS-T(1|2|2N)__SD&)'
            )
    MAIN_MEMORY_VOLUME_LABEL = 'SONY Reader Main Memory'
    STORAGE_CARD_VOLUME_LABEL = 'SONY Reader Storage Card'
@ -66,7 +66,7 @@ class PRST1(USBMS):

    EXTRA_CUSTOMIZATION_MESSAGE = [
        _('Comma separated list of metadata fields '
-            'to turn into collections on the device. Possibilities include: ')+\
+            'to turn into collections on the device. Possibilities include: ')+
                    'series, tags, authors',
        _('Upload separate cover thumbnails for books') +
             ':::'+_('Normally, the SONY readers get the cover image from the'
@ -194,11 +194,11 @@ class PRST1(USBMS):
                time_offsets = {}
                for i, row in enumerate(cursor):
                    try:
-                        comp_date = int(os.path.getmtime(self.normalize_path(prefix + row[0])) * 1000);
+                        comp_date = int(os.path.getmtime(self.normalize_path(prefix + row[0])) * 1000)
                    except (OSError, IOError, TypeError):
                        # In case the db has incorrect path info
                        continue
-                    device_date = int(row[1]);
+                    device_date = int(row[1])
                    offset = device_date - comp_date
                    time_offsets.setdefault(offset, 0)
                    time_offsets[offset] = time_offsets[offset] + 1
@ -345,7 +345,7 @@ class PRST1(USBMS):
        # Insert the sequence Id if it doesn't
        query = ('INSERT INTO sqlite_sequence (name, seq) '
                'SELECT ?, ? '
-                'WHERE NOT EXISTS (SELECT 1 FROM sqlite_sequence WHERE name = ?)');
+                'WHERE NOT EXISTS (SELECT 1 FROM sqlite_sequence WHERE name = ?)')
        cursor.execute(query, (table, sequence_id, table,))

        cursor.close()
--- a/src/calibre/devices/smart_device_app/driver.py
+++ b/src/calibre/devices/smart_device_app/driver.py
@ -875,6 +875,9 @@ class SMART_DEVICE_APP(DeviceConfig, DevicePlugin):
            self.client_device_kind = result.get('deviceKind', '')
            self._debug('Client device kind', self.client_device_kind)

+            self.client_device_name = result.get('deviceName', self.client_device_kind)
+            self._debug('Client device name', self.client_device_name)
+
            self.max_book_packet_len = result.get('maxBookContentPacketLen',
                                                  self.BASE_PACKET_LEN)
            self._debug('max_book_packet_len', self.max_book_packet_len)
@ -946,6 +949,8 @@ class SMART_DEVICE_APP(DeviceConfig, DevicePlugin):
        return False

    def get_gui_name(self):
+        if getattr(self, 'client_device_name', None):
+            return self.gui_name_template%(self.gui_name, self.client_device_name)
        if getattr(self, 'client_device_kind', None):
            return self.gui_name_template%(self.gui_name, self.client_device_kind)
        return self.gui_name
--- a/src/calibre/ebooks/conversion/plugins/txt_input.py
+++ b/src/calibre/ebooks/conversion/plugins/txt_input.py
@ -91,14 +91,15 @@ class TXTInput(InputFormatPlugin):
            log.debug('Using user specified input encoding of %s' % ienc)
        else:
            det_encoding = detect(txt)
+            det_encoding, confidence = det_encoding['encoding'], det_encoding['confidence']
            if det_encoding and det_encoding.lower().replace('_', '-').strip() in (
                    'gb2312', 'chinese', 'csiso58gb231280', 'euc-cn', 'euccn',
                    'eucgb2312-cn', 'gb2312-1980', 'gb2312-80', 'iso-ir-58'):
                # Microsoft Word exports to HTML with encoding incorrectly set to
                # gb2312 instead of gbk. gbk is a superset of gb2312, anyway.
                det_encoding = 'gbk'
-            ienc = det_encoding['encoding']
-            log.debug('Detected input encoding as %s with a confidence of %s%%' % (ienc, det_encoding['confidence'] * 100))
+            ienc = det_encoding
+            log.debug('Detected input encoding as %s with a confidence of %s%%' % (ienc, confidence * 100))
        if not ienc:
            ienc = 'utf-8'
            log.debug('No input encoding specified and could not auto detect using %s' % ienc)
--- a/src/calibre/ebooks/conversion/plumber.py
+++ b/src/calibre/ebooks/conversion/plumber.py
@ -77,7 +77,7 @@ class Plumber(object):

    def __init__(self, input, output, log, report_progress=DummyReporter(),
            dummy=False, merge_plugin_recs=True, abort_after_input_dump=False,
-            override_input_metadata=False):
+            override_input_metadata=False, for_regex_wizard=False):
        '''
        :param input: Path to input file.
        :param output: Path to output file/directory
@ -87,6 +87,7 @@ class Plumber(object):
        if isbytestring(output):
            output = output.decode(filesystem_encoding)
        self.original_input_arg = input
+        self.for_regex_wizard = for_regex_wizard
        self.input = os.path.abspath(input)
        self.output = os.path.abspath(output)
        self.log = log
@ -123,7 +124,7 @@ OptionRecommendation(name='input_profile',
                   'conversion system information on how to interpret '
                   'various information in the input document. For '
                   'example resolution dependent lengths (i.e. lengths in '
-                   'pixels). Choices are:')+\
+                   'pixels). Choices are:')+
                        ', '.join([x.short_name for x in input_profiles()])
        ),

@ -135,7 +136,7 @@ OptionRecommendation(name='output_profile',
                   'created document for the specified device. In some cases, '
                   'an output profile is required to produce documents that '
                   'will work on a device. For example EPUB on the SONY reader. '
-                   'Choices are:') + \
+                   'Choices are:') +
                           ', '.join([x.short_name for x in output_profiles()])
        ),

@ -490,7 +491,7 @@ OptionRecommendation(name='asciiize',
            'cases where there are multiple representations of a character '
            '(characters shared by Chinese and Japanese for instance) the '
            'representation based on the current calibre interface language will be '
-            'used.')%\
+            'used.')%
            u'\u041c\u0438\u0445\u0430\u0438\u043b '
            u'\u0413\u043e\u0440\u0431\u0430\u0447\u0451\u0432'
 )
@ -711,7 +712,6 @@ OptionRecommendation(name='search_replace',
        self.input_fmt = input_fmt
        self.output_fmt = output_fmt

-
        self.all_format_options = set()
        self.input_options = set()
        self.output_options = set()
@ -783,8 +783,6 @@ OptionRecommendation(name='search_replace',
                    return f, os.path.splitext(f)[1].lower()[1:]
        return html_files[-1], os.path.splitext(html_files[-1])[1].lower()[1:]

-
-
    def get_option_by_name(self, name):
        for group in (self.input_options, self.pipeline_options,
                      self.output_options, self.all_format_options):
@ -956,7 +954,6 @@ OptionRecommendation(name='search_replace',

        self.log.info('Input debug saved to:', out_dir)

-
    def run(self):
        '''
        Run the conversion pipeline
@ -965,6 +962,8 @@ OptionRecommendation(name='search_replace',
        self.setup_options()
        if self.opts.verbose:
            self.log.filter_level = self.log.DEBUG
+        if self.for_regex_wizard and hasattr(self.opts, 'no_process'):
+            self.opts.no_process = True
        self.flush()
        import cssutils, logging
        cssutils.log.setLevel(logging.WARN)
@ -1003,6 +1002,8 @@ OptionRecommendation(name='search_replace',
        self.ui_reporter(0.01, _('Converting input to HTML...'))
        ir = CompositeProgressReporter(0.01, 0.34, self.ui_reporter)
        self.input_plugin.report_progress = ir
+        if self.for_regex_wizard:
+            self.input_plugin.for_viewer = True
        with self.input_plugin:
            self.oeb = self.input_plugin(stream, self.opts,
                                        self.input_fmt, self.log,
@ -1014,8 +1015,12 @@ OptionRecommendation(name='search_replace',
            if self.input_fmt in ('recipe', 'downloaded_recipe'):
                self.opts_to_mi(self.user_metadata)
            if not hasattr(self.oeb, 'manifest'):
-                self.oeb = create_oebbook(self.log, self.oeb, self.opts,
-                        encoding=self.input_plugin.output_encoding)
+                self.oeb = create_oebbook(
+                    self.log, self.oeb, self.opts,
+                    encoding=self.input_plugin.output_encoding,
+                    for_regex_wizard=self.for_regex_wizard)
+            if self.for_regex_wizard:
+                return
            self.input_plugin.postprocess_book(self.oeb, self.opts, self.log)
            self.opts.is_image_collection = self.input_plugin.is_image_collection
            pr = CompositeProgressReporter(0.34, 0.67, self.ui_reporter)
@ -1081,7 +1086,6 @@ OptionRecommendation(name='search_replace',
            self.dump_oeb(self.oeb, out_dir)
            self.log('Structured HTML written to:', out_dir)

-
        if self.opts.extra_css and os.path.exists(self.opts.extra_css):
            self.opts.extra_css = open(self.opts.extra_css, 'rb').read()

@ -1161,13 +1165,20 @@ OptionRecommendation(name='search_replace',
        self.log(self.output_fmt.upper(), 'output written to', self.output)
        self.flush()

+# This has to be global as create_oebbook can be called from other locations
+# (for example in the html input plugin)
+regex_wizard_callback = None
+def set_regex_wizard_callback(f):
+    global regex_wizard_callback
+    regex_wizard_callback = f
+
 def create_oebbook(log, path_or_stream, opts, reader=None,
-        encoding='utf-8', populate=True):
+        encoding='utf-8', populate=True, for_regex_wizard=False):
    '''
    Create an OEBBook.
    '''
    from calibre.ebooks.oeb.base import OEBBook
-    html_preprocessor = HTMLPreProcessor(log, opts)
+    html_preprocessor = HTMLPreProcessor(log, opts, regex_wizard_callback=regex_wizard_callback)
    if not encoding:
        encoding = None
    oeb = OEBBook(log, html_preprocessor,
@ -1182,3 +1193,4 @@ def create_oebbook(log, path_or_stream, opts, reader=None,

    reader()(oeb, path_or_stream)
    return oeb
+
--- a/src/calibre/ebooks/conversion/preprocess.py
+++ b/src/calibre/ebooks/conversion/preprocess.py
@ -200,7 +200,7 @@ class Dehyphenator(object):
        # Add common suffixes to the regex below to increase the likelihood of a match -
        # don't add suffixes which are also complete words, such as 'able' or 'sex'
        # only remove if it's not already the point of hyphenation
-        self.suffix_string = "((ed)?ly|'?e?s||a?(t|s)?ion(s|al(ly)?)?|ings?|er|(i)?ous|(i|a)ty|(it)?ies|ive|gence|istic(ally)?|(e|a)nce|m?ents?|ism|ated|(e|u)ct(ed)?|ed|(i|ed)?ness|(e|a)ncy|ble|ier|al|ex|ian)$"
+        self.suffix_string = "((ed)?ly|'?e?s||a?(t|s)?ion(s|al(ly)?)?|ings?|er|(i)?ous|(i|a)ty|(it)?ies|ive|gence|istic(ally)?|(e|a)nce|m?ents?|ism|ated|(e|u)ct(ed)?|ed|(i|ed)?ness|(e|a)ncy|ble|ier|al|ex|ian)$"  # noqa
        self.suffixes = re.compile(r"^%s" % self.suffix_string, re.IGNORECASE)
        self.removesuffixes = re.compile(r"%s" % self.suffix_string, re.IGNORECASE)
        # remove prefixes if the prefix was not already the point of hyphenation
@ -265,19 +265,18 @@ class Dehyphenator(object):
        self.html = html
        self.format = format
        if format == 'html':
-            intextmatch = re.compile(u'(?<=.{%i})(?P<firstpart>[^\W\-]+)(-|‐)\s*(?=<)(?P<wraptags>(</span>)?\s*(</[iubp]>\s*){1,2}(?P<up2threeblanks><(p|div)[^>]*>\s*(<p[^>]*>\s*</p>\s*)?</(p|div)>\s+){0,3}\s*(<[iubp][^>]*>\s*){1,2}(<span[^>]*>)?)\s*(?P<secondpart>[\w\d]+)' % length)
+            intextmatch = re.compile(u'(?<=.{%i})(?P<firstpart>[^\W\-]+)(-|‐)\s*(?=<)(?P<wraptags>(</span>)?\s*(</[iubp]>\s*){1,2}(?P<up2threeblanks><(p|div)[^>]*>\s*(<p[^>]*>\s*</p>\s*)?</(p|div)>\s+){0,3}\s*(<[iubp][^>]*>\s*){1,2}(<span[^>]*>)?)\s*(?P<secondpart>[\w\d]+)' % length)  # noqa
        elif format == 'pdf':
            intextmatch = re.compile(u'(?<=.{%i})(?P<firstpart>[^\W\-]+)(-|‐)\s*(?P<wraptags><p>|</[iub]>\s*<p>\s*<[iub]>)\s*(?P<secondpart>[\w\d]+)'% length)
        elif format == 'txt':
-            intextmatch = re.compile(u'(?<=.{%i})(?P<firstpart>[^\W\-]+)(-|‐)(\u0020|\u0009)*(?P<wraptags>(\n(\u0020|\u0009)*)+)(?P<secondpart>[\w\d]+)'% length)
+            intextmatch = re.compile(u'(?<=.{%i})(?P<firstpart>[^\W\-]+)(-|‐)(\u0020|\u0009)*(?P<wraptags>(\n(\u0020|\u0009)*)+)(?P<secondpart>[\w\d]+)'% length)  # noqa
        elif format == 'individual_words':
            intextmatch = re.compile(u'(?!<)(?P<firstpart>[^\W\-]+)(-|‐)\s*(?P<secondpart>\w+)(?![^<]*?>)')
        elif format == 'html_cleanup':
-            intextmatch = re.compile(u'(?P<firstpart>[^\W\-]+)(-|‐)\s*(?=<)(?P<wraptags></span>\s*(</[iubp]>\s*<[iubp][^>]*>\s*)?<span[^>]*>|</[iubp]>\s*<[iubp][^>]*>)?\s*(?P<secondpart>[\w\d]+)')
+            intextmatch = re.compile(u'(?P<firstpart>[^\W\-]+)(-|‐)\s*(?=<)(?P<wraptags></span>\s*(</[iubp]>\s*<[iubp][^>]*>\s*)?<span[^>]*>|</[iubp]>\s*<[iubp][^>]*>)?\s*(?P<secondpart>[\w\d]+)')  # noqa
        elif format == 'txt_cleanup':
            intextmatch = re.compile(u'(?P<firstpart>[^\W\-]+)(-|‐)(?P<wraptags>\s+)(?P<secondpart>[\w\d]+)')

-
        html = intextmatch.sub(self.dehyphenate, html)
        return html

@ -498,9 +497,11 @@ class HTMLPreProcessor(object):
                     (re.compile('<span[^><]*?id=subtitle[^><]*?>(.*?)</span>', re.IGNORECASE|re.DOTALL),
                      lambda match : '<h3 class="subtitle">%s</h3>'%(match.group(1),)),
                     ]
-    def __init__(self, log=None, extra_opts=None):
+    def __init__(self, log=None, extra_opts=None, regex_wizard_callback=None):
        self.log = log
        self.extra_opts = extra_opts
+        self.regex_wizard_callback = regex_wizard_callback
+        self.current_href = None

    def is_baen(self, src):
        return re.compile(r'<meta\s+name="Publisher"\s+content=".*?Baen.*?"',
@ -581,12 +582,15 @@ class HTMLPreProcessor(object):
                end_rules.append((re.compile(u'(?<=.{%i}[–—])\s*<p>\s*(?=[[a-z\d])' % length), lambda match: ''))
                end_rules.append(
                    # Un wrap using punctuation
-                    (re.compile(u'(?<=.{%i}([a-zäëïöüàèìòùáćéíĺóŕńśúýâêîôûçąężıãõñæøþðßěľščťžňďřů,:)\IA\u00DF]|(?<!\&\w{4});))\s*(?P<ital></(i|b|u)>)?\s*(</p>\s*<p>\s*)+\s*(?=(<(i|b|u)>)?\s*[\w\d$(])' % length, re.UNICODE), wrap_lines),
+                    (re.compile(u'(?<=.{%i}([a-zäëïöüàèìòùáćéíĺóŕńśúýâêîôûçąężıãõñæøþðßěľščťžňďřů,:)\IA\u00DF]|(?<!\&\w{4});))\s*(?P<ital></(i|b|u)>)?\s*(</p>\s*<p>\s*)+\s*(?=(<(i|b|u)>)?\s*[\w\d$(])' % length, re.UNICODE), wrap_lines),  # noqa
                )

        for rule in self.PREPROCESS + start_rules:
            html = rule[0].sub(rule[1], html)

+        if self.regex_wizard_callback is not None:
+            self.regex_wizard_callback(self.current_href, html)
+
        if get_preprocess_html:
            return html

--- a/src/calibre/ebooks/docx/block_styles.py
+++ b/src/calibre/ebooks/docx/block_styles.py
@ -0,0 +1,371 @@
+#!/usr/bin/env python
+# vim:fileencoding=utf-8
+from __future__ import (unicode_literals, division, absolute_import,
+                        print_function)
+
+__license__ = 'GPL v3'
+__copyright__ = '2013, Kovid Goyal <kovid at kovidgoyal.net>'
+
+from collections import OrderedDict
+from calibre.ebooks.docx.names import XPath, get
+
+class Inherit:
+    pass
+inherit = Inherit()
+
+def binary_property(parent, name):
+    vals = XPath('./w:%s' % name)(parent)
+    if not vals:
+        return inherit
+    val = get(vals[0], 'w:val', 'on')
+    return True if val in {'on', '1', 'true'} else False
+
+def simple_color(col, auto='black'):
+    if not col or col == 'auto' or len(col) != 6:
+        return auto
+    return '#'+col
+
+def simple_float(val, mult=1.0):
+    try:
+        return float(val) * mult
+    except (ValueError, TypeError, AttributeError, KeyError):
+        return None
+
+
+LINE_STYLES = {  # {{{
+    'basicBlackDashes': 'dashed',
+    'basicBlackDots': 'dotted',
+    'basicBlackSquares': 'dashed',
+    'basicThinLines': 'solid',
+    'dashDotStroked': 'groove',
+    'dashed': 'dashed',
+    'dashSmallGap': 'dashed',
+    'dotDash': 'dashed',
+    'dotDotDash': 'dashed',
+    'dotted': 'dotted',
+    'double': 'double',
+    'inset': 'inset',
+    'nil': 'none',
+    'none': 'none',
+    'outset': 'outset',
+    'single': 'solid',
+    'thick': 'solid',
+    'thickThinLargeGap': 'double',
+    'thickThinMediumGap': 'double',
+    'thickThinSmallGap' : 'double',
+    'thinThickLargeGap': 'double',
+    'thinThickMediumGap': 'double',
+    'thinThickSmallGap': 'double',
+    'thinThickThinLargeGap': 'double',
+    'thinThickThinMediumGap': 'double',
+    'thinThickThinSmallGap': 'double',
+    'threeDEmboss': 'ridge',
+    'threeDEngrave': 'groove',
+    'triple': 'double',
+}  # }}}
+
+# Read from XML {{{
+def read_border(parent, dest):
+    tvals = {'padding_%s':inherit, 'border_%s_width':inherit,
+            'border_%s_style':inherit, 'border_%s_color':inherit}
+    vals = {}
+    for edge in ('left', 'top', 'right', 'bottom'):
+        vals.update({k % edge:v for k, v in tvals.iteritems()})
+
+    for border in XPath('./w:pBdr')(parent):
+        for edge in ('left', 'top', 'right', 'bottom'):
+            for elem in XPath('./w:%s' % edge)(border):
+                color = get(elem, 'w:color')
+                if color is not None:
+                    vals['border_%s_color' % edge] = simple_color(color)
+                style = get(elem, 'w:val')
+                if style is not None:
+                    vals['border_%s_style' % edge] = LINE_STYLES.get(style, 'solid')
+                space = get(elem, 'w:space')
+                if space is not None:
+                    try:
+                        vals['padding_%s' % edge] = float(space)
+                    except (ValueError, TypeError):
+                        pass
+                sz = get(elem, 'w:sz')
+                if sz is not None:
+                    # we dont care about art borders (they are only used for page borders)
+                    try:
+                        vals['border_%s_width' % edge] = min(96, max(2, float(sz))) / 8
+                    except (ValueError, TypeError):
+                        pass
+
+    for key, val in vals.iteritems():
+        setattr(dest, key, val)
+
+def read_indent(parent, dest):
+    padding_left = padding_right = text_indent = inherit
+    for indent in XPath('./w:ind')(parent):
+        l, lc = get(indent, 'w:left'), get(indent, 'w:leftChars')
+        pl = simple_float(lc, 0.01) if lc is not None else simple_float(l, 0.05) if l is not None else None
+        if pl is not None:
+            padding_left = '%.3g%s' % (pl, 'em' if lc is not None else 'pt')
+
+        r, rc = get(indent, 'w:right'), get(indent, 'w:rightChars')
+        pr = simple_float(rc, 0.01) if rc is not None else simple_float(r, 0.05) if r is not None else None
+        if pr is not None:
+            padding_right = '%.3g%s' % (pr, 'em' if rc is not None else 'pt')
+
+        h, hc = get(indent, 'w:hanging'), get(indent, 'w:hangingChars')
+        fl, flc = get(indent, 'w:firstLine'), get(indent, 'w:firstLineChars')
+        h = h if h is None else '-'+h
+        hc = hc if hc is None else '-'+hc
+        ti = (simple_float(hc, 0.01) if hc is not None else simple_float(h, 0.05) if h is not None else
+              simple_float(flc, 0.01) if flc is not None else simple_float(fl, 0.05) if fl is not None else None)
+        if ti is not None:
+            text_indent = '%.3g%s' % (ti, 'em' if hc is not None or (h is None and flc is not None) else 'pt')
+
+    setattr(dest, 'margin_left', padding_left)
+    setattr(dest, 'margin_right', padding_right)
+    setattr(dest, 'text_indent', text_indent)
+
+def read_justification(parent, dest):
+    ans = inherit
+    for jc in XPath('./w:jc[@w:val]')(parent):
+        val = get(jc, 'w:val')
+        if not val:
+            continue
+        if val in {'both', 'distribute'} or 'thai' in val or 'kashida' in val:
+            ans = 'justify'
+        if val in {'left', 'center', 'right',}:
+            ans = val
+    setattr(dest, 'text_align', ans)
+
+def read_spacing(parent, dest):
+    padding_top = padding_bottom = line_height = inherit
+    for s in XPath('./w:spacing')(parent):
+        a, al, aa = get(s, 'w:after'), get(s, 'w:afterLines'), get(s, 'w:afterAutospacing')
+        pb = None if aa in {'on', '1', 'true'} else simple_float(al, 0.02) if al is not None else simple_float(a, 0.05) if a is not None else None
+        if pb is not None:
+            padding_bottom = '%.3g%s' % (pb, 'ex' if al is not None else 'pt')
+
+        b, bl, bb = get(s, 'w:before'), get(s, 'w:beforeLines'), get(s, 'w:beforeAutospacing')
+        pt = None if bb in {'on', '1', 'true'} else simple_float(bl, 0.02) if bl is not None else simple_float(b, 0.05) if b is not None else None
+        if pt is not None:
+            padding_top = '%.3g%s' % (pt, 'ex' if bl is not None else 'pt')
+
+        l, lr = get(s, 'w:line'), get(s, 'w:lineRule', 'auto')
+        if l is not None:
+            lh = simple_float(l, 0.05) if lr in {'exact', 'atLeast'} else simple_float(l, 1/240.0)
+            line_height = '%.3g%s' % (lh, 'pt' if lr in {'exact', 'atLeast'} else '')
+
+    setattr(dest, 'margin_top', padding_top)
+    setattr(dest, 'margin_bottom', padding_bottom)
+    setattr(dest, 'line_height', line_height)
+
+def read_direction(parent, dest):
+    ans = inherit
+    for jc in XPath('./w:textFlow[@w:val]')(parent):
+        val = get(jc, 'w:val')
+        if not val:
+            continue
+        if 'rl' in val.lower():
+            ans = 'rtl'
+    setattr(dest, 'direction', ans)
+
+def read_shd(parent, dest):
+    ans = inherit
+    for shd in XPath('./w:shd[@w:fill]')(parent):
+        val = get(shd, 'w:fill')
+        if val:
+            ans = simple_color(val, auto='transparent')
+    setattr(dest, 'background_color', ans)
+
+def read_numbering(parent, dest):
+    lvl = num_id = None
+    for np in XPath('./w:numPr')(parent):
+        for ilvl in XPath('./w:ilvl[@w:val]')(np):
+            try:
+                lvl = int(get(ilvl, 'w:val'))
+            except (ValueError, TypeError):
+                pass
+        for num in XPath('./w:numId[@w:val]')(np):
+            num_id = get(num, 'w:val')
+    val = (num_id, lvl) if num_id is not None or lvl is not None else inherit
+    setattr(dest, 'numbering', val)
+
+class Frame(object):
+
+    all_attributes = ('drop_cap', 'h', 'w', 'h_anchor', 'h_rule', 'v_anchor', 'wrap',
+                      'h_space', 'v_space', 'lines', 'x_align', 'y_align', 'x', 'y')
+
+    def __init__(self, fp):
+        self.drop_cap = get(fp, 'w:dropCap', 'none')
+        try:
+            self.h = int(get(fp, 'w:h'))/20
+        except (ValueError, TypeError):
+            self.h = 0
+        try:
+            self.w = int(get(fp, 'w:w'))/20
+        except (ValueError, TypeError):
+            self.w = None
+        try:
+            self.x = int(get(fp, 'w:x'))/20
+        except (ValueError, TypeError):
+            self.x = 0
+        try:
+            self.y = int(get(fp, 'w:y'))/20
+        except (ValueError, TypeError):
+            self.y = 0
+
+        self.h_anchor = get(fp, 'w:hAnchor', 'page')
+        self.h_rule = get(fp, 'w:hRule', 'auto')
+        self.v_anchor = get(fp, 'w:vAnchor', 'page')
+        self.wrap = get(fp, 'w:wrap', 'around')
+        self.x_align = get(fp, 'w:xAlign')
+        self.y_align = get(fp, 'w:yAlign')
+
+        try:
+            self.h_space = int(get(fp, 'w:hSpace'))/20
+        except (ValueError, TypeError):
+            self.h_space = 0
+        try:
+            self.v_space = int(get(fp, 'w:vSpace'))/20
+        except (ValueError, TypeError):
+            self.v_space = 0
+        try:
+            self.lines = int(get(fp, 'w:lines'))
+        except (ValueError, TypeError):
+            self.lines = 1
+
+    def css(self, page):
+        is_dropcap = self.drop_cap in {'drop', 'margin'}
+        ans = {'overflow': 'hidden'}
+
+        if is_dropcap:
+            ans['float'] = 'left'
+            ans['margin'] = '0'
+            ans['padding-right'] = '0.2em'
+        else:
+            if self.h_rule != 'auto':
+                t = 'min-height' if self.h_rule == 'atLeast' else 'height'
+                ans[t] = '%.3gpt' % self.h
+            if self.w is not None:
+                ans['width'] = '%.3gpt' % self.w
+            ans['padding-top'] = ans['padding-bottom'] = '%.3gpt' % self.v_space
+            if self.wrap not in {None, 'none'}:
+                ans['padding-left'] = ans['padding-right'] = '%.3gpt' % self.h_space
+                if self.x_align is None:
+                    fl = 'left' if self.x/page.width < 0.5 else 'right'
+                else:
+                    fl = 'right' if self.x_align == 'right' else 'left'
+                ans['float'] = fl
+        return ans
+
+    def __eq__(self, other):
+        for x in self.all_attributes:
+            if getattr(other, x, inherit) != getattr(self, x):
+                return False
+        return True
+
+    def __ne__(self, other):
+        return not self.__eq__(other)
+
+def read_frame(parent, dest):
+    ans = inherit
+    for fp in XPath('./w:framePr')(parent):
+        ans = Frame(fp)
+    setattr(dest, 'frame', ans)
+
+# }}}
+
+class ParagraphStyle(object):
+
+    all_properties = (
+        'adjustRightInd', 'autoSpaceDE', 'autoSpaceDN', 'bidi',
+        'contextualSpacing', 'keepLines', 'keepNext', 'mirrorIndents',
+        'pageBreakBefore', 'snapToGrid', 'suppressLineNumbers',
+        'suppressOverlap', 'topLinePunct', 'widowControl', 'wordWrap',
+
+        # Border margins padding
+        'border_left_width', 'border_left_style', 'border_left_color', 'padding_left',
+        'border_top_width', 'border_top_style', 'border_top_color', 'padding_top',
+        'border_right_width', 'border_right_style', 'border_right_color', 'padding_right',
+        'border_bottom_width', 'border_bottom_style', 'border_bottom_color', 'padding_bottom',
+        'margin_left', 'margin_top', 'margin_right', 'margin_bottom',
+
+        # Misc.
+        'text_indent', 'text_align', 'line_height', 'direction', 'background_color',
+        'numbering', 'font_family', 'font_size', 'frame',
+    )
+
+    def __init__(self, pPr=None):
+        self.linked_style = None
+        if pPr is None:
+            for p in self.all_properties:
+                setattr(self, p, inherit)
+        else:
+            for p in (
+                'adjustRightInd', 'autoSpaceDE', 'autoSpaceDN', 'bidi',
+                'contextualSpacing', 'keepLines', 'keepNext', 'mirrorIndents',
+                'pageBreakBefore', 'snapToGrid', 'suppressLineNumbers',
+                'suppressOverlap', 'topLinePunct', 'widowControl', 'wordWrap',
+            ):
+                setattr(self, p, binary_property(pPr, p))
+
+            for x in ('border', 'indent', 'justification', 'spacing', 'direction', 'shd', 'numbering', 'frame'):
+                f = globals()['read_%s' % x]
+                f(pPr, self)
+
+            for s in XPath('./w:pStyle[@w:val]')(pPr):
+                self.linked_style = get(s, 'w:val')
+
+            self.font_family = self.font_size = inherit
+
+        self._css = None
+
+    def update(self, other):
+        for prop in self.all_properties:
+            nval = getattr(other, prop)
+            if nval is not inherit:
+                setattr(self, prop, nval)
+        if other.linked_style is not None:
+            self.linked_style = other.linked_style
+
+    def resolve_based_on(self, parent):
+        for p in self.all_properties:
+            val = getattr(self, p)
+            if val is inherit:
+                setattr(self, p, getattr(parent, p))
+
+    @property
+    def css(self):
+        if self._css is None:
+            self._css = c = OrderedDict()
+            if self.keepLines is True:
+                c['page-break-inside'] = 'avoid'
+            if self.pageBreakBefore is True:
+                c['page-break-before'] = 'always'
+            for edge in ('left', 'top', 'right', 'bottom'):
+                val = getattr(self, 'border_%s_width' % edge)
+                if val is not inherit:
+                    c['border-left-width'] = '%.3gpt' % val
+                for x in ('style', 'color'):
+                    val = getattr(self, 'border_%s_%s' % (edge, x))
+                    if val is not inherit:
+                        c['border-%s-%s' % (edge, x)] = val
+                val = getattr(self, 'padding_%s' % edge)
+                if val is not inherit:
+                    c['padding-%s' % edge] = '%.3gpt' % val
+                val = getattr(self, 'margin_%s' % edge)
+                if val is not inherit:
+                    c['margin-%s' % edge] = val
+
+            if self.line_height not in {inherit, '1'}:
+                c['line-height'] = self.line_height
+
+            for x in ('text_indent', 'text_align', 'background_color', 'font_family', 'font_size'):
+                val = getattr(self, x)
+                if val is not inherit:
+                    if x == 'font_size':
+                        val = '%.3gpt' % val
+                    c[x.replace('_', '-')] = val
+
+        return self._css
+
+        # TODO: keepNext must be done at markup level
--- a/src/calibre/ebooks/docx/char_styles.py
+++ b/src/calibre/ebooks/docx/char_styles.py
@ -0,0 +1,249 @@
+#!/usr/bin/env python
+# vim:fileencoding=utf-8
+from __future__ import (unicode_literals, division, absolute_import,
+                        print_function)
+
+__license__ = 'GPL v3'
+__copyright__ = '2013, Kovid Goyal <kovid at kovidgoyal.net>'
+
+from collections import OrderedDict
+from calibre.ebooks.docx.block_styles import (  # noqa
+    inherit, simple_color, LINE_STYLES, simple_float, binary_property, read_shd)
+from calibre.ebooks.docx.names import XPath, get
+
+# Read from XML {{{
+def read_text_border(parent, dest):
+    border_color = border_style = border_width = padding = inherit
+    elems = XPath('./w:bdr')(parent)
+    if elems:
+        border_color = simple_color('auto')
+        border_style = 'solid'
+        border_width = 1
+    for elem in elems:
+        color = get(elem, 'w:color')
+        if color is not None:
+            border_color = simple_color(color)
+        style = get(elem, 'w:val')
+        if style is not None:
+            border_style = LINE_STYLES.get(style, 'solid')
+        space = get(elem, 'w:space')
+        if space is not None:
+            try:
+                padding = float(space)
+            except (ValueError, TypeError):
+                pass
+        sz = get(elem, 'w:sz')
+        if sz is not None:
+            # we dont care about art borders (they are only used for page borders)
+            try:
+                border_width = min(96, max(2, float(sz))) / 8
+            except (ValueError, TypeError):
+                pass
+
+    setattr(dest, 'border_color', border_color)
+    setattr(dest, 'border_style', border_style)
+    setattr(dest, 'border_width', border_width)
+    setattr(dest, 'padding', padding)
+
+def read_color(parent, dest):
+    ans = inherit
+    for col in XPath('./w:color[@w:val]')(parent):
+        val = get(col, 'w:val')
+        if not val:
+            continue
+        ans = simple_color(val)
+    setattr(dest, 'color', ans)
+
+def read_highlight(parent, dest):
+    ans = inherit
+    for col in XPath('./w:highlight[@w:val]')(parent):
+        val = get(col, 'w:val')
+        if not val:
+            continue
+        if not val or val == 'none':
+            val = 'transparent'
+        ans = val
+    setattr(dest, 'highlight', ans)
+
+def read_lang(parent, dest):
+    ans = inherit
+    for col in XPath('./w:lang[@w:val]')(parent):
+        val = get(col, 'w:val')
+        if not val:
+            continue
+        try:
+            code = int(val, 16)
+        except (ValueError, TypeError):
+            ans = val
+        else:
+            from calibre.ebooks.docx.lcid import lcid
+            val = lcid.get(code, None)
+            if val:
+                ans = val
+    setattr(dest, 'lang', ans)
+
+def read_letter_spacing(parent, dest):
+    ans = inherit
+    for col in XPath('./w:spacing[@w:val]')(parent):
+        val = simple_float(get(col, 'w:val'), 0.05)
+        if val is not None:
+            ans = val
+    setattr(dest, 'letter_spacing', ans)
+
+def read_sz(parent, dest):
+    ans = inherit
+    for col in XPath('./w:sz[@w:val]')(parent):
+        val = simple_float(get(col, 'w:val'), 0.5)
+        if val is not None:
+            ans = val
+    setattr(dest, 'font_size', ans)
+
+def read_underline(parent, dest):
+    ans = inherit
+    for col in XPath('./w:u[@w:val]')(parent):
+        val = get(col, 'w:val')
+        if val:
+            ans = 'underline'
+    setattr(dest, 'text_decoration', ans)
+
+def read_vert_align(parent, dest):
+    ans = inherit
+    for col in XPath('./w:vertAlign[@w:val]')(parent):
+        val = get(col, 'w:val')
+        if val and val in {'baseline', 'subscript', 'superscript'}:
+            ans = val
+    setattr(dest, 'vert_align', ans)
+
+def read_font_family(parent, dest):
+    ans = inherit
+    for col in XPath('./w:rFonts[@w:ascii]')(parent):
+        val = get(col, 'w:ascii')
+        if val:
+            ans = val
+    setattr(dest, 'font_family', ans)
+# }}}
+
+class RunStyle(object):
+
+    all_properties = {
+        'b', 'bCs', 'caps', 'cs', 'dstrike', 'emboss', 'i', 'iCs', 'imprint',
+        'rtl', 'shadow', 'smallCaps', 'strike', 'vanish',
+
+        'border_color', 'border_style', 'border_width', 'padding', 'color', 'highlight', 'background_color',
+        'letter_spacing', 'font_size', 'text_decoration', 'vert_align', 'lang', 'font_family'
+    }
+
+    toggle_properties = {
+        'b', 'bCs', 'caps', 'emboss', 'i', 'iCs', 'imprint', 'shadow', 'smallCaps', 'strike', 'dstrike', 'vanish',
+    }
+
+    def __init__(self, rPr=None):
+        self.linked_style = None
+        if rPr is None:
+            for p in self.all_properties:
+                setattr(self, p, inherit)
+        else:
+            for p in (
+                'b', 'bCs', 'caps', 'cs', 'dstrike', 'emboss', 'i', 'iCs', 'imprint', 'rtl', 'shadow',
+                'smallCaps', 'strike', 'vanish',
+            ):
+                setattr(self, p, binary_property(rPr, p))
+
+            for x in ('text_border', 'color', 'highlight', 'shd', 'letter_spacing', 'sz', 'underline', 'vert_align', 'lang', 'font_family'):
+                f = globals()['read_%s' % x]
+                f(rPr, self)
+
+            for s in XPath('./w:rStyle[@w:val]')(rPr):
+                self.linked_style = get(s, 'w:val')
+
+        self._css = None
+
+    def update(self, other):
+        for prop in self.all_properties:
+            nval = getattr(other, prop)
+            if nval is not inherit:
+                setattr(self, prop, nval)
+        if other.linked_style is not None:
+            self.linked_style = other.linked_style
+
+    def resolve_based_on(self, parent):
+        for p in self.all_properties:
+            val = getattr(self, p)
+            if val is inherit:
+                setattr(self, p, getattr(parent, p))
+
+    def get_border_css(self, ans):
+        for x in ('color', 'style', 'width'):
+            val = getattr(self, 'border_'+x)
+            if x == 'width' and val is not inherit:
+                val = '%.3gpt' % val
+            if val is not inherit:
+                ans['border-%s' % x] = val
+
+    def clear_border_css(self):
+        for x in ('color', 'style', 'width'):
+            setattr(self, 'border_'+x, inherit)
+
+    @property
+    def css(self):
+        if self._css is None:
+            c = self._css = OrderedDict()
+            td = set()
+            if self.text_decoration is not inherit:
+                td.add(self.text_decoration)
+            if self.strike:
+                td.add('line-through')
+            if self.dstrike:
+                td.add('line-through')
+            if td:
+                c['text-decoration'] = ' '.join(td)
+            if self.caps is True:
+                c['text-transform'] = 'uppercase'
+            if self.i is True:
+                c['font-style'] = 'italic'
+            if self.shadow:
+                c['text-shadow'] = '2px 2px'
+            if self.smallCaps is True:
+                c['font-variant'] = 'small-caps'
+            if self.vanish is True:
+                c['display'] = 'none'
+
+            self.get_border_css(c)
+            if self.padding is not inherit:
+                c['padding'] = '%.3gpt' % self.padding
+
+            for x in ('color', 'background_color'):
+                val = getattr(self, x)
+                if val is not inherit:
+                    c[x.replace('_', '-')] = val
+
+            for x in ('letter_spacing', 'font_size'):
+                val = getattr(self, x)
+                if val is not inherit:
+                    c[x.replace('_', '-')] = '%.3gpt' % val
+
+            if self.highlight is not inherit and self.highlight != 'transparent':
+                c['background-color'] = self.highlight
+
+            if self.b:
+                c['font-weight'] = 'bold'
+
+            if self.font_family is not inherit:
+                c['font-family'] = self.font_family
+
+        return self._css
+
+    def same_border(self, other):
+        for x in (self, other):
+            has_border = False
+            for y in ('color', 'style', 'width'):
+                if ('border-%s' % y) in x.css:
+                    has_border = True
+                    break
+            if not has_border:
+                return False
+
+        s = tuple(self.css.get('border-%s' % y, None) for y in ('color', 'style', 'width'))
+        o = tuple(other.css.get('border-%s' % y, None) for y in ('color', 'style', 'width'))
+        return s == o
+
--- a/src/calibre/ebooks/docx/container.py
+++ b/src/calibre/ebooks/docx/container.py
@ -167,7 +167,9 @@ class DOCX(object):

    @property
    def document_relationships(self):
-        name = self.document_name
+        return self.get_relationships(self.document_name)
+
+    def get_relationships(self, name):
        base = '/'.join(name.split('/')[:-1])
        by_id, by_type = {}, {}
        parts = name.split('/')
@ -179,7 +181,9 @@ class DOCX(object):
        else:
            root = fromstring(raw)
            for item in root.xpath('//*[local-name()="Relationships"]/*[local-name()="Relationship" and @Type and @Target]'):
-                target = '/'.join((base, item.get('Target').lstrip('/')))
+                target = item.get('Target')
+                if item.get('TargetMode', None) != 'External':
+                    target = '/'.join((base, target.lstrip('/')))
                typ = item.get('Type')
                Id = item.get('Id')
                by_id[Id] = by_type[typ] = target
--- a/src/calibre/ebooks/docx/dump.py
+++ b/src/calibre/ebooks/docx/dump.py
@ -0,0 +1,37 @@
+#!/usr/bin/env python
+# vim:fileencoding=utf-8
+from __future__ import (unicode_literals, division, absolute_import,
+                        print_function)
+
+__license__ = 'GPL v3'
+__copyright__ = '2013, Kovid Goyal <kovid at kovidgoyal.net>'
+
+import sys, os, shutil
+
+from lxml import etree
+
+from calibre import walk
+from calibre.utils.zipfile import ZipFile
+
+def dump(path):
+    dest = os.path.splitext(os.path.basename(path))[0]
+    dest += '_extracted'
+    if os.path.exists(dest):
+        shutil.rmtree(dest)
+    with ZipFile(path) as zf:
+        zf.extractall(dest)
+
+    for f in walk(dest):
+        if f.endswith('.xml') or f.endswith('.rels'):
+            with open(f, 'r+b') as stream:
+                raw = stream.read()
+                root = etree.fromstring(raw)
+                stream.seek(0)
+                stream.truncate()
+                stream.write(etree.tostring(root, pretty_print=True, encoding='utf-8', xml_declaration=True))
+
+    print (path, 'dumped to', dest)
+
+if __name__ == '__main__':
+    dump(sys.argv[-1])
+
--- a/src/calibre/ebooks/docx/fonts.py
+++ b/src/calibre/ebooks/docx/fonts.py
@ -0,0 +1,132 @@
+#!/usr/bin/env python
+# vim:fileencoding=utf-8
+from __future__ import (unicode_literals, division, absolute_import,
+                        print_function)
+
+__license__ = 'GPL v3'
+__copyright__ = '2013, Kovid Goyal <kovid at kovidgoyal.net>'
+
+import os, re
+from collections import namedtuple
+
+from calibre.ebooks.docx.block_styles import binary_property, inherit
+from calibre.ebooks.docx.names import XPath, get
+from calibre.utils.filenames import ascii_filename
+from calibre.utils.fonts.scanner import font_scanner, NoFonts
+from calibre.utils.fonts.utils import panose_to_css_generic_family, is_truetype_font
+
+Embed = namedtuple('Embed', 'name key subsetted')
+
+def has_system_fonts(name):
+    try:
+        return bool(font_scanner.fonts_for_family(name))
+    except NoFonts:
+        return False
+
+def get_variant(bold=False, italic=False):
+    return {(False, False):'Regular', (False, True):'Italic',
+            (True, False):'Bold', (True, True):'BoldItalic'}[(bold, italic)]
+
+class Family(object):
+
+    def __init__(self, elem, embed_relationships):
+        self.name = self.family_name = get(elem, 'w:name')
+        self.alt_names = tuple(get(x, 'w:val') for x in XPath('./w:altName')(elem))
+        if self.alt_names and not has_system_fonts(self.name):
+            for x in self.alt_names:
+                if has_system_fonts(x):
+                    self.family_name = x
+                    break
+
+        self.embedded = {}
+        for x in ('Regular', 'Bold', 'Italic', 'BoldItalic'):
+            for y in XPath('./w:embed%s[@r:id]' % x)(elem):
+                rid = get(y, 'r:id')
+                key = get(y, 'w:fontKey')
+                subsetted = get(y, 'w:subsetted') in {'1', 'true', 'on'}
+                if rid in embed_relationships:
+                    self.embedded[x] = Embed(embed_relationships[rid], key, subsetted)
+
+        self.generic_family = 'auto'
+        for x in XPath('./w:family[@w:val]')(elem):
+            self.generic_family = get(x, 'w:val', 'auto')
+
+        ntt = binary_property(elem, 'notTrueType')
+        self.is_ttf = ntt is inherit or not ntt
+
+        self.panose1 = None
+        self.panose_name = None
+        for x in XPath('./w:panose1[@w:val]')(elem):
+            try:
+                v = get(x, 'w:val')
+                v = tuple(int(v[i:i+2], 16) for i in xrange(0, len(v), 2))
+            except (TypeError, ValueError, IndexError):
+                pass
+            else:
+                self.panose1 = v
+                self.panose_name = panose_to_css_generic_family(v)
+
+        self.css_generic_family = {'roman':'serif', 'swiss':'sans-serif', 'modern':'monospace',
+                                   'decorative':'fantasy', 'script':'cursive'}.get(self.generic_family, None)
+        self.css_generic_family = self.css_generic_family or self.panose_name or 'serif'
+
+
+class Fonts(object):
+
+    def __init__(self):
+        self.fonts = {}
+        self.used = set()
+
+    def __call__(self, root, embed_relationships, docx, dest_dir):
+        for elem in XPath('//w:font[@w:name]')(root):
+            self.fonts[get(elem, 'w:name')] = Family(elem, embed_relationships)
+
+    def family_for(self, name, bold=False, italic=False):
+        f = self.fonts.get(name, None)
+        if f is None:
+            return 'serif'
+        variant = get_variant(bold, italic)
+        self.used.add((name, variant))
+        name = f.name if variant in f.embedded else f.family_name
+        return '"%s", %s' % (name.replace('"', ''), f.css_generic_family)
+
+    def embed_fonts(self, dest_dir, docx):
+        defs = []
+        dest_dir = os.path.join(dest_dir, 'fonts')
+        for name, variant in self.used:
+            f = self.fonts[name]
+            if variant in f.embedded:
+                if not os.path.exists(dest_dir):
+                    os.mkdir(dest_dir)
+                fname = self.write(name, dest_dir, docx, variant)
+                if fname is not None:
+                    d = {'font-family':'"%s"' % name.replace('"', ''), 'src': 'url("fonts/%s")' % fname}
+                    if 'Bold' in variant:
+                        d['font-weight'] = 'bold'
+                    if 'Italic' in variant:
+                        d['font-style'] = 'italic'
+                    d = ['%s: %s' % (k, v) for k, v in d.iteritems()]
+                    d = ';\n\t'.join(d)
+                    defs.append('@font-face {\n\t%s\n}\n' % d)
+        return '\n'.join(defs)
+
+    def write(self, name, dest_dir, docx, variant):
+        f = self.fonts[name]
+        ef = f.embedded[variant]
+        raw = docx.read(ef.name)
+        prefix = raw[:32]
+        if ef.key:
+            key = re.sub(r'[^A-Fa-f0-9]', '', ef.key)
+            key = bytearray(reversed(tuple(int(key[i:i+2], 16) for i in xrange(0, len(key), 2))))
+            prefix = bytearray(prefix)
+            prefix = bytes(bytearray(prefix[i]^key[i % len(key)] for i in xrange(len(prefix))))
+        if not is_truetype_font(prefix):
+            return None
+        ext = 'otf' if prefix.startswith(b'OTTO') else 'ttf'
+        fname = ascii_filename('%s - %s.%s' % (name, variant, ext))
+        with open(os.path.join(dest_dir, fname), 'wb') as dest:
+            dest.write(prefix)
+            dest.write(raw[32:])
+
+        return fname
+
--- a/src/calibre/ebooks/docx/footnotes.py
+++ b/src/calibre/ebooks/docx/footnotes.py
@ -0,0 +1,62 @@
+#!/usr/bin/env python
+# vim:fileencoding=utf-8
+from __future__ import (unicode_literals, division, absolute_import,
+                        print_function)
+
+__license__ = 'GPL v3'
+__copyright__ = '2013, Kovid Goyal <kovid at kovidgoyal.net>'
+
+from collections import OrderedDict
+
+from calibre.ebooks.docx.names import get, XPath, descendants
+
+class Note(object):
+
+    def __init__(self, parent):
+        self.type = get(parent, 'w:type', 'normal')
+        self.parent = parent
+
+    def __iter__(self):
+        for p in descendants(self.parent, 'w:p'):
+            yield p
+
+class Footnotes(object):
+
+    def __init__(self):
+        self.footnotes = {}
+        self.endnotes = {}
+        self.counter = 0
+        self.notes = OrderedDict()
+
+    def __call__(self, footnotes, endnotes):
+        if footnotes is not None:
+            for footnote in XPath('./w:footnote[@w:id]')(footnotes):
+                fid = get(footnote, 'w:id')
+                if fid:
+                    self.footnotes[fid] = Note(footnote)
+
+        if endnotes is not None:
+            for endnote in XPath('./w:endnote[@w:id]')(endnotes):
+                fid = get(endnote, 'w:id')
+                if fid:
+                    self.endnotes[fid] = Note(endnote)
+
+    def get_ref(self, ref):
+        fid = get(ref, 'w:id')
+        notes = self.footnotes if ref.tag.endswith('}footnoteReference') else self.endnotes
+        note = notes.get(fid, None)
+        if note is not None and note.type == 'normal':
+            self.counter += 1
+            anchor = 'note_%d' % self.counter
+            self.notes[anchor] = (type('')(self.counter), note)
+            return anchor, type('')(self.counter)
+        return None, None
+
+    def __iter__(self):
+        for anchor, (counter, note) in self.notes.iteritems():
+            yield anchor, counter, note
+
+    @property
+    def has_notes(self):
+        return bool(self.notes)
+
--- a/src/calibre/ebooks/docx/images.py
+++ b/src/calibre/ebooks/docx/images.py
@ -0,0 +1,205 @@
+#!/usr/bin/env python
+# vim:fileencoding=utf-8
+from __future__ import (unicode_literals, division, absolute_import,
+                        print_function)
+
+__license__ = 'GPL v3'
+__copyright__ = '2013, Kovid Goyal <kovid at kovidgoyal.net>'
+
+import os
+
+from lxml.html.builder import IMG
+
+from calibre.ebooks.docx.names import XPath, get, barename
+from calibre.utils.filenames import ascii_filename
+from calibre.utils.imghdr import what
+
+def emu_to_pt(x):
+    return x / 12700
+
+def get_image_properties(parent):
+    width = height = None
+    for extent in XPath('./wp:extent')(parent):
+        try:
+            width = emu_to_pt(int(extent.get('cx')))
+        except (TypeError, ValueError):
+            pass
+        try:
+            height = emu_to_pt(int(extent.get('cy')))
+        except (TypeError, ValueError):
+            pass
+    ans = {}
+    if width is not None:
+        ans['width'] = '%.3gpt' % width
+    if height is not None:
+        ans['height'] = '%.3gpt' % height
+
+    alt = None
+    for docPr in XPath('./wp:docPr')(parent):
+        x = docPr.get('descr', None)
+        if x:
+            alt = x
+        if docPr.get('hidden', None) in {'true', 'on', '1'}:
+            ans['display'] = 'none'
+
+    return ans, alt
+
+
+def get_image_margins(elem):
+    ans = {}
+    for w, css in {'L':'left', 'T':'top', 'R':'right', 'B':'bottom'}.iteritems():
+        val = elem.get('dist%s' % w, None)
+        if val is not None:
+            try:
+                val = emu_to_pt(val)
+            except (TypeError, ValueError):
+                continue
+            ans['padding-%s' % css] = '%.3gpt' % val
+    return ans
+
+def get_hpos(anchor, page_width):
+    for ph in XPath('./wp:positionH')(anchor):
+        rp = ph.get('relativeFrom', None)
+        if rp == 'leftMargin':
+            return 0
+        if rp == 'rightMargin':
+            return 1
+        for align in XPath('./wp:align')(ph):
+            al = align.text
+            if al == 'left':
+                return 0
+            if al == 'center':
+                return 0.5
+            if al == 'right':
+                return 1
+        for po in XPath('./wp:posOffset')(ph):
+            try:
+                pos = emu_to_pt(int(po.text))
+            except (TypeError, ValueError):
+                continue
+            return pos/page_width
+
+    for sp in XPath('./wp:simplePos')(anchor):
+        try:
+            x = emu_to_pt(sp.get('x', None))
+        except (TypeError, ValueError):
+            continue
+        return x/page_width
+
+    return 0
+
+
+class Images(object):
+
+    def __init__(self):
+        self.rid_map = {}
+        self.used = {}
+        self.names = set()
+        self.all_images = set()
+
+    def __call__(self, relationships_by_id):
+        self.rid_map = relationships_by_id
+
+    def generate_filename(self, rid, base=None):
+        if rid in self.used:
+            return self.used[rid]
+        raw = self.docx.read(self.rid_map[rid])
+        base = base or ascii_filename(self.rid_map[rid].rpartition('/')[-1]).replace(' ', '_')
+        ext = what(None, raw) or base.rpartition('.')[-1] or 'jpeg'
+        base = base.rpartition('.')[0] + '.' + ext
+        exists = frozenset(self.used.itervalues())
+        c = 1
+        while base in exists:
+            n, e = base.rpartition('.')[0::2]
+            base = '%s-%d.%s' % (n, c, e)
+            c += 1
+        self.used[rid] = base
+        with open(os.path.join(self.dest_dir, base), 'wb') as f:
+            f.write(raw)
+        self.all_images.add('images/' + base)
+        return base
+
+    def pic_to_img(self, pic, alt=None):
+        name = None
+        for pr in XPath('descendant::pic:cNvPr')(pic):
+            name = pr.get('name', None)
+            if name:
+                name = ascii_filename(name).replace(' ', '_')
+            alt = pr.get('descr', None)
+            for a in XPath('descendant::a:blip[@r:embed]')(pic):
+                rid = get(a, 'r:embed')
+                if rid in self.rid_map:
+                    src = self.generate_filename(rid, name)
+                    img = IMG(src='images/%s' % src)
+                    if alt:
+                        img(alt=alt)
+                    return img
+
+    def drawing_to_html(self, drawing, page):
+        # First process the inline pictures
+        for inline in XPath('./wp:inline')(drawing):
+            style, alt = get_image_properties(inline)
+            for pic in XPath('descendant::pic:pic')(inline):
+                ans = self.pic_to_img(pic, alt)
+                if ans is not None:
+                    if style:
+                        ans.set('style', '; '.join('%s: %s' % (k, v) for k, v in style.iteritems()))
+                    yield ans
+
+        # Now process the floats
+        for anchor in XPath('./wp:anchor')(drawing):
+            style, alt = get_image_properties(anchor)
+            self.get_float_properties(anchor, style, page)
+            for pic in XPath('descendant::pic:pic')(anchor):
+                ans = self.pic_to_img(pic, alt)
+                if ans is not None:
+                    if style:
+                        ans.set('style', '; '.join('%s: %s' % (k, v) for k, v in style.iteritems()))
+                    yield ans
+
+    def get_float_properties(self, anchor, style, page):
+        if 'display' not in style:
+            style['display'] = 'block'
+        padding = get_image_margins(anchor)
+        width = float(style.get('width', '100pt')[:-2])
+
+        page_width = page.width - page.margin_left - page.margin_right
+
+        hpos = get_hpos(anchor, page_width) + width/(2*page_width)
+
+        wrap_elem = None
+        dofloat = False
+
+        for child in reversed(anchor):
+            bt = barename(child.tag)
+            if bt in {'wrapNone', 'wrapSquare', 'wrapThrough', 'wrapTight', 'wrapTopAndBottom'}:
+                wrap_elem = child
+                dofloat = bt not in {'wrapNone', 'wrapTopAndBottom'}
+                break
+
+        if wrap_elem is not None:
+            padding.update(get_image_margins(wrap_elem))
+            wt = wrap_elem.get('wrapText', None)
+            hpos = 0 if wt == 'right' else 1 if wt == 'left' else hpos
+            if dofloat:
+                style['float'] = 'left' if hpos < 0.65 else 'right'
+            else:
+                ml, mr = (None, None) if hpos < 0.34 else ('auto', None) if hpos > 0.65 else ('auto', 'auto')
+                if ml is not None:
+                    style['margin-left'] = ml
+                if mr is not None:
+                    style['margin-right'] = mr
+
+        style.update(padding)
+
+    def to_html(self, elem, page, docx, dest_dir):
+        dest = os.path.join(dest_dir, 'images')
+        if not os.path.exists(dest):
+            os.mkdir(dest)
+        self.dest_dir, self.docx = dest, docx
+        if elem.tag.endswith('}drawing'):
+            for tag in self.drawing_to_html(elem, page):
+                yield tag
+        # TODO: Handle w:pict
+
+
--- a/src/calibre/ebooks/docx/lcid.py
+++ b/src/calibre/ebooks/docx/lcid.py
@ -0,0 +1,233 @@
+#!/usr/bin/env python
+# vim:fileencoding=utf-8
+from __future__ import (unicode_literals, division, absolute_import,
+                        print_function)
+
+__license__ = 'GPL v3'
+__copyright__ = '2013, Kovid Goyal <kovid at kovidgoyal.net>'
+
+lcid = {
+    1078: 'af',  # Afrikaans - South Africa
+    1052: 'sq',  # Albanian - Albania
+    1118: 'am',  # Amharic - Ethiopia
+    1025: 'ar',  # Arabic - Saudi Arabia
+    5121: 'ar',  # Arabic - Algeria
+    15361: 'ar',  # Arabic - Bahrain
+    3073: 'ar',  # Arabic - Egypt
+    2049: 'ar',  # Arabic - Iraq
+    11265: 'ar',  # Arabic - Jordan
+    13313: 'ar',  # Arabic - Kuwait
+    12289: 'ar',  # Arabic - Lebanon
+    4097: 'ar',  # Arabic - Libya
+    6145: 'ar',  # Arabic - Morocco
+    8193: 'ar',  # Arabic - Oman
+    16385: 'ar',  # Arabic - Qatar
+    10241: 'ar',  # Arabic - Syria
+    7169: 'ar',  # Arabic - Tunisia
+    14337: 'ar',  # Arabic - U.A.E.
+    9217: 'ar',  # Arabic - Yemen
+    1067: 'hy',  # Armenian - Armenia
+    1101: 'as',  # Assamese
+    2092: 'az',  # Azeri (Cyrillic)
+    1068: 'az',  # Azeri (Latin)
+    1069: 'eu',  # Basque
+    1059: 'be',  # Belarusian
+    1093: 'bn',  # Bengali (India)
+    2117: 'bn',  # Bengali (Bangladesh)
+    5146: 'bs',  # Bosnian (Bosnia/Herzegovina)
+    1026: 'bg',  # Bulgarian
+    1109: 'my',  # Burmese
+    1027: 'ca',  # Catalan
+    1116: 'chr',  # Cherokee - United States
+    2052: 'zh',  # Chinese - People's Republic of China
+    4100: 'zh',  # Chinese - Singapore
+    1028: 'zh',  # Chinese - Taiwan
+    3076: 'zh',  # Chinese - Hong Kong SAR
+    5124: 'zh',  # Chinese - Macao SAR
+    1050: 'hr',  # Croatian
+    4122: 'hr',  # Croatian (Bosnia/Herzegovina)
+    1029: 'cs',  # Czech
+    1030: 'da',  # Danish
+    1125: 'dv',  # Divehi
+    1043: 'nl',  # Dutch - Netherlands
+    2067: 'nl',  # Dutch - Belgium
+    1126: 'bin',  # Edo
+    1033: 'en',  # English - United States
+    2057: 'en',  # English - United Kingdom
+    3081: 'en',  # English - Australia
+    10249: 'en',  # English - Belize
+    4105: 'en',  # English - Canada
+    9225: 'en',  # English - Caribbean
+    15369: 'en',  # English - Hong Kong SAR
+    16393: 'en',  # English - India
+    14345: 'en',  # English - Indonesia
+    6153: 'en',  # English - Ireland
+    8201: 'en',  # English - Jamaica
+    17417: 'en',  # English - Malaysia
+    5129: 'en',  # English - New Zealand
+    13321: 'en',  # English - Philippines
+    18441: 'en',  # English - Singapore
+    7177: 'en',  # English - South Africa
+    11273: 'en',  # English - Trinidad
+    12297: 'en',  # English - Zimbabwe
+    1061: 'et',  # Estonian
+    1080: 'fo',  # Faroese
+    1065: None,  # TODO: Farsi
+    1124: 'fil',  # Filipino
+    1035: 'fi',  # Finnish
+    1036: 'fr',  # French - France
+    2060: 'fr',  # French - Belgium
+    11276: 'fr',  # French - Cameroon
+    3084: 'fr',  # French - Canada
+    9228: 'fr',  # French - Democratic Rep. of Congo
+    12300: 'fr',  # French - Cote d'Ivoire
+    15372: 'fr',  # French - Haiti
+    5132: 'fr',  # French - Luxembourg
+    13324: 'fr',  # French - Mali
+    6156: 'fr',  # French - Monaco
+    14348: 'fr',  # French - Morocco
+    58380: 'fr',  # French - North Africa
+    8204: 'fr',  # French - Reunion
+    10252: 'fr',  # French - Senegal
+    4108: 'fr',  # French - Switzerland
+    7180: 'fr',  # French - West Indies
+    1122: 'fy',  # Frisian - Netherlands
+    1127: None,  # TODO: Fulfulde - Nigeria
+    1071: 'mk',  # FYRO Macedonian
+    2108: 'ga',  # Gaelic (Ireland)
+    1084: 'gd',  # Gaelic (Scotland)
+    1110: 'gl',  # Galician
+    1079: 'ka',  # Georgian
+    1031: 'de',  # German - Germany
+    3079: 'de',  # German - Austria
+    5127: 'de',  # German - Liechtenstein
+    4103: 'de',  # German - Luxembourg
+    2055: 'de',  # German - Switzerland
+    1032: 'el',  # Greek
+    1140: 'gn',  # Guarani - Paraguay
+    1095: 'gu',  # Gujarati
+    1128: 'ha',  # Hausa - Nigeria
+    1141: 'haw',  # Hawaiian - United States
+    1037: 'he',  # Hebrew
+    1081: 'hi',  # Hindi
+    1038: 'hu',  # Hungarian
+    1129: None,  # TODO: Ibibio - Nigeria
+    1039: 'is',  # Icelandic
+    1136: 'ig',  # Igbo - Nigeria
+    1057: 'id',  # Indonesian
+    1117: 'iu',  # Inuktitut
+    1040: 'it',  # Italian - Italy
+    2064: 'it',  # Italian - Switzerland
+    1041: 'ja',  # Japanese
+    1099: 'kn',  # Kannada
+    1137: 'kr',  # Kanuri - Nigeria
+    2144: 'ks',  # Kashmiri
+    1120: 'ks',  # Kashmiri (Arabic)
+    1087: 'kk',  # Kazakh
+    1107: 'km',  # Khmer
+    1111: 'kok',  # Konkani
+    1042: 'ko',  # Korean
+    1088: 'ky',  # Kyrgyz (Cyrillic)
+    1108: 'lo',  # Lao
+    1142: 'la',  # Latin
+    1062: 'lv',  # Latvian
+    1063: 'lt',  # Lithuanian
+    1086: 'ms',  # Malay - Malaysia
+    2110: 'ms',  # Malay - Brunei Darussalam
+    1100: 'ml',  # Malayalam
+    1082: 'mt',  # Maltese
+    1112: 'mni',  # Manipuri
+    1153: 'mi',  # Maori - New Zealand
+    1102: 'mr',  # Marathi
+    1104: 'mn',  # Mongolian (Cyrillic)
+    2128: 'mn',  # Mongolian (Mongolian)
+    1121: 'ne',  # Nepali
+    2145: 'ne',  # Nepali - India
+    1044: 'no',  # Norwegian (Bokmￃﾥl)
+    2068: 'no',  # Norwegian (Nynorsk)
+    1096: 'or',  # Oriya
+    1138: 'om',  # Oromo
+    1145: 'pap',  # Papiamentu
+    1123: 'ps',  # Pashto
+    1045: 'pl',  # Polish
+    1046: 'pt',  # Portuguese - Brazil
+    2070: 'pt',  # Portuguese - Portugal
+    1094: 'pa',  # Punjabi
+    2118: 'pa',  # Punjabi (Pakistan)
+    1131: 'qu',  # Quecha - Bolivia
+    2155: 'qu',  # Quecha - Ecuador
+    3179: 'qu',  # Quecha - Peru
+    1047: 'rm',  # Rhaeto-Romanic
+    1048: 'ro',  # Romanian
+    2072: 'ro',  # Romanian - Moldava
+    1049: 'ru',  # Russian
+    2073: 'ru',  # Russian - Moldava
+    1083: 'se',  # Sami (Lappish)
+    1103: 'sa',  # Sanskrit
+    1132: 'nso',  # Sepedi
+    3098: 'sr',  # Serbian (Cyrillic)
+    2074: 'sr',  # Serbian (Latin)
+    1113: 'sd',  # Sindhi - India
+    2137: 'sd',  # Sindhi - Pakistan
+    1115: 'si',  # Sinhalese - Sri Lanka
+    1051: 'sk',  # Slovak
+    1060: 'sl',  # Slovenian
+    1143: 'so',  # Somali
+    1070: 'wen',  # Sorbian
+    3082: 'es',  # Spanish - Spain (Modern Sort)
+    1034: 'es',  # Spanish - Spain (Traditional Sort)
+    11274: 'es',  # Spanish - Argentina
+    16394: 'es',  # Spanish - Bolivia
+    13322: 'es',  # Spanish - Chile
+    9226: 'es',  # Spanish - Colombia
+    5130: 'es',  # Spanish - Costa Rica
+    7178: 'es',  # Spanish - Dominican Republic
+    12298: 'es',  # Spanish - Ecuador
+    17418: 'es',  # Spanish - El Salvador
+    4106: 'es',  # Spanish - Guatemala
+    18442: 'es',  # Spanish - Honduras
+    58378: 'es',  # Spanish - Latin America
+    2058: 'es',  # Spanish - Mexico
+    19466: 'es',  # Spanish - Nicaragua
+    6154: 'es',  # Spanish - Panama
+    15370: 'es',  # Spanish - Paraguay
+    10250: 'es',  # Spanish - Peru
+    20490: 'es',  # Spanish - Puerto Rico
+    21514: 'es',  # Spanish - United States
+    14346: 'es',  # Spanish - Uruguay
+    8202: 'es',  # Spanish - Venezuela
+    1072: None,  # TODO: Sutu
+    1089: 'sw',  # Swahili
+    1053: 'sv',  # Swedish
+    2077: 'sv',  # Swedish - Finland
+    1114: 'syr',  # Syriac
+    1064: 'tg',  # Tajik
+    1119: None,  # TODO: Tamazight (Arabic)
+    2143: None,  # TODO: Tamazight (Latin)
+    1097: 'ta',  # Tamil
+    1092: 'tt',  # Tatar
+    1098: 'te',  # Telugu
+    1054: 'th',  # Thai
+    2129: 'bo',  # Tibetan - Bhutan
+    1105: 'bo',  # Tibetan - People's Republic of China
+    2163: 'ti',  # Tigrigna - Eritrea
+    1139: 'ti',  # Tigrigna - Ethiopia
+    1073: 'ts',  # Tsonga
+    1074: 'tn',  # Tswana
+    1055: 'tr',  # Turkish
+    1090: 'tk',  # Turkmen
+    1152: 'ug',  # Uighur - China
+    1058: 'uk',  # Ukrainian
+    1056: 'ur',  # Urdu
+    2080: 'ur',  # Urdu - India
+    2115: 'uz',  # Uzbek (Cyrillic)
+    1091: 'uz',  # Uzbek (Latin)
+    1075: 've',  # Venda
+    1066: 'vi',  # Vietnamese
+    1106: 'cy',  # Welsh
+    1076: 'xh',  # Xhosa
+    1144: 'ii',  # Yi
+    1085: 'yi',  # Yiddish
+    1130: 'yo',  # Yoruba
+    1077: 'zu'  # Zulu
+}
--- a/src/calibre/ebooks/docx/names.py
+++ b/src/calibre/ebooks/docx/names.py
@ -6,12 +6,23 @@ from __future__ import (unicode_literals, division, absolute_import,
 __license__ = 'GPL v3'
 __copyright__ = '2013, Kovid Goyal <kovid at kovidgoyal.net>'

+import re
+from future_builtins import map
+
 from lxml.etree import XPath as X

+from calibre.utils.filenames import ascii_text
+
 DOCUMENT  = 'http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument'
 DOCPROPS  = 'http://schemas.openxmlformats.org/package/2006/relationships/metadata/core-properties'
 APPPROPS  = 'http://schemas.openxmlformats.org/officeDocument/2006/relationships/extended-properties'
 STYLES    = 'http://schemas.openxmlformats.org/officeDocument/2006/relationships/styles'
+NUMBERING = 'http://schemas.openxmlformats.org/officeDocument/2006/relationships/numbering'
+FONTS     = 'http://schemas.openxmlformats.org/officeDocument/2006/relationships/fontTable'
+IMAGES    = 'http://schemas.openxmlformats.org/officeDocument/2006/relationships/image'
+LINKS     = 'http://schemas.openxmlformats.org/officeDocument/2006/relationships/hyperlink'
+FOOTNOTES = 'http://schemas.openxmlformats.org/officeDocument/2006/relationships/footnotes'
+ENDNOTES  = 'http://schemas.openxmlformats.org/officeDocument/2006/relationships/endnotes'

 namespaces = {
    'mo': 'http://schemas.microsoft.com/office/mac/office/2008/main',
@ -44,8 +55,13 @@ namespaces = {
    'dcterms': 'http://purl.org/dc/terms/'
 }

+xpath_cache = {}
+
 def XPath(expr):
-    return X(expr, namespaces=namespaces)
+    ans = xpath_cache.get(expr, None)
+    if ans is None:
+        xpath_cache[expr] = ans = X(expr, namespaces=namespaces)
+    return ans

 def is_tag(x, q):
    tag = getattr(x, 'tag', x)
@ -58,7 +74,32 @@ def barename(x):
 def XML(x):
    return '{%s}%s' % (namespaces['xml'], x)

-def get(x, attr, default=None):
-    ns, name = attr.partition(':')[0::2]
-    return x.attrib.get('{%s}%s' % (namespaces[ns], name), default)
+def expand(name):
+    ns, tag = name.partition(':')[0::2]
+    if ns:
+        tag = '{%s}%s' % (namespaces[ns], tag)
+    return tag

+def get(x, attr, default=None):
+    return x.attrib.get(expand(attr), default)
+
+def ancestor(elem, name):
+    tag = expand(name)
+    while elem is not None:
+        elem = elem.getparent()
+        if getattr(elem, 'tag', None) == tag:
+            return elem
+
+def generate_anchor(name, existing):
+    x = y = 'id_' + re.sub(r'[^0-9a-zA-Z_]', '', ascii_text(name)).lstrip('_')
+    c = 1
+    while y in existing:
+        y = '%s_%d' % (x, c)
+        c += 1
+    return y
+
+def children(elem, *args):
+    return elem.iterchildren(*map(expand, args))
+
+def descendants(elem, *args):
+    return elem.iterdescendants(*map(expand, args))
--- a/src/calibre/ebooks/docx/numbering.py
+++ b/src/calibre/ebooks/docx/numbering.py
@ -0,0 +1,300 @@
+#!/usr/bin/env python
+# vim:fileencoding=utf-8
+from __future__ import (unicode_literals, division, absolute_import,
+                        print_function)
+
+__license__ = 'GPL v3'
+__copyright__ = '2013, Kovid Goyal <kovid at kovidgoyal.net>'
+
+import re
+from collections import Counter
+
+from lxml.html.builder import OL, UL, SPAN
+
+from calibre.ebooks.docx.block_styles import ParagraphStyle
+from calibre.ebooks.docx.char_styles import RunStyle
+from calibre.ebooks.docx.names import XPath, get
+
+STYLE_MAP = {
+    'aiueo': 'hiragana',
+    'aiueoFullWidth': 'hiragana',
+    'hebrew1': 'hebrew',
+    'iroha': 'katakana-iroha',
+    'irohaFullWidth': 'katakana-iroha',
+    'lowerLetter': 'lower-alpha',
+    'lowerRoman': 'lower-roman',
+    'none': 'none',
+    'upperLetter': 'upper-alpha',
+    'upperRoman': 'upper-roman',
+    'chineseCounting': 'cjk-ideographic',
+    'decimalZero': 'decimal-leading-zero',
+}
+
+class Level(object):
+
+    def __init__(self, lvl=None):
+        self.restart = None
+        self.start = 0
+        self.fmt = 'decimal'
+        self.para_link = None
+        self.paragraph_style = self.character_style = None
+        self.is_numbered = False
+        self.num_template = None
+
+        if lvl is not None:
+            self.read_from_xml(lvl)
+
+    def copy(self):
+        ans = Level()
+        for x in ('restart', 'start', 'fmt', 'para_link', 'paragraph_style', 'character_style', 'is_numbered', 'num_template'):
+            setattr(ans, x, getattr(self, x))
+        return ans
+
+    def format_template(self, counter, ilvl):
+        def sub(m):
+            x = int(m.group(1)) - 1
+            if x > ilvl or x not in counter:
+                return ''
+            return '%d' % (counter[x] - (0 if x == ilvl else 1))
+        return re.sub(r'%(\d+)', sub, self.num_template).rstrip() + '\xa0'
+
+    def read_from_xml(self, lvl, override=False):
+        for lr in XPath('./w:lvlRestart[@w:val]')(lvl):
+            try:
+                self.restart = int(get(lr, 'w:val'))
+            except (TypeError, ValueError):
+                pass
+
+        for lr in XPath('./w:start[@w:val]')(lvl):
+            try:
+                self.start = int(get(lr, 'w:val'))
+            except (TypeError, ValueError):
+                pass
+
+        lt = None
+        for lr in XPath('./w:lvlText[@w:val]')(lvl):
+            lt = get(lr, 'w:val')
+
+        for lr in XPath('./w:numFmt[@w:val]')(lvl):
+            val = get(lr, 'w:val')
+            if val == 'bullet':
+                self.is_numbered = False
+                self.fmt = {'\uf0a7':'square', 'o':'circle'}.get(lt, 'disc')
+            else:
+                self.is_numbered = True
+                self.fmt = STYLE_MAP.get(val, 'decimal')
+                if lt and re.match(r'%\d+\.$', lt) is None:
+                    self.num_template = lt
+
+        for lr in XPath('./w:pStyle[@w:val]')(lvl):
+            self.para_link = get(lr, 'w:val')
+
+        for pPr in XPath('./w:pPr')(lvl):
+            ps = ParagraphStyle(pPr)
+            if self.paragraph_style is None:
+                self.paragraph_style = ps
+            else:
+                self.paragraph_style.update(ps)
+
+        for rPr in XPath('./w:rPr')(lvl):
+            ps = RunStyle(rPr)
+            if self.character_style is None:
+                self.character_style = ps
+            else:
+                self.character_style.update(ps)
+
+class NumberingDefinition(object):
+
+    def __init__(self, parent=None):
+        self.levels = {}
+        if parent is not None:
+            for lvl in XPath('./w:lvl')(parent):
+                try:
+                    ilvl = int(get(lvl, 'w:ilvl', 0))
+                except (TypeError, ValueError):
+                    ilvl = 0
+                self.levels[ilvl] = Level(lvl)
+
+    def copy(self):
+        ans = NumberingDefinition()
+        for l, lvl in self.levels.iteritems():
+            ans.levels[l] = lvl.copy()
+        return ans
+
+class Numbering(object):
+
+    def __init__(self):
+        self.definitions = {}
+        self.instances = {}
+        self.counters = {}
+
+    def __call__(self, root, styles):
+        ' Read all numbering style definitions '
+        lazy_load = {}
+        for an in XPath('./w:abstractNum[@w:abstractNumId]')(root):
+            an_id = get(an, 'w:abstractNumId')
+            nsl = XPath('./w:numStyleLink[@w:val]')(an)
+            if nsl:
+                lazy_load[an_id] = get(nsl[0], 'w:val')
+            else:
+                nd = NumberingDefinition(an)
+                self.definitions[an_id] = nd
+
+        def create_instance(n, definition):
+            nd = definition.copy()
+            for lo in XPath('./w:lvlOverride')(n):
+                ilvl = get(lo, 'w:ilvl')
+                for lvl in XPath('./w:lvl')(lo)[:1]:
+                    nilvl = get(lvl, 'w:ilvl')
+                    ilvl = nilvl if ilvl is None else ilvl
+                    alvl = nd.levels.get(ilvl, None)
+                    if alvl is None:
+                        alvl = Level()
+                    alvl.read_from_xml(lvl, override=True)
+            return nd
+
+        next_pass = {}
+        for n in XPath('./w:num[@w:numId]')(root):
+            an_id = None
+            num_id = get(n, 'w:numId')
+            for an in XPath('./w:abstractNumId[@w:val]')(n):
+                an_id = get(an, 'w:val')
+            d = self.definitions.get(an_id, None)
+            if d is None:
+                next_pass[num_id] = (an_id, n)
+                continue
+            self.instances[num_id] = create_instance(n, d)
+
+        numbering_links = styles.numbering_style_links
+        for an_id, style_link in lazy_load.iteritems():
+            num_id = numbering_links[style_link]
+            self.definitions[an_id] = self.instances[num_id].copy()
+
+        for num_id, (an_id, n) in next_pass.iteritems():
+            d = self.definitions.get(an_id, None)
+            if d is not None:
+                self.instances[num_id] = create_instance(n, d)
+
+        for num_id, d in self.instances.iteritems():
+            self.counters[num_id] = Counter({lvl:d.levels[lvl].start for lvl in d.levels})
+
+    def get_pstyle(self, num_id, style_id):
+        d = self.instances.get(num_id, None)
+        if d is not None:
+            for ilvl, lvl in d.levels.iteritems():
+                if lvl.para_link == style_id:
+                    return ilvl
+
+    def get_para_style(self, num_id, lvl):
+        d = self.instances.get(num_id, None)
+        if d is not None:
+            lvl = d.levels.get(lvl, None)
+            return getattr(lvl, 'paragraph_style', None)
+
+    def update_counter(self, counter, levelnum, levels):
+        counter[levelnum] += 1
+        for ilvl, lvl in levels.iteritems():
+            restart = lvl.restart
+            if (restart is None and ilvl == levelnum + 1) or restart == levelnum + 1:
+                counter[ilvl] = lvl.start
+
+    def apply_markup(self, items, body, styles, object_map):
+        for p, num_id, ilvl in items:
+            d = self.instances.get(num_id, None)
+            if d is not None:
+                lvl = d.levels.get(ilvl, None)
+                if lvl is not None:
+                    counter = self.counters[num_id]
+                    p.tag = 'li'
+                    p.set('value', '%s' % counter[ilvl])
+                    p.set('list-lvl', str(ilvl))
+                    p.set('list-id', num_id)
+                    if lvl.num_template is not None:
+                        val = lvl.format_template(counter, ilvl)
+                        p.set('list-template', val)
+                    self.update_counter(counter, ilvl, d.levels)
+
+        templates = {}
+
+        def commit(current_run):
+            if not current_run:
+                return
+            start = current_run[0]
+            parent = start.getparent()
+            idx = parent.index(start)
+
+            d = self.instances[start.get('list-id')]
+            ilvl = int(start.get('list-lvl'))
+            lvl = d.levels[ilvl]
+            lvlid = start.get('list-id') + start.get('list-lvl')
+            wrap = (OL if lvl.is_numbered else UL)('\n\t')
+            has_template = 'list-template' in start.attrib
+            if has_template:
+                wrap.set('lvlid', lvlid)
+            else:
+                wrap.set('class', styles.register({'list-style-type': lvl.fmt}, 'list'))
+            parent.insert(idx, wrap)
+            last_val = None
+            for child in current_run:
+                wrap.append(child)
+                child.tail = '\n\t'
+                if has_template:
+                    span = SPAN()
+                    span.text = child.text
+                    child.text = None
+                    for gc in child:
+                        span.append(gc)
+                    child.append(span)
+                    span = SPAN(child.get('list-template'))
+                    last = templates.get(lvlid, '')
+                    if span.text and len(span.text) > len(last):
+                        templates[lvlid] = span.text
+                    child.insert(0, span)
+                for attr in ('list-lvl', 'list-id', 'list-template'):
+                    child.attrib.pop(attr, None)
+                val = int(child.get('value'))
+                if last_val == val - 1 or wrap.tag == 'ul':
+                    child.attrib.pop('value')
+                last_val = val
+            current_run[-1].tail = '\n'
+            del current_run[:]
+
+        parents = set()
+        for child in body.iterdescendants('li'):
+            parents.add(child.getparent())
+
+        for parent in parents:
+            current_run = []
+            for child in parent:
+                if child.tag == 'li':
+                    if current_run:
+                        last = current_run[-1]
+                        if (last.get('list-id') , last.get('list-lvl')) != (child.get('list-id'), child.get('list-lvl')):
+                            commit(current_run)
+                    current_run.append(child)
+                else:
+                    commit(current_run)
+            commit(current_run)
+
+        for wrap in body.xpath('//ol[@lvlid]'):
+            lvlid = wrap.attrib.pop('lvlid')
+            wrap.tag = 'div'
+            text = ''
+            maxtext = templates.get(lvlid, '').replace('.', '')[:-1]
+            for li in wrap.iterchildren('li'):
+                t = li[0].text
+                if t and len(t) > len(text):
+                    text = t
+            for i, li in enumerate(wrap.iterchildren('li')):
+                li.tag = 'div'
+                li.attrib.pop('value', None)
+                li.set('style', 'display:table-row')
+                obj = object_map[li]
+                bs = styles.para_cache[obj]
+                if i == 0:
+                    m = len(maxtext)  # Move the table left to simulate the behavior of a list (number is to the left of text margin)
+                    wrap.set('style', 'display:table; margin-left: -%dem; padding-left: %s' % (m, bs.css.get('margin-left', 0)))
+                bs.css.pop('margin-left', None)
+                for child in li:
+                    child.set('style', 'display:table-cell')
+
--- a/src/calibre/ebooks/docx/styles.py
+++ b/src/calibre/ebooks/docx/styles.py
@ -6,205 +6,56 @@ from __future__ import (unicode_literals, division, absolute_import,
 __license__ = 'GPL v3'
 __copyright__ = '2013, Kovid Goyal <kovid at kovidgoyal.net>'

-from collections import OrderedDict
+import textwrap
+from collections import OrderedDict, Counter

+from calibre.ebooks.docx.block_styles import ParagraphStyle, inherit
+from calibre.ebooks.docx.char_styles import RunStyle
 from calibre.ebooks.docx.names import XPath, get

-class Inherit:
-    pass
-inherit = Inherit()
+class PageProperties(object):

-def binary_property(parent, name):
-    vals = XPath('./w:%s')
-    if not vals:
-        return inherit
-    val = get(vals[0], 'w:val', 'on')
-    return True if val in {'on', '1', 'true'} else False
+    '''
+    Class representing page level properties (page size/margins) read from
+    sectPr elements.
+    '''

-def simple_color(col):
-    if not col or col == 'auto' or len(col) != 6:
-        return 'black'
-    return '#'+col
-
-def simple_float(val, mult=1.0):
+    def __init__(self, elems=()):
+        self.width = self.height = 595.28, 841.89  # pts, A4
+        self.margin_left = self.margin_right = 72  # pts
+        for sectPr in elems:
+            for pgSz in XPath('./w:pgSz')(sectPr):
+                w, h = get(pgSz, 'w:w'), get(pgSz, 'w:h')
                try:
-        return float(val) * mult
-    except (ValueError, TypeError, AttributeError, KeyError):
-        return None
-
-# Block styles {{{
-
-LINE_STYLES = {  # {{{
-    'basicBlackDashes': 'dashed',
-    'basicBlackDots': 'dotted',
-    'basicBlackSquares': 'dashed',
-    'basicThinLines': 'solid',
-    'dashDotStroked': 'groove',
-    'dashed': 'dashed',
-    'dashSmallGap': 'dashed',
-    'dotDash': 'dashed',
-    'dotDotDash': 'dashed',
-    'dotted': 'dotted',
-    'double': 'double',
-    'inset': 'inset',
-    'nil': 'none',
-    'none': 'none',
-    'outset': 'outset',
-    'single': 'solid',
-    'thick': 'solid',
-    'thickThinLargeGap': 'double',
-    'thickThinMediumGap': 'double',
-    'thickThinSmallGap' : 'double',
-    'thinThickLargeGap': 'double',
-    'thinThickMediumGap': 'double',
-    'thinThickSmallGap': 'double',
-    'thinThickThinLargeGap': 'double',
-    'thinThickThinMediumGap': 'double',
-    'thinThickThinSmallGap': 'double',
-    'threeDEmboss': 'ridge',
-    'threeDEngrave': 'groove',
-    'triple': 'double',
-}  # }}}
-
-def read_border(border, dest):
-    all_attrs = set()
-    for edge in ('left', 'top', 'right', 'bottom'):
-        vals = {'padding_%s':inherit, 'border_%s_width':inherit,
-                'border_%s_style':inherit, 'border_%s_color':inherit}
-        all_attrs |= {key % edge for key in vals}
-        for elem in XPath('./w:%s' % edge):
-            color = get(elem, 'w:color')
-            if color is not None:
-                vals['border_%s_color'] = simple_color(color)
-            style = get(elem, 'w:val')
-            if style is not None:
-                vals['border_%s_style'] = LINE_STYLES.get(style, 'solid')
-            space = get(elem, 'w:space')
-            if space is not None:
-                try:
-                    vals['padding_%s'] = float(space)
+                    self.width = int(w)/20
                except (ValueError, TypeError):
                    pass
-            sz = get(elem, 'w:space')
-            if sz is not None:
-                # we dont care about art borders (they are only used for page borders)
                try:
-                    vals['border_%s_width'] = min(96, max(2, float(sz))) * 8
+                    self.height = int(h)/20
+                except (ValueError, TypeError):
+                    pass
+            for pgMar in XPath('./w:pgMar')(sectPr):
+                l, r = get(pgMar, 'w:left'), get(pgMar, 'w:right')
+                try:
+                    self.margin_left = int(l)/20
+                except (ValueError, TypeError):
+                    pass
+                try:
+                    self.margin_right = int(r)/20
                except (ValueError, TypeError):
                    pass

-        for key, val in vals.iteritems():
-            setattr(dest, key % edge, val)
-
-    return all_attrs
-
-def read_indent(parent, dest):
-    padding_left = padding_right = text_indent = inherit
-    for indent in XPath('./w:ind')(parent):
-        l, lc = get(indent, 'w:left'), get(indent, 'w:leftChars')
-        pl = simple_float(lc, 0.01) if lc is not None else simple_float(l, 0.05) if l is not None else None
-        if pl is not None:
-            padding_left = '%.3f%s' % (pl, 'em' if lc is not None else 'pt')
-
-        r, rc = get(indent, 'w:right'), get(indent, 'w:rightChars')
-        pr = simple_float(rc, 0.01) if rc is not None else simple_float(r, 0.05) if r is not None else None
-        if pr is not None:
-            padding_right = '%.3f%s' % (pr, 'em' if rc is not None else 'pt')
-
-        h, hc = get(indent, 'w:hanging'), get(indent, 'w:hangingChars')
-        fl, flc = get(indent, 'w:firstLine'), get(indent, 'w:firstLineChars')
-        ti = (simple_float(hc, 0.01) if hc is not None else simple_float(h, 0.05) if h is not None else
-              simple_float(flc, 0.01) if flc is not None else simple_float(fl, 0.05) if fl is not None else None)
-        if ti is not None:
-            text_indent = '%.3f' % (ti, 'em' if hc is not None or (h is None and flc is not None) else 'pt')
-
-    setattr(dest, 'padding_left', padding_left)
-    setattr(dest, 'padding_right', padding_right)
-    setattr(dest, 'text_indent', text_indent)
-    return {'padding_left', 'padding_right', 'text_indent'}
-
-def read_justification(parent, dest):
-    ans = inherit
-    for jc in XPath('./w:jc[@w:val]')(parent):
-        val = get(jc, 'w:val')
-        if not val:
-            continue
-        if val in {'both', 'distribute'} or 'thai' in val or 'kashida' in val:
-            ans = 'justify'
-        if val in {'left', 'center', 'right',}:
-            ans = val
-    setattr(dest, 'text_align', ans)
-    return {'text_align'}
-
-def read_spacing(parent, dest):
-    padding_top = padding_bottom = line_height = inherit
-    for s in XPath('./w:spacing')(parent):
-        a, al, aa = get(s, 'w:after'), get(s, 'w:afterLines'), get(s, 'w:afterAutospacing')
-        pb = None if aa in {'on', '1', 'true'} else simple_float(al, 0.02) if al is not None else simple_float(a, 0.05) if a is not None else None
-        if pb is not None:
-            padding_bottom = '%.3f%s' % (pb, 'ex' if al is not None else 'pt')
-
-        b, bl, bb = get(s, 'w:before'), get(s, 'w:beforeLines'), get(s, 'w:beforeAutospacing')
-        pt = None if bb in {'on', '1', 'true'} else simple_float(bl, 0.02) if bl is not None else simple_float(b, 0.05) if b is not None else None
-        if pt is not None:
-            padding_top = '%.3f%s' % (pt, 'ex' if bl is not None else 'pt')
-
-        l, lr = get(s, 'w:line'), get(s, 'w:lineRule', 'auto')
-        if l is not None:
-            lh = simple_float(l, 0.05) if lr in {'exactly', 'atLeast'} else simple_float(l, 1/240.0)
-            line_height = '%.3f%s' % (lh, 'pt' if lr in {'exactly', 'atLeast'} else '')
-
-    setattr(dest, 'padding_top', padding_top)
-    setattr(dest, 'padding_bottom', padding_bottom)
-    setattr(dest, 'line_height', line_height)
-    return {'padding_top', 'padding_bottom', 'line_height'}
-
-def read_direction(parent, dest):
-    ans = inherit
-    for jc in XPath('./w:textFlow[@w:val]')(parent):
-        val = get(jc, 'w:val')
-        if not val:
-            continue
-        if 'rl' in val.lower():
-            ans = 'rtl'
-    setattr(dest, 'direction', ans)
-    return {'direction'}
-
-
-class ParagraphStyle(object):
-
-    border_path = XPath('./w:pBdr')
-
-    def __init__(self, pPr):
-        self.all_properties = set()
-        for p in (
-            'adjustRightInd', 'autoSpaceDE', 'autoSpaceDN',
-            'bidi', 'contextualSpacing', 'keepLines', 'keepNext',
-            'mirrorIndents', 'pageBreakBefore', 'snapToGrid',
-            'suppressLineNumbers', 'suppressOverlap', 'topLinePunct',
-            'widowControl', 'wordWrap',
-        ):
-            self.all_properties.add(p)
-            setattr(p, binary_property(pPr, p))
-
-        for border in self.border_path(pPr):
-            self.all_properties |= read_border(border, self)
-
-        self.all_properties |= read_indent(pPr, self)
-        self.all_properties |= read_justification(pPr, self)
-        self.all_properties |= read_spacing(pPr, self)
-        self.all_properties |= read_direction(pPr, self)
-
-        # TODO: numPr and outlineLvl
-# }}}

 class Style(object):
+    '''
+    Class representing a <w:style> element. Can contain block, character, etc. styles.
+    '''

    name_path = XPath('./w:name[@w:val]')
    based_on_path = XPath('./w:basedOn[@w:val]')
-    link_path = XPath('./w:link[@w:val]')

    def __init__(self, elem):
+        self.resolved = False
        self.style_id = get(elem, 'w:styleId')
        self.style_type = get(elem, 'w:type')
        names = self.name_path(elem)
@ -213,16 +64,57 @@ class Style(object):
        self.based_on = get(based_on[0], 'w:val') if based_on else None
        if self.style_type == 'numbering':
            self.based_on = None
-        link = self.link_path(elem)
-        self.link = get(link[0], 'w:val') if link else None
-        if self.style_type not in {'paragraph', 'character'}:
-            self.link = None
+        self.is_default = get(elem, 'w:default') in {'1', 'on', 'true'}
+
+        self.paragraph_style = self.character_style = None
+
+        if self.style_type in {'paragraph', 'character'}:
+            if self.style_type == 'paragraph':
+                for pPr in XPath('./w:pPr')(elem):
+                    ps = ParagraphStyle(pPr)
+                    if self.paragraph_style is None:
+                        self.paragraph_style = ps
+                    else:
+                        self.paragraph_style.update(ps)
+
+            for rPr in XPath('./w:rPr')(elem):
+                rs = RunStyle(rPr)
+                if self.character_style is None:
+                    self.character_style = rs
+                else:
+                    self.character_style.update(rs)
+
+        if self.style_type == 'numbering':
+            self.numbering_style_link = None
+            for x in XPath('./w:pPr/w:numPr/w:numId[@w:val]')(elem):
+                self.numbering_style_link = get(x, 'w:val')
+
+    def resolve_based_on(self, parent):
+        if parent.paragraph_style is not None:
+            if self.paragraph_style is None:
+                self.paragraph_style = ParagraphStyle()
+            self.paragraph_style.resolve_based_on(parent.paragraph_style)
+        if parent.character_style is not None:
+            if self.character_style is None:
+                self.character_style = RunStyle()
+            self.character_style.resolve_based_on(parent.character_style)


 class Styles(object):

+    '''
+    Collection of all styles defined in the document. Used to get the final styles applicable to elements in the document markup.
+    '''
+
    def __init__(self):
        self.id_map = OrderedDict()
+        self.para_cache = {}
+        self.para_char_cache = {}
+        self.run_cache = {}
+        self.classes = {}
+        self.counter = Counter()
+        self.default_styles = {}
+        self.numbering_style_links = {}

    def __iter__(self):
        for s in self.id_map.itervalues():
@ -237,27 +129,279 @@ class Styles(object):
    def get(self, key, default=None):
        return self.id_map.get(key, default)

-    def __call__(self, root):
+    def __call__(self, root, fonts):
+        self.fonts = fonts
        for s in XPath('//w:style')(root):
            s = Style(s)
            if s.style_id:
                self.id_map[s.style_id] = s
+            if s.is_default:
+                self.default_styles[s.style_type] = s
+            if s.style_type == 'numbering' and s.numbering_style_link:
+                self.numbering_style_links[s.style_id] = s.numbering_style_link
+
+        self.default_paragraph_style = self.default_character_style = None
+
+        for dd in XPath('./w:docDefaults')(root):
+            for pd in XPath('./w:pPrDefault')(dd):
+                for pPr in XPath('./w:pPr')(pd):
+                    ps = ParagraphStyle(pPr)
+                    if self.default_paragraph_style is None:
+                        self.default_paragraph_style = ps
+                    else:
+                        self.default_paragraph_style.update(ps)
+            for pd in XPath('./w:rPrDefault')(dd):
+                for pPr in XPath('./w:rPr')(pd):
+                    ps = RunStyle(pPr)
+                    if self.default_character_style is None:
+                        self.default_character_style = ps
+                    else:
+                        self.default_character_style.update(ps)
+
+        def resolve(s, p):
+            if p is not None:
+                if not p.resolved:
+                    resolve(p, self.get(p.based_on))
+                s.resolve_based_on(p)
+            s.resolved = True

-        # Nuke based_on, link attributes that refer to non-existing/incompatible
-        # parents
        for s in self:
-            bo = s.based_on
-            if bo is not None:
-                p = self.get(bo)
-                if p is None or p.style_type != s.style_type:
-                    s.based_on = None
-            link = s.link
-            if link is not None:
-                p = self.get(link)
-                if p is None or (s.style_type, p.style_type) not in {('paragraph', 'character'), ('character', 'paragraph')}:
-                    s.link = None
+            if not s.resolved:
+                resolve(s, self.get(s.based_on))

-        # TODO: Document defaults (docDefaults)
+    def para_val(self, parent_styles, direct_formatting, attr):
+        val = getattr(direct_formatting, attr)
+        if val is inherit:
+            for ps in reversed(parent_styles):
+                pval = getattr(ps, attr)
+                if pval is not inherit:
+                    val = pval
+                    break
+        return val

+    def run_val(self, parent_styles, direct_formatting, attr):
+        val = getattr(direct_formatting, attr)
+        if val is not inherit:
+            return val
+        if attr in direct_formatting.toggle_properties:
+            val = False
+            for rs in parent_styles:
+                pval = getattr(rs, attr)
+                if pval is True:
+                    val ^= True
+            return val
+        for rs in reversed(parent_styles):
+            rval = getattr(rs, attr)
+            if rval is not inherit:
+                return rval
+        return val

+    def resolve_paragraph(self, p):
+        ans = self.para_cache.get(p, None)
+        if ans is None:
+            ans = self.para_cache[p] = ParagraphStyle()
+            ans.style_name = None
+            direct_formatting = None
+            for pPr in XPath('./w:pPr')(p):
+                ps = ParagraphStyle(pPr)
+                if direct_formatting is None:
+                    direct_formatting = ps
+                else:
+                    direct_formatting.update(ps)
+
+            if direct_formatting is None:
+                direct_formatting = ParagraphStyle()
+            parent_styles = []
+            if self.default_paragraph_style is not None:
+                parent_styles.append(self.default_paragraph_style)
+
+            default_para = self.default_styles.get('paragraph', None)
+            if direct_formatting.linked_style is not None:
+                ls = self.get(direct_formatting.linked_style)
+                if ls is not None:
+                    ans.style_name = ls.name
+                    ps = ls.paragraph_style
+                    if ps is not None:
+                        parent_styles.append(ps)
+                    if ls.character_style is not None:
+                        self.para_char_cache[p] = ls.character_style
+            elif default_para is not None:
+                if default_para.paragraph_style is not None:
+                    parent_styles.append(default_para.paragraph_style)
+                if default_para.character_style is not None:
+                    self.para_char_cache[p] = default_para.character_style
+
+            is_numbering = direct_formatting.numbering is not inherit
+            if is_numbering:
+                num_id, lvl = direct_formatting.numbering
+                if num_id is not None:
+                    p.set('calibre_num_id', '%s:%s' % (lvl, num_id))
+                if num_id is not None and lvl is not None:
+                    ps = self.numbering.get_para_style(num_id, lvl)
+                    if ps is not None:
+                        parent_styles.append(ps)
+
+            for attr in ans.all_properties:
+                if not (is_numbering and attr == 'text_indent'):  # skip text-indent for lists
+                    setattr(ans, attr, self.para_val(parent_styles, direct_formatting, attr))
+        return ans
+
+    def resolve_run(self, r):
+        ans = self.run_cache.get(r, None)
+        if ans is None:
+            p = r.getparent()
+            ans = self.run_cache[r] = RunStyle()
+            direct_formatting = None
+            for rPr in XPath('./w:rPr')(r):
+                rs = RunStyle(rPr)
+                if direct_formatting is None:
+                    direct_formatting = rs
+                else:
+                    direct_formatting.update(rs)
+
+            if direct_formatting is None:
+                direct_formatting = RunStyle()
+
+            parent_styles = []
+            default_char = self.default_styles.get('character', None)
+            if self.default_character_style is not None:
+                parent_styles.append(self.default_character_style)
+            pstyle = self.para_char_cache.get(p, None)
+            if pstyle is not None:
+                parent_styles.append(pstyle)
+            if direct_formatting.linked_style is not None:
+                ls = self.get(direct_formatting.linked_style).character_style
+                if ls is not None:
+                    parent_styles.append(ls)
+            elif default_char is not None and default_char.character_style is not None:
+                parent_styles.append(default_char.character_style)
+
+            for attr in ans.all_properties:
+                setattr(ans, attr, self.run_val(parent_styles, direct_formatting, attr))
+
+            if ans.font_family is not inherit:
+                ans.font_family = self.fonts.family_for(ans.font_family, ans.b, ans.i)
+
+        return ans
+
+    def resolve(self, obj):
+        if obj.tag.endswith('}p'):
+            return self.resolve_paragraph(obj)
+        if obj.tag.endswith('}r'):
+            return self.resolve_run(obj)
+
+    def cascade(self, layers):
+        self.body_font_family = 'serif'
+        self.body_font_size = '10pt'
+
+        for p, runs in layers.iteritems():
+            char_styles = [self.resolve_run(r) for r in runs]
+            block_style = self.resolve_paragraph(p)
+            c = Counter()
+            for s in char_styles:
+                if s.font_family is not inherit:
+                    c[s.font_family] += 1
+            if c:
+                family = c.most_common(1)[0][0]
+                block_style.font_family = family
+                for s in char_styles:
+                    if s.font_family == family:
+                        s.font_family = inherit
+
+            sizes = [s.font_size for s in char_styles if s.font_size is not inherit]
+            if sizes:
+                sz = block_style.font_size = sizes[0]
+                for s in char_styles:
+                    if s.font_size == sz:
+                        s.font_size = inherit
+
+        block_styles = [self.resolve_paragraph(p) for p in layers]
+        c = Counter()
+        for s in block_styles:
+            if s.font_family is not inherit:
+                c[s.font_family] += 1
+
+        if c:
+            self.body_font_family = family = c.most_common(1)[0][0]
+            for s in block_styles:
+                if s.font_family == family:
+                    s.font_family = inherit
+
+        c = Counter()
+        for s in block_styles:
+            if s.font_size is not inherit:
+                c[s.font_size] += 1
+
+        if c:
+            sz = c.most_common(1)[0][0]
+            for s in block_styles:
+                if s.font_size == sz:
+                    s.font_size = inherit
+            self.body_font_size = '%.3gpt' % sz
+
+    def resolve_numbering(self, numbering):
+        # When a numPr element appears inside a paragraph style, the lvl info
+        # must be discarder and pStyle used instead.
+        self.numbering = numbering
+        for style in self:
+            ps = style.paragraph_style
+            if ps is not None and ps.numbering is not inherit:
+                lvl = numbering.get_pstyle(ps.numbering[0], style.style_id)
+                if lvl is None:
+                    ps.numbering = inherit
+                else:
+                    ps.numbering = (ps.numbering[0], lvl)
+
+    def register(self, css, prefix):
+        h = hash(frozenset(css.iteritems()))
+        ans, _ = self.classes.get(h, (None, None))
+        if ans is None:
+            self.counter[prefix] += 1
+            ans = '%s_%d' % (prefix, self.counter[prefix])
+            self.classes[h] = (ans, css)
+        return ans
+
+    def generate_classes(self):
+        for bs in self.para_cache.itervalues():
+            css = bs.css
+            if css:
+                self.register(css, 'block')
+        for bs in self.run_cache.itervalues():
+            css = bs.css
+            if css:
+                self.register(css, 'text')
+
+    def class_name(self, css):
+        h = hash(frozenset(css.iteritems()))
+        return self.classes.get(h, (None, None))[0]
+
+    def generate_css(self, dest_dir, docx):
+        ef = self.fonts.embed_fonts(dest_dir, docx)
+        prefix = textwrap.dedent(
+            '''\
+            body { font-family: %s; font-size: %s }
+
+            p { text-indent: 1.5em }
+
+            ul, ol, p { margin: 0; padding: 0 }
+
+            sup.noteref a { text-decoration: none }
+
+            h1.notes-header { page-break-before: always }
+
+            dl.notes dt { font-size: large }
+
+            dl.notes dt a { text-decoration: none }
+
+            dl.notes dd { page-break-after: always }
+            ''') % (self.body_font_family, self.body_font_size)
+        if ef:
+            prefix = ef + '\n' + prefix
+
+        ans = []
+        for (cls, css) in sorted(self.classes.itervalues(), key=lambda x:x[0]):
+            b = ('\t%s: %s;' % (k, v) for k, v in css.iteritems())
+            b = '\n'.join(b)
+            ans.append('.%s {\n%s\n}\n' % (cls, b.rstrip(';')))
+        return prefix + '\n' + '\n'.join(ans)

--- a/src/calibre/ebooks/docx/to_html.py
+++ b/src/calibre/ebooks/docx/to_html.py
@ -6,15 +6,22 @@ from __future__ import (unicode_literals, division, absolute_import,
 __license__ = 'GPL v3'
 __copyright__ = '2013, Kovid Goyal <kovid at kovidgoyal.net>'

-import sys, os
+import sys, os, re
+from collections import OrderedDict, defaultdict

 from lxml import html
 from lxml.html.builder import (
-    HTML, HEAD, TITLE, BODY, LINK, META, P, SPAN, BR)
+    HTML, HEAD, TITLE, BODY, LINK, META, P, SPAN, BR, DIV, SUP, A, DT, DL, DD, H1)

 from calibre.ebooks.docx.container import DOCX, fromstring
-from calibre.ebooks.docx.names import XPath, is_tag, barename, XML, STYLES
-from calibre.ebooks.docx.styles import Styles
+from calibre.ebooks.docx.names import (
+    XPath, is_tag, XML, STYLES, NUMBERING, FONTS, get, generate_anchor,
+    descendants, ancestor, FOOTNOTES, ENDNOTES)
+from calibre.ebooks.docx.styles import Styles, inherit, PageProperties
+from calibre.ebooks.docx.numbering import Numbering
+from calibre.ebooks.docx.fonts import Fonts
+from calibre.ebooks.docx.images import Images
+from calibre.ebooks.docx.footnotes import Footnotes
 from calibre.utils.localization import canonicalize_lang, lang_as_iso639_1

 class Text:
@ -28,13 +35,16 @@ class Text:

 class Convert(object):

-    def __init__(self, path_or_stream, dest_dir=None, log=None):
+    def __init__(self, path_or_stream, dest_dir=None, log=None, notes_text=None):
        self.docx = DOCX(path_or_stream, log=log)
        self.log = self.docx.log
+        self.notes_text = notes_text or _('Notes')
        self.dest_dir = dest_dir or os.getcwdu()
        self.mi = self.docx.metadata
        self.body = BODY()
        self.styles = Styles()
+        self.images = Images()
+        self.object_map = OrderedDict()
        self.html = HTML(
            HEAD(
                META(charset='utf-8'),
@ -60,53 +70,264 @@ class Convert(object):
        doc = self.docx.document
        relationships_by_id, relationships_by_type = self.docx.document_relationships
        self.read_styles(relationships_by_type)
-        for top_level in XPath('/w:document/w:body/*')(doc):
-            if is_tag(top_level, 'w:p'):
-                p = self.convert_p(top_level)
+        self.images(relationships_by_id)
+        self.layers = OrderedDict()
+        self.framed = [[]]
+        self.framed_map = {}
+        self.anchor_map = {}
+        self.link_map = defaultdict(list)
+
+        self.read_page_properties(doc)
+        for wp, page_properties in self.page_map.iteritems():
+            self.current_page = page_properties
+            p = self.convert_p(wp)
            self.body.append(p)
-            elif is_tag(top_level, 'w:tbl'):
-                pass  # TODO: tables
-            elif is_tag(top_level, 'w:sectPr'):
-                pass  # TODO: Last section properties
-            else:
-                self.log.debug('Unknown top-level tag: %s, ignoring' % barename(top_level.tag))
+
+        if self.footnotes.has_notes:
+            dl = DL()
+            dl.set('class', 'notes')
+            self.body.append(H1(self.notes_text))
+            self.body[-1].set('class', 'notes-header')
+            self.body.append(dl)
+            for anchor, text, note in self.footnotes:
+                dl.append(DT('[', A('←' + text, href='#back_%s' % anchor, title=text), id=anchor))
+                dl[-1][0].tail = ']'
+                dl.append(DD())
+                for wp in note:
+                    p = self.convert_p(wp)
+                    dl[-1].append(p)
+
+        self.resolve_links(relationships_by_id)
+        # TODO: tables <w:tbl> child of <w:body> (nested tables?)
+
+        self.styles.cascade(self.layers)
+
+        numbered = []
+        for html_obj, obj in self.object_map.iteritems():
+            raw = obj.get('calibre_num_id', None)
+            if raw is not None:
+                lvl, num_id = raw.partition(':')[0::2]
+                try:
+                    lvl = int(lvl)
+                except (TypeError, ValueError):
+                    lvl = 0
+                numbered.append((html_obj, num_id, lvl))
+        self.numbering.apply_markup(numbered, self.body, self.styles, self.object_map)
+        self.apply_frames()
+
        if len(self.body) > 0:
            self.body.text = '\n\t'
            for child in self.body:
                child.tail = '\n\t'
            self.body[-1].tail = '\n'
+
+        self.styles.generate_classes()
+        for html_obj, obj in self.object_map.iteritems():
+            style = self.styles.resolve(obj)
+            if style is not None:
+                css = style.css
+                if css:
+                    cls = self.styles.class_name(css)
+                    if cls:
+                        html_obj.set('class', cls)
+        for html_obj, css in self.framed_map.iteritems():
+            cls = self.styles.class_name(css)
+            if cls:
+                html_obj.set('class', cls)
+
        self.write()

+    def read_page_properties(self, doc):
+        current = []
+        self.page_map = OrderedDict()
+
+        for p in descendants(doc, 'w:p'):
+            sect = tuple(descendants(p, 'w:sectPr'))
+            if sect:
+                pr = PageProperties(sect)
+                for x in current + [p]:
+                    self.page_map[x] = pr
+                current = []
+            else:
+                current.append(p)
+        if current:
+            last = XPath('./w:body/w:sectPr')(doc)
+            pr = PageProperties(last)
+            for x in current:
+                self.page_map[x] = pr
+
    def read_styles(self, relationships_by_type):
-        sname = relationships_by_type.get(STYLES, None)
-        if sname is None:
-            name = self.docx.document_name.split('/')
-            name[-1] = 'styles.xml'
-            if self.docx.exists(name):
-                sname = name
+
+        def get_name(rtype, defname):
+            name = relationships_by_type.get(rtype, None)
+            if name is None:
+                cname = self.docx.document_name.split('/')
+                cname[-1] = defname
+                if self.docx.exists('/'.join(cname)):
+                    name = name
+            return name
+
+        nname = get_name(NUMBERING, 'numbering.xml')
+        sname = get_name(STYLES, 'styles.xml')
+        fname = get_name(FONTS, 'fontTable.xml')
+        foname = get_name(FOOTNOTES, 'footnotes.xml')
+        enname = get_name(ENDNOTES, 'endnotes.xml')
+        numbering = self.numbering = Numbering()
+        footnotes = self.footnotes = Footnotes()
+        fonts = self.fonts = Fonts()
+
+        foraw = enraw = None
+        if foname is not None:
+            try:
+                foraw = self.docx.read(foname)
+            except KeyError:
+                self.log.warn('Footnotes %s do not exist' % foname)
+        if enname is not None:
+            try:
+                enraw = self.docx.read(enname)
+            except KeyError:
+                self.log.warn('Endnotes %s do not exist' % enname)
+        footnotes(fromstring(foraw) if foraw else None, fromstring(enraw) if enraw else None)
+
+        if fname is not None:
+            embed_relationships = self.docx.get_relationships(fname)[0]
+            try:
+                raw = self.docx.read(fname)
+            except KeyError:
+                self.log.warn('Fonts table %s does not exist' % fname)
+            else:
+                fonts(fromstring(raw), embed_relationships, self.docx, self.dest_dir)
+
        if sname is not None:
            try:
                raw = self.docx.read(sname)
            except KeyError:
                self.log.warn('Styles %s do not exist' % sname)
            else:
-                self.styles(fromstring(raw))
+                self.styles(fromstring(raw), fonts)
+
+        if nname is not None:
+            try:
+                raw = self.docx.read(nname)
+            except KeyError:
+                self.log.warn('Numbering styles %s do not exist' % nname)
+            else:
+                numbering(fromstring(raw), self.styles)
+
+        self.styles.resolve_numbering(numbering)

    def write(self):
        raw = html.tostring(self.html, encoding='utf-8', doctype='<!DOCTYPE html>')
        with open(os.path.join(self.dest_dir, 'index.html'), 'wb') as f:
            f.write(raw)
+        css = self.styles.generate_css(self.dest_dir, self.docx)
+        if css:
+            with open(os.path.join(self.dest_dir, 'docx.css'), 'wb') as f:
+                f.write(css.encode('utf-8'))

    def convert_p(self, p):
        dest = P()
-        for run in XPath('descendant::w:r')(p):
-            span = self.convert_run(run)
+        self.object_map[dest] = p
+        style = self.styles.resolve_paragraph(p)
+        self.layers[p] = []
+        self.add_frame(dest, style.frame)
+
+        current_anchor = None
+        current_hyperlink = None
+
+        for x in descendants(p, 'w:r', 'w:bookmarkStart', 'w:hyperlink'):
+            if x.tag.endswith('}r'):
+                span = self.convert_run(x)
+                if current_anchor is not None:
+                    (dest if len(dest) == 0 else span).set('id', current_anchor)
+                    current_anchor = None
+                if current_hyperlink is not None:
+                    hl = ancestor(x, 'w:hyperlink')
+                    if hl is not None:
+                        self.link_map[hl].append(span)
+                    else:
+                        current_hyperlink = None
                dest.append(span)
+                self.layers[p].append(x)
+            elif x.tag.endswith('}bookmarkStart'):
+                anchor = get(x, 'w:name')
+                if anchor and anchor not in self.anchor_map:
+                    self.anchor_map[anchor] = current_anchor = generate_anchor(anchor, frozenset(self.anchor_map.itervalues()))
+            elif x.tag.endswith('}hyperlink'):
+                current_hyperlink = x
+
+        m = re.match(r'heading\s+(\d+)$', style.style_name or '', re.IGNORECASE)
+        if m is not None:
+            n = min(1, max(6, int(m.group(1))))
+            dest.tag = 'h%d' % n
+
+        if style.direction == 'rtl':
+            dest.set('dir', 'rtl')
+
+        border_runs = []
+        common_borders = []
+        for span in dest:
+            run = self.object_map[span]
+            style = self.styles.resolve_run(run)
+            if not border_runs or border_runs[-1][1].same_border(style):
+                border_runs.append((span, style))
+            elif border_runs:
+                if len(border_runs) > 1:
+                    common_borders.append(border_runs)
+                border_runs = []
+
+        for border_run in common_borders:
+            spans = []
+            bs = {}
+            for span, style in border_run:
+                style.get_border_css(bs)
+                style.clear_border_css()
+                spans.append(span)
+            if bs:
+                cls = self.styles.register(bs, 'text_border')
+                wrapper = self.wrap_elems(spans, SPAN())
+                wrapper.set('class', cls)

        return dest

+    def wrap_elems(self, elems, wrapper):
+        p = elems[0].getparent()
+        idx = p.index(elems[0])
+        p.insert(idx, wrapper)
+        wrapper.tail = elems[-1].tail
+        elems[-1].tail = None
+        for elem in elems:
+            p.remove(elem)
+            wrapper.append(elem)
+        return wrapper
+
+    def resolve_links(self, relationships_by_id):
+        for hyperlink, spans in self.link_map.iteritems():
+            span = spans[0]
+            if len(spans) > 1:
+                span = self.wrap_elems(spans, SPAN())
+            span.tag = 'a'
+            tgt = get(hyperlink, 'w:tgtFrame')
+            if tgt:
+                span.set('target', tgt)
+            tt = get(hyperlink, 'w:tooltip')
+            if tt:
+                span.set('title', tt)
+            rid = get(hyperlink, 'r:id')
+            if rid and rid in relationships_by_id:
+                span.set('href', relationships_by_id[rid])
+                continue
+            anchor = get(hyperlink, 'w:anchor')
+            if anchor and anchor in self.anchor_map:
+                span.set('href', '#' + self.anchor_map[anchor])
+                continue
+            self.log.warn('Hyperlink with unknown target (%s, %s), ignoring' %
+                          (rid, anchor))
+            span.set('href', '#')
+
    def convert_run(self, run):
        ans = SPAN()
+        self.object_map[ans] = run
        text = Text(ans, 'text', [])

        for child in run:
@ -121,6 +342,7 @@ class Convert(object):
                    text.buf.append(child.text)
            elif is_tag(child, 'w:cr'):
                text.add_elem(BR())
+                ans.append(text.elem)
            elif is_tag(child, 'w:br'):
                typ = child.get('type', None)
                if typ in {'column', 'page'}:
@ -132,11 +354,56 @@ class Convert(object):
                    else:
                        br = BR()
                text.add_elem(br)
+                ans.append(text.elem)
+            elif is_tag(child, 'w:drawing') or is_tag(child, 'w:pict'):
+                for img in self.images.to_html(child, self.current_page, self.docx, self.dest_dir):
+                    text.add_elem(img)
+                    ans.append(text.elem)
+            elif is_tag(child, 'w:footnoteReference') or is_tag(child, 'w:endnoteReference'):
+                anchor, name = self.footnotes.get_ref(child)
+                if anchor and name:
+                    l = SUP(A(name, href='#' + anchor, title=name), id='back_%s' % anchor)
+                    l.set('class', 'noteref')
+                    text.add_elem(l)
+                    ans.append(text.elem)
        if text.buf:
            setattr(text.elem, text.attr, ''.join(text.buf))
+
+        style = self.styles.resolve_run(run)
+        if style.vert_align in {'superscript', 'subscript'}:
+            ans.tag = 'sub' if style.vert_align == 'subscript' else 'sup'
+        if style.lang is not inherit:
+            ans.lang = style.lang
        return ans

+    def add_frame(self, html_obj, style):
+        last_run = self.framed[-1]
+        if style is inherit:
+            if last_run:
+                self.framed.append([])
+            return
+
+        if last_run:
+            if last_run[-1][1] == style:
+                last_run.append((html_obj, style))
+            else:
+                self.framed.append((html_obj, style))
+        else:
+            last_run.append((html_obj, style))
+
+    def apply_frames(self):
+        for run in filter(None, self.framed):
+            style = run[0][1]
+            paras = tuple(x[0] for x in run)
+            parent = paras[0].getparent()
+            idx = parent.index(paras[0])
+            frame = DIV(*paras)
+            parent.insert(idx, frame)
+            self.framed_map[frame] = css = style.css(self.page_map[self.object_map[paras[0]]])
+            self.styles.register(css, 'frame')
+
 if __name__ == '__main__':
    from calibre.utils.logging import default_log
    default_log.filter_level = default_log.DEBUG
    Convert(sys.argv[-1], log=default_log)()
+
--- a/src/calibre/ebooks/fb2/fb2ml.py
+++ b/src/calibre/ebooks/fb2/fb2ml.py
@ -136,7 +136,7 @@ class FB2MLizer(object):
            metadata['author'] += '<last-name>%s</last-name>' % prepare_string_for_xml(author_last)
            metadata['author'] += '</author>'
        if not metadata['author']:
-            metadata['author'] = u'<author><first-name></first-name><last-name><last-name></author>'
+            metadata['author'] = u'<author><first-name></first-name><last-name></last-name></author>'

        metadata['keywords'] = u''
        tags = list(map(unicode, self.oeb_book.metadata.subject))
--- a/src/calibre/ebooks/metadata/opf2.py
+++ b/src/calibre/ebooks/metadata/opf2.py
@ -21,7 +21,7 @@ from calibre.ebooks.metadata.book.base import Metadata
 from calibre.utils.date import parse_date, isoformat
 from calibre.utils.localization import get_lang, canonicalize_lang
 from calibre import prints, guess_type
-from calibre.utils.cleantext import clean_ascii_chars
+from calibre.utils.cleantext import clean_ascii_chars, clean_xml_chars
 from calibre.utils.config import tweaks

 class Resource(object):  # {{{
@ -560,7 +560,9 @@ class OPF(object):  # {{{
            self.package_version = 0
        self.metadata = self.metadata_path(self.root)
        if not self.metadata:
-            raise ValueError('Malformed OPF file: No <metadata> element')
+            self.metadata = [self.root.makeelement('{http://www.idpf.org/2007/opf}metadata')]
+            self.root.insert(0, self.metadata[0])
+            self.metadata[0].tail = '\n'
        self.metadata      = self.metadata[0]
        if unquote_urls:
            self.unquote_urls()
@ -1434,7 +1436,10 @@ def metadata_to_opf(mi, as_string=True, default_lang=None):
            attrib['name'] = name
        if content:
            attrib['content'] = content
+        try:
            elem = metadata.makeelement(tag, attrib=attrib)
+        except ValueError:
+            elem = metadata.makeelement(tag, attrib={k:clean_xml_chars(v) for k, v in attrib.iteritems()})
        elem.tail = '\n'+(' '*8)
        if text:
            try:
--- a/src/calibre/ebooks/mobi/debug/mobi8.py
+++ b/src/calibre/ebooks/mobi/debug/mobi8.py
@ -163,7 +163,8 @@ class MOBIFile(object):
            ext = 'dat'
            prefix = 'binary'
            suffix = ''
-            if sig in {b'HUFF', b'CDIC', b'INDX'}: continue
+            if sig in {b'HUFF', b'CDIC', b'INDX'}:
+                continue
            # TODO: Ignore CNCX records as well
            if sig == b'FONT':
                font = read_font_record(rec.raw)
@ -196,7 +197,6 @@ class MOBIFile(object):
            vals = list(index)[:-1] + [None, None, None, None]
            entry_map.append(Entry(*(vals[:12])))

-
        indexing_data = collect_indexing_data(entry_map, list(map(len,
            self.text_records)))
        self.indexing_data = [DOC + '\n' +textwrap.dedent('''\
--- a/src/calibre/ebooks/mobi/mobiml.py
+++ b/src/calibre/ebooks/mobi/mobiml.py
@ -16,7 +16,8 @@ from calibre.ebooks.oeb.transforms.flatcss import KeyMapper
 from calibre.utils.magick.draw import identify_data

 MBP_NS = 'http://mobipocket.com/ns/mbp'
-def MBP(name): return '{%s}%s' % (MBP_NS, name)
+def MBP(name):
+    return '{%s}%s' % (MBP_NS, name)

 MOBI_NSMAP = {None: XHTML_NS, 'mbp': MBP_NS}

@ -413,7 +414,7 @@ class MobiMLizer(object):
                        # img sizes in units other than px
                        # See #7520 for test case
                        try:
-                            pixs = int(round(float(value) / \
+                            pixs = int(round(float(value) /
                                (72./self.profile.dpi)))
                        except:
                            continue
@ -488,8 +489,6 @@ class MobiMLizer(object):
        if elem.text:
            if istate.preserve:
                text = elem.text
-            elif len(elem) > 0 and isspace(elem.text):
-                text = None
            else:
                text = COLLAPSE.sub(' ', elem.text)
        valign = style['vertical-align']
--- a/src/calibre/ebooks/mobi/reader/headers.py
+++ b/src/calibre/ebooks/mobi/reader/headers.py
@ -181,9 +181,9 @@ class BookHeader(object):
                self.codec = 'cp1252' if not user_encoding else user_encoding
                log.warn('Unknown codepage %d. Assuming %s' % (self.codepage,
                    self.codec))
-            # Some KF8 files have header length == 256 (generated by kindlegen
-            # 2.7?). See https://bugs.launchpad.net/bugs/1067310
-            max_header_length = 0x100
+            # Some KF8 files have header length == 264 (generated by kindlegen
+            # 2.9?). See https://bugs.launchpad.net/bugs/1179144
+            max_header_length = 500  # We choose 500 for future versions of kindlegen

            if (ident == 'TEXTREAD' or self.length < 0xE4 or
                    self.length > max_header_length or
--- a/src/calibre/ebooks/mobi/reader/markup.py
+++ b/src/calibre/ebooks/mobi/reader/markup.py
@ -100,7 +100,7 @@ def update_flow_links(mobi8_reader, resource_map, log):
    mr = mobi8_reader
    flows = []

-    img_pattern = re.compile(r'''(<[img\s|image\s][^>]*>)''', re.IGNORECASE)
+    img_pattern = re.compile(r'''(<[img\s|image\s|svg:image\s][^>]*>)''', re.IGNORECASE)
    img_index_pattern = re.compile(r'''['"]kindle:embed:([0-9|A-V]+)[^'"]*['"]''', re.IGNORECASE)

    tag_pattern = re.compile(r'''(<[^>]*>)''')
@ -128,7 +128,7 @@ def update_flow_links(mobi8_reader, resource_map, log):
        srcpieces = img_pattern.split(flow)
        for j in range(1, len(srcpieces), 2):
            tag = srcpieces[j]
-            if tag.startswith('<im'):
+            if tag.startswith('<im') or tag.startswith('<svg:image'):
                for m in img_index_pattern.finditer(tag):
                    num = int(m.group(1), 32)
                    href = resource_map[num-1]
--- a/src/calibre/ebooks/mobi/reader/mobi8.py
+++ b/src/calibre/ebooks/mobi/reader/mobi8.py
@ -228,7 +228,7 @@ class Mobi8Reader(object):

        self.flowinfo.append(FlowInfo(None, None, None, None))
        svg_tag_pattern = re.compile(br'''(<svg[^>]*>)''', re.IGNORECASE)
-        image_tag_pattern = re.compile(br'''(<image[^>]*>)''', re.IGNORECASE)
+        image_tag_pattern = re.compile(br'''(<(?:svg:)?image[^>]*>)''', re.IGNORECASE)
        for j in xrange(1, len(self.flows)):
            flowpart = self.flows[j]
            nstr = '%04d' % j
@ -243,7 +243,7 @@ class Mobi8Reader(object):
                    dir = None
                    fname = None
                    # strip off anything before <svg if inlining
-                    flowpart = flowpart[start:]
+                    flowpart = re.sub(br'(</?)svg:', r'\1', flowpart[start:])
                else:
                    format = 'file'
                    dir = "images"
--- a/src/calibre/ebooks/oeb/base.py
+++ b/src/calibre/ebooks/oeb/base.py
@ -373,7 +373,7 @@ def urlquote(href):
        result.append(char)
    return ''.join(result)

-def urlunquote(href):
+def urlunquote(href, error_handling='strict'):
    # unquote must run on a bytestring and will return a bytestring
    # If it runs on a unicode object, it returns a double encoded unicode
    # string: unquote(u'%C3%A4') != unquote(b'%C3%A4').decode('utf-8')
@ -383,7 +383,10 @@ def urlunquote(href):
        href = href.encode('utf-8')
    href = unquote(href)
    if want_unicode:
-        href = href.decode('utf-8')
+        # The quoted characters could have been in some encoding other than
+        # UTF-8, this often happens with old/broken web servers. There is no
+        # way to know what that encoding should be in this context.
+        href = href.decode('utf-8', error_handling)
    return href

 def urlnormalize(href):
@ -871,6 +874,7 @@ class Manifest(object):
            orig_data = data
            fname = urlunquote(self.href)
            self.oeb.log.debug('Parsing', fname, '...')
+            self.oeb.html_preprocessor.current_href = self.href
            try:
                data = parse_html(data, log=self.oeb.log,
                        decoder=self.oeb.decode,
@ -1312,9 +1316,9 @@ class Guide(object):
                         ('notes', __('Notes')),
                         ('preface', __('Preface')),
                         ('text', __('Main Text'))]
-        TYPES = set(t for t, _ in _TYPES_TITLES)
+        TYPES = set(t for t, _ in _TYPES_TITLES)  # noqa
        TITLES = dict(_TYPES_TITLES)
-        ORDER = dict((t, i) for i, (t, _) in enumerate(_TYPES_TITLES))
+        ORDER = dict((t, i) for i, (t, _) in enumerate(_TYPES_TITLES))  # noqa

        def __init__(self, oeb, type, title, href):
            self.oeb = oeb
--- a/src/calibre/ebooks/oeb/display/webview.py
+++ b/src/calibre/ebooks/oeb/display/webview.py
@ -51,7 +51,7 @@ def load_html(path, view, codec='utf-8', mime_type=None,
    loading_url = QUrl.fromLocalFile(path)
    pre_load_callback(loading_url)

-    if force_as_html or re.search(r'<[:a-zA-Z0-9-]*svg', html) is None:
+    if force_as_html or re.search(r'<[a-zA-Z0-9-]+:svg', html) is None:
        view.setHtml(html, loading_url)
    else:
        view.setContent(QByteArray(html.encode(codec)), mime_type,
@ -61,4 +61,3 @@ def load_html(path, view, codec='utf-8', mime_type=None,
        if not elem.isNull():
            return False
    return True
-
--- a/src/calibre/ebooks/oeb/iterator/init.py
+++ b/src/calibre/ebooks/oeb/iterator/init.py
@ -7,7 +7,7 @@ __license__   = 'GPL v3'
 __copyright__ = '2012, Kovid Goyal <kovid@kovidgoyal.net>'
 __docformat__ = 'restructuredtext en'

-import os, re
+import sys, os, re

 from calibre.customize.ui import available_input_formats

@ -26,17 +26,18 @@ def EbookIterator(*args, **kwargs):
    from calibre.ebooks.oeb.iterator.book import EbookIterator
    return EbookIterator(*args, **kwargs)

-def get_preprocess_html(path_to_ebook, output):
-    from calibre.ebooks.conversion.preprocess import HTMLPreProcessor
-    iterator = EbookIterator(path_to_ebook)
-    iterator.__enter__(only_input_plugin=True, run_char_count=False,
-            read_anchor_map=False)
-    preprocessor = HTMLPreProcessor(None, False)
-    with open(output, 'wb') as out:
-        for path in iterator.spine:
-            with open(path, 'rb') as f:
-                html = f.read().decode('utf-8', 'replace')
-            html = preprocessor(html, get_preprocess_html=True)
+def get_preprocess_html(path_to_ebook, output=None):
+    from calibre.ebooks.conversion.plumber import set_regex_wizard_callback, Plumber
+    from calibre.utils.logging import DevNull
+    from calibre.ptempfile import TemporaryDirectory
+    raw = {}
+    set_regex_wizard_callback(raw.__setitem__)
+    with TemporaryDirectory('_regex_wiz') as tdir:
+        pl = Plumber(path_to_ebook, os.path.join(tdir, 'a.epub'), DevNull(), for_regex_wizard=True)
+        pl.run()
+        items = [raw[item.href] for item in pl.oeb.spine if item.href in raw]
+
+    with (sys.stdout if output is None else open(output, 'wb')) as out:
+        for html in items:
            out.write(html.encode('utf-8'))
            out.write(b'\n\n' + b'-'*80 + b'\n\n')
-
--- a/src/calibre/ebooks/oeb/iterator/book.py
+++ b/src/calibre/ebooks/oeb/iterator/book.py
@ -25,7 +25,7 @@ from calibre.ebooks.oeb.transforms.cover import CoverManager
 from calibre.ebooks.oeb.iterator.spine import (SpineItem, create_indexing_data)
 from calibre.ebooks.oeb.iterator.bookmarks import BookmarksMixin

-TITLEPAGE = CoverManager.SVG_TEMPLATE.decode('utf-8').replace(\
+TITLEPAGE = CoverManager.SVG_TEMPLATE.decode('utf-8').replace(
        '__ar__', 'none').replace('__viewbox__', '0 0 600 800'
        ).replace('__width__', '600').replace('__height__', '800')

--- a/src/calibre/ebooks/oeb/parse_utils.py
+++ b/src/calibre/ebooks/oeb/parse_utils.py
@ -44,8 +44,10 @@ META_XP = XPath('/h:html/h:head/h:meta[@http-equiv="Content-Type"]')

 def merge_multiple_html_heads_and_bodies(root, log=None):
    heads, bodies = xpath(root, '//h:head'), xpath(root, '//h:body')
-    if not (len(heads) > 1 or len(bodies) > 1): return root
-    for child in root: root.remove(child)
+    if not (len(heads) > 1 or len(bodies) > 1):
+        return root
+    for child in root:
+        root.remove(child)
    head = root.makeelement(XHTML('head'))
    body = root.makeelement(XHTML('body'))
    for h in heads:
@ -368,8 +370,7 @@ def parse_html(data, log=None, decoder=None, preprocessor=None,
        meta.getparent().remove(meta)
    meta = etree.SubElement(head, XHTML('meta'),
        attrib={'http-equiv': 'Content-Type'})
-    meta.set('content', 'text/html; charset=utf-8') # Ensure content is second
-                                                    # attribute
+    meta.set('content', 'text/html; charset=utf-8')  # Ensure content is second attribute

    # Ensure has a <body/>
    if not xpath(data, '/h:html/h:body'):
--- a/src/calibre/ebooks/oeb/polish/toc.py
+++ b/src/calibre/ebooks/oeb/polish/toc.py
@ -9,7 +9,7 @@ __docformat__ = 'restructuredtext en'

 import re
 from urlparse import urlparse
-from collections import deque
+from collections import deque, Counter
 from functools import partial

 from lxml import etree
@ -29,7 +29,8 @@ class TOC(object):
    def __init__(self, title=None, dest=None, frag=None):
        self.title, self.dest, self.frag = title, dest, frag
        self.dest_exists = self.dest_error = None
-        if self.title: self.title = self.title.strip()
+        if self.title:
+            self.title = self.title.strip()
        self.parent = None
        self.children = []

@ -326,11 +327,13 @@ def create_ncx(toc, to_href, btitle, lang, uid):
    navmap = etree.SubElement(ncx, NCX('navMap'))
    spat = re.compile(r'\s+')

-    def process_node(xml_parent, toc_parent, play_order=0):
+    play_order = Counter()
+
+    def process_node(xml_parent, toc_parent):
        for child in toc_parent:
-            play_order += 1
+            play_order['c'] += 1
            point = etree.SubElement(xml_parent, NCX('navPoint'), id=uuid_id(),
-                            playOrder=str(play_order))
+                            playOrder=str(play_order['c']))
            label = etree.SubElement(point, NCX('navLabel'))
            title = child.title
            if title:
@ -341,7 +344,7 @@ def create_ncx(toc, to_href, btitle, lang, uid):
                if child.frag:
                    href += '#'+child.frag
                etree.SubElement(point, NCX('content'), src=href)
-            process_node(point, child, play_order)
+            process_node(point, child)

    process_node(navmap, toc)
    return ncx
--- a/src/calibre/ebooks/oeb/transforms/flatcss.py
+++ b/src/calibre/ebooks/oeb/transforms/flatcss.py
@ -32,7 +32,8 @@ def dynamic_rescale_factor(node):
    classes = node.get('class', '').split(' ')
    classes = [x.replace('calibre_rescale_', '') for x in classes if
            x.startswith('calibre_rescale_')]
-    if not classes: return None
+    if not classes:
+        return None
    factor = 1.0
    for x in classes:
        try:
@ -54,7 +55,8 @@ class KeyMapper(object):
            return base
        size = float(size)
        base = float(base)
-        if abs(size - base) < 0.1: return 0
+        if abs(size - base) < 0.1:
+            return 0
        sign = -1 if size < base else 1
        endp = 0 if size < base else 36
        diff = (abs(base - size) * 3) + ((36 - size) / 100)
@ -110,7 +112,8 @@ class EmbedFontsCSSRules(object):
        self.href = None

    def __call__(self, oeb):
-        if not self.body_font_family: return None
+        if not self.body_font_family:
+            return None
        if not self.href:
            iid, href = oeb.manifest.generate(u'page_styles', u'page_styles.css')
            rules = [x.cssText for x in self.rules]
@ -228,10 +231,10 @@ class CSSFlattener(object):
            bs.append('margin-top: 0pt')
            bs.append('margin-bottom: 0pt')
            if float(self.context.margin_left) >= 0:
-                bs.append('margin-left : %gpt'%\
+                bs.append('margin-left : %gpt'%
                        float(self.context.margin_left))
            if float(self.context.margin_right) >= 0:
-                bs.append('margin-right : %gpt'%\
+                bs.append('margin-right : %gpt'%
                        float(self.context.margin_right))
            bs.extend(['padding-left: 0pt', 'padding-right: 0pt'])
            if self.page_break_on_body:
@ -277,8 +280,10 @@ class CSSFlattener(object):
        for kind in ('margin', 'padding'):
            for edge in ('bottom', 'top'):
                property = "%s-%s" % (kind, edge)
-                if property not in cssdict: continue
-                if '%' in cssdict[property]: continue
+                if property not in cssdict:
+                    continue
+                if '%' in cssdict[property]:
+                    continue
                value = style[property]
                if value == 0:
                    continue
@ -366,6 +371,11 @@ class CSSFlattener(object):
        is_drop_cap = (cssdict.get('float', None) == 'left' and 'font-size' in
                       cssdict and len(node) == 0 and node.text and
                       len(node.text) == 1)
+        is_drop_cap = is_drop_cap or (
+            # The docx input plugin generates drop caps that look like this
+            len(node) == 1 and not node.text and len(node[0]) == 0 and
+            node[0].text and not node[0].tail and len(node[0].text) == 1 and
+            'line-height' in cssdict and 'font-size' in cssdict)
        if not self.context.disable_font_rescaling and not is_drop_cap:
            _sbase = self.sbase if self.sbase is not None else \
                self.context.source.fbase
@ -436,8 +446,7 @@ class CSSFlattener(object):
            keep_classes = set()

            if cssdict:
-                items = cssdict.items()
-                items.sort()
+                items = sorted(cssdict.items())
                css = u';\n'.join(u'%s: %s' % (key, val) for key, val in items)
                classes = node.get('class', '').strip() or 'calibre'
                klass = ascii_text(STRIPNUM.sub('', classes.split()[0].replace('_', '')))
@ -519,8 +528,7 @@ class CSSFlattener(object):
            if float(self.context.margin_bottom) >= 0:
                stylizer.page_rule['margin-bottom'] = '%gpt'%\
                        float(self.context.margin_bottom)
-            items = stylizer.page_rule.items()
-            items.sort()
+            items = sorted(stylizer.page_rule.items())
            css = ';\n'.join("%s: %s" % (key, val) for key, val in items)
            css = ('@page {\n%s\n}\n'%css) if items else ''
            rules = [r.cssText for r in stylizer.font_face_rules +
@ -556,14 +564,14 @@ class CSSFlattener(object):
            body = html.find(XHTML('body'))
            fsize = self.context.dest.fbase
            self.flatten_node(body, stylizer, names, styles, pseudo_styles, fsize, item.id)
-        items = [(key, val) for (val, key) in styles.items()]
-        items.sort()
+        items = sorted([(key, val) for (val, key) in styles.items()])
        # :hover must come after link and :active must come after :hover
        psels = sorted(pseudo_styles.iterkeys(), key=lambda x :
                {'hover':1, 'active':2}.get(x, 0))
        for psel in psels:
            styles = pseudo_styles[psel]
-            if not styles: continue
+            if not styles:
+                continue
            x = sorted(((k+':'+psel, v) for v, k in styles.iteritems()))
            items.extend(x)

--- a/src/calibre/ebooks/oeb/transforms/split.py
+++ b/src/calibre/ebooks/oeb/transforms/split.py
@ -159,7 +159,11 @@ class Split(object):
        except ValueError:
            # Unparseable URL
            return url
+        try:
            href = urlnormalize(href)
+        except ValueError:
+            # href has non utf-8 quoting
+            return url
        if href in self.map:
            anchor_map = self.map[href]
            nhref = anchor_map[frag if frag else None]
@ -171,7 +175,6 @@ class Split(object):
        return url


-
 class FlowSplitter(object):
    'The actual splitting logic'

@ -313,7 +316,6 @@ class FlowSplitter(object):
        split_point  = root.xpath(path)[0]
        split_point2 = root2.xpath(path)[0]

-
        def nix_element(elem, top=True):
            # Remove elem unless top is False in which case replace elem by its
            # children
@ -373,6 +375,8 @@ class FlowSplitter(object):
        for img in root.xpath('//h:img', namespaces=NAMESPACES):
            if img.get('style', '') != 'display:none':
                return False
+        if root.xpath('//*[local-name() = "svg"]'):
+            return False
        return True

    def split_text(self, text, root, size):
@ -393,7 +397,6 @@ class FlowSplitter(object):
                buf = part
        return ans

-
    def split_to_size(self, tree):
        self.log.debug('\t\tSplitting...')
        root = tree.getroot()
@ -440,7 +443,7 @@ class FlowSplitter(object):
                               len(self.split_trees), size/1024.))
            else:
                self.log.debug(
-                        '\t\t\tSplit tree still too large: %d KB' % \
+                        '\t\t\tSplit tree still too large: %d KB' %
                                (size/1024.))
                self.split_to_size(t)

@ -546,7 +549,6 @@ class FlowSplitter(object):
            for x in toc:
                fix_toc_entry(x)

-
        if self.oeb.toc:
            fix_toc_entry(self.oeb.toc)

--- a/src/calibre/ebooks/pdf/render/links.py
+++ b/src/calibre/ebooks/pdf/render/links.py
@ -45,11 +45,15 @@ class Links(object):
            href, page, rect = link
            p, frag = href.partition('#')[0::2]
            try:
-                link = ((path, p, frag or None), self.pdf.get_pageref(page).obj, Array(rect))
+                pref = self.pdf.get_pageref(page).obj
            except IndexError:
-                self.log.warn('Unable to find page for link: %r, ignoring it' % link)
+                try:
+                    pref = self.pdf.get_pageref(page-1).obj
+                except IndexError:
+                    self.pdf.debug('Unable to find page for link: %r, ignoring it' % link)
                    continue
-            self.links.append(link)
+                self.pdf.debug('The link %s points to non-existent page, moving it one page back' % href)
+            self.links.append(((path, p, frag or None), pref, Array(rect)))

    def add_links(self):
        for link in self.links:
--- a/src/calibre/gui2/actions/choose_library.py
+++ b/src/calibre/gui2/actions/choose_library.py
@ -161,13 +161,15 @@ class ChooseLibraryAction(InterfaceAction):
    def genesis(self):
        self.base_text = _('%d books')
        self.count_changed(0)
-        self.qaction.triggered.connect(self.choose_library,
-                type=Qt.QueuedConnection)
        self.action_choose = self.menuless_qaction

        self.stats = LibraryUsageStats()
        self.popup_type = (QToolButton.InstantPopup if len(self.stats.stats) > 1 else
                QToolButton.MenuButtonPopup)
+        if len(self.stats.stats) > 1:
+            self.action_choose.triggered.connect(self.choose_library)
+        else:
+            self.qaction.triggered.connect(self.choose_library)

        self.choose_menu = self.qaction.menu()

@ -200,7 +202,6 @@ class ChooseLibraryAction(InterfaceAction):
                    type=Qt.QueuedConnection)
            self.choose_menu.addAction(ac)

-
        self.rename_separator = self.choose_menu.addSeparator()

        self.maintenance_menu = QMenu(_('Library Maintenance'))
@ -489,7 +490,8 @@ class ChooseLibraryAction(InterfaceAction):
        import gc
        from calibre.utils.mem import memory
        ref = self.dbref
-        for i in xrange(3): gc.collect()
+        for i in xrange(3):
+            gc.collect()
        if ref() is not None:
            print 'DB object alive:', ref()
            for r in gc.get_referrers(ref())[:10]:
@ -500,7 +502,6 @@ class ChooseLibraryAction(InterfaceAction):
        print
        self.dbref = self.before_mem = None

-
    def qs_requested(self, idx, *args):
        self.switch_requested(self.qs_locations[idx])

@ -546,3 +547,4 @@ class ChooseLibraryAction(InterfaceAction):
            return False

        return True
+
--- a/src/calibre/gui2/actions/show_quickview.py
+++ b/src/calibre/gui2/actions/show_quickview.py
@ -38,6 +38,13 @@ class ShowQuickviewAction(InterfaceAction):
                Quickview(self.gui, self.gui.library_view, index)
            self.current_instance.show()

+    def change_quickview_column(self, idx):
+        self.show_quickview()
+        if self.current_instance:
+            if self.current_instance.is_closed:
+                return
+            self.current_instance.change_quickview_column.emit(idx)
+
    def library_changed(self, db):
        if self.current_instance and not self.current_instance.is_closed:
            self.current_instance.set_database(db)
--- a/src/calibre/gui2/custom_column_widgets.py
+++ b/src/calibre/gui2/custom_column_widgets.py
@ -7,10 +7,10 @@ __docformat__ = 'restructuredtext en'

 from functools import partial

-from PyQt4.Qt import QComboBox, QLabel, QSpinBox, QDoubleSpinBox, QDateTimeEdit, \
-        QDateTime, QGroupBox, QVBoxLayout, QSizePolicy, QGridLayout, \
-        QSpacerItem, QIcon, QCheckBox, QWidget, QHBoxLayout, SIGNAL, \
-        QPushButton, QMessageBox, QToolButton
+from PyQt4.Qt import (QComboBox, QLabel, QSpinBox, QDoubleSpinBox, QDateTimeEdit,
+        QDateTime, QGroupBox, QVBoxLayout, QSizePolicy, QGridLayout,
+        QSpacerItem, QIcon, QCheckBox, QWidget, QHBoxLayout, SIGNAL,
+        QPushButton, QMessageBox, QToolButton, Qt)

 from calibre.utils.date import qt_to_dt, now
 from calibre.gui2.complete2 import EditWithComplete
@ -39,7 +39,6 @@ class Base(object):
    def gui_val(self):
        return self.getter()

-
    def commit(self, book_id, notify=False):
        val = self.gui_val
        val = self.normalize_ui_val(val)
@ -159,6 +158,17 @@ class DateTimeEdit(QDateTimeEdit):
    def set_to_clear(self):
        self.setDateTime(UNDEFINED_QDATETIME)

+    def keyPressEvent(self, ev):
+        if ev.key() == Qt.Key_Minus:
+            ev.accept()
+            self.setDateTime(self.minimumDateTime())
+        elif ev.key() == Qt.Key_Equal:
+            ev.accept()
+            self.setDateTime(QDateTime.currentDateTime())
+        else:
+            return QDateTimeEdit.keyPressEvent(self, ev)
+
+
 class DateTime(Base):

    def setup_ui(self, parent):
@ -595,7 +605,6 @@ class BulkBase(Base):
            self._cached_gui_val_ = self.getter()
        return self._cached_gui_val_

-
    def get_initial_value(self, book_ids):
        values = set([])
        for book_id in book_ids:
@ -1054,3 +1063,5 @@ bulk_widgets = {
        'series': BulkSeries,
        'enumeration': BulkEnumeration,
 }
+
+
--- a/src/calibre/gui2/device.py
+++ b/src/calibre/gui2/device.py
@ -122,7 +122,8 @@ def device_name_for_plugboards(device_class):
 class DeviceManager(Thread): # {{{

    def __init__(self, connected_slot, job_manager, open_feedback_slot,
-            open_feedback_msg, allow_connect_slot, sleep_time=2):
+                 open_feedback_msg, allow_connect_slot,
+                 after_callback_feedback_slot, sleep_time=2):
        '''
        :sleep_time: Time to sleep between device probes in secs
        '''
@ -150,6 +151,7 @@ class DeviceManager(Thread): # {{{
        self.ejected_devices  = set([])
        self.mount_connection_requests = Queue.Queue(0)
        self.open_feedback_slot = open_feedback_slot
+        self.after_callback_feedback_slot = after_callback_feedback_slot
        self.open_feedback_msg = open_feedback_msg
        self._device_information = None
        self.current_library_uuid = None
@ -392,6 +394,10 @@ class DeviceManager(Thread): # {{{
                        self.device.set_progress_reporter(job.report_progress)
                    self.current_job.run()
                    self.current_job = None
+                    feedback = getattr(self.device, 'user_feedback_after_callback', None)
+                    if feedback is not None:
+                        self.device.user_feedback_after_callback = None
+                        self.after_callback_feedback_slot(feedback)
                else:
                    break
            if do_sleep:
@ -850,7 +856,7 @@ class DeviceMixin(object): # {{{
        self.device_manager = DeviceManager(FunctionDispatcher(self.device_detected),
                self.job_manager, Dispatcher(self.status_bar.show_message),
                Dispatcher(self.show_open_feedback),
-                FunctionDispatcher(self.allow_connect))
+                FunctionDispatcher(self.allow_connect), Dispatcher(self.after_callback_feedback))
        self.device_manager.start()
        self.device_manager.devices_initialized.wait()
        if tweaks['auto_connect_to_folder']:
@ -862,6 +868,10 @@ class DeviceMixin(object): # {{{
                name, show_copy_button=False,
                override_icon=QIcon(icon))

+    def after_callback_feedback(self, feedback):
+        title, msg, det_msg = feedback
+        info_dialog(self, feedback['title'], feedback['msg'], det_msg=feedback['det_msg']).show()
+
    def debug_detection(self, done):
        self.debug_detection_callback = weakref.ref(done)
        self.device_manager.debug_detection(FunctionDispatcher(self.debug_detection_done))
@ -1116,7 +1126,7 @@ class DeviceMixin(object): # {{{
            return

        dm = self.iactions['Remove Books'].delete_memory
-        if dm.has_key(job):
+        if job in dm:
            paths, model = dm.pop(job)
            self.device_manager.remove_books_from_metadata(paths,
                    self.booklists())
@ -1141,7 +1151,7 @@ class DeviceMixin(object): # {{{
    def dispatch_sync_event(self, dest, delete, specific):
        rows = self.library_view.selectionModel().selectedRows()
        if not rows or len(rows) == 0:
-            error_dialog(self, _('No books'), _('No books')+' '+\
+            error_dialog(self, _('No books'), _('No books')+' '+
                    _('selected to send')).exec_()
            return

@ -1160,7 +1170,7 @@ class DeviceMixin(object): # {{{
                if fmts:
                    for f in fmts.split(','):
                        f = f.lower()
-                        if format_count.has_key(f):
+                        if f in format_count:
                            format_count[f] += 1
                        else:
                            format_count[f] = 1
--- a/src/calibre/gui2/device_drivers/configwidget.py
+++ b/src/calibre/gui2/device_drivers/configwidget.py
@ -28,7 +28,10 @@ class ConfigWidget(QWidget, Ui_ConfigWidget):

        all_formats = set(all_formats)
        self.calibre_known_formats = device.FORMATS
+        try:
            self.device_name = device.get_gui_name()
+        except TypeError:
+            self.device_name = getattr(device, 'gui_name', None) or _('Device')
        if device.USER_CAN_ADD_NEW_FORMATS:
            all_formats = set(all_formats) | set(BOOK_EXTENSIONS)

--- a/src/calibre/gui2/dialogs/quickview.py
+++ b/src/calibre/gui2/dialogs/quickview.py
@ -6,7 +6,7 @@ __docformat__ = 'restructuredtext en'

 from PyQt4.Qt import (Qt, QDialog, QAbstractItemView, QTableWidgetItem,
                      QListWidgetItem, QByteArray, QCoreApplication,
-                      QApplication)
+                      QApplication, pyqtSignal)

 from calibre.customize.ui import find_plugin
 from calibre.gui2 import gprefs
@ -44,6 +44,8 @@ class TableItem(QTableWidgetItem):

 class Quickview(QDialog, Ui_Quickview):

+    change_quickview_column   = pyqtSignal(object)
+
    def __init__(self, gui, view, row):
        QDialog.__init__(self, gui, flags=Qt.Window)
        Ui_Quickview.__init__(self)
@ -105,6 +107,7 @@ class Quickview(QDialog, Ui_Quickview):
        self.refresh(row)

        self.view.clicked.connect(self.slave)
+        self.change_quickview_column.connect(self.slave)
        QCoreApplication.instance().aboutToQuit.connect(self.save_state)
        self.search_button.clicked.connect(self.do_search)
        view.model().new_bookdisplay_data.connect(self.book_was_changed)
@ -146,6 +149,9 @@ class Quickview(QDialog, Ui_Quickview):
        key = self.view.model().column_map[self.current_column]
        book_id = self.view.model().id(bv_row)

+        if self.current_book_id == book_id and self.current_key == key:
+            return
+
        # Only show items for categories
        if not self.db.field_metadata[key]['is_category']:
            if self.current_key is None:
@ -164,6 +170,8 @@ class Quickview(QDialog, Ui_Quickview):

        if vals:
            self.no_valid_items = False
+            if self.db.field_metadata[key]['datatype'] == 'rating':
+                vals = unicode(vals/2)
            if not isinstance(vals, list):
                vals = [vals]
            vals.sort(key=sort_key)
@ -198,8 +206,7 @@ class Quickview(QDialog, Ui_Quickview):
            sv = selected_item
        sv = sv.replace('"', r'\"')
        self.last_search = self.current_key+':"=' + sv + '"'
-        books = self.db.search_getting_ids(self.last_search,
-                                           self.db.data.search_restriction)
+        books = self.db.search(self.last_search, return_matches=True)

        self.books_table.setRowCount(len(books))
        self.books_label.setText(_('Books with selected item "{0}": {1}').
--- a/src/calibre/gui2/dialogs/template_dialog.py
+++ b/src/calibre/gui2/dialogs/template_dialog.py
@ -3,17 +3,21 @@ __copyright__ = '2008, Kovid Goyal kovid@kovidgoyal.net'
 __docformat__ = 'restructuredtext en'
 __license__   = 'GPL v3'

-import json
+import json, os, traceback

 from PyQt4.Qt import (Qt, QDialog, QDialogButtonBox, QSyntaxHighlighter, QFont,
-                      QRegExp, QApplication, QTextCharFormat, QColor, QCursor)
+                      QRegExp, QApplication, QTextCharFormat, QColor, QCursor,
+                      QIcon, QSize)

-from calibre.gui2 import error_dialog
+from calibre import sanitize_file_name_unicode
+from calibre.constants import config_dir
 from calibre.gui2.dialogs.template_dialog_ui import Ui_TemplateDialog
 from calibre.utils.formatter_functions import formatter_functions
+from calibre.utils.icu import sort_key
 from calibre.ebooks.metadata.book.base import Metadata
 from calibre.ebooks.metadata.book.formatter import SafeFormat
-from calibre.library.coloring import (displayable_columns)
+from calibre.library.coloring import (displayable_columns, color_row_key)
+from calibre.gui2 import error_dialog, choose_files, pixmap_to_data


 class ParenPosition:
@ -198,25 +202,56 @@ class TemplateHighlighter(QSyntaxHighlighter):

 class TemplateDialog(QDialog, Ui_TemplateDialog):

-    def __init__(self, parent, text, mi=None, fm=None, color_field=None):
+    def __init__(self, parent, text, mi=None, fm=None, color_field=None,
+                 icon_field_key=None, icon_rule_kind=None):
        QDialog.__init__(self, parent)
        Ui_TemplateDialog.__init__(self)
        self.setupUi(self)

        self.coloring = color_field is not None
+        self.iconing = icon_field_key is not None
+
+        cols = []
+        if fm is not None:
+            for key in sorted(displayable_columns(fm),
+                              key=lambda(k): sort_key(fm[k]['name']) if k != color_row_key else 0):
+                if key == color_row_key and not self.coloring:
+                    continue
+                from calibre.gui2.preferences.coloring import all_columns_string
+                name = all_columns_string if key == color_row_key else fm[key]['name']
+                if name:
+                    cols.append((name, key))
+
+        self.color_layout.setVisible(False)
+        self.icon_layout.setVisible(False)
+
        if self.coloring:
-            cols = sorted([k for k in displayable_columns(fm)])
-            self.colored_field.addItems(cols)
-            self.colored_field.setCurrentIndex(self.colored_field.findText(color_field))
+            self.color_layout.setVisible(True)
+            for n1, k1 in cols:
+                self.colored_field.addItem(n1, k1)
+            self.colored_field.setCurrentIndex(self.colored_field.findData(color_field))
            colors = QColor.colorNames()
            colors.sort()
            self.color_name.addItems(colors)
-        else:
-            self.colored_field.setVisible(False)
-            self.colored_field_label.setVisible(False)
-            self.color_chooser_label.setVisible(False)
-            self.color_name.setVisible(False)
-            self.color_copy_button.setVisible(False)
+        elif self.iconing:
+            self.icon_layout.setVisible(True)
+            for n1, k1 in cols:
+                self.icon_field.addItem(n1, k1)
+            self.icon_file_names = []
+            d = os.path.join(config_dir, 'cc_icons')
+            if os.path.exists(d):
+                for icon_file in os.listdir(d):
+                    icon_file = icu_lower(icon_file)
+                    if os.path.exists(os.path.join(d, icon_file)):
+                        if icon_file.endswith('.png'):
+                            self.icon_file_names.append(icon_file)
+            self.icon_file_names.sort(key=sort_key)
+            self.update_filename_box()
+            self.icon_with_text.setChecked(True)
+            if icon_rule_kind == 'icon_only':
+                self.icon_without_text.setChecked(True)
+            self.icon_field.setCurrentIndex(self.icon_field.findData(icon_field_key))
+
        if mi:
            self.mi = mi
        else:
@ -248,6 +283,8 @@ class TemplateDialog(QDialog, Ui_TemplateDialog):
        self.buttonBox.button(QDialogButtonBox.Ok).setText(_('&OK'))
        self.buttonBox.button(QDialogButtonBox.Cancel).setText(_('&Cancel'))
        self.color_copy_button.clicked.connect(self.color_to_clipboard)
+        self.filename_button.clicked.connect(self.filename_button_clicked)
+        self.icon_copy_button.clicked.connect(self.icon_to_clipboard)

        try:
            with open(P('template-functions.json'), 'rb') as f:
@ -276,11 +313,55 @@ class TemplateDialog(QDialog, Ui_TemplateDialog):
                '<a href="http://manual.calibre-ebook.com/template_ref.html">'
                '%s</a>'%tt)

+    def filename_button_clicked(self):
+        try:
+            path = choose_files(self, 'choose_category_icon',
+                        _('Select Icon'), filters=[
+                        ('Images', ['png', 'gif', 'jpg', 'jpeg'])],
+                    all_files=False, select_only_single_file=True)
+            if path:
+                icon_path = path[0]
+                icon_name = sanitize_file_name_unicode(
+                             os.path.splitext(
+                                   os.path.basename(icon_path))[0]+'.png')
+                if icon_name not in self.icon_file_names:
+                    self.icon_file_names.append(icon_name)
+                    self.update_filename_box()
+                    try:
+                        p = QIcon(icon_path).pixmap(QSize(128, 128))
+                        d = os.path.join(config_dir, 'cc_icons')
+                        if not os.path.exists(os.path.join(d, icon_name)):
+                            if not os.path.exists(d):
+                                os.makedirs(d)
+                            with open(os.path.join(d, icon_name), 'wb') as f:
+                                f.write(pixmap_to_data(p, format='PNG'))
+                    except:
+                        traceback.print_exc()
+                self.icon_files.setCurrentIndex(self.icon_files.findText(icon_name))
+                self.icon_files.adjustSize()
+        except:
+            traceback.print_exc()
+        return
+
+    def update_filename_box(self):
+        self.icon_files.clear()
+        self.icon_file_names.sort(key=sort_key)
+        self.icon_files.addItem('')
+        self.icon_files.addItems(self.icon_file_names)
+        for i,filename in enumerate(self.icon_file_names):
+            icon = QIcon(os.path.join(config_dir, 'cc_icons', filename))
+            self.icon_files.setItemIcon(i+1, icon)
+
    def color_to_clipboard(self):
        app = QApplication.instance()
        c = app.clipboard()
        c.setText(unicode(self.color_name.currentText()))

+    def icon_to_clipboard(self):
+        app = QApplication.instance()
+        c = app.clipboard()
+        c.setText(unicode(self.icon_files.currentText()))
+
    def textbox_changed(self):
        cur_text = unicode(self.textbox.toPlainText())
        if self.last_text != cur_text:
@ -324,5 +405,14 @@ class TemplateDialog(QDialog, Ui_TemplateDialog):
                    _('The template box cannot be empty'), show=True)
                return

-        self.rule = (unicode(self.colored_field.currentText()), txt)
+            self.rule = (unicode(self.colored_field.itemData(
+                                self.colored_field.currentIndex()).toString()), txt)
+        elif self.iconing:
+            rt = 'icon' if self.icon_with_text.isChecked() else 'icon_only'
+            self.rule = (rt,
+                         unicode(self.icon_field.itemData(
+                                self.icon_field.currentIndex()).toString()),
+                         txt)
+        else:
+            self.rule = ('', txt)
        QDialog.accept(self)
--- a/src/calibre/gui2/dialogs/template_dialog.ui
+++ b/src/calibre/gui2/dialogs/template_dialog.ui
@ -21,6 +21,7 @@
  </property>
  <layout class="QVBoxLayout" name="verticalLayout">
   <item>
+    <widget class="QWidget" name="color_layout">
     <layout class="QGridLayout">
      <item row="0" column="0">
       <widget class="QLabel" name="colored_field_label">
@ -62,6 +63,97 @@
       </widget>
      </item>
     </layout>
+    </widget>
+   </item>
+   <item>
+    <widget class="QWidget" name="icon_layout">
+     <layout class="QGridLayout">
+      <item row="0" column="0" colspan="2">
+       <widget class="QGroupBox">
+        <property name="title">
+         <string>Kind</string>
+        </property>
+        <layout class="QHBoxLayout">
+         <item>
+          <widget class="QRadioButton" name="icon_without_text">
+           <property name="text">
+            <string>icon with no text</string>
+           </property>
+          </widget>
+         </item>
+         <item>
+          <widget class="QRadioButton" name="icon_with_text">
+           <property name="text">
+            <string>icon with text</string>
+           </property>
+          </widget>
+         </item>
+        </layout>
+        <property name="sizePolicy">
+         <sizepolicy hsizetype="Expanding" vsizetype="Fixed">
+          <horstretch>100</horstretch>
+          <verstretch>0</verstretch>
+         </sizepolicy>
+        </property>
+       </widget>
+      </item>
+      <item row="1" column="0">
+       <widget class="QLabel" name="icon_chooser_label">
+        <property name="text">
+         <string>Apply the icon to column:</string>
+        </property>
+        <property name="buddy">
+         <cstring>icon_field</cstring>
+        </property>
+       </widget>
+      </item>
+      <item row="1" column="1">
+       <widget class="QComboBox" name="icon_field">
+       </widget>
+      </item>
+      <item row="2" column="0">
+       <widget class="QLabel" name="image_chooser_label">
+        <property name="text">
+         <string>Copy an icon file name to the clipboard:</string>
+        </property>
+        <property name="buddy">
+         <cstring>color_name</cstring>
+        </property>
+       </widget>
+      </item>
+      <item row="2" column="1">
+       <widget class="QWidget">
+        <layout class="QHBoxLayout">
+         <item>
+          <widget class="QComboBox" name="icon_files">
+          </widget>
+         </item>
+         <item>
+          <widget class="QToolButton" name="icon_copy_button">
+           <property name="icon">
+            <iconset resource="../../../../resources/images.qrc">
+             <normaloff>:/images/edit-copy.png</normaloff>:/images/edit-copy.png</iconset>
+           </property>
+           <property name="toolTip">
+            <string>Copy the selected icon file name to the clipboard</string>
+           </property>
+          </widget>
+         </item>
+         <item>
+          <widget class="QPushButton" name="filename_button">
+           <property name="text">
+            <string>Add icon</string>
+           </property>
+           <property name="toolTip">
+            <string>Add an icon file to the set of choices</string>
+           </property>
+          </widget>
+         </item>
+        </layout>
+       </widget>
+      </item>
+     </layout>
+    </widget>
   </item>
   <item>
    <widget class="QPlainTextEdit" name="textbox"/>
--- a/src/calibre/gui2/init.py
+++ b/src/calibre/gui2/init.py
@ -161,18 +161,18 @@ class StatusBar(QStatusBar): # {{{

    def __init__(self, parent=None):
        QStatusBar.__init__(self, parent)
-        self.default_message = __appname__ + ' ' + _('version') + ' ' + \
-                self.get_version() + ' ' + _('created by Kovid Goyal')
        self.device_string = ''
        self.update_label = UpdateLabel('')
+        self.total = self.current = self.selected = 0
        self.addPermanentWidget(self.update_label)
        self.update_label.setVisible(False)
        self._font = QFont()
        self._font.setBold(True)
        self.setFont(self._font)
-        self.defmsg = QLabel(self.default_message)
+        self.defmsg = QLabel('')
        self.defmsg.setFont(self._font)
        self.addWidget(self.defmsg)
+        self.set_label()

    def initialize(self, systray=None):
        self.systray = systray
@ -180,17 +180,39 @@ class StatusBar(QStatusBar): # {{{

    def device_connected(self, devname):
        self.device_string = _('Connected ') + devname
-        self.defmsg.setText(self.default_message + ' ..::.. ' +
-                self.device_string)
+        self.set_label()
+
+    def update_state(self, total, current, selected):
+        self.total, self.current, self.selected = total, current, selected
+        self.set_label()
+
+    def set_label(self):
+        try:
+            self._set_label()
+        except:
+            import traceback
+            traceback.print_exc()
+
+    def _set_label(self):
+        msg = '%s %s %s' % (__appname__, _('version'), get_version())
+        if self.device_string:
+            msg += ' ..::.. ' + self.device_string
+        else:
+            msg += _(' %(created)s %(name)s') % dict(created=_('created by'), name='Kovid Goyal')
+
+        if self.total != self.current:
+            base = _('%(num)d of %(total)d books') % dict(num=self.current, total=self.total)
+        else:
+            base = _('%d books') % self.total
+        if self.selected > 0:
+            base = _('%(num)s, %(sel)d selected') % dict(num=base, sel=self.selected)
+
+        self.defmsg.setText('%s [%s]' % (msg, base))
        self.clearMessage()

    def device_disconnected(self):
        self.device_string = ''
-        self.defmsg.setText(self.default_message)
-        self.clearMessage()
-
-    def get_version(self):
-        return get_version()
+        self.set_label()

    def show_message(self, msg, timeout=0):
        self.showMessage(msg, timeout)
@ -312,9 +334,15 @@ class LayoutMixin(object): # {{{

    def read_layout_settings(self):
        # View states are restored automatically when set_database is called
-
        for x in ('cb', 'tb', 'bd'):
            getattr(self, x+'_splitter').restore_state()

+    def update_status_bar(self, *args):
+        v = self.current_view()
+        selected = len(v.selectionModel().selectedRows())
+        total, current = v.model().counts()
+        self.status_bar.update_state(total, current, selected)
+
 # }}}

+
--- a/src/calibre/gui2/library/delegates.py
+++ b/src/calibre/gui2/library/delegates.py
@ -9,7 +9,7 @@ import sys

 from PyQt4.Qt import (Qt, QApplication, QStyle, QIcon,  QDoubleSpinBox,
        QVariant, QSpinBox, QStyledItemDelegate, QComboBox, QTextDocument,
-        QAbstractTextDocumentLayout, QFont, QFontInfo, QDate)
+        QAbstractTextDocumentLayout, QFont, QFontInfo, QDate, QDateTimeEdit, QDateTime)

 from calibre.gui2 import UNDEFINED_QDATETIME, error_dialog, rating_font
 from calibre.constants import iswindows
@ -23,6 +23,26 @@ from calibre.gui2.dialogs.comments_dialog import CommentsDialog
 from calibre.gui2.dialogs.template_dialog import TemplateDialog
 from calibre.gui2.languages import LanguagesEdit

+class DateTimeEdit(QDateTimeEdit):  # {{{
+
+    def __init__(self, parent, format):
+        QDateTimeEdit.__init__(self, parent)
+        self.setFrame(False)
+        self.setMinimumDateTime(UNDEFINED_QDATETIME)
+        self.setSpecialValueText(_('Undefined'))
+        self.setCalendarPopup(True)
+        self.setDisplayFormat(format)
+
+    def keyPressEvent(self, ev):
+        if ev.key() == Qt.Key_Minus:
+            ev.accept()
+            self.setDateTime(self.minimumDateTime())
+        elif ev.key() == Qt.Key_Equal:
+            ev.accept()
+            self.setDateTime(QDateTime.currentDateTime())
+        else:
+            return QDateTimeEdit.keyPressEvent(self, ev)
+# }}}

 class RatingDelegate(QStyledItemDelegate):  # {{{

@ -77,12 +97,7 @@ class DateDelegate(QStyledItemDelegate): # {{{
        return format_date(qt_to_dt(d, as_utc=False), self.format)

    def createEditor(self, parent, option, index):
-        qde = QStyledItemDelegate.createEditor(self, parent, option, index)
-        qde.setDisplayFormat(self.format)
-        qde.setMinimumDateTime(UNDEFINED_QDATETIME)
-        qde.setSpecialValueText(_('Undefined'))
-        qde.setCalendarPopup(True)
-        return qde
+        return DateTimeEdit(parent, self.format)

 # }}}

@ -101,12 +116,7 @@ class PubDateDelegate(QStyledItemDelegate): # {{{
        return format_date(qt_to_dt(d, as_utc=False), self.format)

    def createEditor(self, parent, option, index):
-        qde = QStyledItemDelegate.createEditor(self, parent, option, index)
-        qde.setDisplayFormat(self.format)
-        qde.setMinimumDateTime(UNDEFINED_QDATETIME)
-        qde.setSpecialValueText(_('Undefined'))
-        qde.setCalendarPopup(True)
-        return qde
+        return DateTimeEdit(parent, self.format)

    def setEditorData(self, editor, index):
        val = index.data(Qt.EditRole).toDate()
@ -230,12 +240,7 @@ class CcDateDelegate(QStyledItemDelegate): # {{{
        return format_date(qt_to_dt(d, as_utc=False), self.format)

    def createEditor(self, parent, option, index):
-        qde = QStyledItemDelegate.createEditor(self, parent, option, index)
-        qde.setDisplayFormat(self.format)
-        qde.setMinimumDateTime(UNDEFINED_QDATETIME)
-        qde.setSpecialValueText(_('Undefined'))
-        qde.setCalendarPopup(True)
-        return qde
+        return DateTimeEdit(parent, self.format)

    def setEditorData(self, editor, index):
        m = index.model()
@ -457,7 +462,7 @@ class CcTemplateDelegate(QStyledItemDelegate): # {{{
            validation_formatter.validate(val)
        except Exception as err:
            error_dialog(self.parent(), _('Invalid template'),
-                    '<p>'+_('The template %s is invalid:')%val + \
+                    '<p>'+_('The template %s is invalid:')%val +
                    '<br>'+str(err), show=True)
        model.setData(index, QVariant(val), Qt.EditRole)

@ -469,3 +474,4 @@ class CcTemplateDelegate(QStyledItemDelegate): # {{{

 # }}}

+
--- a/src/calibre/gui2/library/models.py
+++ b/src/calibre/gui2/library/models.py
@ -6,7 +6,7 @@ __copyright__ = '2010, Kovid Goyal <kovid@kovidgoyal.net>'
 __docformat__ = 'restructuredtext en'

 import functools, re, os, traceback, errno, time
-from collections import defaultdict
+from collections import defaultdict, namedtuple

 from PyQt4.Qt import (QAbstractTableModel, Qt, pyqtSignal, QIcon, QImage,
        QModelIndex, QVariant, QDateTime, QColor, QPixmap)
@ -29,6 +29,8 @@ from calibre.gui2.library import DEFAULT_SORT
 from calibre.utils.localization import calibre_langcode_to_name
 from calibre.library.coloring import color_row_key

+Counts = namedtuple('Counts', 'total current')
+
 def human_readable(size, precision=1):
    """ Convert a size in bytes into megabytes """
    return ('%.'+str(precision)+'f') % ((size/(1024.*1024.)),)
@ -46,7 +48,7 @@ def default_image():
        _default_image = QImage(I('default_cover.png'))
    return _default_image

-class ColumnColor(object):
+class ColumnColor(object):  # {{{

    def __init__(self, formatter, colors):
        self.mi = None
@ -70,9 +72,9 @@ class ColumnColor(object):
                    return color
        except:
            pass
+# }}}

-
-class ColumnIcon(object):
+class ColumnIcon(object):  # {{{

    def __init__(self, formatter):
        self.mi = None
@ -108,6 +110,7 @@ class ColumnIcon(object):
                    return icon_bitmap
        except:
            pass
+# }}}

 class BooksModel(QAbstractTableModel):  # {{{

@ -240,7 +243,6 @@ class BooksModel(QAbstractTableModel): # {{{
            # Would like to to a join here, but the thread might be waiting to
            # do something on the GUI thread. Deadlock.

-
    def refresh_ids(self, ids, current_row=-1):
        self._clear_caches()
        rows = self.db.refresh_ids(ids)
@ -282,6 +284,13 @@ class BooksModel(QAbstractTableModel): # {{{
        self._clear_caches()
        self.count_changed_signal.emit(self.db.count())

+    def counts(self):
+        if self.db.data.search_restriction_applied():
+            total  = self.db.data.get_search_restriction_book_count()
+        else:
+            total = self.db.count()
+        return Counts(total, self.count())
+
    def row_indices(self, index):
        ''' Return list indices of all cells in index.row()'''
        return [self.index(index.row(), c) for c in range(self.columnCount(None))]
@ -332,7 +341,7 @@ class BooksModel(QAbstractTableModel): # {{{
        while True:
            row_ += 1 if forward else -1
            if row_ < 0:
-                row_ = self.count() - 1;
+                row_ = self.count() - 1
            elif row_ >= self.count():
                row_ = 0
            if self.id(row_) in self.ids_to_highlight_set:
@ -897,7 +906,11 @@ class BooksModel(QAbstractTableModel): # {{{
                ht = self.column_map[section]
                if ht == 'timestamp':  # change help text because users know this field as 'date'
                    ht = 'date'
-                return QVariant(_('The lookup/search name is "{0}"').format(ht))
+                if self.db.field_metadata[self.column_map[section]]['is_category']:
+                    is_cat = '.\n\n' + _('Click in this column and press Q to to Quickview books with the same %s' % ht)
+                else:
+                    is_cat = ''
+                return QVariant(_('The lookup/search name is "{0}"{1}').format(ht, is_cat))
            if role == Qt.DisplayRole:
                return QVariant(self.headers[self.column_map[section]])
            return NONE
@ -909,7 +922,6 @@ class BooksModel(QAbstractTableModel): # {{{
            return QVariant(section+1)
        return NONE

-
    def flags(self, index):
        flags = QAbstractTableModel.flags(self, index)
        if index.isValid():
@ -1017,7 +1029,7 @@ class BooksModel(QAbstractTableModel): # {{{
                return False
            val = (int(value.toInt()[0]) if column == 'rating' else
                    value.toDateTime() if column in ('timestamp', 'pubdate')
-                    else unicode(value.toString()).strip())
+                    else re.sub(ur'\s', u' ', unicode(value.toString()).strip()))
            id = self.db.id(row)
            books_to_refresh = set([id])
            if column == 'rating':
@ -1078,7 +1090,6 @@ class OnDeviceSearch(SearchQueryParser): # {{{
        'inlibrary'
    ]

-
    def __init__(self, model):
        SearchQueryParser.__init__(self, locations=self.USABLE_LOCATIONS)
        self.model = model
@ -1101,7 +1112,7 @@ class OnDeviceSearch(SearchQueryParser): # {{{
            elif query.startswith('~'):
                matchkind = REGEXP_MATCH
                query = query[1:]
-        if matchkind != REGEXP_MATCH: ### leave case in regexps because it can be significant e.g. \S \W \D
+        if matchkind != REGEXP_MATCH:  # leave case in regexps because it can be significant e.g. \S \W \D
            query = query.lower()

        if location not in self.USABLE_LOCATIONS:
@ -1133,9 +1144,9 @@ class OnDeviceSearch(SearchQueryParser): # {{{
                if locvalue == 'inlibrary':
                    continue    # this is bool, so can't match below
                try:
-                    ### Can't separate authors because comma is used for name sep and author sep
-                    ### Exact match might not get what you want. For that reason, turn author
-                    ### exactmatch searches into contains searches.
+                    # Can't separate authors because comma is used for name sep and author sep
+                    # Exact match might not get what you want. For that reason, turn author
+                    # exactmatch searches into contains searches.
                    if locvalue == 'author' and matchkind == EQUALS_MATCH:
                        m = CONTAINS_MATCH
                    else:
@ -1198,6 +1209,12 @@ class DeviceBooksModel(BooksModel): # {{{
        self.editable = ['title', 'authors', 'collections']
        self.book_in_library = None

+    def counts(self):
+        return Counts(len(self.db), len(self.map))
+
+    def count_changed(self, *args):
+        self.count_changed_signal.emit(len(self.db))
+
    def mark_for_deletion(self, job, rows, rows_are_ids=False):
        db_indices = rows if rows_are_ids else self.indices(rows)
        db_items = [self.db[i] for i in db_indices if -1 < i < len(self.db)]
@ -1237,11 +1254,13 @@ class DeviceBooksModel(BooksModel): # {{{
            if not succeeded:
                indices = self.row_indices(self.index(row, 0))
                self.dataChanged.emit(indices[0], indices[-1])
+        self.count_changed()

    def paths_deleted(self, paths):
        self.map = list(range(0, len(self.db)))
        self.resort(False)
        self.research(True)
+        self.count_changed()

    def is_row_marked_for_deletion(self, row):
        try:
@ -1272,9 +1291,9 @@ class DeviceBooksModel(BooksModel): # {{{
        if index.isValid():
            cname = self.column_map[index.column()]
            if cname in self.editable and \
-                     (cname != 'collections' or \
-                     (callable(getattr(self.db, 'supports_collections', None)) and \
-                      self.db.supports_collections() and \
+                     (cname != 'collections' or
+                     (callable(getattr(self.db, 'supports_collections', None)) and
+                      self.db.supports_collections() and
                      device_prefs['manage_device_metadata']=='manual')):
                flags |= Qt.ItemIsEditable
        return flags
@ -1304,6 +1323,7 @@ class DeviceBooksModel(BooksModel): # {{{
        self.last_search = text
        if self.last_search:
            self.searched.emit(True)
+        self.count_changed()

    def research(self, reset=True):
        self.search(self.last_search, reset)
@ -1373,6 +1393,7 @@ class DeviceBooksModel(BooksModel): # {{{
        self.map = list(range(0, len(db)))
        self.research(reset=False)
        self.resort()
+        self.count_changed()

    def cover(self, row):
        item = self.db[self.map[row]]
@ -1517,7 +1538,7 @@ class DeviceBooksModel(BooksModel): # {{{
        elif role == Qt.ToolTipRole and index.isValid():
            if self.is_row_marked_for_deletion(row):
                return QVariant(_('Marked for deletion'))
-            if cname in ['title', 'authors'] or (cname == 'collections' and \
+            if cname in ['title', 'authors'] or (cname == 'collections' and
                    self.db.supports_collections()):
                return QVariant(_("Double click to <b>edit</b> me<br><br>"))
        elif role == Qt.DecorationRole and cname == 'inlibrary':
@ -1586,3 +1607,4 @@ class DeviceBooksModel(BooksModel): # {{{

 # }}}

+
--- a/src/calibre/gui2/library/views.py
+++ b/src/calibre/gui2/library/views.py
@ -10,9 +10,9 @@ from functools import partial
 from future_builtins import map
 from collections import OrderedDict

-from PyQt4.Qt import (QTableView, Qt, QAbstractItemView, QMenu, pyqtSignal,
-    QModelIndex, QIcon, QItemSelection, QMimeData, QDrag, QApplication,
-    QPoint, QPixmap, QUrl, QImage, QPainter, QColor, QRect)
+from PyQt4.Qt import (QTableView, Qt, QAbstractItemView, QMenu, pyqtSignal, QFont,
+    QModelIndex, QIcon, QItemSelection, QMimeData, QDrag, QApplication, QStyle,
+    QPoint, QPixmap, QUrl, QImage, QPainter, QColor, QRect, QHeaderView, QStyleOptionHeader)

 from calibre.gui2.library.delegates import (RatingDelegate, PubDateDelegate,
    TextDelegate, DateDelegate, CompleteDelegate, CcTextDelegate,
@ -25,6 +25,54 @@ from calibre.gui2.library import DEFAULT_SORT
 from calibre.constants import filesystem_encoding
 from calibre import force_unicode

+class HeaderView(QHeaderView):  # {{{
+
+    def __init__(self, *args):
+        QHeaderView.__init__(self, *args)
+        self.hover = -1
+        self.current_font = QFont(self.font())
+        self.current_font.setBold(True)
+        self.current_font.setItalic(True)
+
+    def event(self, e):
+        if e.type() in (e.HoverMove, e.HoverEnter):
+            self.hover = self.logicalIndexAt(e.pos())
+        elif e.type() in (e.Leave, e.HoverLeave):
+            self.hover = -1
+        return QHeaderView.event(self, e)
+
+    def paintSection(self, painter, rect, logical_index):
+        opt = QStyleOptionHeader()
+        self.initStyleOption(opt)
+        opt.rect = rect
+        opt.section = logical_index
+        opt.orientation = self.orientation()
+        opt.textAlignment = Qt.AlignHCenter | Qt.AlignVCenter
+        model = self.parent().model()
+        opt.text = model.headerData(logical_index, opt.orientation, Qt.DisplayRole).toString()
+        if self.isSortIndicatorShown() and self.sortIndicatorSection() == logical_index:
+            opt.sortIndicator = QStyleOptionHeader.SortDown if self.sortIndicatorOrder() == Qt.AscendingOrder else QStyleOptionHeader.SortUp
+        opt.text = opt.fontMetrics.elidedText(opt.text, Qt.ElideRight, rect.width() - 4)
+        if self.isEnabled():
+            opt.state |= QStyle.State_Enabled
+            if self.window().isActiveWindow():
+                opt.state |= QStyle.State_Active
+                if self.hover == logical_index:
+                    opt.state |= QStyle.State_MouseOver
+        sm = self.selectionModel()
+        if opt.orientation == Qt.Vertical:
+            if sm.isRowSelected(logical_index, QModelIndex()):
+                opt.state |= QStyle.State_Sunken
+
+        painter.save()
+        if (
+                (opt.orientation == Qt.Horizontal and sm.currentIndex().column() == logical_index) or
+                (opt.orientation == Qt.Vertical and sm.currentIndex().row() == logical_index)):
+            painter.setFont(self.current_font)
+        self.style().drawControl(QStyle.CE_Header, opt, painter, self)
+        painter.restore()
+# }}}
+
 class PreserveViewState(object):  # {{{

    '''
@ -72,7 +120,8 @@ class PreserveViewState(object): # {{{
            return {x:getattr(self, x) for x in ('selected_ids', 'current_id',
                'vscroll', 'hscroll')}
        def fset(self, state):
-            for k, v in state.iteritems(): setattr(self, k, v)
+            for k, v in state.iteritems():
+                setattr(self, k, v)
            self.__exit__()
        return property(fget=fget, fset=fset)

@ -90,6 +139,7 @@ class BooksView(QTableView): # {{{

    def __init__(self, parent, modelcls=BooksModel, use_edit_metadata_dialog=True):
        QTableView.__init__(self, parent)
+        self.setProperty('highlight_current_item', 150)
        self.row_sizing_done = False

        if not tweaks['horizontal_scrolling_per_column']:
@ -152,12 +202,16 @@ class BooksView(QTableView): # {{{
        # {{{ Column Header setup
        self.can_add_columns = True
        self.was_restored = False
-        self.column_header = self.horizontalHeader()
+        self.column_header = HeaderView(Qt.Horizontal, self)
+        self.setHorizontalHeader(self.column_header)
        self.column_header.setMovable(True)
+        self.column_header.setClickable(True)
        self.column_header.sectionMoved.connect(self.save_state)
        self.column_header.setContextMenuPolicy(Qt.CustomContextMenu)
        self.column_header.customContextMenuRequested.connect(self.show_column_header_context_menu)
        self.column_header.sectionResized.connect(self.column_resized, Qt.QueuedConnection)
+        self.row_header = HeaderView(Qt.Vertical, self)
+        self.setVerticalHeader(self.row_header)
        # }}}

        self._model.database_changed.connect(self.database_changed)
@ -197,6 +251,16 @@ class BooksView(QTableView): # {{{
        elif action.startswith('align_'):
            alignment = action.partition('_')[-1]
            self._model.change_alignment(column, alignment)
+        elif action == 'quickview':
+            from calibre.customize.ui import find_plugin
+            qv = find_plugin('Show Quickview')
+            if qv:
+                rows = self.selectionModel().selectedRows()
+                if len(rows) > 0:
+                    current_row = rows[0].row()
+                    current_col = self.column_map.index(column)
+                    index = self.model().index(current_row, current_col)
+                    qv.actual_plugin_.change_quickview_column(index)

        self.save_state()

@ -225,7 +289,7 @@ class BooksView(QTableView): # {{{
                ac.setCheckable(True)
                ac.setChecked(True)
            if col not in ('ondevice', 'inlibrary') and \
-                    (not self.model().is_custom_column(col) or \
+                    (not self.model().is_custom_column(col) or
                    self.model().custom_columns[col]['datatype'] not in ('bool',
                        )):
                m = self.column_header_context_menu.addMenu(
@ -240,7 +304,14 @@ class BooksView(QTableView): # {{{
                            a.setCheckable(True)
                            a.setChecked(True)

-
+            if self._model.db.field_metadata[col]['is_category']:
+                act = self.column_header_context_menu.addAction(_('Quickview column %s') %
+                        name,
+                    partial(self.column_header_context_handler, action='quickview',
+                        column=col))
+                rows = self.selectionModel().selectedRows()
+                if len(rows) > 1:
+                    act.setEnabled(False)

            hidden_cols = [self.column_map[i] for i in
                    range(self.column_header.count()) if
@ -260,7 +331,6 @@ class BooksView(QTableView): # {{{
                        partial(self.column_header_context_handler,
                        action='show', column=col))

-
            self.column_header_context_menu.addSeparator()
            self.column_header_context_menu.addAction(
                    _('Shrink column if it is too wide to fit'),
@ -497,7 +567,6 @@ class BooksView(QTableView): # {{{
                        db.prefs[name] = ans
        return ans

-
    def restore_state(self):
        old_state = self.get_old_state()
        if old_state is None:
@ -820,7 +889,8 @@ class BooksView(QTableView): # {{{
        ids = frozenset(ids)
        m = self.model()
        for row in xrange(m.rowCount(QModelIndex())):
-            if len(row_map) >= len(ids): break
+            if len(row_map) >= len(ids):
+                break
            c = m.id(row)
            if c in ids:
                row_map[c] = row
@ -880,7 +950,8 @@ class BooksView(QTableView): # {{{
                pass
            return None
        def fset(self, val):
-            if val is None: return
+            if val is None:
+                return
            m = self.model()
            for row in xrange(m.rowCount(QModelIndex())):
                if m.id(row) == val:
@ -902,7 +973,8 @@ class BooksView(QTableView): # {{{
        column = ci.column()

        for i in xrange(ci.row()+1, self.row_count()):
-            if i in selected_rows: continue
+            if i in selected_rows:
+                continue
            try:
                return self.model().id(self.model().index(i, column))
            except:
@ -910,7 +982,8 @@ class BooksView(QTableView): # {{{

        # No unselected rows after the current row, look before
        for i in xrange(ci.row()-1, -1, -1):
-            if i in selected_rows: continue
+            if i in selected_rows:
+                continue
            try:
                return self.model().id(self.model().index(i, column))
            except:
--- a/src/calibre/gui2/metadata/basic_widgets.py
+++ b/src/calibre/gui2/metadata/basic_widgets.py
@ -13,7 +13,7 @@ from PyQt4.Qt import (Qt, QDateTimeEdit, pyqtSignal, QMessageBox, QIcon,
        QToolButton, QWidget, QLabel, QGridLayout, QApplication,
        QDoubleSpinBox, QListWidgetItem, QSize, QPixmap, QDialog, QMenu,
        QPushButton, QSpinBox, QLineEdit, QSizePolicy, QDialogButtonBox,
-        QAction, QCalendarWidget, QDate)
+        QAction, QCalendarWidget, QDate, QDateTime)

 from calibre.gui2.widgets import EnLineEdit, FormatList as _FormatList, ImageView
 from calibre.utils.icu import sort_key
@ -45,6 +45,9 @@ def save_dialog(parent, title, msg, det_msg=''):
    d.setStandardButtons(QMessageBox.Yes | QMessageBox.No | QMessageBox.Cancel)
    return d.exec_()

+def clean_text(x):
+    return re.sub(r'\s', ' ', x.strip())
+
 '''
 The interface common to all widgets used to set basic metadata
 class BasicMetadataWidget(object):
@ -117,7 +120,7 @@ class TitleEdit(EnLineEdit):
    def current_val(self):

        def fget(self):
-            title = unicode(self.text()).strip()
+            title = clean_text(unicode(self.text()))
            if not title:
                title = self.get_default()
            return title
@ -289,7 +292,7 @@ class AuthorsEdit(EditWithComplete):
    def current_val(self):

        def fget(self):
-            au = unicode(self.text()).strip()
+            au = clean_text(unicode(self.text()))
            if not au:
                au = self.get_default()
            return string_to_authors(au)
@ -352,7 +355,7 @@ class AuthorSortEdit(EnLineEdit):
    def current_val(self):

        def fget(self):
-            return unicode(self.text()).strip()
+            return clean_text(unicode(self.text()))

        def fset(self, val):
            if not val:
@ -472,7 +475,7 @@ class SeriesEdit(EditWithComplete):
    def current_val(self):

        def fget(self):
-            return unicode(self.currentText()).strip()
+            return clean_text(unicode(self.currentText()))

        def fset(self, val):
            if not val:
@ -1135,7 +1138,7 @@ class TagsEdit(EditWithComplete):  # {{{
    @dynamic_property
    def current_val(self):
        def fget(self):
-            return [x.strip() for x in unicode(self.text()).split(',')]
+            return [clean_text(x) for x in unicode(self.text()).split(',')]
        def fset(self, val):
            if not val:
                val = []
@ -1237,7 +1240,7 @@ class IdentifiersEdit(QLineEdit):  # {{{
    def current_val(self):
        def fget(self):
            raw = unicode(self.text()).strip()
-            parts = [x.strip() for x in raw.split(',')]
+            parts = [clean_text(x) for x in raw.split(',')]
            ans = {}
            for x in parts:
                c = x.split(':')
@ -1376,7 +1379,7 @@ class PublisherEdit(EditWithComplete):  # {{{
    def current_val(self):

        def fget(self):
-            return unicode(self.currentText()).strip()
+            return clean_text(unicode(self.currentText()))

        def fset(self, val):
            if not val:
@ -1472,6 +1475,16 @@ class DateEdit(QDateTimeEdit):
        o, c = self.original_val, self.current_val
        return o != c

+    def keyPressEvent(self, ev):
+        if ev.key() == Qt.Key_Minus:
+            ev.accept()
+            self.setDateTime(self.minimumDateTime())
+        elif ev.key() == Qt.Key_Equal:
+            ev.accept()
+            self.setDateTime(QDateTime.currentDateTime())
+        else:
+            return QDateTimeEdit.keyPressEvent(self, ev)
+
 class PubdateEdit(DateEdit):
    LABEL = _('Publishe&d:')
    FMT = 'MMM yyyy'
--- a/src/calibre/gui2/preferences/coloring.py
+++ b/src/calibre/gui2/preferences/coloring.py
@ -636,10 +636,20 @@ class RulesModel(QAbstractListModel): # {{{

    def rule_to_html(self, kind, col, rule):
        if not isinstance(rule, Rule):
+            if kind == 'color':
                return _('''
                <p>Advanced Rule for column <b>%(col)s</b>:
                <pre>%(rule)s</pre>
                ''')%dict(col=col, rule=prepare_string_for_xml(rule))
+            else:
+                return _('''
+                <p>Advanced Rule: set <b>%(typ)s</b> for column <b>%(col)s</b>:
+                <pre>%(rule)s</pre>
+                ''')%dict(col=col,
+                          typ=icon_rule_kinds[0][0]
+                            if kind == icon_rule_kinds[0][1] else icon_rule_kinds[1][0],
+                          rule=prepare_string_for_xml(rule))
+
        conditions = [self.condition_to_html(c) for c in rule.conditions]

        trans_kind = 'not found'
@ -761,7 +771,7 @@ class EditRules(QWidget): # {{{
                ' what icon to use. Click the Add Rule button below'
                ' to get started.<p>You can <b>change an existing rule</b> by'
                ' double clicking it.'))
-            self.add_advanced_button.setVisible(False)
+#             self.add_advanced_button.setVisible(False)

    def add_rule(self):
        d = RuleEditor(self.model.fm, self.pref_name)
@ -774,6 +784,7 @@ class EditRules(QWidget): # {{{
                self.changed.emit()

    def add_advanced(self):
+        if self.pref_name == 'column_color_rules':
            td = TemplateDialog(self, '', mi=self.mi, fm=self.fm, color_field='')
            if td.exec_() == td.Accepted:
                col, r = td.rule
@ -781,6 +792,15 @@ class EditRules(QWidget): # {{{
                    idx = self.model.add_rule('color', col, r)
                    self.rules_view.scrollTo(idx)
                    self.changed.emit()
+        else:
+            td = TemplateDialog(self, '', mi=self.mi, fm=self.fm, icon_field_key='')
+            if td.exec_() == td.Accepted:
+                print(td.rule)
+                typ, col, r = td.rule
+                if typ and r and col:
+                    idx = self.model.add_rule(typ, col, r)
+                    self.rules_view.scrollTo(idx)
+                    self.changed.emit()

    def edit_rule(self, index):
        try:
@ -790,8 +810,12 @@ class EditRules(QWidget): # {{{
        if isinstance(rule, Rule):
            d = RuleEditor(self.model.fm, self.pref_name)
            d.apply_rule(kind, col, rule)
-        else:
+        elif self.pref_name == 'column_color_rules':
            d = TemplateDialog(self, rule, mi=self.mi, fm=self.fm, color_field=col)
+        else:
+            d = TemplateDialog(self, rule, mi=self.mi, fm=self.fm, icon_field_key=col,
+                               icon_rule_kind=kind)
+
        if d.exec_() == d.Accepted:
            if len(d.rule) == 2: # Convert template dialog rules to a triple
                d.rule = ('color', d.rule[0], d.rule[1])
--- a/src/calibre/gui2/preferences/tweaks.py
+++ b/src/calibre/gui2/preferences/tweaks.py
@ -172,7 +172,10 @@ class Tweaks(QAbstractListModel, SearchQueryParser): # {{{
            doc.append(line[1:].strip())
        doc = '\n'.join(doc)
        while True:
+            try:
                line = lines[pos]
+            except IndexError:
+                break
            if not line.strip():
                break
            spidx1 = line.find(' ')
--- a/src/calibre/gui2/search_restriction_mixin.py
+++ b/src/calibre/gui2/search_restriction_mixin.py
@ -146,8 +146,12 @@ class CreateVirtualLibrary(QDialog):  # {{{

            <p>For example you can use a Virtual Library to only show you books with the Tag <i>"Unread"</i>
            or only books by <i>"My Favorite Author"</i> or only books in a particular series.</p>
+
+            <p>More information and examples are available in the
+            <a href="http://manual.calibre-ebook.com/virtual_libraries.html">User Manual</a>.</p>
            '''))
        hl.setWordWrap(True)
+        hl.setOpenExternalLinks(True)
        hl.setFrameStyle(hl.StyledPanel)
        gl.addWidget(hl, 0, 3, 4, 1)

--- a/src/calibre/gui2/store/stores/foyles_uk_plugin.py
+++ b/src/calibre/gui2/store/stores/foyles_uk_plugin.py
@ -1,7 +1,7 @@
 # -*- coding: utf-8 -*-

 from __future__ import (unicode_literals, division, absolute_import, print_function)
-store_version = 1 # Needed for dynamic plugin loading
+store_version = 2 # Needed for dynamic plugin loading

 __license__ = 'GPL 3'
 __copyright__ = '2011, John Schember <john@nachtimwald.com>'
@ -54,14 +54,13 @@ class FoylesUKStore(BasicStoreConfig, StorePlugin):
                id_ = ''.join(data.xpath('.//p[@class="doc-cover"]/a/@href')).strip()
                if not id_:
                    continue
+                id_ = 'http://ebooks.foyles.co.uk' + id_

                cover_url = ''.join(data.xpath('.//p[@class="doc-cover"]/a/img/@src'))
                title = ''.join(data.xpath('.//span[@class="title"]/a/text()'))
                author = ', '.join(data.xpath('.//span[@class="author"]/span[@class="author"]/text()'))
-                price = ''.join(data.xpath('.//span[@itemprop="price"]/text()'))
+                price = ''.join(data.xpath('.//span[@itemprop="price"]/text()')).strip()
                format_ = ''.join(data.xpath('.//p[@class="doc-meta-format"]/span[last()]/text()'))
-                format_, ign, drm = format_.partition(' ')
-                drm = SearchResult.DRM_LOCKED if 'DRM' in drm else SearchResult.DRM_UNLOCKED

                counter -= 1

@ -71,7 +70,7 @@ class FoylesUKStore(BasicStoreConfig, StorePlugin):
                s.author = author.strip()
                s.price = price
                s.detail_item = id_
-                s.drm = drm
+                s.drm = SearchResult.DRM_LOCKED
                s.formats = format_

                yield s
--- a/src/calibre/gui2/store/stores/koobe_plugin.py
+++ b/src/calibre/gui2/store/stores/koobe_plugin.py
@ -1,7 +1,7 @@
 # -*- coding: utf-8 -*-

 from __future__ import (division, absolute_import, print_function)
-store_version = 2  # Needed for dynamic plugin loading
+store_version = 3  # Needed for dynamic plugin loading

 __license__ = 'GPL 3'
 __copyright__ = '2013, Tomasz Długosz <tomek3d@gmail.com>'
@ -25,21 +25,20 @@ from calibre.gui2.store.web_store_dialog import WebStoreDialog
 class KoobeStore(BasicStoreConfig, StorePlugin):

    def open(self, parent=None, detail_item=None, external=False):
-        #aff_root = 'https://www.a4b-tracking.com/pl/stat-click-text-link/15/58/'
+        aff_root = 'https://www.a4b-tracking.com/pl/stat-click-text-link/15/58/'
+
        url = 'http://www.koobe.pl/'

-        #aff_url = aff_root + str(b64encode(url))
+        aff_url = aff_root + str(b64encode(url))

        detail_url = None
        if detail_item:
-            detail_url = detail_item #aff_root + str(b64encode(detail_item))
+            detail_url = aff_root + str(b64encode(detail_item))

        if external or self.config.get('open_external', False):
-            #open_url(QUrl(url_slash_cleaner(detail_url if detail_url else aff_url)))
-            open_url(QUrl(url_slash_cleaner(detail_url if detail_url else url)))
+            open_url(QUrl(url_slash_cleaner(detail_url if detail_url else aff_url)))
        else:
-            #d = WebStoreDialog(self.gui, url, parent, detail_url if detail_url else aff_url)
-            d = WebStoreDialog(self.gui, url, parent, detail_url if detail_url else url)
+            d = WebStoreDialog(self.gui, url, parent, detail_url if detail_url else aff_url)
            d.setWindowTitle(self.name)
            d.set_tags(self.config.get('tags', ''))
            d.exec_()
@ -64,7 +63,7 @@ class KoobeStore(BasicStoreConfig, StorePlugin):
                    cover_url = ''.join(data.xpath('.//div[@class="cover"]/a/img/@src'))
                    price = ''.join(data.xpath('.//span[@class="current_price"]/text()'))
                    title = ''.join(data.xpath('.//h2[@class="title"]/a/text()'))
-                    author = ''.join(data.xpath('.//h3[@class="book_author"]/a/text()'))
+                    author = ', '.join(data.xpath('.//h3[@class="book_author"]/a/text()'))
                    formats = ', '.join(data.xpath('.//div[@class="formats"]/div/div/@title'))

                    counter -= 1
--- a/Show More
+++ b/Show More