merge with John's branch

2025-08-11 09:13:57 -04:00 · 2011-07-29 10:40:15 +02:00 · 2011-07-29 10:40:15 +02:00 · c1214d4006
commit c1214d4006
parent 35e3b759da caaaa036df
199 changed files with 86897 additions and 45690 deletions
--- a/Changelog.yaml
+++ b/Changelog.yaml
@ -19,6 +19,132 @@
 #  new recipes:
 #    - title: 

+- version: 0.8.11
+  date: 2011-07-22
+
+  new features:
+    - title: "When doing a conversion from some format to the same format, save the original file"
+      description: "When calibre does a conversion from the same format to the same format, for
+        example, from EPUB to EPUB, the original file is saved as original_epub, so that in case the
+        conversion is poor, you can change the settings and run it again. The original is automatically used
+        every time you run a conversion with that format as input. If you want to disable this,
+        there is a tweak that prevents calibre from saving the originals in Preferences->Tweaks. You can
+        easily replace the converted version with the original in the Edit metadata dialog by right 
+        clicking on the list of formats in the top right corner."
+      type: major
+
+    - title: "Conversion pipeline: Add an option to control the height of the blank lines inserted by calibre"
+
+    - title: "Drivers for bq DaVinci, Samsung Galaxy ACE GT-S5830 and Medion e-reader"
+
+    - title: "Get Books: Add stores Chitanka and Bookoteka. Remove epubbuy.de at store's request"
+
+    - title: "Content server: Add a link at the bottom of the mobile interface to switch to the full interface."
+      tickets: [812525]
+
+    - title: "Update the kindle icon shown when a Kindle is connected to use a picture of the Kindle 3"
+      tickets: [810852]
+
+    - title: "MOBI Output: When converting epub documents that have a start element in their guide, use it to mark the starting position at which the MOBI file will be opened."
+      tickets: [804755]
+
+    - title: "News download: Add a default Accept header to all requests"
+  
+  bug fixes:
+    - title: "Fix regression that broke loading translations from .po files in the working directory"
+
+    - title: "Fix conversion dialog not allowing series numbers larger than 9999"
+      tickets: [813281]
+
+    - title: "Conversion pipeline: When adding/removing entries to the manifest, ignore unparseable URLs instead of erroring out on them"
+
+    - title: "SD Card in Azbooka not being detected"
+      tickets: [812750]
+
+    - title: "Conversion pipeline: Strip out large blocks of contiguous space (more than 10000 contiguous blanks) as these slow down the conversion process and are almost always indicative of an error in the input document."
+
+    - title: "ebook-convert: Abort if a keyboard interrupt is raised during parsing"
+
+    - title: "Regex builder: Show a nicer error message when the user has the file open in another program on windows."
+      tickets: [811641]
+
+    - title: "When converting in the GUI, set all identifiers present in the book's metadata in the output file, if the output format supports them."
+
+  improved recipes:
+    - NBObline
+    - JBPress
+    - Instapaper
+    - Die Zeit
+    - Wired (UK)
+
+  new recipes:
+    - title: Utrinski Vesnik
+      author: Darko Spasovski
+
+    - title: IDG.se
+      author: zapt0
+
+    - title: Los Andes
+      author: Darko Miletic
+
+    - title: De Luns a Venres
+      author: Susana Sotelo Docío
+
+    - title: "Nikkei News subscription version"
+      author: Ado Nishimura
+
+- version: 0.8.10
+  date: 2011-07-15
+
+  new features:
+    - title: "Add a right click menu to the cover browser. It allows you to view a book, edit metadata etc. from within the cover browser. The menu can be customized in Preferences->Toolbars"
+
+    - title: "Allow selecting and stopping multiple jobs at once in the jobs window"
+      tickets: [810349]
+
+    - title: "When editing metadata directly in the book list, have a little pop up menu so that all existing values can be accessed by mouse only. For example, when you edit authors, you can use the mouse to select an existing author."
+
+    - title: "Get Books: Add ebook.nl and fix price parsing for the legimi store"
+
+    - title: "Drivers for Samsung Infuse and Motorola XPERT"
+
+    - title: "Tag Browser: Make hierarchical items work in group searched terms."
+
+  bug fixes:
+    - title: "Allow setting numbers larger than 99 in custom series columns"
+
+    - title: "Fix a bug that caused the same news download sent via a USB connection to the device on two different days resulting in a duplicate on the device"
+
+    - title: "Ensure English in the list of interface languages in Preferences is always listed in English, so that it does not become hard to find"
+
+    - title: "SNB Output: Fix bug in handling unicode file names"
+
+    - title: "Fix sorting problem in manage categories. Fix poor performance problem when dropping multiple books onto a user category."
+
+    - title: "Remove 'empty field' error dialogs in bulk search/replace, instead setting the fields to their default value."
+
+    - title: "Fix regression that broke communicating with Kobo devices using outdated firmware"
+      tickets: [807832]
+
+    - title: "LRF Input: Fix conversion of LRF files with non ascii titles on some windows systems"
+      tickets: [807641]
+
+  improved recipes:
+    - Time
+    - Freakonomics Blog
+    - io9
+    - "Computer Act!ve"
+
+  new recipes:
+    - title: Techcrunch and Pecat
+      author: Darko Miletic
+
+    - title: Vio Mundo, IDG Now and Tojolaco
+      author: Diniz Bortoletto
+
+    - title: Geek and Poke, Automatiseringgids IT 
+      author: DrMerry
+
 - version: 0.8.9
  date: 2011-07-08

@ -32,7 +158,7 @@
    - title: "Conversion pipeline: Add option to control if duplicate entries are allowed when generating the Table of Contents from links."
      tickets: [806095]

-    - title: "Metadata download: When merging results, if the query to the xisbn service hangs, wait no more than 10 seconds. Also try harder to preserve the month when downlaoding published date. Do not throw away isbnless results if there are some sources that return isbns and some that do not."
+    - title: "Metadata download: When merging results, if the query to the xisbn service hangs, wait no more than 10 seconds. Also try harder to preserve the month when downloading published date. Do not throw away isbnless results if there are some sources that return isbns and some that do not."
      tickets: [798309]

    - title: "Get Books: Remove OpenLibrary since it has the same files as archive.org. Allow direct downloading from Project Gutenberg."
@ -617,7 +743,7 @@


 - version: 0.8.0
-  date: 2010-05-06
+  date: 2011-05-06

  new features:
    - title: "Go to http://calibre-ebook.com/new-in/eight to see what's new in 0.8.0"
--- a/recipes/corren2.recipe
+++ b/recipes/corren2.recipe
@ -1,39 +1,34 @@
+# -*- coding: utf-8 -*-
+
+__license__	= 'GPLv3'
+
 from calibre.web.feeds.news import BasicNewsRecipe

-class AdvancedUserRecipe1255797795(BasicNewsRecipe):
-    title          = u'Corren'
-    language = 'sv'
-    __author__ = 'Jonas Svensson'
-    simultaneous_downloads = 1
-    no_stylesheets = True
-    oldest_article = 7
+class AdvancedUserRecipe1311446032(BasicNewsRecipe):
+    title                 = 'Corren'
+    __author__            = 'Jonas Svensson'
+    description           = 'News from Sweden'
+    publisher             = 'Corren'
+    category              = 'news, politics, Sweden'
+    oldest_article        = 2
+    delay                 = 1
    max_articles_per_feed = 100
-    remove_attributes = ['onload']
-    timefmt = ''
+    no_stylesheets        = True
+    use_embedded_content  = False
+    encoding              = 'iso-8859-1'
+    language              = 'sv'

-    feeds          = [
-                   (u'Toppnyheter (alla kategorier)', u'http://www.corren.se/inc/RssHandler.ashx?id=4122151&ripurl=http://www.corren.se/nyheter/'),
-                   (u'Bostad', u'http://www.corren.se/inc/RssHandler.ashx?id=4122174&ripurl=http://www.corren.se/bostad/'),
-                   (u'Ekonomi & Jobb', u'http://www.corren.se/inc/RssHandler.ashx?id=4122176&ripurl=http://www.corren.se/ekonomi/'),
-                   (u'Kultur & Nöje', u'http://www.corren.se/inc/RssHandler.ashx?id=4122192&ripurl=http://www.corren.se/kultur/'),
-                   (u'Mat & dryck', u'http://www.corren.se/inc/RssHandler.ashx?id=4122201&ripurl=http://www.corren.se/mat-dryck/'),
-                   (u'Motor', u'http://www.corren.se/inc/RssHandler.ashx?id=4122203&ripurl=http://www.corren.se/motor/'),
-                   (u'Sport', u'http://www.corren.se/inc/RssHandler.ashx?id=4122206&ripurl=http://www.corren.se/sport/'),
-                   (u'Åsikter', u'http://www.corren.se/inc/RssHandler.ashx?id=4122223&ripurl=http://www.corren.se/asikter/'),
-                   (u'Mjölby', u'http://www.corren.se/inc/RssHandler.ashx?id=4122235&ripurl=http://www.corren.se/ostergotland/mjolby/'),
-                   (u'Motala', u'http://www.corren.se/inc/RssHandler.ashx?id=4122236&ripurl=http://www.corren.se/ostergotland/motala/')
-                     ]
-
-    def print_version(self, url):
-        url = url.replace("ekonomi/artikel.aspx", "Print.aspx")
-        url = url.replace("bostad/artikel.aspx", "Print.aspx")
-        url = url.replace("kultur/artikel.aspx", "Print.aspx")
-        url = url.replace("motor/artikel.aspx", "Print.aspx")
-        url = url.replace("mat-dryck/artikel.aspx", "Print.aspx")
-        url = url.replace("sport/artikel.aspx", "Print.aspx")
-        url = url.replace("asikter/artikel.aspx", "Print.aspx")
-        url = url.replace("mat-dryck/artikel.aspx", "Print.aspx")
-        url = url.replace("ostergotland/mjolby/artikel.aspx", "Print.aspx")
-        url = url.replace("ostergotland/motala/artikel.aspx", "Print.aspx")
-        return url.replace("nyheter/artikel.aspx", "Print.aspx")
+    feeds = [
+              (u'Toppnyheter', u'http://www.corren.se/inc/RssHandler.ashx?id=4122151&ripurl=http://www.corren.se/nyheter/')
+              ,(u'Ekonomi', u'http://www.corren.se/inc/RssHandler.ashx?id=4122176&ripurl=http://www.corren.se/ekonomi/')
+              ,(u'Link\xf6ping', u'http://www.corren.se/inc/RssHandler.ashx?id=4122234')
+              ,(u'Åsikter', u'http://www.corren.se/inc/RssHandler.ashx?id=4122223,4122224,4122226,4122227,4122228,4122229,4122230')
+            ]

+    keep_only_tags = [dict(name='div', attrs={'id':'article'}),dict(name='div', attrs={'class':'body'})]
+    remove_tags = [
+                     dict(name='ul',attrs={'class':'functions'})
+                     ,dict(name='a',attrs={'href':'javascript*'})
+                     ,dict(name='div',attrs={'class':'box'})
+                     ,dict(name='div',attrs={'class':'functionsbottom'})
+                  ]
--- a/recipes/dagens_industri.recipe
+++ b/recipes/dagens_industri.recipe
@ -0,0 +1,32 @@
+# -*- coding: utf-8 -*-
+
+__license__	= 'GPLv3'
+
+from calibre.web.feeds.news import BasicNewsRecipe
+
+class AdvancedUserRecipe1311450855(BasicNewsRecipe):
+    title          = u'Dagens Industri'
+    __author__            = 'Jonas Svensson'
+    description           = 'Economy news from Sweden'
+    publisher             = 'DI'
+    category              = 'news, politics, Sweden'
+    oldest_article        = 2
+    delay                 = 1
+    max_articles_per_feed = 100
+    no_stylesheets        = True
+    use_embedded_content  = False
+    encoding              = 'utf-8'
+    language              = 'sv'
+
+    feeds          = [(u'DI', u'http://di.se/rss')]
+
+    keep_only_tags = [dict(name='h1', attrs={'id':'ctl00_ExtraWideContentRegion_WideContentRegion_MainRegion_MainContentRegion_MainBodyRegion_headlineNormal'}),dict(name='div', attrs={'id':'articleBody'})]
+
+    remove_tags = [
+                     dict(name='div',attrs={'class':'article-actions clear'})
+                     ,dict(name='div',attrs={'class':'article-action-popup'})
+                     ,dict(name='div',attrs={'class':'header'})
+                     ,dict(name='div',attrs={'class':'content clear'})
+                     ,dict(name='div',attrs={'id':'articleAdvertisementDiv'})
+                     ,dict(name='ul',attrs={'class':'action-list'})
+                  ]
--- a/recipes/freakonomics.recipe
+++ b/recipes/freakonomics.recipe
@ -1,25 +1,29 @@
 #!/usr/bin/env  python
 __license__   = 'GPL v3'
-__copyright__ = '2009, Kovid Goyal kovid@kovidgoyal.net'
+__copyright__ = '2011, Starson17'
 __docformat__ = 'restructuredtext en'

 from calibre.web.feeds.news import BasicNewsRecipe

 class Freakonomics(BasicNewsRecipe):
-
    title = 'Freakonomics Blog'
    description = 'The Hidden side of everything'
-    __author__ = 'Starson17'
+    __author__          = 'Starson17'
+    __version__         = '1.02'
+    __date__            = '11 July 2011'
    language = 'en'
    cover_url           = 'http://ilkerugur.files.wordpress.com/2009/04/freakonomics.jpg'
+    use_embedded_content= False
+    no_stylesheets      = True
+    oldest_article      = 30
+    remove_javascript   = True
+    remove_empty_feeds    = True
+    max_articles_per_feed = 50

-    feeds = [('Blog', 'http://feeds.feedburner.com/freakonomicsblog')]
-
-    keep_only_tags     = [dict(name='div', attrs={'id':'header'}),
-                          dict(name='h1'),
-                          dict(name='h2'),
-                          dict(name='div', attrs={'class':'entry-content'}),
-                          ]
+    feeds = [(u'Freakonomics Blog', u'http://www.freakonomics.com/feed/')]
+    keep_only_tags = [dict(name='div', attrs={'id':['content']})]
+    remove_tags_after = [dict(name='div', attrs={'class':['simple_socialmedia']})]
+    remove_tags = [dict(name='div', attrs={'class':['simple_socialmedia','single-fb-share','wp-polls']})]
    extra_css = '''
                    h1{font-family:Arial,Helvetica,sans-serif; font-weight:bold;font-size:large;}
                    h2{font-family:Arial,Helvetica,sans-serif; font-weight:normal;font-size:small;}
--- a/recipes/galicia_confidential.recipe
+++ b/recipes/galicia_confidential.recipe
@ -1,5 +1,4 @@
 # -*- coding: utf-8 -*-
-
 from calibre.web.feeds.news import BasicNewsRecipe
 from calibre.web.feeds import Feed

@ -36,14 +35,13 @@ class GC_gl(BasicNewsRecipe):


    def feed_to_index_append(self, feedObject, masterFeed):
-        for feed in feedObject:
-            newArticles = []
-            for article in feed.articles:
-                newArt = {
-                                'title' : article.title,
-                                    'url'   : article.url,
-                                    'date'  : article.date
-                    }
-                newArticles.append(newArt)
-            masterFeed.append((feed.title,newArticles))
-
+	for feed in feedObject:
+		newArticles = []
+		for article in feed.articles:
+			newArt = {
+	                          'title' : article.title,
+      	                          'url'   : article.url,
+            	                  'date'  : article.date
+				}
+			newArticles.append(newArt)
+		masterFeed.append((feed.title,newArticles))
--- a/recipes/guardian.recipe
+++ b/recipes/guardian.recipe
@ -12,7 +12,7 @@ from datetime import date

 class Guardian(BasicNewsRecipe):

-    title = u'The Guardian / The Observer'
+    title = u'The Guardian and The Observer'
    if date.today().weekday() == 6:
        base_url = "http://www.guardian.co.uk/theobserver"
    else:
@ -28,7 +28,7 @@ class Guardian(BasicNewsRecipe):
    # List of section titles to ignore
    # For example: ['Sport']
    ignore_sections = []
-    
+
    timefmt = ' [%a, %d %b %Y]'
    keep_only_tags = [
                      dict(name='div', attrs={'id':["content","article_header","main-article-info",]}),
@ -94,7 +94,7 @@ class Guardian(BasicNewsRecipe):
                    prefix = section_title + ': '
                for subsection in s.parent.findAll('a', attrs={'class':'book-section'}):
                    yield (prefix + self.tag_to_string(subsection), subsection['href'])
-    
+
    def find_articles(self, url):
        soup = self.index_to_soup(url)
        div = soup.find('div', attrs={'class':'book-index'})
@ -115,7 +115,7 @@ class Guardian(BasicNewsRecipe):
                        'title': title, 'url':url, 'description':desc,
                        'date' : strftime('%a, %d %b'),
                        }
-    
+
    def parse_index(self):
        try:
            feeds = []
--- a/recipes/icons/losandes.png
+++ b/recipes/icons/losandes.png
--- a/recipes/icons/techcrunch.png
+++ b/recipes/icons/techcrunch.png
--- a/recipes/idg_se.recipe
+++ b/recipes/idg_se.recipe
@ -0,0 +1,33 @@
+__license__ = 'GPLv3'
+
+from calibre.web.feeds.news import BasicNewsRecipe
+
+class IDGse(BasicNewsRecipe):
+    title               = 'IDG'
+    description = 'IDG.se'
+    language = 'se'
+    __author__ = 'zapt0'
+    oldest_article = 1
+    max_articles_per_feed = 40
+    no_stylesheets = True
+    encoding = 'ISO-8859-1'
+    remove_javascript = True
+
+    feeds          = [(u'Senaste nytt',u'http://feeds.idg.se/idg/vzzs')]
+
+    def print_version(self,url):
+            return url + '?articleRenderMode=print&m=print'
+
+    def get_cover_url(this):
+        return 'http://idgmedia.idg.se/polopoly_fs/2.3275!images/idgmedia_logo_75.jpg'
+
+    keep_only_tags = [
+                                            dict(name='h1'),
+                                            dict(name='div', attrs={'class':['divColumn1Article']}),
+                                            ]
+    #remove ads
+    remove_tags = [
+                                    dict(name='div', attrs={'id':['preamble_ad']}),
+                                    dict(name='ul', attrs={'class':['share']})
+                                ]
+
--- a/recipes/instapaper.recipe
+++ b/recipes/instapaper.recipe
@ -1,22 +1,31 @@
-from calibre import strftime
 from calibre.web.feeds.news import BasicNewsRecipe

 class AdvancedUserRecipe1299694372(BasicNewsRecipe):
-    title          = u'Instapaper'
-    __author__            = 'Darko Miletic'
-    publisher             = 'Instapaper.com'
-    category              = 'info, custom, Instapaper'
-    oldest_article = 365
+    title                             = u'Instapaper'
+    __author__                  = 'Darko Miletic'
+    publisher                     = 'Instapaper.com'
+    category                      = 'info, custom, Instapaper'
+    oldest_article               = 365
    max_articles_per_feed = 100
    no_stylesheets        = True
+    remove_javascript     = True
+    remove_tags              = [
+	dict(name='div', attrs={'id':'text_controls_toggle'})
+	,dict(name='script')
+	,dict(name='div', attrs={'id':'text_controls'})
+	,dict(name='div', attrs={'id':'editing_controls'})
+	,dict(name='div', attrs={'class':'bar bottom'})
+	 ]
    use_embedded_content  = False
    needs_subscription    = True
    INDEX                 = u'http://www.instapaper.com'
    LOGIN                 = INDEX + u'/user/login'


-
-    feeds          = [(u'Instapaper Unread', u'http://www.instapaper.com/u'), (u'Instapaper Starred', u'http://www.instapaper.com/starred')]
+    feeds          = [
+            (u'Instapaper Unread', u'http://www.instapaper.com/u'),
+            (u'Instapaper Starred', u'http://www.instapaper.com/starred')
+            ]

    def get_browser(self):
        br = BasicNewsRecipe.get_browser()
@ -34,21 +43,28 @@ class AdvancedUserRecipe1299694372(BasicNewsRecipe):
        lfeeds = self.get_feeds()
        for feedobj in lfeeds:
            feedtitle, feedurl = feedobj
-            self.report_progress(0, _('Fetching feed')+' %s...'%(feedtitle if feedtitle else feedurl))
+            self.report_progress(0, 'Fetching feed'+' %s...'%(feedtitle if feedtitle else feedurl))
            articles = []
            soup = self.index_to_soup(feedurl)
-            for item in soup.findAll('div', attrs={'class':'titleRow'}):
-                description = self.tag_to_string(item.div)
+            for item in soup.findAll('div', attrs={'class':'cornerControls'}):
+                #description = self.tag_to_string(item.div)
                atag = item.a
                if atag and atag.has_key('href'):
                    url         = atag['href']
-                    title       = self.tag_to_string(atag)
-                    date        = strftime(self.timefmt)
                    articles.append({
-                                      'title'      :title
-                                     ,'date'       :date
-                                     ,'url'        :url
-                                     ,'description':description
+                                     'url'        :url
                                    })
            totalfeeds.append((feedtitle, articles))
        return totalfeeds
+
+    def print_version(self, url):
+        return 'http://www.instapaper.com' + url
+
+    def populate_article_metadata(self, article, soup, first):
+        article.title  = soup.find('title').contents[0].strip()
+
+    def postprocess_html(self, soup, first_fetch):
+        for link_tag in soup.findAll(attrs={"id" : "story"}):
+            link_tag.insert(0,'<h1>'+soup.find('title').contents[0].strip()+'</h1>')
+
+        return soup
--- a/recipes/irish_times.recipe
+++ b/recipes/irish_times.recipe
@ -1,4 +1,4 @@
-__license__   = 'GPL v3'
+__license__  = 'GPL v3'
 __copyright__ = "2008, Derry FitzGerald. 2009 Modified by Ray Kinsella and David O'Callaghan, 2011 Modified by Phil Burns"
 '''
 irishtimes.com
@ -10,7 +10,7 @@ from calibre.web.feeds.news import BasicNewsRecipe
 class IrishTimes(BasicNewsRecipe):
    title          = u'The Irish Times'
    encoding  = 'ISO-8859-15'
-    __author__     = "Derry FitzGerald, Ray Kinsella, David O'Callaghan and Phil Burns"
+    __author__    = "Derry FitzGerald, Ray Kinsella, David O'Callaghan and Phil Burns"
    language = 'en_IE'
    timefmt = ' (%A, %B %d, %Y)'

@ -18,6 +18,7 @@ class IrishTimes(BasicNewsRecipe):
    oldest_article = 1.0
    max_articles_per_feed  = 100
    no_stylesheets = True
+    simultaneous_downloads= 5

    r = re.compile('.*(?P<url>http:\/\/(www.irishtimes.com)|(rss.feedsportal.com\/c)\/.*\.html?).*')
    remove_tags    = [dict(name='div', attrs={'class':'footer'})]
@ -25,17 +26,17 @@ class IrishTimes(BasicNewsRecipe):

    feeds          = [
                      ('Frontpage', 'http://www.irishtimes.com/feeds/rss/newspaper/index.rss'),
-                      ('Ireland', 'http://rss.feedsportal.com/c/851/f/10845/index.rss'),
-                      ('World', 'http://rss.feedsportal.com/c/851/f/10846/index.rss'),
-                      ('Finance', 'http://rss.feedsportal.com/c/851/f/10847/index.rss'),
-                      ('Features', 'http://rss.feedsportal.com/c/851/f/10848/index.rss'),
-                      ('Sport', 'http://rss.feedsportal.com/c/851/f/10849/index.rss'),
-                      ('Opinion', 'http://rss.feedsportal.com/c/851/f/10850/index.rss'),
-                      ('Letters', 'http://rss.feedsportal.com/c/851/f/10851/index.rss'),
+                      ('Ireland', 'http://www.irishtimes.com/feeds/rss/newspaper/ireland.rss'),
+                      ('World', 'http://www.irishtimes.com/feeds/rss/newspaper/world.rss'),
+                      ('Finance', 'http://www.irishtimes.com/feeds/rss/newspaper/finance.rss'),
+                      ('Features', 'http://www.irishtimes.com/feeds/rss/newspaper/features.rss'),
+                      ('Sport', 'http://www.irishtimes.com/feeds/rss/newspaper/sport.rss'),
+                      ('Opinion', 'http://www.irishtimes.com/feeds/rss/newspaper/opinion.rss'),
+                      ('Letters', 'http://www.irishtimes.com/feeds/rss/newspaper/letters.rss'),
                      ('Magazine', 'http://www.irishtimes.com/feeds/rss/newspaper/magazine.rss'),
-                      ('Health', 'http://rss.feedsportal.com/c/851/f/10852/index.rss'),
-                      ('Education & Parenting', 'http://rss.feedsportal.com/c/851/f/10853/index.rss'),
-                      ('Motors', 'http://rss.feedsportal.com/c/851/f/10854/index.rss'),
+                      ('Health', 'http://www.irishtimes.com/feeds/rss/newspaper/health.rss'),
+                      ('Education & Parenting', 'http://www.irishtimes.com/feeds/rss/newspaper/education.rss'),
+                      ('Motors', 'http://www.irishtimes.com/feeds/rss/newspaper/motors.rss'),
                      ('An Teanga Bheo', 'http://www.irishtimes.com/feeds/rss/newspaper/anteangabheo.rss'),
                      ('Commercial Property', 'http://www.irishtimes.com/feeds/rss/newspaper/commercialproperty.rss'),
                      ('Science Today', 'http://www.irishtimes.com/feeds/rss/newspaper/sciencetoday.rss'),
@ -49,10 +50,16 @@ class IrishTimes(BasicNewsRecipe):

    def print_version(self, url):
        if url.count('rss.feedsportal.com'):
-            u = url.replace('0Bhtml/story01.htm','_pf0Bhtml/story01.htm')
+            #u = url.replace('0Bhtml/story01.htm','_pf0Bhtml/story01.htm')
+            u = url.find('irishtimes')
+            u = 'http://www.irishtimes.com' + url[u + 12:]
+            u = u.replace('0C', '/')
+            u = u.replace('A', '')
+            u = u.replace('0Bhtml/story01.htm', '_pf.html')
        else:
            u = url.replace('.html','_pf.html')
        return u

    def get_article_url(self, article):
        return article.link
+
--- a/recipes/jbpress.recipe
+++ b/recipes/jbpress.recipe
@ -1,4 +1,4 @@
-import urllib2
+import urllib2, re
 from calibre.web.feeds.news import BasicNewsRecipe

 class JBPress(BasicNewsRecipe):
@ -40,3 +40,12 @@ class JBPress(BasicNewsRecipe):
    def print_version(self, url):
        url = urllib2.urlopen(url).geturl() # resolve redirect.
        return url.replace('/-/', '/print/')
+
+    def preprocess_html(self, soup):
+            # remove breadcrumb
+            h3s = soup.findAll('h3')
+            for h3 in h3s:
+                if re.compile('^JBpress&gt;').match(h3.string):
+                    h3.extract()
+            return soup
+
--- a/recipes/losandes.recipe
+++ b/recipes/losandes.recipe
@ -0,0 +1,78 @@
+__license__   = 'GPL v3'
+__copyright__ = '2011, Darko Miletic <darko.miletic at gmail.com>'
+'''
+www.losandes.com.ar
+'''
+
+from calibre import strftime
+from calibre.web.feeds.news import BasicNewsRecipe
+
+class LosAndes(BasicNewsRecipe):
+    title                 = 'Los Andes'
+    __author__            = 'Darko Miletic'
+    description           = 'Noticias de Mendoza, Argentina y el resto del mundo'
+    publisher             = 'Los Andes'
+    category              = 'news, politics, Argentina'
+    oldest_article        = 2
+    max_articles_per_feed = 200
+    no_stylesheets        = True
+    encoding              = 'cp1252'
+    use_embedded_content  = False
+    language              = 'es_AR'
+    remove_empty_feeds    = True
+    publication_type      = 'newspaper'
+    masthead_url          = 'http://www.losandes.com.ar/graficos/losandes.png'
+    extra_css             = """
+                               body{font-family: Arial,Helvetica,sans-serif }
+                               h1,h2{font-family: "Times New Roman",Times,serif}
+                               .fechaNota{font-weight: bold; color: gray}
+                            """
+
+    conversion_options = {
+                          'comment'   : description
+                        , 'tags'      : category
+                        , 'publisher' : publisher
+                        , 'language'  : language
+                        }
+
+    remove_tags = [
+                     dict(name=['meta','link'])
+                    ,dict(attrs={'class':['cabecera', 'url']})
+                  ]
+    remove_tags_before=dict(attrs={'class':'cabecera'})
+    remove_tags_after=dict(attrs={'class':'url'})
+
+
+    feeds = [
+              (u'Ultimas Noticias'       , u'http://www.losandes.com.ar/servicios/rss.asp?r=78' )
+             ,(u'Politica'               , u'http://www.losandes.com.ar/servicios/rss.asp?r=68' )
+             ,(u'Economia nacional'      , u'http://www.losandes.com.ar/servicios/rss.asp?r=65' )
+             ,(u'Economia internacional' , u'http://www.losandes.com.ar/servicios/rss.asp?r=505')
+             ,(u'Internacionales'        , u'http://www.losandes.com.ar/servicios/rss.asp?r=66' )
+             ,(u'Turismo'                , u'http://www.losandes.com.ar/servicios/rss.asp?r=502')
+             ,(u'Fincas'                 , u'http://www.losandes.com.ar/servicios/rss.asp?r=504')
+             ,(u'Isha nos habla'         , u'http://www.losandes.com.ar/servicios/rss.asp?r=562')
+             ,(u'Estilo'                 , u'http://www.losandes.com.ar/servicios/rss.asp?r=81' )
+             ,(u'Cultura'                , u'http://www.losandes.com.ar/servicios/rss.asp?r=503')
+             ,(u'Policiales'             , u'http://www.losandes.com.ar/servicios/rss.asp?r=70' )
+             ,(u'Deportes'               , u'http://www.losandes.com.ar/servicios/rss.asp?r=69' )
+             ,(u'Sociedad'               , u'http://www.losandes.com.ar/servicios/rss.asp?r=67' )
+             ,(u'Opinion'                , u'http://www.losandes.com.ar/servicios/rss.asp?r=80' )
+             ,(u'Editorial'              , u'http://www.losandes.com.ar/servicios/rss.asp?r=76' )
+             ,(u'Mirador'                , u'http://www.losandes.com.ar/servicios/rss.asp?r=79' )
+            ]
+
+    def print_version(self, url):
+        artid = url.rpartition('.')[0].rpartition('-')[2]
+        return "http://www.losandes.com.ar/includes/modulos/imprimir.asp?tipo=noticia&id=" + artid
+
+    def get_cover_url(self):
+        month = strftime("%m").lstrip('0')
+        day   = strftime("%d").lstrip('0')
+        year  = strftime("%Y")
+        return "http://www.losandes.com.ar/fotografias/fotosnoticias/" + year + "/" + month + "/" + day + "/th_tapa.jpg"
+
+    def preprocess_html(self, soup):
+        for item in soup.findAll(style=True):
+            del item['style']
+        return soup
--- a/recipes/luns_a_venres.recipe
+++ b/recipes/luns_a_venres.recipe
@ -0,0 +1,44 @@
+# -*- coding: utf-8 -*-
+from calibre.web.feeds.news import BasicNewsRecipe
+
+class LV_gl(BasicNewsRecipe):
+    title                 = u'De Luns a Venres (RSS)'
+    __author__            = u'Susana Sotelo Docío'
+    description           = u'O gratuíto galego'
+    publisher             = u'Galiciaé'
+    category              = u'news'
+    encoding              = 'utf-8'
+    language              = 'gl'
+    direction             = 'ltr'
+    cover_url             = 'http://lv.galiciae.com/new_estilos/lv/logo.gif'
+    oldest_article        = 2
+    max_articles_per_feed = 200
+    center_navbar         = False
+
+    feeds                 = [
+                              (u'Galicia', u'http://lv.galiciae.com/cache/rss/sec_galicia_gl.rss'),
+                              (u'Cultura', u'http://lv.galiciae.com/cache/rss/sec_cultura_gl.rss'),
+                              (u'Mundo', u'http://lv.galiciae.com/cache/rss/sec_mundo_gl.rss'),
+                              (u'Cidadanía', u'http://lv.galiciae.com/cache/rss/sec_ciudadania_gl.rss'),
+                              (u'Tecnoloxía', u'http://lv.galiciae.com/cache/rss/sec_tecnologia_gl.rss'),
+                              (u'España', u'http://lv.galiciae.com/cache/rss/sec_espana_gl.rss'),
+                              (u'Deportes', u'http://lv.galiciae.com/cache/rss/sec_deportes_gl.rss'),
+                              (u'Economía', u'http://lv.galiciae.com/cache/rss/sec_economia_gl.rss'),
+                              (u'Lercheo', u'http://lv.galiciae.com/cache/rss/sec_gente_gl.rss'),
+                              (u'Medio ambiente', u'http://lv.galiciae.com/cache/rss/sec_medioambiente_gl.rss'),
+                              (u'España/Mundo', u'http://lv.galiciae.com/cache/rss/sec_espanamundo_gl.rss'),
+                              (u'Sociedade', u'http://lv.galiciae.com/cache/rss/sec_sociedad_gl.rss'),
+                              (u'Ciencia', u'http://lv.galiciae.com/cache/rss/sec_ciencia_gl.rss'),
+                              (u'Motor', u'http://lv.galiciae.com/cache/rss/sec_motor_gl.rss'),
+                              (u'Coches', u'http://lv.galiciae.com/cache/rss/sec_coches_gl.rss'),
+                              (u'Motos', u'http://lv.galiciae.com/cache/rss/sec_motos_gl.rss'),
+                              (u'Industriais', u'http://lv.galiciae.com/cache/rss/sec_industriales_gl.rss')
+                            ]
+
+    extra_css             = u' p{text-align:left} '
+    html2epub_options     = 'publisher="' + publisher + '"\ncomments="' + description + '"\nencoding="' + encoding + '"\ntags="' + category + '"\noverride_css=" p {text-align:left; text-indent: 0cm} "'
+
+    def print_version(self, url):
+        url += '?imprimir&lang=gl'
+        return url
+
--- a/recipes/nbonline.recipe
+++ b/recipes/nbonline.recipe
@ -1,11 +1,10 @@
-EMAILADDRESS = 'hoge@foobar.co.jp'
 from calibre.web.feeds.news import BasicNewsRecipe


 class NBOnline(BasicNewsRecipe):
    title          = u'Nikkei Business Online'
    language = 'ja'
-    description = u'Nikkei Business Online New articles. PLEASE NOTE: You need to edit EMAILADDRESS line of this "nbonline.recipe" file to set your e-mail address which is needed when login. (file is in "Calibre2/resources/recipes" directory.)'
+    description = u'Nikkei Business Online.\u6CE8\uFF1A\u30E6\u30FC\u30B6\u30FC\u540D\u306Bemail\u30A2\u30C9\u30EC\u30B9\u3068\u30E6\u30FC\u30B6\u30FC\u540D\u3092\u30BB\u30DF\u30B3\u30ED\u30F3\u3067\u533A\u5207\u3063\u3066\u5165\u308C\u3066\u304F\u3060\u3055\u3044\u3002\u4F8B\uFF1Aemail@address.jp;username . PLEASE NOTE: You need to put your email address and username into username filed separeted by ; (semi-colon).'
    __author__	= 'Ado Nishimura'
    needs_subscription = True
    oldest_article = 7
@ -23,8 +22,8 @@ class NBOnline(BasicNewsRecipe):
        if self.username is not None and self.password is not None:
            br.open('https://signon.nikkeibp.co.jp/front/login/?ct=p&ts=nbo')
            br.select_form(name='loginActionForm')
-            br['email']   = EMAILADDRESS
-            br['userId']   = self.username
+            br['email']   = self.username.split(';')[0]
+            br['userId']   = self.username.split(';')[1]
            br['password'] = self.password
            br.submit()
        return br
--- a/recipes/nikkei_news.recipe
+++ b/recipes/nikkei_news.recipe
@ -0,0 +1,88 @@
+from calibre.web.feeds.recipes import BasicNewsRecipe
+import re
+
+#import pprint, sys
+#pp = pprint.PrettyPrinter(indent=4)
+
+class NikkeiNet_paper_subscription(BasicNewsRecipe):
+    title           = u'\u65E5\u672C\u7D4C\u6E08\u65B0\u805E\uFF08\u671D\u520A\u30FB\u5915\u520A\uFF09'
+    __author__      = 'Ado Nishimura'
+    description     = u'\u65E5\u7D4C\u96FB\u5B50\u7248\u306B\u3088\u308B\u65E5\u672C\u7D4C\u6E08\u65B0\u805E\u3002\u671D\u520A\u30FB\u5915\u520A\u306F\u53D6\u5F97\u6642\u9593\u306B\u3088\u308A\u5207\u308A\u66FF\u308F\u308A\u307E\u3059\u3002\u8981\u8CFC\u8AAD'
+    needs_subscription = True
+    oldest_article  = 1
+    max_articles_per_feed = 30
+    language        = 'ja'
+    no_stylesheets  = True
+    cover_url       = 'http://parts.nikkei.com/parts/ds/images/common/logo_r1.svg'
+    masthead_url    = 'http://parts.nikkei.com/parts/ds/images/common/logo_r1.svg'
+
+    remove_tags_before = {'class':"cmn-indent"}
+    remove_tags = [
+#                       {'class':"cmn-article_move"},
+#                       {'class':"cmn-pr_list"},
+#                       {'class':"cmnc-zoom"},
+                       {'class':"cmn-hide"},
+                       {'name':'form'},
+                  ]
+    remove_tags_after = {'class':"cmn-indent"}
+
+    def get_browser(self):
+        br = BasicNewsRecipe.get_browser()
+
+        #pp.pprint(self.parse_index())
+        #exit(1)
+
+        #br.set_debug_http(True)
+        #br.set_debug_redirects(True)
+        #br.set_debug_responses(True)
+
+        if self.username is not None and self.password is not None:
+            print "----------------------------open top page----------------------------------------"
+            br.open('http://www.nikkei.com/')
+            print "----------------------------open first login form--------------------------------"
+            link = br.links(url_regex="www.nikkei.com/etc/accounts/login").next()
+            br.follow_link(link)
+            #response = br.response()
+            #print response.get_data()
+            print "----------------------------JS redirect(send autoPostForm)-----------------------"
+            br.select_form(name='autoPostForm')
+            br.submit()
+            #response = br.response()
+            print "----------------------------got login form---------------------------------------"
+            br.select_form(name='LA0210Form01')
+            br['LA0210Form01:LA0210Email']    = self.username
+            br['LA0210Form01:LA0210Password'] = self.password
+            br.submit()
+            #response = br.response()
+            print "----------------------------JS redirect------------------------------------------"
+            br.select_form(nr=0)
+            br.submit()
+
+            #br.set_debug_http(False)
+            #br.set_debug_redirects(False)
+            #br.set_debug_responses(False)
+        return br
+
+    def cleanup(self):
+        print "----------------------------logout-----------------------------------------------"
+        self.browser.open('https://regist.nikkei.com/ds/etc/accounts/logout')
+
+    def parse_index(self):
+        print "----------------------------get index of paper-----------------------------------"
+        result = []
+        soup = self.index_to_soup('http://www.nikkei.com/paper/')
+        #soup = self.index_to_soup(self.test_data())
+        for sect in soup.findAll('div', 'cmn-section kn-special JSID_baseSection'):
+            sect_title = sect.find('h3', 'cmnc-title').string
+            sect_result = []
+            for elem in sect.findAll(attrs={'class':['cmn-article_title']}):
+                url = 'http://www.nikkei.com' + elem.span.a['href']
+                url = re.sub("/article/", "/print-article/", url) # print version.
+                span = elem.span.a.span
+                if ((span is not None) and (len(span.contents) > 1)):
+                    title = span.contents[1].string
+                    sect_result.append(dict(title=title, url=url, date='',
+                                            description='', content=''))
+            result.append([sect_title, sect_result])
+        #pp.pprint(result)
+
--- a/recipes/techcrunch.recipe
+++ b/recipes/techcrunch.recipe
@ -0,0 +1,63 @@
+__license__   = 'GPL v3'
+__copyright__ = '2011, Darko Miletic <darko.miletic at gmail.com>'
+'''
+techcrunch.com
+'''
+
+from calibre.web.feeds.news import BasicNewsRecipe
+
+class TechCrunch(BasicNewsRecipe):
+    title                 = 'TechCrunch'
+    __author__            = 'Darko Miletic'
+    description           = 'IT News'
+    publisher             = 'AOL Inc.'
+    category              = 'news, IT'
+    oldest_article        = 2
+    max_articles_per_feed = 200
+    no_stylesheets        = True
+    encoding              = 'utf8'
+    use_embedded_content  = False
+    language              = 'en'
+    remove_empty_feeds    = True
+    publication_type      = 'newsportal'
+    masthead_url          = 'http://s2.wp.com/wp-content/themes/vip/tctechcrunch2/images/site-logo.png'
+    extra_css             = """
+                               body{font-family: Helvetica,Arial,sans-serif }
+                               img{margin-bottom: 0.4em; display:block}
+                            """
+
+    conversion_options = {
+                          'comment'   : description
+                        , 'tags'      : category
+                        , 'publisher' : publisher
+                        , 'language'  : language
+                        }
+
+    remove_tags = [dict(name=['meta','link'])]
+    remove_attributes=['lang']
+    keep_only_tags=[
+                      dict(name='h1', attrs={'class':'headline'})
+                     ,dict(attrs={'class':['author','post-time','body-copy']})
+                   ]
+
+    feeds = [(u'News', u'http://feeds.feedburner.com/TechCrunch/')]
+
+    def preprocess_html(self, soup):
+        for item in soup.findAll(style=True):
+            del item['style']
+        for item in soup.findAll('a'):
+            limg = item.find('img')
+            if item.string is not None:
+               str = item.string
+               item.replaceWith(str)
+            else:
+               if limg:
+                  item.name = 'div'
+                  item.attrs = []
+               else:
+                   str = self.tag_to_string(item)
+                   item.replaceWith(str)
+        for item in soup.findAll('img'):
+            if not item.has_key('alt'):
+               item['alt'] = 'image'            
+        return soup
--- a/recipes/tijolaco.recipe
+++ b/recipes/tijolaco.recipe
@ -0,0 +1,24 @@
+from calibre.web.feeds.recipes import BasicNewsRecipe
+
+class Tijolaco(BasicNewsRecipe):
+    title                  = u'Tijolaco.com'
+    __author__             = u'Diniz Bortolotto'
+    description            = u'Posts do Blog Tijola\xe7o.com'
+    oldest_article         = 7
+    max_articles_per_feed  = 50
+    encoding               = 'utf8'
+    publisher              = u'Brizola Neto'
+    category               = 'politics, Brazil'
+    language               = 'pt_BR'
+    publication_type       = 'politics portal'
+    use_embedded_content   = False
+    no_stylesheets         = True
+    remove_javascript      = True
+
+    feeds                  = [(u'Blog Tijola\xe7o.com', u'http://feeds.feedburner.com/Tijolacoblog')]
+
+    reverse_article_order  = True
+
+    keep_only_tags         = [dict(name='div', attrs={'class':'post'})]
+
+    remove_tags            = [dict(name='span', attrs={'class':'com'})]
--- a/recipes/time_magazine.recipe
+++ b/recipes/time_magazine.recipe
@ -8,47 +8,33 @@ time.com

 import re
 from calibre.web.feeds.news import BasicNewsRecipe
+from lxml import html

 class Time(BasicNewsRecipe):
    #recipe_disabled = ('This recipe has been disabled as TIME no longer'
    #        ' publish complete articles on the web.')
    title                 = u'Time'
-    __author__            = 'Kovid Goyal and Sujata Raman'
+    __author__            = 'Kovid Goyal'
    description           = 'Weekly magazine'
    encoding = 'utf-8'
    no_stylesheets        = True
    language = 'en'
    remove_javascript     = True

-    extra_css      = ''' h1 {font-family:georgia,serif;color:#000000;}
-                        .mainHd{font-family:georgia,serif;color:#000000;}
-                         h2 {font-family:Arial,Sans-serif;}
-                        .name{font-family:Arial,Sans-serif; font-size:x-small;font-weight:bold; }
-                        .date{font-family:Arial,Sans-serif; font-size:x-small ;color:#999999;}
-                        .byline{font-family:Arial,Sans-serif; font-size:x-small ;}
-                        .photoBkt{ font-size:x-small ;}
-                        .vertPhoto{font-size:x-small ;}
-                        .credits{font-family:Arial,Sans-serif; font-size:x-small ;color:gray;}
-                        .credit{font-family:Arial,Sans-serif; font-size:x-small ;color:gray;}
-                        .artTxt{font-family:georgia,serif;}
-                        #content{font-family:georgia,serif;}
-                        .caption{font-family:georgia,serif; font-size:x-small;color:#333333;}
-                        .credit{font-family:georgia,serif; font-size:x-small;color:#999999;}
-                        a:link{color:#CC0000;}
-                        .breadcrumb{font-family:Arial,Sans-serif;font-size:x-small;}
-                        '''

+    keep_only_tags = [
+            {
+                'class':['artHd', 'articleContent',
+                            'entry-title','entry-meta', 'entry-content', 'thumbnail']
+            },
+        ]
+    remove_tags = [
+            {'class':['content-tools', 'quigo', 'see',
+                'first-tier-social-tools', 'navigation', 'enlarge lightbox']},
+            {'id':['share-tools']},
+            {'rel':'lightbox'},
+            ]

-    keep_only_tags = [ dict(name ="div",attrs = {"id" :["content"]}) ,
-                        dict(name ="div",attrs = {"class" :["artHd","artTxt","photoBkt","vertPhoto","image","copy"]}) ,]
-    remove_tags    = [           dict(name ="div",attrs =  {'class':['articleFooterNav','listsByTopic','articleTools2','relatedContent','sideContent','topBannerWrap','articlePagination','nextUp',"rtCol","pagination","enlarge","contentTools2",]}),
-                                  dict(name ="span",attrs =  {'class':['see']}),
-                                 dict(name ="div",attrs =  {'id':['header','articleSideBar',"articleTools","articleFooter","cmBotLt","quigoPackage"]}),
-                                  dict(name ="a",attrs =  {'class':['listLink']}),
-                                 dict(name ="ul",attrs =  {'id':['shareSocial','tabs']}),
-                                 dict(name ="li",attrs =  {'class':['back']}),
-                                 dict(name ="ul",attrs =  {'class':['navCount']}),
-                     ]
    recursions = 10
    match_regexps = [r'/[0-9,]+-(2|3|4|5|6|7|8|9)(,\d+){0,1}.html',r'http://www.time.com/time/specials/packages/article/.*']

@ -56,10 +42,11 @@ class Time(BasicNewsRecipe):
        r'<meta .+/>'), lambda m:'')]

    def parse_index(self):
-        soup = self.index_to_soup('http://www.time.com/time/magazine')
-        img = soup.find('a', title="View Large Cover", href=True)
-        if img is not None:
-            cover_url = 'http://www.time.com'+img['href']
+        raw = self.index_to_soup('http://www.time.com/time/magazine', raw=True)
+        root = html.fromstring(raw)
+        img = root.xpath('//a[.="View Large Cover" and @href]')
+        if img:
+            cover_url = 'http://www.time.com' + img[0].get('href')
            try:
                nsoup = self.index_to_soup(cover_url)
                img = nsoup.find('img', src=re.compile('archive/covers'))
@ -70,46 +57,48 @@ class Time(BasicNewsRecipe):


        feeds = []
-        parent = soup.find(id='tocGuts')
-        for seched in parent.findAll(attrs={'class':'toc_seched'}):
-            section = self.tag_to_string(seched).capitalize()
-            articles = list(self.find_articles(seched))
-            feeds.append((section, articles))
+        parent = root.xpath('//div[@class="content-main-aside"]')[0]
+        for sec in parent.xpath(
+                'descendant::section[contains(@class, "sec-mag-section")]'):
+            h3 = sec.xpath('./h3')
+            if h3:
+                section = html.tostring(h3[0], encoding=unicode,
+                        method='text').strip().capitalize()
+                self.log('Found section', section)
+                articles = list(self.find_articles(sec))
+                if articles:
+                    feeds.append((section, articles))

        return feeds

-    def find_articles(self, seched):
-            for a in seched.findNextSiblings( attrs={'class':['toc_hed','rule2']}):
-              if a.name in "div":
-                  break
-              else:
-                  yield {
-                           'title' : self.tag_to_string(a),
-                           'url'   : 'http://www.time.com'+a['href'],
-                           'date'  : '',
-                           'description' : self.article_description(a)
-                            }
+    def find_articles(self, sec):

-
-
-    def article_description(self, a):
-        ans = []
-        while True:
-            t = a.nextSibling
-            if t is None:
-                break
-            a = t
-            if getattr(t, 'name', False):
-                if t.get('class', '') == 'toc_parens' or t.name == 'br':
-                    continue
-                if t.name in ('div', 'a'):
-                    break
-                ans.append(self.tag_to_string(t))
-            else:
-                ans.append(unicode(t))
-        return u' '.join(ans).replace(u'\xa0', u'').strip()
+        for article in sec.xpath('./article'):
+            h2 = article.xpath('./*[@class="entry-title"]')
+            if not h2: continue
+            a = h2[0].xpath('./a[@href]')
+            if not a: continue
+            title = html.tostring(a[0], encoding=unicode,
+                        method='text').strip()
+            if not title: continue
+            url =  a[0].get('href')
+            if url.startswith('/'):
+                url = 'http://www.time.com'+url
+            desc = ''
+            p = article.xpath('./*[@class="entry-content"]')
+            if p:
+                desc = html.tostring(p[0], encoding=unicode,
+                        method='text')
+            self.log('\t', title, ':\n\t\t', desc)
+            yield {
+                    'title' : title,
+                    'url'   : url,
+                    'date'  : '',
+                    'description' : desc
+                    }

    def postprocess_html(self,soup,first):
        for tag in soup.findAll(attrs ={'class':['artPag','pagination']}):
            tag.extract()
        return soup
+
--- a/recipes/united_daily.recipe
+++ b/recipes/united_daily.recipe
@ -64,7 +64,7 @@ class UnitedDaily(BasicNewsRecipe):

    __author__ = 'Eddie Lau'
    __version__ = '1.1'
-    language = 'zh-TW'
+    language = 'zh_TW'
    publisher = 'United Daily News Group'
    description = 'United Daily (Taiwan)'
    category = 'News, Chinese, Taiwan'
--- a/recipes/utrinski.recipe
+++ b/recipes/utrinski.recipe
@ -0,0 +1,71 @@
+#!/usr/bin/env  python
+
+__license__   = 'GPL v3'
+__copyright__ = '2011, Darko Spasovski <darko.spasovski at gmail.com>'
+'''
+utrinski.com.mk
+'''
+
+import re
+import datetime
+from calibre.web.feeds.news import BasicNewsRecipe
+
+class UtrinskiVesnik(BasicNewsRecipe):
+
+    __author__            = 'Darko Spasovski'
+    INDEX                 = 'http://www.utrinski.com.mk/'
+    title                 = 'Utrinski Vesnik'
+    description           = 'Daily Macedonian newspaper'
+    masthead_url          = 'http://www.utrinski.com.mk/images/LogoTop.jpg'
+    language              = 'mk'
+    remove_javascript     = True
+    publication_type      = 'newspaper'
+    category              = 'news, Macedonia'
+    oldest_article        = 2
+    max_articles_per_feed = 100
+    no_stylesheets        = True
+    use_embedded_content  = False
+    preprocess_regexps    = [(re.compile(i[0], re.IGNORECASE | re.DOTALL), i[1]) for i in
+    [
+        ## Remove anything before the start of the article.
+        (r'<body.*?Article start-->', lambda match: '<body>'),
+
+        ## Remove anything after the end of the article.
+        (r'<!--Article end.*?</body>', lambda match : '</body>'),
+        ]
+    ]
+    extra_css             = """
+                                body{font-family: Arial,Helvetica,sans-serif}
+                                .WB_UTRINSKIVESNIK_Naslov{FONT-WEIGHT: bold; FONT-SIZE: 18px; FONT-FAMILY: Arial, Verdana, Tahoma; TEXT-DECORATION: none}
+                            """
+
+    conversion_options = {
+                          'comment'  : description,
+                          'tags'     : category,
+                          'language' : language,
+                          'linearize_tables' : True
+                        }
+
+    def parse_index(self):
+        soup = self.index_to_soup(self.INDEX)
+        feeds = []
+        for section in soup.findAll('a', attrs={'class':'WB_UTRINSKIVESNIK_TOCTitleBig'}):
+            sectionTitle = section.contents[0].string
+            tocItemTable = section.findAllPrevious('table')[1]
+            if tocItemTable is None: continue
+            articles = []
+            while True:
+                tocItemTable = tocItemTable.nextSibling
+                if tocItemTable is None: break
+                article = tocItemTable.findAll('a', attrs={'class': 'WB_UTRINSKIVESNIK_TocItem'})
+                if len(article)==0: break
+                title = self.tag_to_string(article[0], use_alt=True).strip()
+                articles.append({'title': title, 'url':'http://www.utrinski.com.mk/' + article[0]['href'], 'description':'', 'date':''})
+            if articles:
+                feeds.append((sectionTitle, articles))
+        return feeds
+
+
+    def get_cover_url(self):
+        datum = datetime.datetime.today().strftime('%d_%m_%Y')
+        return 'http://www.utrinski.com.mk/WBStorage/Files/' + datum + '.jpg'
--- a/recipes/vio_mundo.recipe
+++ b/recipes/vio_mundo.recipe
@ -0,0 +1,30 @@
+import re
+from calibre.web.feeds.news import BasicNewsRecipe
+
+class VioMundo(BasicNewsRecipe):
+    title                  = 'Blog VioMundo'
+    __author__             = 'Diniz Bortolotto'
+    description            = 'Posts do Blog VioMundo'
+    publisher              = 'Luiz Carlos Azenha'
+    oldest_article         = 5
+    max_articles_per_feed  = 20
+    category               = 'news, politics, Brazil'
+    language               = 'pt_BR'
+    publication_type       = 'news and politics portal'
+    use_embedded_content   = False
+    no_stylesheets         = True
+    remove_javascript      = True
+
+    feeds                  = [(u'Blog VioMundo', u'http://www.viomundo.com.br/feed')]
+
+    reverse_article_order  = True
+
+    def print_version(self, url):
+        return url + '/print/'
+
+    remove_tags_after      = dict(id='BlogContent')
+
+    preprocess_regexps     = [
+                              (re.compile(r'\|\ <u>.*</p>'),
+                              lambda match: '</p>')
+                             ]
--- a/recipes/wired_uk.recipe
+++ b/recipes/wired_uk.recipe
@ -1,28 +1,29 @@
-
 __license__   = 'GPL v3'
-__copyright__ = '2010, Darko Miletic <darko.miletic at gmail.com>'
+__copyright__ = '2011, Starson17 <Starson17 at gmail.com>'
 '''
 www.wired.co.uk
 '''

 from calibre import strftime
 from calibre.web.feeds.news import BasicNewsRecipe
+import re

 class Wired_UK(BasicNewsRecipe):
    title                 = 'Wired Magazine - UK edition'
-    __author__            = 'Darko Miletic'
+    __author__            = 'Starson17'
+    __version__           = 'v1.30'
+    __date__              = '15 July 2011'
    description           = 'Gaming news'
    publisher             = 'Conde Nast Digital'
    category              = 'news, games, IT, gadgets'
-    oldest_article        = 32
+    oldest_article        = 40
    max_articles_per_feed = 100
    no_stylesheets        = True
    encoding              = 'utf-8'
    use_embedded_content  = False
-    masthead_url          = 'http://www.wired.co.uk/_/media/wired-logo_UK.gif'
+    #masthead_url          = 'http://www.wired.co.uk/_/media/wired-logo_UK.gif'
    language              = 'en_GB'
-    extra_css             = ' body{font-family: Palatino,"Palatino Linotype","Times New Roman",Times,serif} img{margin-bottom: 0.8em } .img-descr{font-family: Tahoma,Arial,Helvetica,sans-serif; font-size: 0.6875em; display: block} '
-    index                 = 'http://www.wired.co.uk/wired-magazine.aspx'
+    index                 = 'http://www.wired.co.uk'

    conversion_options = {
                          'comment'   : description
@ -31,44 +32,118 @@ class Wired_UK(BasicNewsRecipe):
                        , 'language'  : language
                        }

-    keep_only_tags = [dict(name='div', attrs={'class':'article-box'})]
-    remove_tags = [
-                     dict(name=['object','embed','iframe','link'])
-                    ,dict(attrs={'class':['opts','comment','stories']})
-                  ]
-    remove_tags_after = dict(name='div',attrs={'class':'stories'})
+    keep_only_tags = [dict(name='div', attrs={'class':['layoutColumn1']})]
+    remove_tags = [dict(name='div',attrs={'class':['articleSidebar1','commentAddBox linkit','commentCountBox commentCountBoxBig']})]
+    remove_tags_after = dict(name='div',attrs={'class':['mainCopy entry-content','mainCopy']})
+    '''
    remove_attributes = ['height','width']
-
-
+                   ,dict(name=['object','embed','iframe','link'])
+                   ,dict(attrs={'class':['opts','comment','stories']})
+                   ]
+    '''
    def parse_index(self):
        totalfeeds = []
        soup   = self.index_to_soup(self.index)
-        maincontent = soup.find('div',attrs={'class':'main-content'})
+        recentcontent = soup.find('ul',attrs={'class':'linkList3'})
        mfeed = []
-        if maincontent:
-           st = maincontent.find(attrs={'class':'most-wired-box'})
-           if st:
-              for itt in st.findAll('a',href=True):
-               url   = 'http://www.wired.co.uk' + itt['href']
-               title = self.tag_to_string(itt)
-               description = ''
-               date  = strftime(self.timefmt)
-               mfeed.append({
-                                  'title'      :title
-                                 ,'date'       :date
-                                 ,'url'        :url
-                                 ,'description':description
-                                })
-        totalfeeds.append(('Articles', mfeed))
+        if recentcontent:
+          for li in recentcontent.findAll('li'):
+            a = li.h2.a
+            url  = self.index + a['href'] + '?page=all'
+            title = self.tag_to_string(a)
+            description = ''
+            date  = strftime(self.timefmt)
+            mfeed.append({
+                  'title'      :title
+                 ,'date'       :date
+                 ,'url'        :url
+                 ,'description':description
+                })
+        totalfeeds.append(('Wired UK Magazine Latest News', mfeed))
+        popmagcontent = soup.findAll('div',attrs={'class':'sidebarLinkList'})
+        magcontent = popmagcontent[1]
+        mfeed2 = []
+        if magcontent:
+          a = magcontent.h3.a
+          if a:
+            url   = self.index + a['href'] + '?page=all'
+            title = self.tag_to_string(a)
+            description = ''
+            date  = strftime(self.timefmt)
+            mfeed2.append({
+                  'title'      :title
+                 ,'date'       :date
+                 ,'url'        :url
+                 ,'description':description
+                })
+          for li in magcontent.findAll('li'):
+            a = li.a
+            url   = self.index + a['href'] + '?page=all'
+            title = self.tag_to_string(a)
+            description = ''
+            date  = strftime(self.timefmt)
+            mfeed2.append({
+                  'title'      :title
+                 ,'date'       :date
+                 ,'url'        :url
+                 ,'description':description
+                })
+          totalfeeds.append(('Wired UK Magazine Features', mfeed2))
+
+        magsoup = self.index_to_soup(self.index + '/magazine')
+        startcontent = magsoup.find('h3',attrs={'class':'magSubSectionTitle titleStart'}).parent
+        mfeed3 = []
+        if startcontent:
+          for li in startcontent.findAll('li'):
+            a = li.a
+            url   = self.index + a['href'] + '?page=all'
+            title = self.tag_to_string(a)
+            description = ''
+            date  = strftime(self.timefmt)
+            mfeed3.append({
+                  'title'      :title
+                 ,'date'       :date
+                 ,'url'        :url
+                 ,'description':description
+                })
+          totalfeeds.append(('Wired UK Magazine More', mfeed3))
+
+        playcontent = magsoup.find('h3',attrs={'class':'magSubSectionTitle titlePlay'}).parent
+        mfeed4 = []
+        if playcontent:
+          for li in playcontent.findAll('li'):
+            a = li.a
+            url   = self.index + a['href'] + '?page=all'
+            title = self.tag_to_string(a)
+            description = ''
+            date  = strftime(self.timefmt)
+            mfeed4.append({
+                  'title'      :title
+                 ,'date'       :date
+                 ,'url'        :url
+                 ,'description':description
+                })
+          totalfeeds.append(('Wired UK Magazine Play', mfeed4))
        return totalfeeds

    def get_cover_url(self):
-        cover_url = None
-        soup = self.index_to_soup(self.index)
-        cover_item = soup.find('span', attrs={'class':'cover'})
+        cover_url = ''
+        soup = self.index_to_soup(self.index + '/magazine/archive')
+        cover_item = soup.find('div', attrs={'class':'image linkme'})
        if cover_item:
           cover_url = cover_item.img['src']
        return cover_url

-    def print_version(self, url):
-        return url + '?page=all'
+    def preprocess_html(self, soup):
+        for tag in soup.findAll(name='p'):
+            if tag.find(name='span', text=re.compile(r'This article was taken from.*', re.DOTALL|re.IGNORECASE)):
+                tag.extract()
+        return soup
+
+    extra_css = '''
+                    h1{font-family:Arial,Helvetica,sans-serif; font-weight:bold;font-size:large;}
+                    h2{font-family:Arial,Helvetica,sans-serif; font-weight:normal;font-size:small;}
+                    p{font-family:Arial,Helvetica,sans-serif;font-size:small;}
+                    body{font-family:Helvetica,Arial,sans-serif;font-size:small;}
+		'''
+
--- a/recipes/zeitde.recipe
+++ b/recipes/zeitde.recipe
@ -15,15 +15,16 @@ class ZeitDe(BasicNewsRecipe):
    encoding = 'UTF-8'

    __author__ = 'Martin Pitt, Sujata Raman, Ingo Paschke and Marc Toensing'
+    no_stylesheets = True

    max_articles_per_feed = 40

    remove_tags = [
-	                    dict(name='iframe'),
-	                    dict(name='div', attrs={'class':["response","pagination block","pagenav","inline link", "copyright"] }),
-	                    dict(name='p', attrs={'class':["ressortbacklink", "copyright"] }),
-	                    dict(name='div', attrs={'id':["place_5","place_4","comments"]})
-	                  ]
+                        dict(name='iframe'),
+                        dict(name='div', attrs={'class':["response","pagination block","pagenav","inline link", "copyright"] }),
+                        dict(name='p', attrs={'class':["ressortbacklink", "copyright"] }),
+                        dict(name='div', attrs={'id':["place_5","place_4","comments"]})
+                      ]

    keep_only_tags = [dict(id=['main'])]

--- a/recipes/zeitde_sub.recipe
+++ b/recipes/zeitde_sub.recipe
@ -2,18 +2,21 @@
 # -*- coding: utf-8 mode: python -*-

 __license__   = 'GPL v3'
-__copyright__ = '2010-2011, Steffen Siebert <calibre at steffensiebert.de>'
+__copyright__ = '2010, Steffen Siebert <calibre at steffensiebert.de>'
 __docformat__ = 'restructuredtext de'
-__version__   = '1.2'
+__version__   = '1.5'

 """
 Die Zeit EPUB
 """

-import os, urllib2, zipfile, re
+import os, zipfile, re, cStringIO
 from calibre.web.feeds.news import BasicNewsRecipe
 from calibre.ptempfile import PersistentTemporaryFile
 from calibre import walk
+from urlparse import urlparse
+from contextlib import closing
+from calibre.utils.magick.draw import save_cover_data_to

 class ZeitEPUBAbo(BasicNewsRecipe):

@ -22,49 +25,112 @@ class ZeitEPUBAbo(BasicNewsRecipe):
    language = 'de'
    lang = 'de-DE'

-    __author__ = 'Steffen Siebert and Tobias Isenberg'
+    __author__ = 'Steffen Siebert, revised by Tobias Isenberg (with some code by Kovid Goyal)'
    needs_subscription = True

    conversion_options = {
        'no_default_epub_cover' : True,
        # fixing the wrong left margin
        'mobi_ignore_margins' : True,
+        'keep_ligatures' : True,
    }

    preprocess_regexps    = [
-        # filtering for correct dashes
-        (re.compile(r' - '), lambda match: ' – '), # regular "Gedankenstrich"
-        (re.compile(r' -,'), lambda match: ' –,'), # "Gedankenstrich" before a comma
-        (re.compile(r'(?<=\d)-(?=\d)'), lambda match: '–'), # number-number
+        # filtering for correct dashes ("Gedankenstrich" and "bis")
+        (re.compile(u' (-|\u2212)(?=[ ,])'), lambda match: u' \u2013'),
+        (re.compile(r'(?<=\d)-(?=\d)'), lambda match: u'\u2013'), # number-number
+        (re.compile(u'(?<=\d,)-(?= ?\u20AC)'), lambda match: u'\u2013'), # ,- Euro
+        # fix the number dash number dash for the title image that was broken by the previous line
+        (re.compile(u'(?<=\d\d\d\d)\u2013(?=\d?\d\.png)'), lambda match: '-'),
+        # filtering for certain dash cases
+        (re.compile(r'Bild - Zeitung'), lambda match: 'Bild-Zeitung'), # the obvious
+        (re.compile(r'EMail'), lambda match: 'E-Mail'), # the obvious
+        (re.compile(r'SBahn'), lambda match: 'S-Bahn'), # the obvious
+        (re.compile(r'UBoot'), lambda match: 'U-Boot'), # the obvious
+        (re.compile(r'T Shirt'), lambda match: 'T-Shirt'), # the obvious
+        (re.compile(r'TShirt'), lambda match: 'T-Shirt'), # the obvious
+        # the next two lines not only fix errors but also create new ones. this is due to additional errors in
+        # the typesetting such as missing commas or wrongly placed dashes. but more is fixed than broken.
+        (re.compile(r'(?<!und|der|\w\w,) -(?=\w)'), lambda match: '-'), # space too much before a connecting dash
+        (re.compile(r'(?<=\w)- (?!und\b|oder\b|wie\b|aber\b|auch\b|sondern\b|bis\b|&amp;|&\s|bzw\.|auf\b|eher\b)'), lambda match: '-'), # space too much after a connecting dash
+        # filtering for missing spaces before the month in long dates
+        (re.compile(u'(?<=\d)\.(?=(Januar|Februar|M\u00E4rz|April|Mai|Juni|Juli|August|September|Oktober|November|Dezember))'), lambda match: '. '),
+        # filtering for other missing spaces
+        (re.compile(r'Stuttgart21'), lambda match: 'Stuttgart 21'), # the obvious
+        (re.compile(u'(?<=\d)(?=\u20AC)'), lambda match: u'\u2013'), # Zahl[no space]Euro
+        (re.compile(r':(?=[^\d\s</])'), lambda match: ': '), # missing space after colon
+        (re.compile(u'\u00AB(?=[^\-\.:;,\?!<\)\s])'), lambda match: u'\u00AB '), # missing space after closing quotation
+        (re.compile(u'(?<=[^\s\(>])\u00BB'), lambda match: u' \u00BB'), # missing space before opening quotation
+        (re.compile(r'(?<=[a-z])(?=(I|II|III|IV|V|VI|VII|VIII|IX|X|XI|XII|XIII|XIV|XV|XVI|XVII|XVIII|XIX|XX)\.)'), lambda match: ' '), # missing space before Roman numeral
+        (re.compile(r'(?<=(I|V|X)\.)(?=[\w])'), lambda match: ' '), # missing space after Roman numeral
+        (re.compile(r'(?<=(II|IV|VI|IX|XI|XV|XX)\.)(?=[\w])'), lambda match: ' '), # missing space after Roman numeral
+        (re.compile(r'(?<=(III|VII|XII|XIV|XVI|XIX)\.)(?=[\w])'), lambda match: ' '), # missing space after Roman numeral
+        (re.compile(r'(?<=(VIII|XIII|XVII)\.)(?=[\w])'), lambda match: ' '), # missing space after Roman numeral
+        (re.compile(r'(?<=(XVIII)\.)(?=[\w])'), lambda match: ' '), # missing space after Roman numeral
+        (re.compile(r'(?<=[A-Za-zÄÖÜäöü]),(?=[A-Za-zÄÖÜäöü])'), lambda match: ', '), # missing space after comma
+        (re.compile(r'(?<=[a-zäöü])\.(?=[A-ZÄÖÜ][A-Za-zÄÖÜäöü])'), lambda match: '. '), # missing space after full-stop
+        (re.compile(r'(?<=[uU]\.) (?=a\.)'), lambda match: u'\u2008'), # fix abbreviation that was potentially broken previously
+        (re.compile(r'(?<=[iI]\.) (?=A\.)'), lambda match: u'\u2008'), # fix abbreviation that was potentially broken previously
+        (re.compile(r'(?<=[zZ]\.) (?=B\.)'), lambda match: u'\u2008'), # fix abbreviation that was potentially broken previously
+        (re.compile(r'(?<=\w\.) (?=[A-Z][a-z]*@)'), lambda match: ''), # fix e-mail address that was potentially broken previously
+        (re.compile(r'(?<=\d)[Pp]rozent'), lambda match: ' Prozent'),
+        (re.compile(r'\.\.\.\.+'), lambda match: '...'), # too many dots (....)
+        (re.compile(r'(?<=[^\s])\.\.\.'), lambda match: ' ...'), # spaces before ...
+        (re.compile(r'\.\.\.(?=[^\s])'), lambda match: '... '), # spaces after ...
+        (re.compile(r'(?<=[\[\(]) \.\.\. (?=[\]\)])'), lambda match: '...'), # fix special cases of ... in brackets
+        (re.compile(u'(?<=[\u00BB\u203A]) \.\.\.'), lambda match: '...'), # fix special cases of ... after a quotation mark
+        (re.compile(u'\.\.\. (?=[\u00AB\u2039,])'), lambda match: '...'), # fix special cases of ... before a quotation mark or comma
+        # fix missing spaces between numbers and any sort of units, possibly with dot
+        (re.compile(r'(?<=\d)(?=(Femto|Piko|Nano|Mikro|Milli|Zenti|Dezi|Hekto|Kilo|Mega|Giga|Tera|Peta|Tausend|Trilli|Kubik|Quadrat|Meter|Uhr|Jahr|Schuljahr|Seite))'), lambda match: ' '),
+        (re.compile(r'(?<=\d\.)(?=(Femto|Piko|Nano|Mikro|Milli|Zenti|Dezi|Hekto|Kilo|Mega|Giga|Tera|Peta|Tausend|Trilli|Kubik|Quadrat|Meter|Uhr|Jahr|Schuljahr|Seite))'), lambda match: ' '),
+        # fix wrong spaces
+        (re.compile(r'(?<=<p class="absatz">[A-ZÄÖÜ]) (?=[a-zäöü\-])'), lambda match: ''), # at beginning of paragraphs
+        (re.compile(u' \u00AB'), lambda match: u'\u00AB '), # before closing quotation
+        (re.compile(u'\u00BB '), lambda match: u' \u00BB'), # after opening quotation
+        # filtering for spaces in large numbers for better readability
+        (re.compile(r'(?<=\d\d)(?=\d\d\d[ ,\.;\)<\?!-])'), lambda match: u'\u2008'), # end of the number with some character following
+        (re.compile(r'(?<=\d\d)(?=\d\d\d. )'), lambda match: u'\u2008'), # end of the number with full-stop following, then space is necessary (avoid file names)
+        (re.compile(u'(?<=\d)(?=\d\d\d\u2008)'), lambda match: u'\u2008'), # next level
+        (re.compile(u'(?<=\d)(?=\d\d\d\u2008)'), lambda match: u'\u2008'), # next level
+        (re.compile(u'(?<=\d)(?=\d\d\d\u2008)'), lambda match: u'\u2008'), # next level
+        (re.compile(u'(?<=\d)(?=\d\d\d\u2008)'), lambda match: u'\u2008'), # next level
        # filtering for unicode characters that are missing on the Kindle,
        # try to replace them with meaningful work-arounds
-        (re.compile(u'\u2080'), lambda match: '<span style="font-size: 50%;">0</span>'), # subscript-0
-        (re.compile(u'\u2081'), lambda match: '<span style="font-size: 50%;">1</span>'), # subscript-1
-        (re.compile(u'\u2082'), lambda match: '<span style="font-size: 50%;">2</span>'), # subscript-2
-        (re.compile(u'\u2083'), lambda match: '<span style="font-size: 50%;">3</span>'), # subscript-3
-        (re.compile(u'\u2084'), lambda match: '<span style="font-size: 50%;">4</span>'), # subscript-4
-        (re.compile(u'\u2085'), lambda match: '<span style="font-size: 50%;">5</span>'), # subscript-5
-        (re.compile(u'\u2086'), lambda match: '<span style="font-size: 50%;">6</span>'), # subscript-6
-        (re.compile(u'\u2087'), lambda match: '<span style="font-size: 50%;">7</span>'), # subscript-7
-        (re.compile(u'\u2088'), lambda match: '<span style="font-size: 50%;">8</span>'), # subscript-8
-        (re.compile(u'\u2089'), lambda match: '<span style="font-size: 50%;">9</span>'), # subscript-9
+        (re.compile(u'\u2080'), lambda match: '<span style="font-size: 40%;">0</span>'), # subscript-0
+        (re.compile(u'\u2081'), lambda match: '<span style="font-size: 40%;">1</span>'), # subscript-1
+        (re.compile(u'\u2082'), lambda match: '<span style="font-size: 40%;">2</span>'), # subscript-2
+        (re.compile(u'\u2083'), lambda match: '<span style="font-size: 40%;">3</span>'), # subscript-3
+        (re.compile(u'\u2084'), lambda match: '<span style="font-size: 40%;">4</span>'), # subscript-4
+        (re.compile(u'\u2085'), lambda match: '<span style="font-size: 40%;">5</span>'), # subscript-5
+        (re.compile(u'\u2086'), lambda match: '<span style="font-size: 40%;">6</span>'), # subscript-6
+        (re.compile(u'\u2087'), lambda match: '<span style="font-size: 40%;">7</span>'), # subscript-7
+        (re.compile(u'\u2088'), lambda match: '<span style="font-size: 40%;">8</span>'), # subscript-8
+        (re.compile(u'\u2089'), lambda match: '<span style="font-size: 40%;">9</span>'), # subscript-9
+        # always chance CO2
+        (re.compile(r'CO2'), lambda match: 'CO<span style="font-size: 40%;">2</span>'), # CO2
+        # remove *** paragraphs
+        (re.compile(r'<p class="absatz">\*\*\*</p>'), lambda match: ''),
+        # better layout for the top line of each article
+        (re.compile(u'(?<=DIE ZEIT N\u00B0 \d /) (?=\d\d)'), lambda match: ' 20'), # proper year in edition number
+        (re.compile(u'(?<=DIE ZEIT N\u00B0 \d\d /) (?=\d\d)'), lambda match: ' 20'), # proper year in edition number
+        (re.compile(u'(?<=>)(?=DIE ZEIT N\u00B0 \d\d / 20\d\d)'), lambda match: u' \u2014 '), # m-dash between category and DIE ZEIT
    ]

    def build_index(self):
-        domain = "http://premium.zeit.de"
-        url = domain + "/abovorteile/cgi-bin/_er_member/p4z.fpl?ER_Do=getUserData&ER_NextTemplate=login_ok"
-
+        domain = "https://premium.zeit.de"
+        url = domain + "/abo/zeit_digital"
        browser = self.get_browser()
-        browser.add_password("http://premium.zeit.de", self.username, self.password)

-        try:
-            browser.open(url)
-        except urllib2.HTTPError:
-            self.report_progress(0,_("Can't login to download issue"))
-            raise ValueError('Failed to login, check your username and password')
-
-        response = browser.follow_link(text="DIE ZEIT als E-Paper")
-        response = browser.follow_link(url_regex=re.compile('^http://contentserver.hgv-online.de/nodrm/fulfillment\\?distributor=zeit-online&orderid=zeit_online.*'))
+        # new login process
+        response = browser.open(url)
+        browser.select_form(nr=2)
+        browser.form['name']=self.username
+        browser.form['pass']=self.password
+        browser.submit()
+        # now find the correct file, we will still use the ePub file
+        epublink = browser.find_link(text_regex=re.compile('.*Ausgabe als Datei im ePub-Format.*'))
+        response = browser.follow_link(epublink)
+        self.report_progress(1,_('next step'))

        tmp = PersistentTemporaryFile(suffix='.epub')
        self.report_progress(0,_('downloading epub'))
@ -104,9 +170,45 @@ class ZeitEPUBAbo(BasicNewsRecipe):

    # getting url of the cover
    def get_cover_url(self):
+        self.log.warning('Downloading cover')
        try:
-            inhalt = self.index_to_soup('http://www.zeit.de/inhalt')
-            cover_url = inhalt.find('div', attrs={'class':'singlearchive clearfix'}).img['src'].replace('icon_','')
+            self.log.warning('Trying PDF-based cover')
+            domain = "https://premium.zeit.de"
+            url = domain + "/abo/zeit_digital"
+            browser = self.get_browser()
+
+            # new login process
+            browser.open(url)
+            browser.select_form(nr=2)
+            browser.form['name']=self.username
+            browser.form['pass']=self.password
+            browser.submit()
+            # actual cover search
+            pdflink = browser.find_link(url_regex=re.compile('system/files/epaper/DZ/pdf/DZ_ePaper*'))
+            cover_url = urlparse(pdflink.base_url)[0]+'://'+urlparse(pdflink.base_url)[1]+''+(urlparse(pdflink.url)[2]).replace('ePaper_','').replace('.pdf','_001.pdf')
+            self.log.warning('PDF link found:')
+            self.log.warning(cover_url)
+            # download the cover (has to be here due to new login process)
+            with closing(browser.open(cover_url)) as r:
+                cdata = r.read()
+            from calibre.ebooks.metadata.pdf import get_metadata
+            stream = cStringIO.StringIO(cdata)
+            cdata = None
+            mi = get_metadata(stream)
+            if mi.cover_data and mi.cover_data[1]:
+                cdata = mi.cover_data[1]
+
+            cpath = os.path.join(self.output_dir, 'cover.jpg')
+            save_cover_data_to(cdata, cpath)
+            cover_url = cpath
+
        except:
-            cover_url = 'http://images.zeit.de/bilder/titelseiten_zeit/1946/001_001.jpg'
+            self.log.warning('Trying low-res cover')
+            try:
+                inhalt = self.index_to_soup('http://www.zeit.de/inhalt')
+                cover_url = inhalt.find('div', attrs={'class':'singlearchive clearfix'}).img['src'].replace('icon_','')
+            except:
+                self.log.warning('Using static old low-res cover')
+                cover_url = 'http://images.zeit.de/bilder/titelseiten_zeit/1946/001_001.jpg'
        return cover_url
+
--- a/resources/content_server/browse/browse.html
+++ b/resources/content_server/browse/browse.html
@ -11,6 +11,7 @@
    <link rel="stylesheet" type="text/css" href="{prefix}/static/browse/browse.css" />
    <link type="text/css" href="{prefix}/static/jquery_ui/css/humanity-custom/jquery-ui-1.8.5.custom.css" rel="stylesheet" />
    <link rel="stylesheet" type="text/css" href="{prefix}/static/jquery.multiselect.css" />
+    <link rel="apple-touch-icon" href="/static/calibre.png" />

    <script type="text/javascript" src="{prefix}/static/jquery.js"></script>
    <script type="text/javascript" src="{prefix}/static/jquery.corner.js"></script>
--- a/resources/default_tweaks.py
+++ b/resources/default_tweaks.py
@ -11,7 +11,7 @@ defaults.
 '''

 #: Auto increment series index
-# The algorithm used to assign a new book in an existing series a series number.
+# The algorithm used to assign a book added to an existing series a series number.
 # New series numbers assigned using this tweak are always integer values, except
 # if a constant non-integer is specified.
 # Possible values are:
@ -27,7 +27,19 @@ defaults.
 # series_index_auto_increment = 'next'
 # series_index_auto_increment = 'next_free'
 # series_index_auto_increment = 16.5
+#
+# Set the use_series_auto_increment_tweak_when_importing tweak to True to
+# use the above values when importing/adding books. If this tweak is set to
+# False (the default) then the series number will be set to 1 if it is not
+# explicitly set to during the import. If set to True, then the
+# series index will be set according to the series_index_auto_increment setting.
+# Note that the use_series_auto_increment_tweak_when_importing tweak is used
+# only when a value is not provided during import. If the importing regular
+# expression produces a value for series_index, or if you are reading metadata
+# from books and the import plugin produces a value, than that value will
+# be used irrespective of the setting of the tweak.
 series_index_auto_increment = 'next'
+use_series_auto_increment_tweak_when_importing = False

 #: Add separator after completing an author name
 # Should the completion separator be append
@ -366,3 +378,10 @@ server_listen_on = '0.0.0.0'
 # on at your own risk!
 unified_title_toolbar_on_osx = False

+#: Save original file when converting from same format to same format
+# When calibre does a conversion from the same format to the same format, for
+# example, from EPUB to EPUB, the original file is saved, so that in case the
+# conversion is poor, you can tweak the settings and run it again. By setting
+# this to False you can prevent calibre from saving the original file.
+save_original_format = True
+
--- a/resources/images/devices/kindle.jpg
+++ b/resources/images/devices/kindle.jpg
--- a/resources/templates/fb2.xsl
+++ b/resources/templates/fb2.xsl
@ -379,7 +379,8 @@
    <!-- image -->
    <xsl:template match="fb:image">
        <div align="center">
-            <img border="1">
+            <xsl:element name="img">
+                <xsl:attribute name="border">1</xsl:attribute>
                <xsl:choose>
                    <xsl:when test="starts-with(@xlink:href,'#')">
                        <xsl:attribute name="src"><xsl:value-of select="substring-after(@xlink:href,'#')"/></xsl:attribute>
@ -388,7 +389,10 @@
                        <xsl:attribute name="src"><xsl:value-of select="@xlink:href"/></xsl:attribute>
                    </xsl:otherwise>
                </xsl:choose>
-            </img>
+                <xsl:if test="@title">
+                    <xsl:attribute name="title"><xsl:value-of select="@title"/></xsl:attribute>
+                </xsl:if>
+            </xsl:element>
        </div>
    </xsl:template>
 </xsl:stylesheet>
--- a/session.vim
+++ b/session.vim
@ -1,5 +1,5 @@
 " Project wide builtins
-let g:pyflakes_builtins += ["dynamic_property", "__", "P", "I", "lopen", "icu_lower", "icu_upper", "icu_title", "ngettext"]
+let g:pyflakes_builtins = ["_", "dynamic_property", "__", "P", "I", "lopen", "icu_lower", "icu_upper", "icu_title", "ngettext"]

 python << EOFPY
 import os
@ -15,7 +15,7 @@ vipy.session.initialize(project_name='calibre', src_dir=src_dir,
            project_dir=project_dir, base_dir=base_dir)

 def recipe_title_callback(raw):
-    return eval(raw.decode('utf-8'))
+    return eval(raw.decode('utf-8')).replace(' ', '_')

 vipy.session.add_content_browser('.r', ',r', 'Recipe',
    vipy.session.glob_based_iterator(os.path.join(project_dir, 'recipes', '*.recipe')),
--- a/setup/check.py
+++ b/setup/check.py
@ -25,18 +25,11 @@ class Message:
        return '%s:%s: %s'%(self.filename, self.lineno, self.msg)

 def check_for_python_errors(code_string, filename):
-    # Since compiler.parse does not reliably report syntax errors, use the
-    # built in compiler first to detect those.
+    import _ast
+    # First, compile into an AST and handle syntax errors.
    try:
-        try:
-            compile(code_string, filename, "exec")
-        except MemoryError:
-            # Python 2.4 will raise MemoryError if the source can't be
-            # decoded.
-            if sys.version_info[:2] == (2, 4):
-                raise SyntaxError(None)
-            raise
-    except (SyntaxError, IndentationError), value:
+        tree = compile(code_string, filename, "exec", _ast.PyCF_ONLY_AST)
+    except (SyntaxError, IndentationError) as value:
        msg = value.args[0]

        (lineno, offset, text) = value.lineno, value.offset, value.text
@ -47,13 +40,11 @@ def check_for_python_errors(code_string, filename):
            # bogus message that claims the encoding the file declared was
            # unknown.
            msg = "%s: problem decoding source" % filename
+
        return [Message(filename, lineno, msg)]
    else:
-        # Okay, it's syntactically valid.  Now parse it into an ast and check
-        # it.
-        import compiler
        checker = __import__('pyflakes.checker').checker
-        tree = compiler.parse(code_string)
+        # Okay, it's syntactically valid.  Now check it.
        w = checker.Checker(tree, filename)
        w.messages.sort(lambda a, b: cmp(a.lineno, b.lineno))
        return [Message(x.filename, x.lineno, x.message%x.message_args) for x in
--- a/setup/translations.py
+++ b/setup/translations.py
@ -8,11 +8,18 @@ __docformat__ = 'restructuredtext en'

 import os, tempfile, shutil, subprocess, glob, re, time, textwrap
 from distutils import sysconfig
+from functools import partial

 from setup import Command, __appname__, __version__
-from setup.build_environment import pyqt

-class POT(Command):
+def qt_sources():
+    qtdir = glob.glob('/usr/src/qt-*')[-1]
+    j = partial(os.path.join, qtdir)
+    return list(map(j, [
+            'src/gui/widgets/qdialogbuttonbox.cpp',
+    ]))
+
+class POT(Command): # {{{

    description = 'Update the .pot translation template'
    PATH = os.path.join(Command.SRC, __appname__, 'translations')
@ -82,6 +89,8 @@ class POT(Command):
                time=time.strftime('%Y-%m-%d %H:%M+%Z'))

        files = self.source_files()
+        qt_inputs = qt_sources()
+
        with tempfile.NamedTemporaryFile() as fl:
            fl.write('\n'.join(files))
            fl.flush()
@ -91,8 +100,14 @@ class POT(Command):
            subprocess.check_call(['xgettext', '-f', fl.name,
                '--default-domain=calibre', '-o', out.name, '-L', 'Python',
                '--from-code=UTF-8', '--sort-by-file', '--omit-header',
-                '--no-wrap', '-k__',
+                '--no-wrap', '-k__', '--add-comments=NOTE:',
                ])
+            subprocess.check_call(['xgettext', '-j',
+                '--default-domain=calibre', '-o', out.name,
+                '--from-code=UTF-8', '--sort-by-file', '--omit-header',
+                '--no-wrap', '-kQT_TRANSLATE_NOOP:2',
+                ] + qt_inputs)
+
            with open(out.name, 'rb') as f:
                src = f.read()
            os.remove(out.name)
@ -102,10 +117,12 @@ class POT(Command):
            with open(pot, 'wb') as f:
                f.write(src)
            self.info('Translations template:', os.path.abspath(pot))
-            return pot


-class Translations(POT):
+        return pot
+# }}}
+
+class Translations(POT): # {{{
    description='''Compile the translations'''
    DEST = os.path.join(os.path.dirname(POT.SRC), 'resources', 'localization',
            'locales')
@ -117,7 +134,6 @@ class Translations(POT):
        locale = os.path.splitext(os.path.basename(po_file))[0]
        return locale, os.path.join(self.DEST, locale, 'messages.mo')

-
    def run(self, opts):
        for f in self.po_files():
            locale, dest = self.mo_file(f)
@ -126,7 +142,7 @@ class Translations(POT):
                os.makedirs(base)
            self.info('\tCompiling translations for', locale)
            subprocess.check_call(['msgfmt', '-o', dest, f])
-            if locale in ('en_GB', 'nds', 'te', 'yi'):
+            if locale in ('en_GB', 'en_CA', 'en_AU', 'si', 'ur', 'sc', 'ltg', 'nds', 'te', 'yi'):
                continue
            pycountry = self.j(sysconfig.get_python_lib(), 'pycountry',
                    'locales', locale, 'LC_MESSAGES')
@ -140,17 +156,6 @@ class Translations(POT):
                self.warn('No ISO 639 translations for locale:', locale,
                '\nDo you have pycountry installed?')

-        base = os.path.join(pyqt.qt_data_dir, 'translations')
-        qt_translations = glob.glob(os.path.join(base, 'qt_*.qm'))
-        if not qt_translations:
-            raise Exception('Could not find qt translations')
-        for f in qt_translations:
-            locale = self.s(self.b(f))[0][3:]
-            dest = self.j(self.DEST, locale, 'LC_MESSAGES', 'qt.qm')
-            if self.e(self.d(dest)) and self.newer(dest, f):
-                self.info('\tCopying Qt translation for locale:', locale)
-                shutil.copy2(f, dest)
-
        self.write_stats()
        self.freeze_locales()

@ -201,7 +206,7 @@ class Translations(POT):
            for x in (i, j, d):
                if os.path.exists(x):
                    os.remove(x)
-
+# }}}

 class GetTranslations(Translations):

--- a/src/calibre/init.py
+++ b/src/calibre/init.py
@ -341,7 +341,7 @@ def random_user_agent():
 def browser(honor_time=True, max_time=2, mobile_browser=False, user_agent=None):
    '''
    Create a mechanize browser for web scraping. The browser handles cookies,
-    refresh requests and ignores robots.txt. Also uses proxy if avaialable.
+    refresh requests and ignores robots.txt. Also uses proxy if available.

    :param honor_time: If True honors pause time in refresh requests
    :param max_time: Maximum time in seconds to wait during a refresh request
@ -474,7 +474,7 @@ def strftime(fmt, t=None):
 def my_unichr(num):
    try:
        return unichr(num)
-    except ValueError:
+    except (ValueError, OverflowError):
        return u'?'

 def entity_to_unicode(match, exceptions=[], encoding='cp1252',
--- a/src/calibre/constants.py
+++ b/src/calibre/constants.py
@ -4,7 +4,7 @@ __license__   = 'GPL v3'
 __copyright__ = '2008, Kovid Goyal kovid@kovidgoyal.net'
 __docformat__ = 'restructuredtext en'
 __appname__   = u'calibre'
-numeric_version = (0, 8, 9)
+numeric_version = (0, 8, 11)
 __version__   = u'.'.join(map(unicode, numeric_version))
 __author__    = u"Kovid Goyal <kovid@kovidgoyal.net>"

--- a/src/calibre/customize/builtins.py
+++ b/src/calibre/customize/builtins.py
@ -570,7 +570,7 @@ from calibre.devices.teclast.driver import (TECLAST_K3, NEWSMY, IPAPYRUS,
 from calibre.devices.sne.driver import SNE
 from calibre.devices.misc import (PALMPRE, AVANT, SWEEX, PDNOVEL,
        GEMEI, VELOCITYMICRO, PDNOVEL_KOBO, LUMIREAD, ALURATEK_COLOR,
-        TREKSTOR, EEEREADER, NEXTBOOK, ADAM, MOOVYBOOK)
+        TREKSTOR, EEEREADER, NEXTBOOK, ADAM, MOOVYBOOK, COBY)
 from calibre.devices.folder_device.driver import FOLDER_DEVICE_FOR_CONFIG
 from calibre.devices.kobo.driver import KOBO
 from calibre.devices.bambook.driver import BAMBOOK
@ -705,7 +705,7 @@ plugins += [
    EEEREADER,
    NEXTBOOK,
    ADAM,
-    MOOVYBOOK,
+    MOOVYBOOK, COBY,
    ITUNES,
    BOEYE_BEX,
    BOEYE_BDX,
@ -1191,6 +1191,16 @@ class StoreBookotekaStore(StoreBase):
    headquarters = 'PL'
    formats = ['EPUB', 'PDF']

+class StoreChitankaStore(StoreBase):
+    name = u'Моята библиотека'
+    author = 'Alex Stanev'
+    description = u'Независим сайт за DRM свободна литература на български език'
+    actual_plugin = 'calibre.gui2.store.stores.chitanka_plugin:ChitankaStore'
+
+    drm_free_only = True
+    headquarters = 'BG'
+    formats = ['FB2', 'EPUB', 'TXT', 'SFB']
+
 class StoreDieselEbooksStore(StoreBase):
    name = 'Diesel eBooks'
    description = u'Instant access to over 2.4 million titles from hundreds of publishers including Harlequin, HarperCollins, John Wiley & Sons, McGraw-Hill, Simon & Schuster and Random House.'
@ -1218,17 +1228,6 @@ class StoreEbookscomStore(StoreBase):
    formats = ['EPUB', 'LIT', 'MOBI', 'PDF']
    affiliate = True

-class StoreEPubBuyDEStore(StoreBase):
-    name = 'EPUBBuy DE'
-    author = 'Charles Haley'
-    description = u'Bei EPUBBuy.com finden Sie ausschliesslich eBooks im weitverbreiteten EPUB-Format und ohne DRM. So haben Sie die freie Wahl, wo Sie Ihr eBook lesen: Tablet, eBook-Reader, Smartphone oder einfach auf Ihrem PC. So macht eBook-Lesen Spaß!'
-    actual_plugin = 'calibre.gui2.store.stores.epubbuy_de_plugin:EPubBuyDEStore'
-
-    drm_free_only = True
-    headquarters = 'DE'
-    formats = ['EPUB']
-    affiliate = True
-
 class StoreEBookShoppeUKStore(StoreBase):
    name = 'ebookShoppe UK'
    author = u'Charles Haley'
@ -1248,14 +1247,15 @@ class StoreEHarlequinStore(StoreBase):
    formats = ['EPUB', 'PDF']
    affiliate = True

-class StoreEpubBudStore(StoreBase):
-    name = 'ePub Bud'
-    description = 'Well, it\'s pretty much just "YouTube for Children\'s eBooks. A not-for-profit organization devoted to brining self published childrens books to the world.'
-    actual_plugin = 'calibre.gui2.store.stores.epubbud_plugin:EpubBudStore'
+class StoreEKnigiStore(StoreBase):
+    name = u'еКниги'
+    author = 'Alex Stanev'
+    description = u'Онлайн книжарница за електронни книги и аудио риалити романи'
+    actual_plugin = 'calibre.gui2.store.stores.eknigi_plugin:eKnigiStore'

-    drm_free_only = True
-    headquarters = 'US'
-    formats = ['EPUB']
+    headquarters = 'BG'
+    formats = ['EPUB', 'PDF', 'HTML']
+    affiliate = True

 class StoreFeedbooksStore(StoreBase):
    name = 'Feedbooks'
@ -1291,6 +1291,7 @@ class StoreGoogleBooksStore(StoreBase):

    headquarters = 'US'
    formats = ['EPUB', 'PDF', 'TXT']
+    affiliate = True

 class StoreGutenbergStore(StoreBase):
    name = 'Project Gutenberg'
@ -1374,6 +1375,17 @@ class StoreOReillyStore(StoreBase):
    headquarters = 'US'
    formats = ['APK', 'DAISY', 'EPUB', 'MOBI', 'PDF']

+class StoreOzonRUStore(StoreBase):
+    name = 'OZON.ru'
+    description = u'ebooks from OZON.ru'
+    actual_plugin = 'calibre.gui2.store.stores.ozon_ru_plugin:OzonRUStore'
+    author = 'Roman Mukhin'
+
+    drm_free_only = True
+    headquarters = 'RU'
+    formats = ['TXT', 'PDF', 'DJVU', 'RTF', 'DOC', 'JAR', 'FB2']
+    affiliate = True
+
 class StorePragmaticBookshelfStore(StoreBase):
    name = 'Pragmatic Bookshelf'
    description = u'The Pragmatic Bookshelf\'s collection of programming and tech books avaliable as ebooks.'
@ -1466,13 +1478,13 @@ plugins += [
    StoreBeamEBooksDEStore,
    StoreBeWriteStore,
    StoreBookotekaStore,
+    StoreChitankaStore,
    StoreDieselEbooksStore,
    StoreEbookNLStore,
    StoreEbookscomStore,
    StoreEBookShoppeUKStore,
-    StoreEPubBuyDEStore,
    StoreEHarlequinStore,
-    StoreEpubBudStore,
+    StoreEKnigiStore,
    StoreFeedbooksStore,
    StoreFoylesUKStore,
    StoreGandalfStore,
@ -1486,6 +1498,7 @@ plugins += [
    StoreNextoStore,
    StoreOpenBooksStore,
    StoreOReillyStore,
+    StoreOzonRUStore,
    StorePragmaticBookshelfStore,
    StoreSmashwordsStore,
    StoreVirtualoStore,
--- a/src/calibre/db/backend.py
+++ b/src/calibre/db/backend.py
@ -8,7 +8,7 @@ __copyright__ = '2011, Kovid Goyal <kovid@kovidgoyal.net>'
 __docformat__ = 'restructuredtext en'

 # Imports {{{
-import os, shutil, uuid, json
+import os, shutil, uuid, json, glob, time, tempfile
 from functools import partial

 import apsw
@ -25,7 +25,7 @@ from calibre.utils.config import to_json, from_json, prefs, tweaks
 from calibre.utils.date import utcfromtimestamp, parse_date
 from calibre.utils.filenames import is_case_sensitive
 from calibre.db.tables import (OneToOneTable, ManyToOneTable, ManyToManyTable,
-        SizeTable, FormatsTable, AuthorsTable, IdentifiersTable)
+        SizeTable, FormatsTable, AuthorsTable, IdentifiersTable, CompositeTable)
 # }}}

 '''
@ -37,6 +37,8 @@ Differences in semantics from pysqlite:

 '''

+SPOOL_SIZE = 30*1024*1024
+
 class DynamicFilter(object): # {{{

    'No longer used, present for legacy compatibility'
@ -478,7 +480,6 @@ class DB(object):
                remove.append(data)
                continue

-            self.custom_column_label_map[data['label']] = data['num']
            self.custom_column_num_map[data['num']] = \
                self.custom_column_label_map[data['label']] = data

@ -613,10 +614,31 @@ class DB(object):

        tables['size'] = SizeTable('size', self.field_metadata['size'].copy())

-        for label, data in self.custom_column_label_map.iteritems():
-            label = '#' + label
+        self.FIELD_MAP = {'id':0, 'title':1, 'authors':2, 'timestamp':3,
+             'size':4, 'rating':5, 'tags':6, 'comments':7, 'series':8,
+             'publisher':9, 'series_index':10, 'sort':11, 'author_sort':12,
+             'formats':13, 'path':14, 'pubdate':15, 'uuid':16, 'cover':17,
+             'au_map':18, 'last_modified':19, 'identifiers':20}
+
+        for k,v in self.FIELD_MAP.iteritems():
+            self.field_metadata.set_field_record_index(k, v, prefer_custom=False)
+
+        base = max(self.FIELD_MAP.itervalues())
+
+        for label_, data in self.custom_column_label_map.iteritems():
+            label = self.field_metadata.custom_field_prefix + label_
            metadata = self.field_metadata[label].copy()
            link_table = self.custom_table_names(data['num'])[1]
+            self.FIELD_MAP[data['num']] = base = base+1
+            self.field_metadata.set_field_record_index(label_, base,
+                    prefer_custom=True)
+            if data['datatype'] == 'series':
+                # account for the series index column. Field_metadata knows that
+                # the series index is one larger than the series. If you change
+                # it here, be sure to change it there as well.
+                self.FIELD_MAP[str(data['num'])+'_index'] = base = base+1
+                self.field_metadata.set_field_record_index(label_+'_index', base,
+                            prefer_custom=True)

            if data['normalized']:
                if metadata['is_multiple']:
@ -633,7 +655,16 @@ class DB(object):
                        metadata['table'] = link_table
                        tables[label] = OneToOneTable(label, metadata)
            else:
-                tables[label] = OneToOneTable(label, metadata)
+                if data['datatype'] == 'composite':
+                    tables[label] = CompositeTable(label, metadata)
+                else:
+                    tables[label] = OneToOneTable(label, metadata)
+
+        self.FIELD_MAP['ondevice'] = base = base+1
+        self.field_metadata.set_field_record_index('ondevice', base, prefer_custom=False)
+        self.FIELD_MAP['marked'] = base = base+1
+        self.field_metadata.set_field_record_index('marked', base, prefer_custom=False)
+
    # }}}

    @property
@ -732,5 +763,57 @@ class DB(object):
                    pprint.pprint(table.metadata)
                    raise

+    def format_abspath(self, book_id, fmt, fname, path):
+        path = os.path.join(self.library_path, path)
+        fmt = ('.' + fmt.lower()) if fmt else ''
+        fmt_path = os.path.join(path, fname+fmt)
+        if os.path.exists(fmt_path):
+            return fmt_path
+        try:
+            candidates = glob.glob(os.path.join(path, '*'+fmt))
+        except: # If path contains strange characters this throws an exc
+            candidates = []
+        if fmt and candidates and os.path.exists(candidates[0]):
+            shutil.copyfile(candidates[0], fmt_path)
+            return fmt_path
+
+    def format_metadata(self, book_id, fmt, fname, path):
+        path = self.format_abspath(book_id, fmt, fname, path)
+        ans = {}
+        if path is not None:
+            stat = os.stat(path)
+            ans['size'] = stat.st_size
+            ans['mtime'] = utcfromtimestamp(stat.st_mtime)
+        return ans
+
+    def cover(self, path, as_file=False, as_image=False,
+            as_path=False):
+        path = os.path.join(self.library_path, path, 'cover.jpg')
+        ret = None
+        if os.access(path, os.R_OK):
+            try:
+                f = lopen(path, 'rb')
+            except (IOError, OSError):
+                time.sleep(0.2)
+                f = lopen(path, 'rb')
+            with f:
+                if as_path:
+                    pt = PersistentTemporaryFile('_dbcover.jpg')
+                    with pt:
+                        shutil.copyfileobj(f, pt)
+                    return pt.name
+                if as_file:
+                    ret = tempfile.SpooledTemporaryFile(SPOOL_SIZE)
+                    shutil.copyfileobj(f, ret)
+                    ret.seek(0)
+                else:
+                    ret = f.read()
+                    if as_image:
+                        from PyQt4.Qt import QImage
+                        i = QImage()
+                        i.loadFromData(ret)
+                        ret = i
+        return ret
+
   # }}}

--- a/src/calibre/db/cache.py
+++ b/src/calibre/db/cache.py
@ -7,5 +7,380 @@ __license__   = 'GPL v3'
 __copyright__ = '2011, Kovid Goyal <kovid@kovidgoyal.net>'
 __docformat__ = 'restructuredtext en'

+import os
+from collections import defaultdict
+from functools import wraps, partial
+
+from calibre.db.locking import create_locks, RecordLock
+from calibre.db.fields import create_field
+from calibre.ebooks.book.base import Metadata
+from calibre.utils.date import now
+
+def api(f):
+    f.is_cache_api = True
+    return f
+
+def read_api(f):
+    f = api(f)
+    f.is_read_api = True
+    return f
+
+def write_api(f):
+    f = api(f)
+    f.is_read_api = False
+    return f
+
+def wrap_simple(lock, func):
+    @wraps(func)
+    def ans(*args, **kwargs):
+        with lock:
+            return func(*args, **kwargs)
+    return ans


+class Cache(object):
+
+    def __init__(self, backend):
+        self.backend = backend
+        self.fields = {}
+        self.composites = set()
+        self.read_lock, self.write_lock = create_locks()
+        self.record_lock = RecordLock(self.read_lock)
+        self.format_metadata_cache = defaultdict(dict)
+
+        # Implement locking for all simple read/write API methods
+        # An unlocked version of the method is stored with the name starting
+        # with a leading underscore. Use the unlocked versions when the lock
+        # has already been acquired.
+        for name in dir(self):
+            func = getattr(self, name)
+            ira = getattr(func, 'is_read_api', None)
+            if ira is not None:
+                # Save original function
+                setattr(self, '_'+name, func)
+                # Wrap it in a lock
+                lock = self.read_lock if ira else self.write_lock
+                setattr(self, name, wrap_simple(lock, func))
+
+    @property
+    def field_metadata(self):
+        return self.backend.field_metadata
+
+    def _format_abspath(self, book_id, fmt):
+        '''
+        Return absolute path to the ebook file of format `format`
+
+        WARNING: This method will return a dummy path for a network backend DB,
+        so do not rely on it, use format(..., as_path=True) instead.
+
+        Currently used only in calibredb list, the viewer and the catalogs (via
+        get_data_as_dict()).
+
+        Apart from the viewer, I don't believe any of the others do any file
+        I/O with the results of this call.
+        '''
+        try:
+            name = self.fields['formats'].format_fname(book_id, fmt)
+            path = self._field_for('path', book_id).replace('/', os.sep)
+        except:
+            return None
+        if name and path:
+            return self.backend.format_abspath(book_id, fmt, name, path)
+
+    def _get_metadata(self, book_id, get_user_categories=True): # {{{
+        mi = Metadata(None)
+        author_ids = self._field_ids_for('authors', book_id)
+        aut_list = [self._author_data(i) for i in author_ids]
+        aum = []
+        aus = {}
+        aul = {}
+        for rec in aut_list:
+            aut = rec['name']
+            aum.append(aut)
+            aus[aut] = rec['sort']
+            aul[aut] = rec['link']
+        mi.title       = self._field_for('title', book_id,
+                default_value=_('Unknown'))
+        mi.authors     = aum
+        mi.author_sort = self._field_for('author_sort', book_id,
+                default_value=_('Unknown'))
+        mi.author_sort_map = aus
+        mi.author_link_map = aul
+        mi.comments    = self._field_for('comments', book_id)
+        mi.publisher   = self._field_for('publisher', book_id)
+        n = now()
+        mi.timestamp   = self._field_for('timestamp', book_id, default_value=n)
+        mi.pubdate     = self._field_for('pubdate', book_id, default_value=n)
+        mi.uuid        = self._field_for('uuid', book_id,
+                default_value='dummy')
+        mi.title_sort  = self._field_for('sort', book_id,
+                default_value=_('Unknown'))
+        mi.book_size   = self._field_for('size', book_id, default_value=0)
+        mi.ondevice_col = self._field_for('ondevice', book_id, default_value='')
+        mi.last_modified = self._field_for('last_modified', book_id,
+                default_value=n)
+        formats = self._field_for('formats', book_id)
+        mi.format_metadata = {}
+        if not formats:
+            formats = None
+        else:
+            for f in formats:
+                mi.format_metadata[f] = self._format_metadata(book_id, f)
+            formats = ','.join(formats)
+        mi.formats = formats
+        mi.has_cover = _('Yes') if self._field_for('cover', book_id,
+                default_value=False) else ''
+        mi.tags = list(self._field_for('tags', book_id, default_value=()))
+        mi.series = self._field_for('series', book_id)
+        if mi.series:
+            mi.series_index = self._field_for('series_index', book_id,
+                    default_value=1.0)
+        mi.rating = self._field_for('rating', book_id)
+        mi.set_identifiers(self._field_for('identifiers', book_id,
+            default_value={}))
+        mi.application_id = book_id
+        mi.id = book_id
+        composites = {}
+        for key, meta in self.field_metadata.custom_iteritems():
+            mi.set_user_metadata(key, meta)
+            if meta['datatype'] == 'composite':
+                composites.append(key)
+            else:
+                mi.set(key, val=self._field_for(meta['label'], book_id),
+                        extra=self._field_for(meta['label']+'_index', book_id))
+        for c in composites:
+            mi.set(key, val=self._composite_for(key, book_id, mi))
+
+        user_cat_vals = {}
+        if get_user_categories:
+            user_cats = self.prefs['user_categories']
+            for ucat in user_cats:
+                res = []
+                for name,cat,ign in user_cats[ucat]:
+                    v = mi.get(cat, None)
+                    if isinstance(v, list):
+                        if name in v:
+                            res.append([name,cat])
+                    elif name == v:
+                        res.append([name,cat])
+                user_cat_vals[ucat] = res
+        mi.user_categories = user_cat_vals
+
+        return mi
+    # }}}
+
+    # Cache Layer API {{{
+
+    @api
+    def init(self):
+        '''
+        Initialize this cache with data from the backend.
+        '''
+        with self.write_lock:
+            self.backend.read_tables()
+
+            for field, table in self.backend.tables.iteritems():
+                self.fields[field] = create_field(field, table)
+                if table.metadata['datatype'] == 'composite':
+                    self.composites.add(field)
+
+            self.fields['ondevice'] = create_field('ondevice', None)
+
+    @read_api
+    def field_for(self, name, book_id, default_value=None):
+        '''
+        Return the value of the field ``name`` for the book identified by
+        ``book_id``. If no such book exists or it has no defined value for the
+        field ``name`` or no such field exists, then ``default_value`` is returned.
+
+        The returned value for is_multiple fields are always tuples.
+        '''
+        if self.composites and name in self.composites:
+            return self.composite_for(name, book_id,
+                    default_value=default_value)
+        try:
+            return self.fields[name].for_book(book_id, default_value=default_value)
+        except (KeyError, IndexError):
+            return default_value
+
+    @read_api
+    def composite_for(self, name, book_id, mi=None, default_value=''):
+        try:
+            f = self.fields[name]
+        except KeyError:
+            return default_value
+
+        if mi is None:
+            return f.get_value_with_cache(book_id, partial(self._get_metadata,
+                get_user_categories=False))
+        else:
+            return f.render_composite(book_id, mi)
+
+    @read_api
+    def field_ids_for(self, name, book_id):
+        '''
+        Return the ids (as a tuple) for the values that the field ``name`` has on the book
+        identified by ``book_id``. If there are no values, or no such book, or
+        no such field, an empty tuple is returned.
+        '''
+        try:
+            return self.fields[name].ids_for_book(book_id)
+        except (KeyError, IndexError):
+            return ()
+
+    @read_api
+    def books_for_field(self, name, item_id):
+        '''
+        Return all the books associated with the item identified by
+        ``item_id``, where the item belongs to the field ``name``.
+
+        Returned value is a tuple of book ids, or the empty tuple if the item
+        or the field does not exist.
+        '''
+        try:
+            return self.fields[name].books_for(item_id)
+        except (KeyError, IndexError):
+            return ()
+
+    @read_api
+    def all_book_ids(self):
+        '''
+        Frozen set of all known book ids.
+        '''
+        return frozenset(self.fields['uuid'].iter_book_ids())
+
+    @read_api
+    def all_field_ids(self, name):
+        '''
+        Frozen set of ids for all values in the field ``name``.
+        '''
+        return frozenset(iter(self.fields[name]))
+
+    @read_api
+    def author_data(self, author_id):
+        '''
+        Return author data as a dictionary with keys: name, sort, link
+
+        If no author with the specified id is found an empty dictionary is
+        returned.
+        '''
+        try:
+            return self.fields['authors'].author_data(author_id)
+        except (KeyError, IndexError):
+            return {}
+
+    @read_api
+    def format_metadata(self, book_id, fmt, allow_cache=True):
+        if not fmt:
+            return {}
+        fmt = fmt.upper()
+        if allow_cache:
+            x = self.format_metadata_cache[book_id].get(fmt, None)
+            if x is not None:
+                return x
+        try:
+            name = self.fields['formats'].format_fname(book_id, fmt)
+            path = self._field_for('path', book_id).replace('/', os.sep)
+        except:
+            return {}
+
+        ans = {}
+        if path and name:
+            ans = self.backend.format_metadata(book_id, fmt, name, path)
+            self.format_metadata_cache[book_id][fmt] = ans
+        return ans
+
+    @api
+    def get_metadata(self, book_id,
+            get_cover=False, get_user_categories=True, cover_as_data=False):
+        '''
+        Return metadata for the book identified by book_id as a :class:`Metadata` object.
+        Note that the list of formats is not verified. If get_cover is True,
+        the cover is returned, either a path to temp file as mi.cover or if
+        cover_as_data is True then as mi.cover_data.
+        '''
+
+        with self.read_lock:
+            mi = self._get_metadata(book_id, get_user_categories=get_user_categories)
+
+        if get_cover:
+            if cover_as_data:
+                cdata = self.cover(book_id)
+                if cdata:
+                    mi.cover_data = ('jpeg', cdata)
+            else:
+                mi.cover = self.cover(book_id, as_path=True)
+
+        return mi
+
+    @api
+    def cover(self, book_id,
+            as_file=False, as_image=False, as_path=False):
+        '''
+        Return the cover image or None. By default, returns the cover as a
+        bytestring.
+
+        WARNING: Using as_path will copy the cover to a temp file and return
+        the path to the temp file. You should delete the temp file when you are
+        done with it.
+
+        :param as_file: If True return the image as an open file object (a SpooledTemporaryFile)
+        :param as_image: If True return the image as a QImage object
+        :param as_path: If True return the image as a path pointing to a
+                        temporary file
+        '''
+        with self.read_lock:
+            try:
+                path = self._field_for('path', book_id).replace('/', os.sep)
+            except:
+                return None
+
+        with self.record_lock.lock(book_id):
+            return self.backend.cover(path, as_file=as_file, as_image=as_image,
+                    as_path=as_path)
+
+    @read_api
+    def multisort(self, fields):
+        all_book_ids = frozenset(self._all_book_ids())
+        get_metadata = partial(self._get_metadata, get_user_categories=False)
+
+        sort_keys = tuple(self.fields[field[0]].sort_keys_for_books(get_metadata,
+            all_book_ids) for field in fields)
+
+        if len(sort_keys) == 1:
+            sk = sort_keys[0]
+            return sorted(all_book_ids, key=lambda i:sk[i], reverse=not
+                    fields[1])
+        else:
+            return sorted(all_book_ids, key=partial(SortKey, fields, sort_keys))
+
+    # }}}
+
+class SortKey(object):
+
+    def __init__(self, fields, sort_keys, book_id):
+        self.orders = tuple(1 if f[1] else -1 for f in fields)
+        self.sort_key = tuple(sk[book_id] for sk in sort_keys)
+
+    def __cmp__(self, other):
+        for i, order in enumerate(self.orders):
+            ans = cmp(self.sort_key[i], other.sort_key[i])
+            if ans != 0:
+                return ans * order
+        return 0
+
+
+# Testing {{{
+
+def test(library_path):
+    from calibre.db.backend import DB
+    backend = DB(library_path)
+    cache = Cache(backend)
+    cache.init()
+    print ('All book ids:', cache.all_book_ids())
+
+if __name__ == '__main__':
+    from calibre.utils.config import prefs
+    test(prefs['library_path'])
+
+# }}}
--- a/src/calibre/db/fields.py
+++ b/src/calibre/db/fields.py
@ -0,0 +1,257 @@
+#!/usr/bin/env python
+# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai
+from __future__ import (unicode_literals, division, absolute_import,
+                        print_function)
+from future_builtins import map
+
+__license__   = 'GPL v3'
+__copyright__ = '2011, Kovid Goyal <kovid@kovidgoyal.net>'
+__docformat__ = 'restructuredtext en'
+
+from threading import Lock
+
+from calibre.db.tables import ONE_ONE, MANY_ONE, MANY_MANY
+from calibre.utils.icu import sort_key
+
+class Field(object):
+
+    def __init__(self, name, table):
+        self.name, self.table = name, table
+        self.has_text_data = self.metadata['datatype'] in ('text', 'comments',
+                'series', 'enumeration')
+        self.table_type = self.table.table_type
+        dt = self.metadata['datatype']
+        self._sort_key = (sort_key if dt == 'text' else lambda x: x)
+
+    @property
+    def metadata(self):
+        return self.table.metadata
+
+    def for_book(self, book_id, default_value=None):
+        '''
+        Return the value of this field for the book identified by book_id.
+        When no value is found, returns ``default_value``.
+        '''
+        raise NotImplementedError()
+
+    def ids_for_book(self, book_id):
+        '''
+        Return a tuple of items ids for items associated with the book
+        identified by book_ids. Returns an empty tuple if no such items are
+        found.
+        '''
+        raise NotImplementedError()
+
+    def books_for(self, item_id):
+        '''
+        Return the ids of all books associated with the item identified by
+        item_id as a tuple. An empty tuple is returned if no books are found.
+        '''
+        raise NotImplementedError()
+
+    def __iter__(self):
+        '''
+        Iterate over the ids for all values in this field
+        '''
+        raise NotImplementedError()
+
+    def sort_keys_for_books(self, get_metadata, all_book_ids):
+        '''
+        Return a mapping of book_id -> sort_key. The sort key is suitable for
+        use in sorting the list of all books by this field, via the python cmp
+        method.
+        '''
+        raise NotImplementedError()
+
+
+class OneToOneField(Field):
+
+    def for_book(self, book_id, default_value=None):
+        return self.table.book_col_map.get(book_id, default_value)
+
+    def ids_for_book(self, book_id):
+        return (book_id,)
+
+    def books_for(self, item_id):
+        return (item_id,)
+
+    def __iter__(self):
+        return self.table.book_col_map.iterkeys()
+
+    def iter_book_ids(self):
+        return self.table.book_col_map.iterkeys()
+
+    def sort_keys_for_books(self, get_metadata, all_book_ids):
+        return {id_ : self._sort_key(self.book_col_map.get(id_, '')) for id_ in
+                all_book_ids}
+
+class CompositeField(OneToOneField):
+
+    def __init__(self, *args, **kwargs):
+        OneToOneField.__init__(self, *args, **kwargs)
+
+        self._render_cache = {}
+        self._lock = Lock()
+
+    def render_composite(self, book_id, mi):
+        with self._lock:
+            ans = self._render_cache.get(book_id, None)
+        if ans is None:
+            ans = mi.get(self.metadata['label'])
+            with self._lock:
+                self._render_cache[book_id] = ans
+        return ans
+
+    def clear_cache(self):
+        with self._lock:
+            self._render_cache = {}
+
+    def pop_cache(self, book_id):
+        with self._lock:
+            self._render_cache.pop(book_id, None)
+
+    def get_value_with_cache(self, book_id, get_metadata):
+        with self._lock:
+            ans = self._render_cache.get(book_id, None)
+        if ans is None:
+            mi = get_metadata(book_id)
+            ans = mi.get(self.metadata['label'])
+        return ans
+
+    def sort_keys_for_books(self, get_metadata, all_book_ids):
+        return {id_ : sort_key(self.get_value_with_cache(id_, get_metadata)) for id_ in
+                all_book_ids}
+
+
+class OnDeviceField(OneToOneField):
+
+    def __init__(self, name, table):
+        self.name = name
+        self.book_on_device_func = None
+
+    def book_on_device(self, book_id):
+        if callable(self.book_on_device_func):
+            return self.book_on_device_func(book_id)
+        return None
+
+    def set_book_on_device_func(self, func):
+        self.book_on_device_func = func
+
+    def for_book(self, book_id, default_value=None):
+        loc = []
+        count = 0
+        on = self.book_on_device(book_id)
+        if on is not None:
+            m, a, b, count = on[:4]
+            if m is not None:
+                loc.append(_('Main'))
+            if a is not None:
+                loc.append(_('Card A'))
+            if b is not None:
+                loc.append(_('Card B'))
+        return ', '.join(loc) + ((' (%s books)'%count) if count > 1 else '')
+
+    def __iter__(self):
+        return iter(())
+
+    def iter_book_ids(self):
+        return iter(())
+
+    def sort_keys_for_books(self, get_metadata, all_book_ids):
+        return {id_ : self.for_book(id_) for id_ in
+                all_book_ids}
+
+class ManyToOneField(Field):
+
+    def for_book(self, book_id, default_value=None):
+        ids = self.table.book_col_map.get(book_id, None)
+        if ids is not None:
+            ans = self.id_map[ids]
+        else:
+            ans = default_value
+        return ans
+
+    def ids_for_book(self, book_id):
+        id_ = self.table.book_col_map.get(book_id, None)
+        if id_ is None:
+            return ()
+        return (id_,)
+
+    def books_for(self, item_id):
+        return self.table.col_book_map.get(item_id, ())
+
+    def __iter__(self):
+        return self.table.id_map.iterkeys()
+
+    def sort_keys_for_books(self, get_metadata, all_book_ids):
+        keys = {id_ : self._sort_key(self.id_map.get(id_, '')) for id_ in
+                all_book_ids}
+        return {id_ : keys.get(
+            self.book_col_map.get(id_, None), '') for id_ in all_book_ids}
+
+class ManyToManyField(Field):
+
+    def __init__(self, *args, **kwargs):
+        Field.__init__(self, *args, **kwargs)
+        self.alphabetical_sort = self.name != 'authors'
+
+    def for_book(self, book_id, default_value=None):
+        ids = self.table.book_col_map.get(book_id, ())
+        if ids:
+            ans = tuple(self.id_map[i] for i in ids)
+        else:
+            ans = default_value
+        return ans
+
+    def ids_for_book(self, book_id):
+        return self.table.book_col_map.get(book_id, ())
+
+    def books_for(self, item_id):
+        return self.table.col_book_map.get(item_id, ())
+
+    def __iter__(self):
+        return self.table.id_map.iterkeys()
+
+    def sort_keys_for_books(self, get_metadata, all_book_ids):
+        keys = {id_ : self._sort_key(self.id_map.get(id_, '')) for id_ in
+                all_book_ids}
+
+        def sort_key_for_book(book_id):
+            item_ids = self.table.book_col_map.get(book_id, ())
+            if self.alphabetical_sort:
+                item_ids = sorted(item_ids, key=keys.get)
+            return tuple(map(keys.get, item_ids))
+
+        return {id_ : sort_key_for_book(id_) for id_ in all_book_ids}
+
+
+class AuthorsField(ManyToManyField):
+
+    def author_data(self, author_id):
+        return {
+            'name' : self.table.id_map[author_id],
+            'sort' : self.table.asort_map[author_id],
+            'link' : self.table.alink_map[author_id],
+        }
+
+class FormatsField(ManyToManyField):
+
+    def format_fname(self, book_id, fmt):
+        return self.table.fname_map[book_id][fmt.upper()]
+
+def create_field(name, table):
+    cls = {
+            ONE_ONE : OneToOneField,
+            MANY_ONE : ManyToOneField,
+            MANY_MANY : ManyToManyField,
+        }[table.table_type]
+    if name == 'authors':
+        cls = AuthorsField
+    elif name == 'ondevice':
+        cls = OnDeviceField
+    elif name == 'formats':
+        cls = FormatsField
+    elif table.metadata['datatype'] == 'composite':
+        cls = CompositeField
+    return cls(name, table)
+
--- a/src/calibre/db/locking.py
+++ b/src/calibre/db/locking.py
@ -7,7 +7,9 @@ __license__   = 'GPL v3'
 __copyright__ = '2011, Kovid Goyal <kovid@kovidgoyal.net>'
 __docformat__ = 'restructuredtext en'

-from threading import Lock, Condition, current_thread
+from threading import Lock, Condition, current_thread, RLock
+from functools import partial
+from collections import Counter

 class LockingError(RuntimeError):
    pass
@ -37,7 +39,7 @@ def create_locks():
    l = SHLock()
    return RWLockWrapper(l), RWLockWrapper(l, is_shared=False)

-class SHLock(object):
+class SHLock(object): # {{{
    '''
    Shareable lock class. Used to implement the Multiple readers-single writer
    paradigm. As best as I can tell, neither writer nor reader starvation
@ -79,6 +81,11 @@ class SHLock(object):
                return self._acquire_exclusive(blocking)
            assert not (self.is_shared and self.is_exclusive)

+    def owns_lock(self):
+        me = current_thread()
+        with self._lock:
+            return self._exclusive_owner is me or me in self._shared_owners
+
    def release(self):
        ''' Release the lock. '''
        #  This decrements the appropriate lock counters, and if the lock
@ -189,6 +196,8 @@ class SHLock(object):
    def _return_waiter(self, waiter):
        self._free_waiters.append(waiter)

+# }}}
+
 class RWLockWrapper(object):

    def __init__(self, shlock, is_shared=True):
@ -200,16 +209,124 @@ class RWLockWrapper(object):
        return self

    def __exit__(self, *args):
+        self.release()
+
+    def release(self):
        self._shlock.release()

+    def owns_lock(self):
+        return self._shlock.owns_lock()
+
+class RecordLock(object):
+
+    '''
+    Lock records identified by hashable ids. To use
+
+    rl = RecordLock()
+
+    with rl.lock(some_id):
+        # do something
+
+    This will lock the record identified by some_id exclusively. The lock is
+    recursive, which means that you can lock the same record multiple times in
+    the same thread.
+
+    This class co-operates with the SHLock class. If you try to lock a record
+    in a thread that already holds the SHLock, a LockingError is raised. This
+    is to prevent the possibility of a cross-lock deadlock.
+
+    A cross-lock deadlock is still possible if you first lock a record and then
+    acquire the SHLock, but the usage pattern for this lock makes this highly
+    unlikely (this lock should be acquired immediately before any file I/O on
+    files in the library and released immediately after).
+    '''
+
+    class Wrap(object):
+
+        def __init__(self, release):
+            self.release = release
+
+        def __enter__(self):
+            return self
+
+        def __exit__(self, *args, **kwargs):
+            self.release()
+            self.release = None
+
+    def __init__(self, sh_lock):
+        self._lock = Lock()
+        #  This is for recycling lock objects.
+        self._free_locks = [RLock()]
+        self._records = {}
+        self._counter = Counter()
+        self.sh_lock = sh_lock
+
+    def lock(self, record_id):
+        if self.sh_lock.owns_lock():
+            raise LockingError('Current thread already holds a shared lock,'
+                ' you cannot also ask for record lock as this could cause a'
+                ' deadlock.')
+        with self._lock:
+            l = self._records.get(record_id, None)
+            if l is None:
+                l = self._take_lock()
+                self._records[record_id] = l
+            self._counter[record_id] += 1
+        l.acquire()
+        return RecordLock.Wrap(partial(self.release, record_id))
+
+    def release(self, record_id):
+        with self._lock:
+            l = self._records.pop(record_id, None)
+            if l is None:
+                raise LockingError('No lock acquired for record %r'%record_id)
+            l.release()
+            self._counter[record_id] -= 1
+            if self._counter[record_id] > 0:
+                self._records[record_id] = l
+            else:
+                self._return_lock(l)
+
+    def _take_lock(self):
+        try:
+            return self._free_locks.pop()
+        except IndexError:
+            return RLock()
+
+    def _return_lock(self, lock):
+        self._free_locks.append(lock)

 # Tests {{{
 if __name__ == '__main__':
    import time, random, unittest
    from threading import Thread

-    class TestSHLock(unittest.TestCase):
-        """Testcases for SHLock class."""
+    class TestLock(unittest.TestCase):
+        """Testcases for Lock classes."""
+
+        def test_owns_locks(self):
+            lock = SHLock()
+            self.assertFalse(lock.owns_lock())
+            lock.acquire(shared=True)
+            self.assertTrue(lock.owns_lock())
+            lock.release()
+            self.assertFalse(lock.owns_lock())
+            lock.acquire(shared=False)
+            self.assertTrue(lock.owns_lock())
+            lock.release()
+            self.assertFalse(lock.owns_lock())
+
+            done = []
+            def test():
+                if not lock.owns_lock():
+                    done.append(True)
+            lock.acquire()
+            t = Thread(target=test)
+            t.daemon = True
+            t.start()
+            t.join(1)
+            self.assertEqual(len(done), 1)
+            lock.release()

        def test_multithread_deadlock(self):
            lock = SHLock()
@ -345,8 +462,38 @@ if __name__ == '__main__':
            self.assertFalse(lock.is_shared)
            self.assertFalse(lock.is_exclusive)

+        def test_record_lock(self):
+            shlock = SHLock()
+            lock = RecordLock(shlock)

-    suite = unittest.TestLoader().loadTestsFromTestCase(TestSHLock)
+            shlock.acquire()
+            self.assertRaises(LockingError, lock.lock, 1)
+            shlock.release()
+            with lock.lock(1):
+                with lock.lock(1):
+                    pass
+
+            def dolock():
+                with lock.lock(1):
+                    time.sleep(0.1)
+
+            t = Thread(target=dolock)
+            t.daemon = True
+            with lock.lock(1):
+                t.start()
+                t.join(0.2)
+                self.assertTrue(t.is_alive())
+            t.join(0.11)
+            self.assertFalse(t.is_alive())
+
+            t = Thread(target=dolock)
+            t.daemon = True
+            with lock.lock(2):
+                t.start()
+                t.join(0.11)
+                self.assertFalse(t.is_alive())
+
+    suite = unittest.TestLoader().loadTestsFromTestCase(TestLock)
    unittest.TextTestRunner(verbosity=2).run(suite)

 # }}}
--- a/src/calibre/db/tables.py
+++ b/src/calibre/db/tables.py
@ -12,11 +12,13 @@ from datetime import datetime
 from dateutil.tz import tzoffset

 from calibre.constants import plugins
-from calibre.utils.date import parse_date, local_tz
+from calibre.utils.date import parse_date, local_tz, UNDEFINED_DATE
 from calibre.ebooks.metadata import author_to_author_sort

 _c_speedup = plugins['speedup'][0]

+ONE_ONE, MANY_ONE, MANY_MANY = xrange(3)
+
 def _c_convert_timestamp(val):
    if not val:
        return None
@ -27,8 +29,11 @@ def _c_convert_timestamp(val):
    if ret is None:
        return parse_date(val, as_utc=False)
    year, month, day, hour, minutes, seconds, tzsecs = ret
-    return datetime(year, month, day, hour, minutes, seconds,
+    try:
+        return datetime(year, month, day, hour, minutes, seconds,
                tzinfo=tzoffset(None, tzsecs)).astimezone(local_tz)
+    except OverflowError:
+        return UNDEFINED_DATE.astimezone(local_tz)

 class Table(object):

@ -57,6 +62,8 @@ class OneToOneTable(Table):
    timestamp, size, etc.
    '''

+    table_type = ONE_ONE
+
    def read(self, db):
        self.book_col_map = {}
        idcol = 'id' if self.metadata['table'] == 'books' else 'book'
@ -73,6 +80,17 @@ class SizeTable(OneToOneTable):
                'WHERE data.book=books.id) FROM books'):
            self.book_col_map[row[0]] = self.unserialize(row[1])

+class CompositeTable(OneToOneTable):
+
+    def read(self, db):
+        self.book_col_map = {}
+        d = self.metadata['display']
+        self.composite_template = ['composite_template']
+        self.contains_html = d['contains_html']
+        self.make_category = d['make_category']
+        self.composite_sort = d['composite_sort']
+        self.use_decorations = d['use_decorations']
+
 class ManyToOneTable(Table):

    '''
@ -82,9 +100,10 @@ class ManyToOneTable(Table):
    Each book however has only one value for data of this type.
    '''

+    table_type = MANY_ONE
+
    def read(self, db):
        self.id_map = {}
-        self.extra_map = {}
        self.col_book_map = {}
        self.book_col_map = {}
        self.read_id_maps(db)
@ -105,6 +124,9 @@ class ManyToOneTable(Table):
            self.col_book_map[row[1]].append(row[0])
            self.book_col_map[row[0]] = row[1]

+        for key in tuple(self.col_book_map.iterkeys()):
+            self.col_book_map[key] = tuple(self.col_book_map[key])
+
 class ManyToManyTable(ManyToOneTable):

    '''
@ -113,6 +135,8 @@ class ManyToManyTable(ManyToOneTable):
    book. For example: tags or authors.
    '''

+    table_type = MANY_MANY
+
    def read_maps(self, db):
        for row in db.conn.execute(
                'SELECT book, {0} FROM {1}'.format(
@ -124,14 +148,21 @@ class ManyToManyTable(ManyToOneTable):
                self.book_col_map[row[0]] = []
            self.book_col_map[row[0]].append(row[1])

+        for key in tuple(self.col_book_map.iterkeys()):
+            self.col_book_map[key] = tuple(self.col_book_map[key])
+
+        for key in tuple(self.book_col_map.iterkeys()):
+            self.book_col_map[key] = tuple(self.book_col_map[key])
+
 class AuthorsTable(ManyToManyTable):

    def read_id_maps(self, db):
        self.alink_map = {}
+        self.asort_map  = {}
        for row in db.conn.execute(
                'SELECT id, name, sort, link FROM authors'):
            self.id_map[row[0]] = row[1]
-            self.extra_map[row[0]] = (row[2] if row[2] else
+            self.asort_map[row[0]] = (row[2] if row[2] else
                    author_to_author_sort(row[1]))
            self.alink_map[row[0]] = row[3]

@ -141,14 +172,25 @@ class FormatsTable(ManyToManyTable):
        pass

    def read_maps(self, db):
+        self.fname_map = {}
        for row in db.conn.execute('SELECT book, format, name FROM data'):
            if row[1] is not None:
-                if row[1] not in self.col_book_map:
-                    self.col_book_map[row[1]] = []
-                self.col_book_map[row[1]].append(row[0])
+                fmt = row[1].upper()
+                if fmt not in self.col_book_map:
+                    self.col_book_map[fmt] = []
+                self.col_book_map[fmt].append(row[0])
                if row[0] not in self.book_col_map:
                    self.book_col_map[row[0]] = []
-                self.book_col_map[row[0]].append((row[1], row[2]))
+                self.book_col_map[row[0]].append(fmt)
+                if row[0] not in self.fname_map:
+                    self.fname_map[row[0]] = {}
+                self.fname_map[row[0]][fmt] = row[2]
+
+        for key in tuple(self.col_book_map.iterkeys()):
+            self.col_book_map[key] = tuple(self.col_book_map[key])
+
+        for key in tuple(self.book_col_map.iterkeys()):
+            self.book_col_map[key] = tuple(self.book_col_map[key])

 class IdentifiersTable(ManyToManyTable):

@ -162,6 +204,9 @@ class IdentifiersTable(ManyToManyTable):
                    self.col_book_map[row[1]] = []
                self.col_book_map[row[1]].append(row[0])
                if row[0] not in self.book_col_map:
-                    self.book_col_map[row[0]] = []
-                self.book_col_map[row[0]].append((row[1], row[2]))
+                    self.book_col_map[row[0]] = {}
+                self.book_col_map[row[0]][row[1]] = row[2]
+
+        for key in tuple(self.col_book_map.iterkeys()):
+            self.col_book_map[key] = tuple(self.col_book_map[key])

--- a/src/calibre/db/view.py
+++ b/src/calibre/db/view.py
@ -0,0 +1,109 @@
+#!/usr/bin/env python
+# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai
+from __future__ import (unicode_literals, division, absolute_import,
+                        print_function)
+
+__license__   = 'GPL v3'
+__copyright__ = '2011, Kovid Goyal <kovid@kovidgoyal.net>'
+__docformat__ = 'restructuredtext en'
+
+from functools import partial
+
+def sanitize_sort_field_name(field_metadata, field):
+    field = field_metadata.search_term_to_field_key(field.lower().strip())
+    # translate some fields to their hidden equivalent
+    field = {'title': 'sort', 'authors':'author_sort'}.get(field, field)
+    return field
+
+class View(object):
+
+    def __init__(self, cache):
+        self.cache = cache
+        self.marked_ids = {}
+        self._field_getters = {}
+        for col, idx in cache.backend.FIELD_MAP.iteritems():
+            if isinstance(col, int):
+                label = self.cache.backend.custom_column_num_map[col]['label']
+                label = (self.cache.backend.field_metadata.custom_field_prefix
+                        + label)
+                self._field_getters[idx] = partial(self.get, label)
+            else:
+                try:
+                    self._field_getters[idx] = {
+                        'id'      : self._get_id,
+                        'au_map'  : self.get_author_data,
+                        'ondevice': self.get_ondevice,
+                        'marked'  : self.get_marked,
+                    }[col]
+                except KeyError:
+                    self._field_getters[idx] = partial(self.get, col)
+
+        self._map = list(self.cache.all_book_ids())
+        self._map_filtered = list(self._map)
+
+    @property
+    def field_metadata(self):
+        return self.cache.field_metadata
+
+    def _get_id(self, idx, index_is_id=True):
+        ans = idx if index_is_id else self.index_to_id(idx)
+        return ans
+
+    def get_field_map_field(self, row, col, index_is_id=True):
+        '''
+        Supports the legacy FIELD_MAP interface for getting metadata. Do not use
+        in new code.
+        '''
+        getter = self._field_getters[col]
+        return getter(row, index_is_id=index_is_id)
+
+    def index_to_id(self, idx):
+        return self._map_filtered[idx]
+
+    def get(self, field, idx, index_is_id=True, default_value=None):
+        id_ = idx if index_is_id else self.index_to_id(idx)
+        return self.cache.field_for(field, id_)
+
+    def get_ondevice(self, idx, index_is_id=True, default_value=''):
+        id_ = idx if index_is_id else self.index_to_id(idx)
+        self.cache.field_for('ondevice', id_, default_value=default_value)
+
+    def get_marked(self, idx, index_is_id=True, default_value=None):
+        id_ = idx if index_is_id else self.index_to_id(idx)
+        return self.marked_ids.get(id_, default_value)
+
+    def get_author_data(self, idx, index_is_id=True, default_value=()):
+        '''
+        Return author data for all authors of the book identified by idx as a
+        tuple of dictionaries. The dictionaries should never be empty, unless
+        there is a bug somewhere. The list could be empty if idx point to an
+        non existent book, or book with no authors (though again a book with no
+        authors should never happen).
+
+        Each dictionary has the keys: name, sort, link. Link can be an empty
+        string.
+
+        default_value is ignored, this method always returns a tuple
+        '''
+        id_ = idx if index_is_id else self.index_to_id(idx)
+        with self.cache.read_lock:
+            ids = self.cache._field_ids_for('authors', id_)
+            ans = []
+            for id_ in ids:
+                ans.append(self.cache._author_data(id_))
+        return tuple(ans)
+
+    def multisort(self, fields=[], subsort=False):
+        fields = [(sanitize_sort_field_name(self.field_metadata, x), bool(y)) for x, y in fields]
+        keys = self.field_metadata.sortable_field_keys()
+        fields = [x for x in fields if x[0] in keys]
+        if subsort and 'sort' not in [x[0] for x in fields]:
+            fields += [('sort', True)]
+        if not fields:
+            fields = [('timestamp', False)]
+
+        sorted_book_ids = self.cache.multisort(fields)
+        sorted_book_ids
+        # TODO: change maps
+
+
--- a/src/calibre/devices/android/driver.py
+++ b/src/calibre/devices/android/driver.py
@ -39,7 +39,7 @@ class ANDROID(USBMS):
            0x22b8 : { 0x41d9 : [0x216], 0x2d61 : [0x100], 0x2d67 : [0x100],
                       0x41db : [0x216], 0x4285 : [0x216], 0x42a3 : [0x216],
                       0x4286 : [0x216], 0x42b3 : [0x216], 0x42b4 : [0x216],
-                       0x7086 : [0x0226], 0x70a8: [0x9999],
+                       0x7086 : [0x0226], 0x70a8: [0x9999], 0x42c4 : [0x216],
                     },

            # Sony Ericsson
@ -47,10 +47,12 @@ class ANDROID(USBMS):

            # Google
            0x18d1 : {
+                0x0001 : [0x0223],
                0x4e11 : [0x0100, 0x226, 0x227],
-                0x4e12: [0x0100, 0x226, 0x227],
-                0x4e21: [0x0100, 0x226, 0x227],
-                0xb058: [0x0222, 0x226, 0x227]},
+                0x4e12 : [0x0100, 0x226, 0x227],
+                0x4e21 : [0x0100, 0x226, 0x227],
+                0xb058 : [0x0222, 0x226, 0x227]
+            },

            # Samsung
            0x04e8 : { 0x681d : [0x0222, 0x0223, 0x0224, 0x0400],
@ -60,6 +62,7 @@ class ANDROID(USBMS):
                       0x685e : [0x0400],
                       0x6860 : [0x0400],
                       0x6877 : [0x0400],
+                       0x689e : [0x0400],
                     },

            # Viewsonic
@ -124,7 +127,8 @@ class ANDROID(USBMS):
            'IDEOS_TABLET', 'MYTOUCH_4G', 'UMS_COMPOSITE', 'SCH-I800_CARD',
            '7', 'A956', 'A955', 'A43', 'ANDROID_PLATFORM', 'TEGRA_2',
            'MB860', 'MULTI-CARD', 'MID7015A', 'INCREDIBLE', 'A7EB', 'STREAK',
-            'MB525', 'ANDROID2.3', 'SGH-I997']
+            'MB525', 'ANDROID2.3', 'SGH-I997', 'GT-I5800_CARD', 'MB612',
+            'GT-S5830_CARD', 'GT-S5570_CARD']
    WINDOWS_CARD_A_MEM = ['ANDROID_PHONE', 'GT-I9000_CARD', 'SGH-I897',
            'FILE-STOR_GADGET', 'SGH-T959', 'SAMSUNG_ANDROID', 'GT-P1000_CARD',
            'A70S', 'A101IT', '7', 'INCREDIBLE', 'A7EB', 'SGH-T849_CARD',
--- a/src/calibre/devices/eb600/driver.py
+++ b/src/calibre/devices/eb600/driver.py
@ -35,9 +35,9 @@ class EB600(USBMS):
    PRODUCT_ID  = [0x1688]
    BCD         = [0x110]

-    VENDOR_NAME      = ['NETRONIX', 'WOLDER']
-    WINDOWS_MAIN_MEM = ['EBOOK', 'MIBUK_GAMMA_6.2']
-    WINDOWS_CARD_A_MEM = 'EBOOK'
+    VENDOR_NAME      = ['NETRONIX', 'WOLDER', 'MD86371']
+    WINDOWS_MAIN_MEM = ['EBOOK', 'MIBUK_GAMMA_6.2', 'MD86371']
+    WINDOWS_CARD_A_MEM = ['EBOOK', 'MD86371']

    OSX_MAIN_MEM = 'EB600 Internal Storage Media'
    OSX_CARD_A_MEM = 'EB600 Card Storage Media'
--- a/src/calibre/devices/hanvon/driver.py
+++ b/src/calibre/devices/hanvon/driver.py
@ -131,7 +131,7 @@ class AZBOOKA(ALEX):
    description = _('Communicate with the Azbooka')

    VENDOR_NAME      = 'LINUX'
-    WINDOWS_MAIN_MEM = 'FILE-STOR_GADGET'
+    WINDOWS_MAIN_MEM = WINDOWS_CARD_A_MEM = 'FILE-STOR_GADGET'

    MAIN_MEMORY_VOLUME_LABEL  = 'Azbooka Internal Memory'

--- a/src/calibre/devices/kobo/driver.py
+++ b/src/calibre/devices/kobo/driver.py
@ -7,6 +7,7 @@ __docformat__ = 'restructuredtext en'

 import os
 import sqlite3 as sqlite
+from contextlib import closing

 from calibre.devices.usbms.books import BookList
 from calibre.devices.kobo.books import Book
@ -22,7 +23,7 @@ class KOBO(USBMS):
    gui_name = 'Kobo Reader'
    description = _('Communicate with the Kobo Reader')
    author = 'Timothy Legge'
-    version = (1, 0, 9)
+    version = (1, 0, 10)

    dbversion = 0
    fwversion = 0
@ -48,12 +49,16 @@ class KOBO(USBMS):

    VIRTUAL_BOOK_EXTENSIONS = frozenset(['kobo'])

-    EXTRA_CUSTOMIZATION_MESSAGE = _('The Kobo supports only one collection '
-            'currently: the \"Im_Reading\" list.  Create a tag called \"Im_Reading\" ')+\
-                    'for automatic management'
+    EXTRA_CUSTOMIZATION_MESSAGE = [
+            _('The Kobo supports several collections including ')+\
+                    'Read, Closed, Im_Reading ' +\
+            _('Create tags for automatic management'),
+    ]

    EXTRA_CUSTOMIZATION_DEFAULT = ', '.join(['tags'])

+    OPT_COLLECTIONS = 0
+
    def initialize(self):
        USBMS.initialize(self)
        self.book_class = Book
@ -188,77 +193,78 @@ class KOBO(USBMS):
                traceback.print_exc()
            return changed

-        connection = sqlite.connect(self.normalize_path(self._main_prefix + '.kobo/KoboReader.sqlite'))
+        with closing(sqlite.connect(
+            self.normalize_path(self._main_prefix +
+                '.kobo/KoboReader.sqlite'))) as connection:

-        # return bytestrings if the content cannot the decoded as unicode
-        connection.text_factory = lambda x: unicode(x, "utf-8", "ignore")
+            # return bytestrings if the content cannot the decoded as unicode
+            connection.text_factory = lambda x: unicode(x, "utf-8", "ignore")

-        cursor = connection.cursor()
+            cursor = connection.cursor()

-        #query = 'select count(distinct volumeId) from volume_shortcovers'
-        #cursor.execute(query)
-        #for row in (cursor):
-        #    numrows = row[0]
-        #cursor.close()
+            #query = 'select count(distinct volumeId) from volume_shortcovers'
+            #cursor.execute(query)
+            #for row in (cursor):
+            #    numrows = row[0]
+            #cursor.close()

-        # Determine the database version
-        # 4 - Bluetooth Kobo Rev 2 (1.4)
-        # 8 - WIFI KOBO Rev 1
-        cursor.execute('select version from dbversion')
-        result = cursor.fetchone()
-        self.dbversion = result[0]
+            # Determine the database version
+            # 4 - Bluetooth Kobo Rev 2 (1.4)
+            # 8 - WIFI KOBO Rev 1
+            cursor.execute('select version from dbversion')
+            result = cursor.fetchone()
+            self.dbversion = result[0]

-        debug_print("Database Version: ", self.dbversion)
-        if self.dbversion >= 16:
-            query= 'select Title, Attribution, DateCreated, ContentID, MimeType, ContentType, ' \
-                'ImageID, ReadStatus, ___ExpirationStatus, FavouritesIndex, Accessibility from content where ' \
-                'BookID is Null  and  ( ___ExpirationStatus <> "3" or ___ExpirationStatus is Null)'
-        elif self.dbversion < 16 and self.dbversion >= 14:
-            query= 'select Title, Attribution, DateCreated, ContentID, MimeType, ContentType, ' \
-                'ImageID, ReadStatus, ___ExpirationStatus, FavouritesIndex, "-1" as Accessibility  from content where ' \
-                'BookID is Null  and  ( ___ExpirationStatus <> "3" or ___ExpirationStatus is Null)'
-        elif self.dbversion < 14 and self.dbversion >= 8:
-            query= 'select Title, Attribution, DateCreated, ContentID, MimeType, ContentType, ' \
-                'ImageID, ReadStatus, ___ExpirationStatus, "-1" as FavouritesIndex, "-1" as Accessibility  from content where ' \
-                'BookID is Null  and  ( ___ExpirationStatus <> "3" or ___ExpirationStatus is Null)'
-        else:
-            query= 'select Title, Attribution, DateCreated, ContentID, MimeType, ContentType, ' \
-                'ImageID, ReadStatus, "-1" as ___ExpirationStatus, "-1" as FavouritesIndex, "-1" as Accessibility from content where BookID is Null'
+            debug_print("Database Version: ", self.dbversion)
+            if self.dbversion >= 16:
+                query= 'select Title, Attribution, DateCreated, ContentID, MimeType, ContentType, ' \
+                    'ImageID, ReadStatus, ___ExpirationStatus, FavouritesIndex, Accessibility from content where ' \
+                    'BookID is Null  and  ( ___ExpirationStatus <> "3" or ___ExpirationStatus is Null)'
+            elif self.dbversion < 16 and self.dbversion >= 14:
+                query= 'select Title, Attribution, DateCreated, ContentID, MimeType, ContentType, ' \
+                    'ImageID, ReadStatus, ___ExpirationStatus, FavouritesIndex, "-1" as Accessibility  from content where ' \
+                    'BookID is Null  and  ( ___ExpirationStatus <> "3" or ___ExpirationStatus is Null)'
+            elif self.dbversion < 14 and self.dbversion >= 8:
+                query= 'select Title, Attribution, DateCreated, ContentID, MimeType, ContentType, ' \
+                    'ImageID, ReadStatus, ___ExpirationStatus, "-1" as FavouritesIndex, "-1" as Accessibility  from content where ' \
+                    'BookID is Null  and  ( ___ExpirationStatus <> "3" or ___ExpirationStatus is Null)'
+            else:
+                query= 'select Title, Attribution, DateCreated, ContentID, MimeType, ContentType, ' \
+                    'ImageID, ReadStatus, "-1" as ___ExpirationStatus, "-1" as FavouritesIndex, "-1" as Accessibility from content where BookID is Null'

-        try:
-            cursor.execute (query)
-        except Exception as e:
-            err = str(e)
-            if not ('___ExpirationStatus' in err or 'FavouritesIndex' in err or
-                    'Accessibility' in err):
-                raise
-            query= ('select Title, Attribution, DateCreated, ContentID, MimeType, ContentType, '
-                'ImageID, ReadStatus, "-1" as ___ExpirationStatus, "-1" as '
-                'FavouritesIndex, "-1" as Accessibility from content where '
-                'BookID is Null')
-            cursor.execute(query)
+            try:
+                cursor.execute (query)
+            except Exception as e:
+                err = str(e)
+                if not ('___ExpirationStatus' in err or 'FavouritesIndex' in err or
+                        'Accessibility' in err):
+                    raise
+                query= ('select Title, Attribution, DateCreated, ContentID, MimeType, ContentType, '
+                    'ImageID, ReadStatus, "-1" as ___ExpirationStatus, "-1" as '
+                    'FavouritesIndex, "-1" as Accessibility from content where '
+                    'BookID is Null')
+                cursor.execute(query)

-        changed = False
-        for i, row in enumerate(cursor):
-        #  self.report_progress((i+1) / float(numrows), _('Getting list of books on device...'))
-            if row[3].startswith("file:///usr/local/Kobo/help/"):
-                # These are internal to the Kobo device and do not exist
-                continue
-            path = self.path_from_contentid(row[3], row[5], row[4], oncard)
-            mime = mime_type_ext(path_to_ext(path)) if path.find('kepub') == -1 else 'application/epub+zip'
-            # debug_print("mime:", mime)
+            changed = False
+            for i, row in enumerate(cursor):
+            #  self.report_progress((i+1) / float(numrows), _('Getting list of books on device...'))
+                if row[3].startswith("file:///usr/local/Kobo/help/"):
+                    # These are internal to the Kobo device and do not exist
+                    continue
+                path = self.path_from_contentid(row[3], row[5], row[4], oncard)
+                mime = mime_type_ext(path_to_ext(path)) if path.find('kepub') == -1 else 'application/epub+zip'
+                # debug_print("mime:", mime)

-            if oncard != 'carda' and oncard != 'cardb' and not row[3].startswith("file:///mnt/sd/"):
-                changed = update_booklist(self._main_prefix, path, row[0], row[1], mime, row[2], row[5], row[6], row[7], row[4], row[8], row[9], row[10])
-                # print "shortbook: " + path
-            elif oncard == 'carda' and row[3].startswith("file:///mnt/sd/"):
-                changed = update_booklist(self._card_a_prefix, path, row[0], row[1], mime, row[2], row[5], row[6], row[7], row[4], row[8], row[9], row[10])
+                if oncard != 'carda' and oncard != 'cardb' and not row[3].startswith("file:///mnt/sd/"):
+                    changed = update_booklist(self._main_prefix, path, row[0], row[1], mime, row[2], row[5], row[6], row[7], row[4], row[8], row[9], row[10])
+                    # print "shortbook: " + path
+                elif oncard == 'carda' and row[3].startswith("file:///mnt/sd/"):
+                    changed = update_booklist(self._card_a_prefix, path, row[0], row[1], mime, row[2], row[5], row[6], row[7], row[4], row[8], row[9], row[10])

-            if changed:
-                need_sync = True
+                if changed:
+                    need_sync = True

-        cursor.close()
-        connection.close()
+            cursor.close()

        # Remove books that are no longer in the filesystem. Cache contains
        # indices into the booklist if book not in filesystem, None otherwise
@ -288,56 +294,56 @@ class KOBO(USBMS):
        #    2) content

        debug_print('delete_via_sql: ContentID: ', ContentID, 'ContentType: ', ContentType)
-        connection = sqlite.connect(self.normalize_path(self._main_prefix + '.kobo/KoboReader.sqlite'))
+        with closing(sqlite.connect(self.normalize_path(self._main_prefix +
+            '.kobo/KoboReader.sqlite'))) as connection:

-        # return bytestrings if the content cannot the decoded as unicode
-        connection.text_factory = lambda x: unicode(x, "utf-8", "ignore")
+            # return bytestrings if the content cannot the decoded as unicode
+            connection.text_factory = lambda x: unicode(x, "utf-8", "ignore")

-        cursor = connection.cursor()
-        t = (ContentID,)
-        cursor.execute('select ImageID from content where ContentID = ?', t)
+            cursor = connection.cursor()
+            t = (ContentID,)
+            cursor.execute('select ImageID from content where ContentID = ?', t)

-        ImageID = None
-        for row in cursor:
-            # First get the ImageID to delete the images
-            ImageID = row[0]
-        cursor.close()
+            ImageID = None
+            for row in cursor:
+                # First get the ImageID to delete the images
+                ImageID = row[0]
+            cursor.close()

-        cursor = connection.cursor()
-        if ContentType == 6 and self.dbversion < 8:
-            # Delete the shortcover_pages first
-            cursor.execute('delete from shortcover_page where shortcoverid in (select ContentID from content where BookID = ?)', t)
+            cursor = connection.cursor()
+            if ContentType == 6 and self.dbversion < 8:
+                # Delete the shortcover_pages first
+                cursor.execute('delete from shortcover_page where shortcoverid in (select ContentID from content where BookID = ?)', t)

-        #Delete the volume_shortcovers second
-        cursor.execute('delete from volume_shortcovers where volumeid = ?', t)
+            #Delete the volume_shortcovers second
+            cursor.execute('delete from volume_shortcovers where volumeid = ?', t)

-        # Delete the rows from content_keys
-        if self.dbversion >= 8:
-            cursor.execute('delete from content_keys where volumeid = ?', t)
+            # Delete the rows from content_keys
+            if self.dbversion >= 8:
+                cursor.execute('delete from content_keys where volumeid = ?', t)

-        # Delete the chapters associated with the book next
-        t = (ContentID,)
-        # Kobo does not delete the Book row (ie the row where the BookID is Null)
-        # The next server sync should remove the row
-        cursor.execute('delete from content where BookID = ?', t)
-        try:
-            cursor.execute('update content set ReadStatus=0, FirstTimeReading = \'true\', ___PercentRead=0, ___ExpirationStatus=3 ' \
-                'where BookID is Null and ContentID =?',t)
-        except Exception as e:
-            if 'no such column' not in str(e):
-                raise
-            cursor.execute('update content set ReadStatus=0, FirstTimeReading = \'true\', ___PercentRead=0 ' \
-                'where BookID is Null and ContentID =?',t)
+            # Delete the chapters associated with the book next
+            t = (ContentID,)
+            # Kobo does not delete the Book row (ie the row where the BookID is Null)
+            # The next server sync should remove the row
+            cursor.execute('delete from content where BookID = ?', t)
+            try:
+                cursor.execute('update content set ReadStatus=0, FirstTimeReading = \'true\', ___PercentRead=0, ___ExpirationStatus=3 ' \
+                    'where BookID is Null and ContentID =?',t)
+            except Exception as e:
+                if 'no such column' not in str(e):
+                    raise
+                cursor.execute('update content set ReadStatus=0, FirstTimeReading = \'true\', ___PercentRead=0 ' \
+                    'where BookID is Null and ContentID =?',t)


-        connection.commit()
+            connection.commit()

-        cursor.close()
-        if ImageID == None:
-            print "Error condition ImageID was not found"
-            print "You likely tried to delete a book that the kobo has not yet added to the database"
+            cursor.close()
+            if ImageID == None:
+                print "Error condition ImageID was not found"
+                print "You likely tried to delete a book that the kobo has not yet added to the database"

-        connection.close()
        # If all this succeeds we need to delete the images files via the ImageID
        return ImageID

@ -664,50 +670,49 @@ class KOBO(USBMS):
        # Needs to be outside books collection as in the case of removing
        # the last book from the collection the list of books is empty
        # and the removal of the last book would not occur
-        connection = sqlite.connect(self.normalize_path(self._main_prefix + '.kobo/KoboReader.sqlite'))
+        with closing(sqlite.connect(self.normalize_path(self._main_prefix +
+            '.kobo/KoboReader.sqlite'))) as connection:

-        # return bytestrings if the content cannot the decoded as unicode
-        connection.text_factory = lambda x: unicode(x, "utf-8", "ignore")
+            # return bytestrings if the content cannot the decoded as unicode
+            connection.text_factory = lambda x: unicode(x, "utf-8", "ignore")

-        if collections:
+            if collections:

-            # Need to reset the collections outside the particular loops
-            # otherwise the last item will not be removed
-            self.reset_readstatus(connection, oncard)
-            if self.dbversion >= 14:
-                self.reset_favouritesindex(connection, oncard)
+                # Need to reset the collections outside the particular loops
+                # otherwise the last item will not be removed
+                self.reset_readstatus(connection, oncard)
+                if self.dbversion >= 14:
+                    self.reset_favouritesindex(connection, oncard)

-            # Process any collections that exist
-            for category, books in collections.items():
-                    debug_print("Category: ", category, " id = ", readstatuslist.get(category))
-                    for book in books:
-                        debug_print('    Title:', book.title, 'category: ', category)
-                        if category not in book.device_collections:
-                            book.device_collections.append(category)
+                # Process any collections that exist
+                for category, books in collections.items():
+                        debug_print("Category: ", category, " id = ", readstatuslist.get(category))
+                        for book in books:
+                            debug_print('    Title:', book.title, 'category: ', category)
+                            if category not in book.device_collections:
+                                book.device_collections.append(category)

-                        extension =  os.path.splitext(book.path)[1]
-                        ContentType = self.get_content_type_from_extension(extension) if extension != '' else self.get_content_type_from_path(book.path)
+                            extension =  os.path.splitext(book.path)[1]
+                            ContentType = self.get_content_type_from_extension(extension) if extension != '' else self.get_content_type_from_path(book.path)

-                        ContentID = self.contentid_from_path(book.path, ContentType)
+                            ContentID = self.contentid_from_path(book.path, ContentType)

-                        if category in readstatuslist.keys():
-                            # Manage ReadStatus
-                            self.set_readstatus(connection, ContentID, readstatuslist.get(category))
-                        if category == 'Shortlist' and self.dbversion >= 14:
-                            # Manage FavouritesIndex/Shortlist
-                            self.set_favouritesindex(connection, ContentID)
-                        if category in accessibilitylist.keys():
-                            # Do not manage the Accessibility List
-                            pass
-        else: # No collections
-            # Since no collections exist the ReadStatus needs to be reset to 0 (Unread)
-            debug_print("No Collections - reseting ReadStatus")
-            self.reset_readstatus(connection, oncard)
-            if self.dbversion >= 14:
-                debug_print("No Collections - reseting FavouritesIndex")
-                self.reset_favouritesindex(connection, oncard)
-
-        connection.close()
+                            if category in readstatuslist.keys():
+                                # Manage ReadStatus
+                                self.set_readstatus(connection, ContentID, readstatuslist.get(category))
+                            elif category == 'Shortlist' and self.dbversion >= 14:
+                                # Manage FavouritesIndex/Shortlist
+                                self.set_favouritesindex(connection, ContentID)
+                            elif category in accessibilitylist.keys():
+                                # Do not manage the Accessibility List
+                                pass
+            else: # No collections
+                # Since no collections exist the ReadStatus needs to be reset to 0 (Unread)
+                debug_print("No Collections - reseting ReadStatus")
+                self.reset_readstatus(connection, oncard)
+                if self.dbversion >= 14:
+                    debug_print("No Collections - reseting FavouritesIndex")
+                    self.reset_favouritesindex(connection, oncard)

 #        debug_print('Finished update_device_database_collections', collections_attributes)

@ -723,7 +728,7 @@ class KOBO(USBMS):
        opts = self.settings()
        if opts.extra_customization:
            collections = [x.lower().strip() for x in
-                    opts.extra_customization.split(',')]
+                    opts.extra_customization[self.OPT_COLLECTIONS].split(',')]
        else:
            collections = []

--- a/src/calibre/devices/misc.py
+++ b/src/calibre/devices/misc.py
@ -351,3 +351,29 @@ class MOOVYBOOK(USBMS):
    def get_main_ebook_dir(self, for_upload=False):
        return 'Books' if for_upload else self.EBOOK_DIR_MAIN

+class COBY(USBMS):
+
+    name           = 'COBY MP977 device interface'
+    gui_name       = 'COBY'
+    description    = _('Communicate with the COBY')
+    author         = 'Kovid Goyal'
+    supported_platforms = ['windows', 'osx', 'linux']
+
+    # Ordered list of supported formats
+    FORMATS     = ['epub', 'pdf']
+
+    VENDOR_ID   = [0x1e74]
+    PRODUCT_ID  = [0x7121]
+    BCD         = [0x02]
+    VENDOR_NAME = 'USB_2.0'
+    WINDOWS_MAIN_MEM = WINDOWS_CARD_A_MEM = 'MP977_DRIVER'
+
+    EBOOK_DIR_MAIN = ''
+
+    SUPPORTS_SUB_DIRS = False
+
+    def get_carda_ebook_dir(self, for_upload=False):
+        if for_upload:
+            return 'eBooks'
+        return self.EBOOK_DIR_CARD_A
+
--- a/src/calibre/devices/usbms/device.py
+++ b/src/calibre/devices/usbms/device.py
@ -1077,8 +1077,13 @@ class Device(DeviceConfig, DevicePlugin):
        settings = self.settings()
        template = self.save_template()
        if mdata.tags and _('News') in mdata.tags:
-            today = time.localtime()
-            template = "{title}_%d-%d-%d" % (today[0], today[1], today[2])
+            try:
+                p = mdata.pubdate
+                date  = (p.year, p.month, p.day)
+            except:
+                today = time.localtime()
+                date = (today[0], today[1], today[2])
+            template = "{title}_%d-%d-%d" % date
        use_subdirs = self.SUPPORTS_SUB_DIRS and settings.use_subdirs

        fname = sanitize(fname)
--- a/src/calibre/devices/usbms/driver.py
+++ b/src/calibre/devices/usbms/driver.py
@ -94,11 +94,29 @@ class USBMS(CLI, Device):
        self.report_progress(1.0, _('Get device information...'))
        self.driveinfo = {}
        if self._main_prefix is not None:
-            self.driveinfo['main'] = self._update_driveinfo_file(self._main_prefix, 'main')
-        if self._card_a_prefix is not None:
-            self.driveinfo['A'] = self._update_driveinfo_file(self._card_a_prefix, 'A')
-        if self._card_b_prefix is not None:
-            self.driveinfo['B'] = self._update_driveinfo_file(self._card_b_prefix, 'B')
+            try:
+                self.driveinfo['main'] = self._update_driveinfo_file(self._main_prefix, 'main')
+            except (IOError, OSError) as e:
+                raise IOError(_('Failed to access files in the main memory of'
+                        ' your device. You should contact the device'
+                        ' manufacturer for support. Common fixes are:'
+                        ' try a different USB cable/USB port on your computer.'
+                        ' If you device has a "Reset to factory defaults" type'
+                        ' of setting somewhere, use it. Underlying error: %s')
+                        % e)
+        try:
+            if self._card_a_prefix is not None:
+                self.driveinfo['A'] = self._update_driveinfo_file(self._card_a_prefix, 'A')
+            if self._card_b_prefix is not None:
+                self.driveinfo['B'] = self._update_driveinfo_file(self._card_b_prefix, 'B')
+        except (IOError, OSError) as e:
+            raise IOError(_('Failed to access files on the SD card in your'
+                ' device. This can happen for many reasons. The SD card may be'
+                ' corrupted, it may be too large for your device, it may be'
+                ' write-protected, etc. Try a different SD card, or reformat'
+                ' your SD card using the FAT32 filesystem. Also make sure'
+                ' there are not too many files in the root of your SD card.'
+                ' Underlying error: %s') % e)
        return (self.get_gui_name(), '', '', '', self.driveinfo)

    def set_driveinfo_name(self, location_code, name):
--- a/src/calibre/ebooks/init.py
+++ b/src/calibre/ebooks/init.py
@ -159,7 +159,7 @@ def normalize(x):
    return x

 def calibre_cover(title, author_string, series_string=None,
-        output_format='jpg', title_size=46, author_size=36):
+        output_format='jpg', title_size=46, author_size=36, logo_path=None):
    title = normalize(title)
    author_string = normalize(author_string)
    series_string = normalize(series_string)
@ -167,7 +167,9 @@ def calibre_cover(title, author_string, series_string=None,
    lines = [TextLine(title, title_size), TextLine(author_string, author_size)]
    if series_string:
        lines.append(TextLine(series_string, author_size))
-    return create_cover_page(lines, I('library.png'), output_format='jpg')
+    if logo_path is None:
+        logo_path = I('library.png')
+    return create_cover_page(lines, logo_path, output_format='jpg')

 UNIT_RE = re.compile(r'^(-*[0-9]*[.]?[0-9]*)\s*(%|em|ex|en|px|mm|cm|in|pt|pc)$')

--- a/src/calibre/ebooks/chardet/init.py
+++ b/src/calibre/ebooks/chardet/init.py
@ -38,8 +38,12 @@ ENCODING_PATS = [
 ENTITY_PATTERN = re.compile(r'&(\S+?);')

 def strip_encoding_declarations(raw):
+    limit = 50*1024
    for pat in ENCODING_PATS:
-        raw = pat.sub('', raw)
+        prefix = raw[:limit]
+        suffix = raw[limit:]
+        prefix = pat.sub('', prefix)
+        raw = prefix + suffix
    return raw

 def substitute_entites(raw):
--- a/src/calibre/ebooks/conversion/cli.py
+++ b/src/calibre/ebooks/conversion/cli.py
@ -137,7 +137,9 @@ def add_pipeline_options(parser, plumber):
                      'extra_css', 'smarten_punctuation',
                      'margin_top', 'margin_left', 'margin_right',
                      'margin_bottom', 'change_justification',
-                      'insert_blank_line', 'remove_paragraph_spacing','remove_paragraph_spacing_indent_size',
+                      'insert_blank_line', 'insert_blank_line_size',
+                      'remove_paragraph_spacing',
+                      'remove_paragraph_spacing_indent_size',
                      'asciiize',
                  ]
                  ),
@ -208,12 +210,13 @@ def add_pipeline_options(parser, plumber):
            if rec.level < rec.HIGH:
                option_recommendation_to_cli_option(add_option, rec)

-    parser.add_option('--list-recipes', default=False, action='store_true',
-            help=_('List builtin recipes'))
-
 def option_parser():
-    return OptionParser(usage=USAGE)
-
+    parser = OptionParser(usage=USAGE)
+    parser.add_option('--list-recipes', default=False, action='store_true',
+            help=_('List builtin recipe names. You can create an ebook from '
+                'a builtin recipe like this: ebook-convert "Recipe Name.recipe" '
+                'output.epub'))
+    return parser

 class ProgressBar(object):

--- a/src/calibre/ebooks/conversion/plumber.py
+++ b/src/calibre/ebooks/conversion/plumber.py
@ -366,9 +366,9 @@ OptionRecommendation(name='remove_paragraph_spacing',

 OptionRecommendation(name='remove_paragraph_spacing_indent_size',
        recommended_value=1.5, level=OptionRecommendation.LOW,
-        help=_('When calibre removes inter paragraph spacing, it automatically '
+        help=_('When calibre removes blank lines between paragraphs, it automatically '
            'sets a paragraph indent, to ensure that paragraphs can be easily '
-            'distinguished. This option controls the width of that indent.')
+            'distinguished. This option controls the width of that indent (in em).')
        ),

 OptionRecommendation(name='prefer_metadata_cover',
@ -384,6 +384,13 @@ OptionRecommendation(name='insert_blank_line',
            )
        ),

+OptionRecommendation(name='insert_blank_line_size',
+        recommended_value=0.5, level=OptionRecommendation.LOW,
+        help=_('Set the height of the inserted blank lines (in em).'
+            ' The height of the lines between paragraphs will be twice the value'
+            ' set here.')
+        ),
+
 OptionRecommendation(name='remove_first_image',
        recommended_value=False, level=OptionRecommendation.LOW,
        help=_('Remove the first image from the input ebook. Useful if the '
@ -602,7 +609,7 @@ OptionRecommendation(name='sr3_replace',
        input_fmt = os.path.splitext(self.input)[1]
        if not input_fmt:
            raise ValueError('Input file must have an extension')
-        input_fmt = input_fmt[1:].lower()
+        input_fmt = input_fmt[1:].lower().replace('original_', '')
        self.archive_input_tdir = None
        if input_fmt in ARCHIVE_FMTS:
            self.log('Processing archive...')
@ -1048,6 +1055,7 @@ OptionRecommendation(name='sr3_replace',
        with self.output_plugin:
            self.output_plugin.convert(self.oeb, self.output, self.input_plugin,
                self.opts, self.log)
+        self.oeb.clean_temp_files()
        self.ui_reporter(1.)
        run_plugins_on_postprocess(self.output, self.output_fmt)

--- a/src/calibre/ebooks/conversion/preprocess.py
+++ b/src/calibre/ebooks/conversion/preprocess.py
@ -303,6 +303,9 @@ class CSSPreProcessor(object):
 class HTMLPreProcessor(object):

    PREPROCESS = [
+                  # Remove huge block of contiguous spaces as they slow down
+                  # the following regexes pretty badly
+                  (re.compile(r'\s{10000,}'), lambda m: ''),
                  # Some idiotic HTML generators (Frontpage I'm looking at you)
                  # Put all sorts of crap into <head>. This messes up lxml
                  (re.compile(r'<head[^>]*>\n*(.*?)\n*</head>', re.IGNORECASE|re.DOTALL),
--- a/src/calibre/ebooks/htmlz/input.py
+++ b/src/calibre/ebooks/htmlz/input.py
@ -8,7 +8,7 @@ __docformat__ = 'restructuredtext en'

 import os

-from calibre import guess_type, walk
+from calibre import guess_type
 from calibre.customize.conversion import InputFormatPlugin
 from calibre.ebooks.chardet import xml_to_unicode
 from calibre.ebooks.metadata.opf2 import OPF
@ -25,16 +25,50 @@ class HTMLZInput(InputFormatPlugin):
                accelerators):
        self.log = log
        html = u''
+        top_levels = []

        # Extract content from zip archive.
        zf = ZipFile(stream)
        zf.extractall()

-        for x in walk('.'):
+        # Find the HTML file in the archive. It needs to be
+        # top level.
+        index = u''
+        multiple_html = False
+        # Get a list of all top level files in the archive.
+        for x in os.listdir('.'):
+            if os.path.isfile(x):
+                top_levels.append(x)
+        # Try to find an index. file.
+        for x in top_levels:
+            if x.lower() in ('index.html', 'index.xhtml', 'index.htm'):
+                index = x
+                break
+        # Look for multiple HTML files in the archive. We look at the
+        # top level files only as only they matter in HTMLZ.
+        for x in top_levels:
            if os.path.splitext(x)[1].lower() in ('.html', '.xhtml', '.htm'):
-                with open(x, 'rb') as tf:
-                    html = tf.read()
-                    break
+                # Set index to the first HTML file found if it's not
+                # called index.
+                if not index:
+                    index = x
+                else:
+                    multiple_html = True
+        # Warn the user if there multiple HTML file in the archive. HTMLZ
+        # supports a single HTML file. A conversion with a multiple HTML file
+        # HTMLZ archive probably won't turn out as the user expects. With
+        # Multiple HTML files ZIP input should be used in place of HTMLZ.
+        if multiple_html:
+            log.warn(_('Multiple HTML files found in the archive. Only %s will be used.') % index)
+
+        if index:
+            with open(index, 'rb') as tf:
+                html = tf.read()
+        else:
+            raise Exception(_('No top level HTML file found.'))
+
+        if not html:
+            raise Exception(_('Top level HTML file %s is empty') % index)

        # Encoding
        if options.input_encoding:
@ -75,7 +109,7 @@ class HTMLZInput(InputFormatPlugin):
        # Get the cover path from the OPF.
        cover_path = None
        opf = None
-        for x in walk('.'):
+        for x in top_levels:
            if os.path.splitext(x)[1].lower() in ('.opf'):
                opf = x
                break
--- a/src/calibre/ebooks/metadata/book/base.py
+++ b/src/calibre/ebooks/metadata/book/base.py
@ -742,7 +742,7 @@ class Metadata(object):
        ans += [('ISBN', unicode(self.isbn))]
        ans += [(_('Tags'), u', '.join([unicode(t) for t in self.tags]))]
        if self.series:
-            ans += [_('Series'), unicode(self.series) + ' #%s'%self.format_series_index()]
+            ans += [(_('Series'), unicode(self.series) + ' #%s'%self.format_series_index())]
        ans += [(_('Language'), unicode(self.language))]
        if self.timestamp is not None:
            ans += [(_('Timestamp'), unicode(self.timestamp.isoformat(' ')))]
--- a/src/calibre/ebooks/metadata/fb2.py
+++ b/src/calibre/ebooks/metadata/fb2.py
@ -24,10 +24,9 @@ XPath = partial(etree.XPath, namespaces=NAMESPACES)
 tostring = partial(etree.tostring, method='text', encoding=unicode)

 def get_metadata(stream):
-    """ Return fb2 metadata as a L{MetaInformation} object """
+    ''' Return fb2 metadata as a L{MetaInformation} object '''

    root = _get_fbroot(stream)
-
    book_title = _parse_book_title(root)
    authors = _parse_authors(root)

@ -166,7 +165,7 @@ def _parse_tags(root, mi):
            break

 def _parse_series(root, mi):
-    #calibri supports only 1 series: use the 1-st one
+    # calibri supports only 1 series: use the 1-st one
    # pick up sequence but only from 1 secrion in prefered order
    # except <src-title-info>
    xp_ti = '//fb2:title-info/fb2:sequence[1]'
@ -181,11 +180,12 @@ def _parse_series(root, mi):
 def _parse_isbn(root, mi):
    # some people try to put several isbn in this field, but it is not allowed.  try to stick to the 1-st one in this case
    isbn = XPath('normalize-space(//fb2:publish-info/fb2:isbn/text())')(root)
-    # some people try to put several isbn in this field, but it is not allowed.  try to stick to the 1-st one in this case
-    if ',' in isbn:
-        isbn = isbn[:isbn.index(',')]
-    if check_isbn(isbn):
-        mi.isbn = isbn
+    if isbn:
+        # some people try to put several isbn in this field, but it is not allowed.  try to stick to the 1-st one in this case
+        if ',' in isbn:
+            isbn = isbn[:isbn.index(',')]
+        if check_isbn(isbn):
+            mi.isbn = isbn

 def _parse_comments(root, mi):
    # pick up annotation but only from 1 secrion <title-info>;  fallback: <src-title-info>
@ -232,4 +232,3 @@ def _get_fbroot(stream):
    raw = xml_to_unicode(raw, strip_encoding_pats=True)[0]
    root = etree.fromstring(raw, parser=parser)
    return root
-
--- a/src/calibre/ebooks/metadata/opf2.py
+++ b/src/calibre/ebooks/metadata/opf2.py
@ -22,6 +22,7 @@ from calibre.utils.date import parse_date, isoformat
 from calibre.utils.localization import get_lang
 from calibre import prints, guess_type
 from calibre.utils.cleantext import clean_ascii_chars
+from calibre.utils.config import tweaks

 class Resource(object): # {{{
    '''
@ -527,7 +528,12 @@ class OPF(object): # {{{
    category        = MetadataField('type')
    rights          = MetadataField('rights')
    series          = MetadataField('series', is_dc=False)
-    series_index    = MetadataField('series_index', is_dc=False, formatter=float, none_is=1)
+    if tweaks['use_series_auto_increment_tweak_when_importing']:
+        series_index    = MetadataField('series_index', is_dc=False,
+                                        formatter=float, none_is=None)
+    else:
+        series_index    = MetadataField('series_index', is_dc=False,
+                                        formatter=float, none_is=1)
    title_sort      = TitleSortField('title_sort', is_dc=False)
    rating          = MetadataField('rating', is_dc=False, formatter=int)
    pubdate         = MetadataField('date', formatter=parse_date,
@ -1024,8 +1030,10 @@ class OPF(object): # {{{
            attrib = attrib or {}
            attrib['name'] = 'calibre:' + name
            name = '{%s}%s' % (self.NAMESPACES['opf'], 'meta')
+        nsmap = dict(self.NAMESPACES)
+        del nsmap['opf']
        elem = etree.SubElement(self.metadata, name, attrib=attrib,
-                                nsmap=self.NAMESPACES)
+                                nsmap=nsmap)
        elem.tail = '\n'
        return elem

--- a/src/calibre/ebooks/metadata/sources/identify.py
+++ b/src/calibre/ebooks/metadata/sources/identify.py
@ -22,6 +22,7 @@ from calibre.ebooks.metadata.book.base import Metadata
 from calibre.utils.date import utc_tz, as_utc
 from calibre.utils.html2text import html2text
 from calibre.utils.icu import lower
+from calibre.utils.date import UNDEFINED_DATE

 # Download worker {{{
 class Worker(Thread):
@ -490,6 +491,8 @@ def identify(log, abort, # {{{
    max_tags = msprefs['max_tags']
    for r in results:
        r.tags = r.tags[:max_tags]
+        if getattr(r.pubdate, 'year', 2000) <= UNDEFINED_DATE.year:
+            r.pubdate = None

    if msprefs['swap_author_names']:
        for r in results:
--- a/src/calibre/ebooks/metadata/sources/isbndb.py
+++ b/src/calibre/ebooks/metadata/sources/isbndb.py
@ -151,7 +151,7 @@ class ISBNDB(Source):

        bl = feed.find('BookList')
        if bl is None:
-            err = tostring(etree.find('errormessage'))
+            err = tostring(feed.find('errormessage'))
            raise ValueError('ISBNDb query failed:' + err)
        total_results = int(bl.get('total_results'))
        shown_results = int(bl.get('shown_results'))
--- a/src/calibre/ebooks/mobi/debug.py
+++ b/src/calibre/ebooks/mobi/debug.py
@ -7,10 +7,15 @@ __license__   = 'GPL v3'
 __copyright__ = '2011, Kovid Goyal <kovid@kovidgoyal.net>'
 __docformat__ = 'restructuredtext en'

-import struct, datetime
+import struct, datetime, sys, os, shutil
+from collections import OrderedDict, defaultdict
 from calibre.utils.date import utc_tz
 from calibre.ebooks.mobi.langcodes import main_language, sub_language
+from calibre.ebooks.mobi.utils import (decode_hex_number, decint,
+        get_trailing_data, decode_tbs)
+from calibre.utils.magick.draw import identify_data

+# PalmDB {{{
 class PalmDOCAttributes(object):

    class Attr(object):
@ -68,7 +73,7 @@ class PalmDB(object):
        self.ident = self.type + self.creator
        if self.ident not in (b'BOOKMOBI', b'TEXTREAD'):
            raise ValueError('Unknown book ident: %r'%self.ident)
-        self.uid_seed = self.raw[68:72]
+        self.uid_seed, = struct.unpack(b'>I', self.raw[68:72])
        self.next_rec_list_id = self.raw[72:76]

        self.number_of_records, = struct.unpack(b'>H', self.raw[76:78])
@ -94,8 +99,9 @@ class PalmDB(object):
        ans.append('Number of records: %s'%self.number_of_records)

        return '\n'.join(ans)
+# }}}

-class Record(object):
+class Record(object): # {{{

    def __init__(self, raw, header):
        self.offset, self.flags, self.uid = header
@ -103,9 +109,11 @@ class Record(object):

    @property
    def header(self):
-        return 'Offset: %d Flags: %d UID: %d'%(self.offset, self.flags,
-                self.uid)
+        return 'Offset: %d Flags: %d UID: %d First 4 bytes: %r Size: %d'%(self.offset, self.flags,
+                self.uid, self.raw[:4], len(self.raw))
+# }}}

+# EXTH {{{
 class EXTHRecord(object):

    def __init__(self, type_, data):
@ -174,6 +182,7 @@ class EXTHHeader(object):
        self.records = []
        for i in xrange(self.count):
            pos = self.read_record(pos)
+        self.records.sort(key=lambda x:x.type)

    def read_record(self, pos):
        type_, length = struct.unpack(b'>II', self.raw[pos:pos+8])
@ -189,9 +198,9 @@ class EXTHHeader(object):
        for r in self.records:
            ans.append(str(r))
        return '\n'.join(ans)
+# }}}

-
-class MOBIHeader(object):
+class MOBIHeader(object): # {{{

    def __init__(self, record0):
        self.raw = record0.raw
@ -206,10 +215,11 @@ class MOBIHeader(object):
        self.number_of_text_records, self.text_record_size = \
                struct.unpack(b'>HH', self.raw[8:12])
        self.encryption_type_raw, = struct.unpack(b'>H', self.raw[12:14])
-        self.encryption_type = {0: 'No encryption',
+        self.encryption_type = {
+                0: 'No encryption',
                1: 'Old mobipocket encryption',
-                2:'Mobipocket encryption'}.get(self.encryption_type_raw,
-                repr(self.encryption_type_raw))
+                2: 'Mobipocket encryption'
+            }.get(self.encryption_type_raw, repr(self.encryption_type_raw))
        self.unknown = self.raw[14:16]

        self.identifier = self.raw[16:20]
@ -271,6 +281,8 @@ class MOBIHeader(object):
            self.drm_flags = bin(struct.unpack(b'>I', self.raw[176:180])[0])
        self.has_extra_data_flags = self.length >= 232 and len(self.raw) >= 232+16
        self.has_fcis_flis = False
+        self.has_multibytes = self.has_indexing_bytes = self.has_uncrossable_breaks = False
+        self.extra_data_flags = 0
        if self.has_extra_data_flags:
            self.unknown4 = self.raw[180:192]
            self.first_content_record, self.last_content_record = \
@ -279,9 +291,17 @@ class MOBIHeader(object):
            (self.fcis_number, self.fcis_count, self.flis_number,
                    self.flis_count) = struct.unpack(b'>IIII',
                            self.raw[200:216])
-            self.unknown6 = self.raw[216:240]
-            self.extra_data_flags = bin(struct.unpack(b'>I',
-                self.raw[240:244])[0])
+            self.unknown6 = self.raw[216:224]
+            self.srcs_record_index = struct.unpack(b'>I',
+                self.raw[224:228])[0]
+            self.num_srcs_records = struct.unpack(b'>I',
+                self.raw[228:232])[0]
+            self.unknown7 = self.raw[232:240]
+            self.extra_data_flags = struct.unpack(b'>I',
+                self.raw[240:244])[0]
+            self.has_multibytes = bool(self.extra_data_flags & 0b1)
+            self.has_indexing_bytes = bool(self.extra_data_flags & 0b10)
+            self.has_uncrossable_breaks = bool(self.extra_data_flags & 0b100)
            self.primary_index_record, = struct.unpack(b'>I',
                    self.raw[244:248])

@ -311,7 +331,8 @@ class MOBIHeader(object):
        ans.append('Secondary index record: %d (null val: %d)'%(
            self.secondary_index_record, 0xffffffff))
        ans.append('Reserved2: %r'%self.reserved2)
-        ans.append('First non-book record: %d'% self.first_non_book_record)
+        ans.append('First non-book record (null value: %d): %d'%(0xffffffff,
+            self.first_non_book_record))
        ans.append('Full name offset: %d'%self.fullname_offset)
        ans.append('Full name length: %d bytes'%self.fullname_length)
        ans.append('Langcode: %r'%self.locale_raw)
@ -324,7 +345,7 @@ class MOBIHeader(object):
        ans.append('Huffman record offset: %d'%self.huffman_record_offset)
        ans.append('Huffman record count: %d'%self.huffman_record_count)
        ans.append('Unknown2: %r'%self.unknown2)
-        ans.append('EXTH flags: %r (%s)'%(self.exth_flags, self.has_exth))
+        ans.append('EXTH flags: %s (%s)'%(bin(self.exth_flags)[2:], self.has_exth))
        if self.has_drm_data:
            ans.append('Unknown3: %r'%self.unknown3)
            ans.append('DRM Offset: %s'%self.drm_offset)
@ -341,8 +362,15 @@ class MOBIHeader(object):
            ans.append('FLIS number: %d'% self.flis_number)
            ans.append('FLIS count: %d'% self.flis_count)
            ans.append('Unknown6: %r'% self.unknown6)
-            ans.append('Extra data flags: %r'%self.extra_data_flags)
-            ans.append('Primary index record: %d'%self.primary_index_record)
+            ans.append('SRCS record index: %d'%self.srcs_record_index)
+            ans.append('Number of SRCS records?: %d'%self.num_srcs_records)
+            ans.append('Unknown7: %r'%self.unknown7)
+            ans.append(('Extra data flags: %s (has multibyte: %s) '
+                '(has indexing: %s) (has uncrossable breaks: %s)')%(
+                    bin(self.extra_data_flags), self.has_multibytes,
+                    self.has_indexing_bytes, self.has_uncrossable_breaks ))
+            ans.append('Primary index record (null value: %d): %d'%(0xffffffff,
+                self.primary_index_record))

        ans = '\n'.join(ans)

@ -355,8 +383,706 @@ class MOBIHeader(object):

        ans += '\nRecord 0 length: %d'%len(self.raw)
        return ans
+# }}}

-class MOBIFile(object):
+class TagX(object): # {{{
+
+    def __init__(self, raw, control_byte_count):
+        self.tag = ord(raw[0])
+        self.num_values = ord(raw[1])
+        self.bitmask = ord(raw[2])
+        # End of file = 1 iff last entry
+        # When it is 1 all others are 0
+        self.eof = ord(raw[3])
+
+        self.is_eof = (self.eof == 1 and self.tag == 0 and self.num_values == 0
+                and self.bitmask == 0)
+
+    def __repr__(self):
+        return 'TAGX(tag=%02d, num_values=%d, bitmask=%r, eof=%d)' % (self.tag,
+                self.num_values, bin(self.bitmask), self.eof)
+    # }}}
+
+class IndexHeader(object): # {{{
+
+    def __init__(self, record):
+        self.record = record
+        raw = self.record.raw
+        #open('/t/index_header.bin', 'wb').write(raw)
+        if raw[:4] != b'INDX':
+            raise ValueError('Invalid Primary Index Record')
+
+        self.header_length, = struct.unpack('>I', raw[4:8])
+        self.unknown1 = raw[8:16]
+        self.index_type, = struct.unpack('>I', raw[16:20])
+        self.index_type_desc = {0: 'normal', 2:
+                'inflection', 6: 'calibre'}.get(self.index_type, 'unknown')
+        self.idxt_start, = struct.unpack('>I', raw[20:24])
+        self.index_count, = struct.unpack('>I', raw[24:28])
+        self.index_encoding_num, = struct.unpack('>I', raw[28:32])
+        self.index_encoding = {65001: 'utf-8', 1252:
+                'cp1252'}.get(self.index_encoding_num, 'unknown')
+        if self.index_encoding == 'unknown':
+            raise ValueError(
+                'Unknown index encoding: %d'%self.index_encoding_num)
+        self.possibly_language = raw[32:36]
+        self.num_index_entries, = struct.unpack('>I', raw[36:40])
+        self.ordt_start, = struct.unpack('>I', raw[40:44])
+        self.ligt_start, = struct.unpack('>I', raw[44:48])
+        self.num_of_ligt_entries, = struct.unpack('>I', raw[48:52])
+        self.num_of_cncx_blocks, = struct.unpack('>I', raw[52:56])
+        self.unknown2 = raw[56:180]
+        self.tagx_offset, = struct.unpack(b'>I', raw[180:184])
+        if self.tagx_offset != self.header_length:
+            raise ValueError('TAGX offset and header length disagree')
+        self.unknown3 = raw[184:self.header_length]
+
+        tagx = raw[self.header_length:]
+        if not tagx.startswith(b'TAGX'):
+            raise ValueError('Invalid TAGX section')
+        self.tagx_header_length, = struct.unpack('>I', tagx[4:8])
+        self.tagx_control_byte_count, = struct.unpack('>I', tagx[8:12])
+        tag_table = tagx[12:self.tagx_header_length]
+        if len(tag_table) % 4 != 0:
+            raise ValueError('Invalid Tag table')
+        num_tagx_entries = len(tag_table) // 4
+        self.tagx_entries = []
+        for i in range(num_tagx_entries):
+            self.tagx_entries.append(TagX(tag_table[i*4:(i+1)*4],
+                self.tagx_control_byte_count))
+        if self.tagx_entries and not self.tagx_entries[-1].is_eof:
+            raise ValueError('TAGX last entry is not EOF')
+        self.tagx_entries = self.tagx_entries[:-1]
+
+        idxt0_pos = self.header_length+self.tagx_header_length
+        last_num, consumed = decode_hex_number(raw[idxt0_pos:])
+        count_pos = idxt0_pos + consumed
+        self.ncx_count, = struct.unpack(b'>H', raw[count_pos:count_pos+2])
+
+        if last_num != self.ncx_count - 1:
+            raise ValueError('Last id number in the NCX != NCX count - 1')
+        # There may be some alignment zero bytes between the end of the idxt0
+        # and self.idxt_start
+
+        idxt = raw[self.idxt_start:]
+        if idxt[:4] != b'IDXT':
+            raise ValueError('Invalid IDXT header')
+        length_check, = struct.unpack(b'>H', idxt[4:6])
+        if length_check != self.header_length + self.tagx_header_length:
+            raise ValueError('Length check failed')
+
+    def __str__(self):
+        ans = ['*'*20 + ' Index Header '+ '*'*20]
+        a = ans.append
+        def u(w):
+            a('Unknown: %r (%d bytes) (All zeros: %r)'%(w,
+                len(w), not bool(w.replace(b'\0', b'')) ))
+
+        a('Header length: %d'%self.header_length)
+        u(self.unknown1)
+        a('Index Type: %s (%d)'%(self.index_type_desc, self.index_type))
+        a('Offset to IDXT start: %d'%self.idxt_start)
+        a('Number of index records: %d'%self.index_count)
+        a('Index encoding: %s (%d)'%(self.index_encoding,
+                self.index_encoding_num))
+        a('Unknown (possibly language?): %r'%(self.possibly_language))
+        a('Number of index entries: %d'% self.num_index_entries)
+        a('ORDT start: %d'%self.ordt_start)
+        a('LIGT start: %d'%self.ligt_start)
+        a('Number of LIGT entries: %d'%self.num_of_ligt_entries)
+        a('Number of cncx blocks: %d'%self.num_of_cncx_blocks)
+        u(self.unknown2)
+        a('TAGX offset: %d'%self.tagx_offset)
+        u(self.unknown3)
+        a('\n\n')
+        a('*'*20 + ' TAGX Header (%d bytes)'%self.tagx_header_length+ '*'*20)
+        a('Header length: %d'%self.tagx_header_length)
+        a('Control byte count: %d'%self.tagx_control_byte_count)
+        for i in self.tagx_entries:
+            a('\t' + repr(i))
+        a('Number of entries in the NCX: %d'% self.ncx_count)
+
+        return '\n'.join(ans)
+    # }}}
+
+class Tag(object): # {{{
+
+    '''
+    Index entries are a collection of tags. Each tag is represented by this
+    class.
+    '''
+
+    TAG_MAP = {
+            1: ('offset', 'Offset in HTML'),
+            2: ('size', 'Size in HTML'),
+            3: ('label_offset', 'Offset to label in CNCX'),
+            4: ('depth', 'Depth of this entry in TOC'),
+
+            # The remaining tag types have to be interpreted subject to the type
+            # of index entry they are present in
+    }
+
+    INTERPRET_MAP = {
+            'subchapter': {
+                    5  : ('Parent chapter index', 'parent_index')
+            },
+
+            'article'   : {
+                    5  : ('Class offset in cncx', 'class_offset'),
+                    21 : ('Parent section index', 'parent_index'),
+                    22 : ('Description offset in cncx', 'desc_offset'),
+                    23 : ('Author offset in cncx', 'author_offset'),
+            },
+
+            'chapter_with_subchapters' : {
+                    22 : ('First subchapter index', 'first_child_index'),
+                    23 : ('Last subchapter index', 'last_child_index'),
+            },
+
+            'periodical' : {
+                    5  : ('Class offset in cncx', 'class_offset'),
+                    22 : ('First section index', 'first_child_index'),
+                    23 : ('Last section index', 'last_child_index'),
+            },
+
+            'section' : {
+                    5  : ('Class offset in cncx', 'class_offset'),
+                    21 : ('Periodical index', 'parent_index'),
+                    22 : ('First article index', 'first_child_index'),
+                    23 : ('Last article index', 'last_child_index'),
+            },
+    }
+
+
+    def __init__(self, tagx, vals, entry_type, cncx):
+        self.value = vals if len(vals) > 1 else vals[0]
+        self.entry_type = entry_type
+        self.cncx_value = None
+        if tagx.tag in self.TAG_MAP:
+            self.attr, self.desc = self.TAG_MAP[tagx.tag]
+        else:
+            try:
+                td = self.INTERPRET_MAP[entry_type]
+            except:
+                raise ValueError('Unknown entry type: %s'%entry_type)
+            try:
+                self.desc, self.attr = td[tagx.tag]
+            except:
+                raise ValueError('Unknown tag: %d for entry type: %s'%(
+                    tagx.tag, entry_type))
+        if '_offset' in self.attr:
+            self.cncx_value = cncx[self.value]
+
+    def __str__(self):
+        if self.cncx_value is not None:
+            return '%s : %r [%r]'%(self.desc, self.value, self.cncx_value)
+        return '%s : %r'%(self.desc, self.value)
+
+# }}}
+
+class IndexEntry(object): # {{{
+
+    '''
+    The index is made up of entries, each of which is represented by an
+    instance of this class. Index entries typically point to offsets int eh
+    HTML, specify HTML sizes and point to text strings in the CNCX that are
+    used in the navigation UI.
+    '''
+
+    TYPES = {
+            # Present in book type files
+            0x0f : 'chapter',
+            0x6f : 'chapter_with_subchapters',
+            0x1f : 'subchapter',
+            # Present in periodicals
+            0xdf : 'periodical',
+            0xff : 'section',
+            0x3f : 'article',
+    }
+
+    def __init__(self, ident, entry_type, raw, cncx, tagx_entries, flags=0):
+        self.index = ident
+        self.raw = raw
+        self.tags = []
+        self.entry_type_raw = entry_type
+        self.byte_size = len(raw)
+
+        orig_raw = raw
+
+        try:
+            self.entry_type = self.TYPES[entry_type]
+        except KeyError:
+            raise ValueError('Unknown Index Entry type: %s'%hex(entry_type))
+
+        expected_tags = [tag for tag in tagx_entries if tag.bitmask &
+                entry_type]
+
+        for tag in expected_tags:
+            vals = []
+            for i in range(tag.num_values):
+                if not raw:
+                    raise ValueError('Index entry does not match TAGX header')
+                val, consumed = decint(raw)
+                raw = raw[consumed:]
+                vals.append(val)
+            self.tags.append(Tag(tag, vals, self.entry_type, cncx))
+
+        if flags & 0b10:
+            # Look for optional description and author
+            desc_tag = [t for t in tagx_entries if t.tag == 22]
+            if desc_tag and raw:
+                val, consumed = decint(raw)
+                raw = raw[consumed:]
+                if val:
+                    self.tags.append(Tag(desc_tag[0], [val], self.entry_type,
+                        cncx))
+        if flags & 0b100:
+            aut_tag = [t for t in tagx_entries if t.tag == 23]
+            if aut_tag and raw:
+                val, consumed = decint(raw)
+                raw = raw[consumed:]
+                if val:
+                    self.tags.append(Tag(aut_tag[0], [val], self.entry_type,
+                        cncx))
+
+        self.consumed = len(orig_raw) - len(raw)
+        self.trailing_bytes = raw
+
+    @property
+    def label(self):
+        for tag in self.tags:
+            if tag.attr == 'label_offset':
+                return tag.cncx_value
+        return ''
+
+    @property
+    def offset(self):
+        for tag in self.tags:
+            if tag.attr == 'offset':
+                return tag.value
+        return 0
+
+    @property
+    def size(self):
+        for tag in self.tags:
+            if tag.attr == 'size':
+                return tag.value
+        return 0
+
+    @property
+    def depth(self):
+        for tag in self.tags:
+            if tag.attr == 'depth':
+                return tag.value
+        return 0
+
+    @property
+    def parent_index(self):
+        for tag in self.tags:
+            if tag.attr == 'parent_index':
+                return tag.value
+        return -1
+
+    @property
+    def first_child_index(self):
+        for tag in self.tags:
+            if tag.attr == 'first_child_index':
+                return tag.value
+        return -1
+
+    @property
+    def last_child_index(self):
+        for tag in self.tags:
+            if tag.attr == 'last_child_index':
+                return tag.value
+        return -1
+
+    def __str__(self):
+        ans = ['Index Entry(index=%s, entry_type=%s (%s), length=%d, byte_size=%d)'%(
+            self.index, self.entry_type, bin(self.entry_type_raw)[2:],
+            len(self.tags), self.byte_size)]
+        for tag in self.tags:
+            ans.append('\t'+str(tag))
+        if self.first_child_index != -1:
+            ans.append('\tNumber of children: %d'%(self.last_child_index -
+                self.first_child_index + 1))
+        if self.trailing_bytes:
+            ans.append('\tTrailing bytes: %r'%self.trailing_bytes)
+        return '\n'.join(ans)
+
+# }}}
+
+class IndexRecord(object): # {{{
+
+    '''
+    Represents all indexing information in the MOBI, apart from indexing info
+    in the trailing data of the text records.
+    '''
+
+    def __init__(self, record, index_header, cncx):
+        self.record = record
+        raw = self.record.raw
+
+        if raw[:4] != b'INDX':
+            raise ValueError('Invalid Primary Index Record')
+
+        u = struct.unpack
+
+        self.header_length, = u('>I', raw[4:8])
+        self.unknown1 = raw[8:12]
+        self.header_type, = u('>I', raw[12:16])
+        self.unknown2 = raw[16:20]
+        self.idxt_offset, self.idxt_count = u(b'>II', raw[20:28])
+        if self.idxt_offset < 192:
+            raise ValueError('Unknown Index record structure')
+        self.unknown3 = raw[28:36]
+        self.unknown4 = raw[36:192] # Should be 156 bytes
+
+        self.index_offsets = []
+        indices = raw[self.idxt_offset:]
+        if indices[:4] != b'IDXT':
+            raise ValueError("Invalid IDXT index table")
+        indices = indices[4:]
+        for i in range(self.idxt_count):
+            off, = u(b'>H', indices[i*2:(i+1)*2])
+            self.index_offsets.append(off-192)
+        rest = indices[(i+1)*2:]
+        if rest.replace(b'\0', ''): # There can be padding null bytes
+            raise ValueError('Extra bytes after IDXT table: %r'%rest)
+
+        indxt = raw[192:self.idxt_offset]
+        self.size_of_indxt_block = len(indxt)
+        self.indices = []
+        for i, off in enumerate(self.index_offsets):
+            try:
+                next_off = self.index_offsets[i+1]
+            except:
+                next_off = len(indxt)
+            index, consumed = decode_hex_number(indxt[off:])
+            entry_type = ord(indxt[off+consumed])
+            d, flags = 1, 0
+            if index_header.index_type == 6:
+                flags = ord(indxt[off+consumed+d])
+                d += 1
+            pos = off+consumed+d
+            self.indices.append(IndexEntry(index, entry_type,
+                indxt[pos:next_off], cncx,
+                index_header.tagx_entries, flags=flags))
+
+        rest = indxt[pos+self.indices[-1].consumed:]
+        if rest.replace(b'\0', ''): # There can be padding null bytes
+            raise ValueError('Extra bytes after IDXT table: %r'%rest)
+
+    def get_parent(self, index):
+        if index.depth < 1:
+            return None
+        parent_depth = index.depth - 1
+        for p in self.indices:
+            if p.depth != parent_depth:
+                continue
+
+
+    def __str__(self):
+        ans = ['*'*20 + ' Index Record (%d bytes) '%len(self.record.raw)+ '*'*20]
+        a = ans.append
+        def u(w):
+            a('Unknown: %r (%d bytes) (All zeros: %r)'%(w,
+                len(w), not bool(w.replace(b'\0', b'')) ))
+        a('Header length: %d'%self.header_length)
+        u(self.unknown1)
+        a('Unknown (header type? index record number? always 1?): %d'%self.header_type)
+        u(self.unknown2)
+        a('IDXT Offset (%d block size): %d'%(self.size_of_indxt_block,
+            self.idxt_offset))
+        a('IDXT Count: %d'%self.idxt_count)
+        u(self.unknown3)
+        u(self.unknown4)
+        a('Index offsets: %r'%self.index_offsets)
+        a('\nIndex Entries (%d entries):'%len(self.indices))
+        for entry in self.indices:
+            a(str(entry)+'\n')
+
+        return '\n'.join(ans)
+
+# }}}
+
+class CNCX(object) : # {{{
+
+    '''
+    Parses the records that contain the compiled NCX (all strings from the
+    NCX). Presents a simple offset : string mapping interface to access the
+    data.
+    '''
+
+    def __init__(self, records, codec):
+        self.records = OrderedDict()
+        record_offset = 0
+        for record in records:
+            raw = record.raw
+            pos = 0
+            while pos < len(raw):
+                length, consumed = decint(raw[pos:])
+                if length > 0:
+                    self.records[pos+record_offset] = raw[
+                            pos+consumed:pos+consumed+length].decode(codec)
+                pos += consumed+length
+            record_offset += 0x10000
+
+    def __getitem__(self, offset):
+        return self.records.get(offset)
+
+    def __str__(self):
+        ans = ['*'*20 + ' cncx (%d strings) '%len(self.records)+ '*'*20]
+        for k, v in self.records.iteritems():
+            ans.append('%10d : %s'%(k, v))
+        return '\n'.join(ans)
+
+
+# }}}
+
+class TextRecord(object): # {{{
+
+    def __init__(self, idx, record, extra_data_flags, decompress):
+        self.trailing_data, self.raw = get_trailing_data(record.raw, extra_data_flags)
+        raw_trailing_bytes = record.raw[len(self.raw):]
+        self.raw = decompress(self.raw)
+        if 0 in self.trailing_data:
+            self.trailing_data['multibyte_overlap'] = self.trailing_data.pop(0)
+        if 1 in self.trailing_data:
+            self.trailing_data['indexing'] = self.trailing_data.pop(1)
+        if 2 in self.trailing_data:
+            self.trailing_data['uncrossable_breaks'] = self.trailing_data.pop(2)
+        self.trailing_data['raw_bytes'] = raw_trailing_bytes
+
+        self.idx = idx
+
+    def dump(self, folder):
+        name = '%06d'%self.idx
+        with open(os.path.join(folder, name+'.txt'), 'wb') as f:
+            f.write(self.raw)
+        with open(os.path.join(folder, name+'.trailing_data'), 'wb') as f:
+            for k, v in self.trailing_data.iteritems():
+                raw = '%s : %r\n\n'%(k, v)
+                f.write(raw.encode('utf-8'))
+
+# }}}
+
+class ImageRecord(object): # {{{
+
+    def __init__(self, idx, record, fmt):
+        self.raw = record.raw
+        self.fmt = fmt
+        self.idx = idx
+
+    def dump(self, folder):
+        name = '%06d'%self.idx
+        with open(os.path.join(folder, name+'.'+self.fmt), 'wb') as f:
+            f.write(self.raw)
+
+# }}}
+
+class BinaryRecord(object): # {{{
+
+    def __init__(self, idx, record):
+        self.raw = record.raw
+        sig = self.raw[:4]
+        name = '%06d'%idx
+        if sig in (b'FCIS', b'FLIS', b'SRCS'):
+            name += '-' + sig.decode('ascii')
+        elif sig == b'\xe9\x8e\r\n':
+            name += '-' + 'EOF'
+        self.name = name
+
+    def dump(self, folder):
+        with open(os.path.join(folder, self.name+'.bin'), 'wb') as f:
+            f.write(self.raw)
+
+# }}}
+
+class TBSIndexing(object): # {{{
+
+    def __init__(self, text_records, indices, doc_type):
+        self.record_indices = OrderedDict()
+        self.doc_type = doc_type
+        self.indices = indices
+        pos = 0
+        for r in text_records:
+            start = pos
+            pos += len(r.raw)
+            end = pos - 1
+            self.record_indices[r] = x = {'starts':[], 'ends':[],
+                    'complete':[], 'geom': (start, end)}
+            for entry in indices:
+                istart, sz = entry.offset, entry.size
+                iend = istart + sz - 1
+                has_start = istart >= start and istart <= end
+                has_end = iend >= start and iend <= end
+                rec = None
+                if has_start and has_end:
+                    rec = 'complete'
+                elif has_start and not has_end:
+                    rec = 'starts'
+                elif not has_start and has_end:
+                    rec = 'ends'
+                if rec:
+                    x[rec].append(entry)
+
+    def get_index(self, idx):
+        for i in self.indices:
+            if i.index == idx: return i
+        raise IndexError('Index %d not found'%idx)
+
+    def __str__(self):
+        ans = ['*'*20 + ' TBS Indexing (%d records) '%len(self.record_indices)+ '*'*20]
+        for r, dat in self.record_indices.iteritems():
+            ans += self.dump_record(r, dat)[-1]
+        return '\n'.join(ans)
+
+    def dump(self, bdir):
+        types = defaultdict(list)
+        for r, dat in self.record_indices.iteritems():
+            tbs_type, strings = self.dump_record(r, dat)
+            if tbs_type == 0: continue
+            types[tbs_type] += strings
+        for typ, strings in types.iteritems():
+            with open(os.path.join(bdir, 'tbs_type_%d.txt'%typ), 'wb') as f:
+                f.write('\n'.join(strings))
+
+    def dump_record(self, r, dat):
+        ans = []
+        ans.append('\nRecord #%d: Starts at: %d Ends at: %d'%(r.idx,
+            dat['geom'][0], dat['geom'][1]))
+        s, e, c = dat['starts'], dat['ends'], dat['complete']
+        ans.append(('\tContains: %d index entries '
+            '(%d ends, %d complete, %d starts)')%tuple(map(len, (s+e+c, e,
+                c, s))))
+        byts = bytearray(r.trailing_data.get('indexing', b''))
+        sbyts = tuple(hex(b)[2:] for b in byts)
+        ans.append('TBS bytes: %s'%(' '.join(sbyts)))
+        for typ, entries in (('Ends', e), ('Complete', c), ('Starts', s)):
+            if entries:
+                ans.append('\t%s:'%typ)
+                for x in entries:
+                    ans.append(('\t\tIndex Entry: %d (Parent index: %d, '
+                            'Depth: %d, Offset: %d, Size: %d) [%s]')%(
+                        x.index, x.parent_index, x.depth, x.offset, x.size, x.label))
+        def bin4(num):
+            ans = bin(num)[2:]
+            return bytes('0'*(4-len(ans)) + ans)
+
+        def repr_extra(x):
+            return str({bin4(k):v for k, v in extra.iteritems()})
+
+        tbs_type = 0
+        is_periodical = self.doc_type in (257, 258, 259)
+        if len(byts):
+            outermost_index, extra, consumed = decode_tbs(byts, flag_size=4 if
+                    is_periodical else 3)
+            byts = byts[consumed:]
+            for k in extra:
+                tbs_type |= k
+            ans.append('\nTBS: %d (%s)'%(tbs_type, bin4(tbs_type)))
+            ans.append('Outermost index: %d'%outermost_index)
+            ans.append('Unknown extra start bytes: %s'%repr_extra(extra))
+            if is_periodical: # Hierarchical periodical
+                byts, a = self.interpret_periodical(tbs_type, byts,
+                        dat['geom'][0])
+                ans += a
+            if byts:
+                sbyts = tuple(hex(b)[2:] for b in byts)
+                ans.append('Remaining bytes: %s'%' '.join(sbyts))
+
+        ans.append('')
+        return tbs_type, ans
+
+    def interpret_periodical(self, tbs_type, byts, record_offset):
+        ans = []
+
+        def read_section_transitions(byts, psi=None): # {{{
+            if psi is None:
+                # Assume previous section is 1
+                psi = self.get_index(1)
+
+            while byts:
+                ai, extra, consumed = decode_tbs(byts)
+                byts = byts[consumed:]
+                if extra.get(0b0010, None) is not None:
+                    raise ValueError('Dont know how to interpret flag 0b0010'
+                            ' while reading section transitions')
+                if extra.get(0b1000, None) is not None:
+                    if len(extra) > 1:
+                        raise ValueError('Dont know how to interpret flags'
+                                ' %r while reading section transitions'%extra)
+                    nsi = self.get_index(psi.index+1)
+                    ans.append('Last article in this record of section %d'
+                            ' (relative to next section index [%d]): '
+                            '%d [%d absolute index]'%(psi.index, nsi.index, ai,
+                                ai+nsi.index))
+                    psi = nsi
+                    continue
+
+                ans.append('First article in this record of section %d'
+                        ' (relative to its parent section): '
+                        '%d [%d absolute index]'%(psi.index, ai, ai+psi.index))
+
+                num = extra.get(0b0100, None)
+                if num is None:
+                    msg = ('The section %d has at most one article'
+                            ' in this record')%psi.index
+                else:
+                    msg = ('Number of articles in this record of '
+                        'section %d: %d')%(psi.index, num)
+                ans.append(msg)
+
+                offset = extra.get(0b0001, None)
+                if offset is not None:
+                    if offset == 0:
+                        ans.append('This record is spanned by the article:'
+                                '%d'%(ai+psi.index))
+                    else:
+                        ans.append('->Offset to start of next section (%d) from start'
+                            ' of record: %d [%d absolute offset]'%(psi.index+1,
+                                offset, offset+record_offset))
+            return byts
+        # }}}
+
+        def read_starting_section(byts): # {{{
+            orig = byts
+            si, extra, consumed = decode_tbs(byts)
+            byts = byts[consumed:]
+            if len(extra) > 1 or 0b0010 in extra or 0b1000 in extra:
+                raise ValueError('Dont know how to interpret flags %r'
+                        ' when reading starting section'%extra)
+            si = self.get_index(si)
+            ans.append('The section at the start of this record is:'
+                    ' %d'%si.index)
+            if 0b0100 in extra:
+                num = extra[0b0100]
+                ans.append('The number of articles from the section %d'
+                        ' in this record: %d'%(si.index, num))
+            elif 0b0001 in extra:
+                eof = extra[0b0001]
+                if eof != 0:
+                    raise ValueError('Unknown eof value %s when reading'
+                            ' starting section. All bytes: %r'%(eof, orig))
+                ans.append('This record is spanned by an article from'
+                        ' the section: %d'%si.index)
+            return si, byts
+        # }}}
+
+        if tbs_type & 0b0100:
+            # Starting section is the first section
+            ssi = self.get_index(1)
+        else:
+            ssi, byts = read_starting_section(byts)
+
+        byts = read_section_transitions(byts, ssi)
+
+        return byts, ans
+
+# }}}
+
+class MOBIFile(object): # {{{

    def __init__(self, stream):
        self.raw = stream.read()
@ -384,25 +1110,109 @@ class MOBIFile(object):

        self.mobi_header = MOBIHeader(self.records[0])

+        if 'huff' in self.mobi_header.compression.lower():
+            huffrecs = [r.raw for r in
+                    xrange(self.mobi_header.huffman_record_offset,
+                        self.mobi_header.huffman_record_offset +
+                        self.mobi_header.huffman_record_count)]
+            from calibre.ebooks.mobi.huffcdic import HuffReader
+            huffs = HuffReader(huffrecs)
+            decompress = huffs.decompress
+        elif 'palmdoc' in self.mobi_header.compression.lower():
+            from calibre.ebooks.compression.palmdoc import decompress_doc
+            decompress = decompress_doc
+        else:
+            decompress = lambda x: x

-    def print_header(self):
-        print (str(self.palmdb).encode('utf-8'))
-        print ()
-        print ('Record headers:')
+        self.index_header = self.index_record = None
+        self.indexing_record_nums = set()
+        pir = self.mobi_header.primary_index_record
+        if pir != 0xffffffff:
+            self.index_header = IndexHeader(self.records[pir])
+            self.cncx = CNCX(self.records[
+                pir+2:pir+2+self.index_header.num_of_cncx_blocks],
+                self.index_header.index_encoding)
+            self.index_record = IndexRecord(self.records[pir+1],
+                    self.index_header, self.cncx)
+            self.indexing_record_nums = set(xrange(pir,
+                pir+2+self.index_header.num_of_cncx_blocks))
+
+
+        ntr = self.mobi_header.number_of_text_records
+        fntbr = self.mobi_header.first_non_book_record
+        fii = self.mobi_header.first_image_index
+        if fntbr == 0xffffffff:
+            fntbr = len(self.records)
+        self.text_records = [TextRecord(r, self.records[r],
+            self.mobi_header.extra_data_flags, decompress) for r in xrange(1,
+            min(len(self.records), ntr+1))]
+        self.image_records, self.binary_records = [], []
+        for i in xrange(fntbr, len(self.records)):
+            if i in self.indexing_record_nums:
+                continue
+            r = self.records[i]
+            fmt = None
+            if i >= fii and r.raw[:4] not in (b'FLIS', b'FCIS', b'SRCS',
+                    b'\xe9\x8e\r\n'):
+                try:
+                    width, height, fmt = identify_data(r.raw)
+                except:
+                    pass
+            if fmt is not None:
+                self.image_records.append(ImageRecord(i, r, fmt))
+            else:
+                self.binary_records.append(BinaryRecord(i, r))
+
+        if self.index_record is not None:
+            self.tbs_indexing = TBSIndexing(self.text_records,
+                    self.index_record.indices, self.mobi_header.type_raw)
+
+    def print_header(self, f=sys.stdout):
+        print (str(self.palmdb).encode('utf-8'), file=f)
+        print (file=f)
+        print ('Record headers:', file=f)
        for i, r in enumerate(self.records):
-            print ('%6d. %s'%(i, r.header))
+            print ('%6d. %s'%(i, r.header), file=f)

-        print ()
-        print (str(self.mobi_header).encode('utf-8'))
+        print (file=f)
+        print (str(self.mobi_header).encode('utf-8'), file=f)
+# }}}

-def inspect_mobi(path_or_stream):
+def inspect_mobi(path_or_stream, prefix='decompiled'):
    stream = (path_or_stream if hasattr(path_or_stream, 'read') else
            open(path_or_stream, 'rb'))
    f = MOBIFile(stream)
-    f.print_header()
+    ddir = prefix + '_' + os.path.splitext(os.path.basename(stream.name))[0]
+    try:
+        shutil.rmtree(ddir)
+    except:
+        pass
+    os.mkdir(ddir)
+    with open(os.path.join(ddir, 'header.txt'), 'wb') as out:
+        f.print_header(f=out)
+    if f.index_header is not None:
+        with open(os.path.join(ddir, 'index.txt'), 'wb') as out:
+            print(str(f.index_header), file=out)
+            print('\n\n', file=out)
+            print(str(f.cncx).encode('utf-8'), file=out)
+            print('\n\n', file=out)
+            print(str(f.index_record), file=out)
+        with open(os.path.join(ddir, 'tbs_indexing.txt'), 'wb') as out:
+            print(str(f.tbs_indexing), file=out)
+        f.tbs_indexing.dump(ddir)
+
+    for tdir, attr in [('text', 'text_records'), ('images', 'image_records'),
+            ('binary', 'binary_records')]:
+        tdir = os.path.join(ddir, tdir)
+        os.mkdir(tdir)
+        for rec in getattr(f, attr):
+            rec.dump(tdir)
+
+    print ('Debug data saved to:', ddir)
+
+def main():
+    inspect_mobi(sys.argv[1])

 if __name__ == '__main__':
-    import sys
-    f = MOBIFile(open(sys.argv[1], 'rb'))
-    f.print_header()
+    main()

--- a/src/calibre/ebooks/mobi/output.py
+++ b/src/calibre/ebooks/mobi/output.py
@ -27,7 +27,7 @@ class MOBIOutput(OutputFormatPlugin):
        ),
        OptionRecommendation(name='no_inline_toc',
            recommended_value=False, level=OptionRecommendation.LOW,
-            help=_('Don\'t add Table of Contents to end of book. Useful if '
+            help=_('Don\'t add Table of Contents to the book. Useful if '
                'the book has its own table of contents.')),
        OptionRecommendation(name='toc_title', recommended_value=None,
            help=_('Title for any generated in-line table of contents.')
@ -45,6 +45,12 @@ class MOBIOutput(OutputFormatPlugin):
                'the MOBI output plugin will try to convert margins specified'
                ' in the input document, otherwise it will ignore them.')
        ),
+        OptionRecommendation(name='mobi_toc_at_start',
+            recommended_value=False,
+            help=_('When adding the Table of Contents to the book, add it at the start of the '
+                'book instead of the end. Not recommended.')
+        ),
+
    ])

    def check_for_periodical(self):
@ -76,26 +82,6 @@ class MOBIOutput(OutputFormatPlugin):
        else:
            self.oeb.log.debug('Using mastheadImage supplied in manifest...')

-
-    def dump_toc(self, toc) :
-        self.log( "\n         >>> TOC contents <<<")
-        self.log( "     toc.title: %s" % toc.title)
-        self.log( "      toc.href: %s" % toc.href)
-        for periodical in toc.nodes :
-            self.log( "\tperiodical title: %s" % periodical.title)
-            self.log( "\t            href: %s" % periodical.href)
-            for section in periodical :
-                self.log( "\t\tsection title: %s" % section.title)
-                self.log( "\t\tfirst article: %s" % section.href)
-                for article in section :
-                    self.log( "\t\t\tarticle title: %s" % repr(article.title))
-                    self.log( "\t\t\t         href: %s" % article.href)
-
-    def dump_manifest(self) :
-        self.log( "\n         >>> Manifest entries <<<")
-        for href in self.oeb.manifest.hrefs :
-            self.log ("\t%s" % href)
-
    def periodicalize_toc(self):
        from calibre.ebooks.oeb.base import TOC
        toc = self.oeb.toc
@ -150,24 +136,16 @@ class MOBIOutput(OutputFormatPlugin):
            # Fix up the periodical href to point to first section href
            toc.nodes[0].href = toc.nodes[0].nodes[0].href

-            # GR diagnostics
-            if self.opts.verbose > 3:
-                self.dump_toc(toc)
-                self.dump_manifest()
-
-
    def convert(self, oeb, output_path, input_plugin, opts, log):
        self.log, self.opts, self.oeb = log, opts, oeb
-        from calibre.ebooks.mobi.writer import PALM_MAX_IMAGE_SIZE, \
-                MobiWriter, PALMDOC, UNCOMPRESSED
        from calibre.ebooks.mobi.mobiml import MobiMLizer
        from calibre.ebooks.oeb.transforms.manglecase import CaseMangler
        from calibre.ebooks.oeb.transforms.rasterize import SVGRasterizer, Unavailable
        from calibre.ebooks.oeb.transforms.htmltoc import HTMLTOCAdder
        from calibre.customize.ui import plugin_for_input_format
-        imagemax = PALM_MAX_IMAGE_SIZE if opts.rescale_images else None
        if not opts.no_inline_toc:
-            tocadder = HTMLTOCAdder(title=opts.toc_title)
+            tocadder = HTMLTOCAdder(title=opts.toc_title, position='start' if
+                    opts.mobi_toc_at_start else 'end')
            tocadder(oeb, opts)
        mangler = CaseMangler()
        mangler(oeb, opts)
@ -179,10 +157,14 @@ class MOBIOutput(OutputFormatPlugin):
        mobimlizer = MobiMLizer(ignore_tables=opts.linearize_tables)
        mobimlizer(oeb, opts)
        self.check_for_periodical()
-        write_page_breaks_after_item = not input_plugin is plugin_for_input_format('cbz')
-        writer = MobiWriter(opts, imagemax=imagemax,
-                compression=UNCOMPRESSED if opts.dont_compress else PALMDOC,
-                            prefer_author_sort=opts.prefer_author_sort,
-                            write_page_breaks_after_item=write_page_breaks_after_item)
+        write_page_breaks_after_item = input_plugin is not plugin_for_input_format('cbz')
+        from calibre.utils.config import tweaks
+        if tweaks.get('new_mobi_writer', False):
+            from calibre.ebooks.mobi.writer2.main import MobiWriter
+            MobiWriter
+        else:
+            from calibre.ebooks.mobi.writer import MobiWriter
+        writer = MobiWriter(opts,
+                        write_page_breaks_after_item=write_page_breaks_after_item)
        writer(oeb, output_path)

--- a/src/calibre/ebooks/mobi/reader.py
+++ b/src/calibre/ebooks/mobi/reader.py
@ -933,6 +933,9 @@ class MobiReader(object):
                continue
            processed_records.append(i)
            data  = self.sections[i][0]
+            if data[:4] in (b'FLIS', b'FCIS', b'SRCS', b'\xe9\x8e\r\n'):
+                # A FLIS, FCIS, SRCS or EOF record, ignore
+                continue
            buf = cStringIO.StringIO(data)
            image_index += 1
            try:
--- a/src/calibre/ebooks/mobi/tbs_periodicals.rst
+++ b/src/calibre/ebooks/mobi/tbs_periodicals.rst
@ -0,0 +1,363 @@
+Reverse engineering the trailing byte sequences for hierarchical periodicals
+===============================================================================
+
+In the following, *vwi* means variable width integer and *fvwi* means a vwi whose lowest four bits are used as a flag. All the following information/inferences are from examining the output of kindlegen on a sample periodical. Given the general level of Amazon's incompetence, there are no guarantees that this information is the *best/most complete* way to do TBS indexing.
+
+Sequence encoding:
+
+0b1000 : Continuation bit
+
+First sequences:
+0b0010 : 80
+0b0011 : 80 80
+0b0110 : 80 2
+0b0111 : 80 2 80
+
+Other sequences:
+0b0101 : 4 1a
+0b0001 : c b1
+
+Opening record
+----------------
+
+The text record that contains the opening node for the periodical (depth=0 node in the NCX) can have TBS of 3 different forms:
+
+    1. If it has only the periodical node and no section/article nodes, TBS of type 2, like this::
+
+            Record #1: Starts at: 0 Ends at: 4095
+                Contains: 1 index entries (0 ends, 0 complete, 1 starts)
+            TBS bytes: 82 80
+                Starts:
+                    Index Entry: 0 (Parent index: -1, Depth: 0, Offset: 215, Size: 68470) [j_x's Google reader]
+            TBS Type: 010 (2)
+            Outer Index entry: 0
+            Unknown (vwi: always 0?): 0
+
+    2. A periodical and a section node, but no article nodes, TBS type of 6, like this::
+
+            Record #1: Starts at: 0 Ends at: 4095
+                Contains: 2 index entries (0 ends, 0 complete, 2 starts)
+            TBS bytes: 86 80 2
+                Starts:
+                    Index Entry: 0 (Parent index: -1, Depth: 0, Offset: 215, Size: 93254) [j_x's Google reader]
+                    Index Entry: 1 (Parent index: 0, Depth: 1, Offset: 541, Size: 49280) [Ars Technica]
+            TBS Type: 110 (6)
+            Outer Index entry: 0
+            Unknown (vwi: always 0?): 0
+            Unknown (byte: always 2?): 2
+
+    3. If it has both the section 1 node and at least one article node, TBS of type 6, like this::
+
+            Record #1: Starts at: 0 Ends at: 4095
+                Contains: 4 index entries (0 ends, 1 complete, 3 starts)
+            TBS bytes: 86 80 2 c4 2
+                Complete:
+                    Index Entry: 5 (Parent index: 1, Depth: 2, Offset: 549, Size: 1866) [Week in gaming: 3DS review, Crysis 2, George Hotz]
+                Starts:
+                    Index Entry: 0 (Parent index: -1, Depth: 0, Offset: 215, Size: 79253) [j_x's Google reader]
+                    Index Entry: 1 (Parent index: 0, Depth: 1, Offset: 541, Size: 35279) [Ars Technica]
+                    Index Entry: 6 (Parent index: 1, Depth: 2, Offset: 2415, Size: 2764) [Week in Apple: ZFS on Mac OS X, rogue tethering, DUI apps, and more]
+            TBS Type: 110 (6)
+            Outer Index entry: 0
+            Unknown (vwi: always 0?): 0
+            Unknown (byte: always 2?): 2
+            Article index at start of record or first article index, relative to parent section (fvwi): 4 [5 absolute]
+            Number of article nodes in the record (byte): 2
+
+        If there was only a single article, instead of 2, then the last two bytes would be: c0, i.e. there would be no byte giving the number of articles in the record.
+
+        Starting record with two section transitions::
+
+            Record #1: Starts at: 0 Ends at: 4095
+                Contains: 7 index entries (0 ends, 4 complete, 3 starts)
+            TBS bytes: 86 80 2 c0 b8 c4 3
+                Complete:
+                    Index Entry: 1 (Parent index: 0, Depth: 1, Offset: 564, Size: 375) [Ars Technica]
+                    Index Entry: 5 (Parent index: 1, Depth: 2, Offset: 572, Size: 367) [Week in gaming: 3DS review, Crysis 2, George Hotz]
+                    Index Entry: 6 (Parent index: 2, Depth: 2, Offset: 947, Size: 1014) [Max and the Magic Marker for iPad: Review]
+                    Index Entry: 7 (Parent index: 2, Depth: 2, Offset: 1961, Size: 1077) [iPad 2 steers itself into home console gaming territory with Real Racing 2 HD]
+                Starts:
+                    Index Entry: 0 (Parent index: -1, Depth: 0, Offset: 215, Size: 35372) [j_x's Google reader]
+                    Index Entry: 2 (Parent index: 0, Depth: 1, Offset: 939, Size: 10368) [Neowin.net]
+                    Index Entry: 8 (Parent index: 2, Depth: 2, Offset: 3038, Size: 1082) [Microsoft's Joe Belfiore still working on upcoming Zune hardware]
+            TBS Type: 110 (6)
+            Outer Index entry: 0
+            Unknown (vwi: always 0?): 0
+            Unknown (byte: always 2?): 2
+            Article index at start of record or first article index, relative to parent section (fvwi): 4 [5 absolute]
+            Remaining bytes: b8 c4 3
+
+        Starting record with three section transitions::
+
+            Record #1: Starts at: 0 Ends at: 4095
+                Contains: 10 index entries (0 ends, 7 complete, 3 starts)
+            TBS bytes: 86 80 2 c0 b8 c0 b8 c4 4
+                Complete:
+                    Index Entry: 1 (Parent index: 0, Depth: 1, Offset: 564, Size: 375) [Ars Technica]
+                    Index Entry: 2 (Parent index: 0, Depth: 1, Offset: 939, Size: 316) [Neowin.net]
+                    Index Entry: 5 (Parent index: 1, Depth: 2, Offset: 572, Size: 367) [Week in gaming: 3DS review, Crysis 2, George Hotz]
+                    Index Entry: 6 (Parent index: 2, Depth: 2, Offset: 947, Size: 308) [Max and the Magic Marker for iPad: Review]
+                    Index Entry: 7 (Parent index: 3, Depth: 2, Offset: 1263, Size: 760) [OSnews Asks on Interrupts: The Results]
+                    Index Entry: 8 (Parent index: 3, Depth: 2, Offset: 2023, Size: 693) [Apple Ditches SAMBA in Favour of Homegrown Replacement]
+                    Index Entry: 9 (Parent index: 3, Depth: 2, Offset: 2716, Size: 747) [ITC: Apple's Mobile Products Do Not Violate Nokia Patents]
+                Starts:
+                    Index Entry: 0 (Parent index: -1, Depth: 0, Offset: 215, Size: 25320) [j_x's Google reader]
+                    Index Entry: 3 (Parent index: 0, Depth: 1, Offset: 1255, Size: 6829) [OSNews]
+                    Index Entry: 10 (Parent index: 3, Depth: 2, Offset: 3463, Size: 666) [Transparent Monitor Embedded in Window Glass]
+            TBS Type: 110 (6)
+            Outer Index entry: 0
+            Unknown (vwi: always 0?): 0
+            Unknown (byte: always 2?): 2
+            Article index at start of record or first article index, relative to parent section (fvwi): 4 [5 absolute]
+            Remaining bytes: b8 c0 b8 c4 4
+
+
+
+
+
+Records with no nodes
+------------------------
+
+subtype = 010
+
+These records are spanned by a single article. They are of two types:
+
+    1. If the parent section index is 1, TBS type of 6, like this::
+
+            Record #4: Starts at: 12288 Ends at: 16383
+                Contains: 0 index entries (0 ends, 0 complete, 0 starts)
+            TBS bytes: 86 80 2 c1 80
+            TBS Type: 110 (6)
+            Outer Index entry: 0
+            Unknown (vwi: always 0?): 0
+            Unknown (byte: always 2?): 2
+            Article index at start of record or first article index, relative to parent section (fvwi): 4 [5 absolute]
+            EOF (vwi: should be 0): 0
+
+        If the record is before the first article, the TBS bytes would be: 86 80 2
+
+    2. If the parent section index is > 1, TBS type of 2, like this::
+
+            Record #14: Starts at: 53248 Ends at: 57343
+                Contains: 0 index entries (0 ends, 0 complete, 0 starts)
+            TBS bytes: 82 80 a0 1 e1 80
+            TBS Type: 010 (2)
+            Outer Index entry: 0
+            Unknown (vwi: always 0?): 0
+            Parent section index (fvwi): 2
+            Flags: 0
+            Article index at start of record or first article index, relative to parent section (fvwi): 14 [16 absolute]
+            EOF (vwi: should be 0): 0
+
+Records with only article nodes
+-----------------------------------
+
+Such records have no section transitions (i.e. a section end/section start pair). They have only one or more article nodes. They are of two types:
+
+    1. If the parent section index is 1, TBS type of 7, like this::
+
+            Record #6: Starts at: 20480 Ends at: 24575
+                Contains: 2 index entries (1 ends, 0 complete, 1 starts)
+            TBS bytes: 87 80 2 80 1 84 2
+                Ends:
+                    Index Entry: 9 (Parent index: 1, Depth: 2, Offset: 16453, Size: 4199) [Vaccine's success spurs whooping cough comeback]
+                Starts:
+                    Index Entry: 10 (Parent index: 1, Depth: 2, Offset: 20652, Size: 4246) [Apple's mobile products do not violate Nokia patents, says ITC]
+            TBS Type: 111 (7)
+            Outer Index entry: 0
+            Unknown (vwi: always 0?): 0
+            Unknown: '\x02\x80' (vwi?: Always 256)
+            Article at start of record (fvwi): 8
+            Number of articles in record (byte): 2
+
+        If there was only one article in the record, the last two bytes would be replaced by a single byte: 80
+
+        If this record is the first record with an article, then the article at the start of the record should be the last section index. At least, that's what kindlegen does, though if you ask me, it should be the first section index.
+
+
+    2. If the parent section index is > 1, TBS type of 2, like this::
+
+            Record #16: Starts at: 61440 Ends at: 65535
+                Contains: 5 index entries (1 ends, 3 complete, 1 starts)
+            TBS bytes: 82 80 a1 80 1 f4 5
+                Ends:
+                    Index Entry: 17 (Parent index: 2, Depth: 2, Offset: 60920, Size: 1082) [Microsoft's Joe Belfiore still working on upcoming Zune hardware]
+                Complete:
+                    Index Entry: 18 (Parent index: 2, Depth: 2, Offset: 62002, Size: 1016) [Rumour: OS X Lion nearing Golden Master stage]
+                    Index Entry: 19 (Parent index: 2, Depth: 2, Offset: 63018, Size: 1045) [iOS 4.3.1 released]
+                    Index Entry: 20 (Parent index: 2, Depth: 2, Offset: 64063, Size: 972) [Windows 8 'system reset' image leaks]
+                Starts:
+                    Index Entry: 21 (Parent index: 2, Depth: 2, Offset: 65035, Size: 1057) [Windows Phone 7: Why it's failing]
+            TBS Type: 010 (2)
+            Outer Index entry: 0
+            Unknown (vwi: always 0?): 0
+            Parent section index (fvwi) : 2
+            Flags: 1
+            Unknown (vwi: always 0?): 0
+            Article index at start of record or first article index, relative to parent section (fvwi): 15 [17 absolute]
+            Number of article nodes in the record (byte): 5
+
+        If there was only one article in the record, the last two bytes would be replaced by a single byte: f0
+
+Records with a section transition
+-----------------------------------
+
+In such a record there is a transition from one section to the next. As such the record must have at least one article ending and one article starting, except in the case of the first section.
+
+    1. The first section::
+
+        Record #2: Starts at: 4096 Ends at: 8191
+            Contains: 2 index entries (0 ends, 0 complete, 2 starts)
+        TBS bytes: 83 80 80 90 c0
+            Starts:
+                Index Entry: 1 (Parent index: 0, Depth: 1, Offset: 7758, Size: 26279) [Ars Technica]
+                Index Entry: 5 (Parent index: 1, Depth: 2, Offset: 7766, Size: 1866) [Week in gaming: 3DS review, Crysis 2, George Hotz]
+        TBS Type: 011 (3)
+        Outer Index entry: 0
+        Unknown (vwi: always 0?): 0
+        Unknown (vwi: always 0?): 0
+        First section index (fvwi) : 1
+        Extra bits: 0
+        First section starts
+        Article at start of block as offset from parent index (fvwi): 4 [5 absolute]
+        Flags: 0
+
+    If there was more than one article at the start then the last byte would be replaced by: c4 n where n is the number of articles
+
+    2. A record with a section transition and only one article from the ending section::
+
+        Record #9: Starts at: 32768 Ends at: 36863
+            Contains: 6 index entries (2 ends, 2 complete, 2 starts)
+        TBS bytes: 83 80 80 90 1 d0 1 c8 1 d4 3
+            Ends:
+                Index Entry: 1 (Parent index: 0, Depth: 1, Offset: 7758, Size: 26279) [Ars Technica]
+                Index Entry: 14 (Parent index: 1, Depth: 2, Offset: 31929, Size: 2108) [Trademarked keyword sales may soon be restricted in Europe]
+            Complete:
+                Index Entry: 15 (Parent index: 2, Depth: 2, Offset: 34045, Size: 1014) [Max and the Magic Marker for iPad: Review]
+                Index Entry: 16 (Parent index: 2, Depth: 2, Offset: 35059, Size: 1077) [iPad 2 steers itself into home console gaming territory with Real Racing 2 HD]
+            Starts:
+                Index Entry: 2 (Parent index: 0, Depth: 1, Offset: 34037, Size: 10368) [Neowin.net]
+                Index Entry: 17 (Parent index: 2, Depth: 2, Offset: 36136, Size: 1082) [Microsoft's Joe Belfiore still working on upcoming Zune hardware]
+        TBS Type: 011 (3)
+        Outer Index entry: 0
+        Unknown (vwi: always 0?): 0
+        Unknown (vwi: always 0?): 0
+        First section index (fvwi): 1
+        Extra bits (flag: always 0?): 0
+        First article of ending section, relative to its parent's index (fvwi): 13 [14 absolute]
+        Last article of ending section w.r.t. starting section offset (fvwi): 12 [14 absolute]
+        Flags (always 8?): 8
+        Article index at start of record or first article index, relative to parent section (fvwi): 13 [15 absolute]
+        Number of article nodes in the record (byte): 3
+
+    3. A record with a section transition and more than one article from the ending section::
+
+        Record #11: Starts at: 40960 Ends at: 45055
+            Contains: 7 index entries (2 ends, 3 complete, 2 starts)
+        TBS bytes: 83 80 80 a0 2 b5 4 1a f5 2 d8 2 e0
+            Ends:
+                Index Entry: 2 (Parent index: 0, Depth: 1, Offset: 34037, Size: 10368) [Neowin.net]
+                Index Entry: 21 (Parent index: 2, Depth: 2, Offset: 40251, Size: 1057) [Windows Phone 7: Why it's failing]
+            Complete:
+                Index Entry: 22 (Parent index: 2, Depth: 2, Offset: 41308, Size: 1050) [RIM announces Android app support for Blackberry Playbook]
+                Index Entry: 23 (Parent index: 2, Depth: 2, Offset: 42358, Size: 1087) [Microsoft buys $7.5m worth of IPv4 addresses]
+                Index Entry: 24 (Parent index: 2, Depth: 2, Offset: 43445, Size: 960) [TechSpot: Apple iPad 2 Review]
+            Starts:
+                Index Entry: 3 (Parent index: 0, Depth: 1, Offset: 44405, Size: 6829) [OSNews]
+                Index Entry: 25 (Parent index: 3, Depth: 2, Offset: 44413, Size: 760) [OSnews Asks on Interrupts: The Results]
+        TBS Type: 011 (3)
+        Outer Index entry: 0
+        Unknown (vwi: always 0?): 0
+        Unknown (vwi: always 0?): 0
+        First section index (fvwi): 2
+        Extra bits (flag: always 0?): 0
+        First article of ending section, relative to its parent's index (fvwi): 19 [21 absolute]
+        Number of article nodes in the record (byte): 4
+        ->Offset from start of record to beginning of last starting section in this record (vwi)): 3445
+        Last article of ending section w.r.t. starting section offset (fvwi): 21 [24 absolute]
+        Flags (always 8?): 8
+        Article index at start of record or first article index, relative to parent section (fvwi): 22 [25 absolute]
+
+    The difference to the previous case is the extra two bytes that encode the offset of the opening section from the start of the record.
+
+    4. A record with multiple section transitions::
+
+        Record #9: Starts at: 32768 Ends at: 36863
+            Contains: 9 index entries (2 ends, 5 complete, 2 starts)
+        TBS bytes: 83 80 80 90 1 d0 1 c8 1 d1 c b1 1 c8 1 d4 4
+            Ends:
+                Index Entry: 1 (Parent index: 0, Depth: 1, Offset: 7758, Size: 26279) [Ars Technica]
+                Index Entry: 14 (Parent index: 1, Depth: 2, Offset: 31929, Size: 2108) [Trademarked keyword sales may soon be restricted in Europe]
+            Complete:
+                Index Entry: 2 (Parent index: 0, Depth: 1, Offset: 34037, Size: 316) [Neowin.net]
+                Index Entry: 15 (Parent index: 2, Depth: 2, Offset: 34045, Size: 308) [Max and the Magic Marker for iPad: Review]
+                Index Entry: 16 (Parent index: 3, Depth: 2, Offset: 34361, Size: 760) [OSnews Asks on Interrupts: The Results]
+                Index Entry: 17 (Parent index: 3, Depth: 2, Offset: 35121, Size: 693) [Apple Ditches SAMBA in Favour of Homegrown Replacement]
+                Index Entry: 18 (Parent index: 3, Depth: 2, Offset: 35814, Size: 747) [ITC: Apple's Mobile Products Do Not Violate Nokia Patents]
+            Starts:
+                Index Entry: 3 (Parent index: 0, Depth: 1, Offset: 34353, Size: 6829) [OSNews]
+                Index Entry: 19 (Parent index: 3, Depth: 2, Offset: 36561, Size: 666) [Transparent Monitor Embedded in Window Glass]
+        TBS Type: 011 (3)
+        Outer Index entry: 0
+        Unknown (vwi: always 0?): 0
+        Unknown (vwi: always 0?): 0
+        First section index (fvwi): 1
+        Extra bits (flag: always 0?): 0
+        First article of ending section, relative to its parent's index (fvwi): 13 [14 absolute]
+        Last article of ending section w.r.t. starting section offset (fvwi): 12 [14 absolute]
+        Flags (always 8?): 8
+        Article index at start of record or first article index, relative to parent section (fvwi): 13 [15 absolute]
+        ->Offset from start of record to beginning ofnext starting section in this record: 1585
+        Last article of ending section w.r.t. starting section offset (fvwi): 12 [15 absolute]
+        Flags (always 8?): 8
+        Article index at start of record or first article index, relative to parent section (fvwi): 13 [16 absolute]
+        Number of article nodes in the record belonging ot the last section (byte): 4
+
+
+Ending record
+----------------
+
+Logically, ending records must have at least one article ending, one section ending and the periodical ending. They are of TBS type 2, like this::
+
+    Record #17: Starts at: 65536 Ends at: 68684
+        Contains: 4 index entries (3 ends, 1 complete, 0 starts)
+    TBS bytes: 82 80 c0 4 f4 2
+        Ends:
+            Index Entry: 0 (Parent index: -1, Depth: 0, Offset: 215, Size: 68470) [j_x's Google reader]
+            Index Entry: 4 (Parent index: 0, Depth: 1, Offset: 51234, Size: 17451) [Slashdot]
+            Index Entry: 43 (Parent index: 4, Depth: 2, Offset: 65422, Size: 1717) [US ITC May Reverse Judge&#39;s Ruling In Kodak vs. Apple]
+        Complete:
+            Index Entry: 44 (Parent index: 4, Depth: 2, Offset: 67139, Size: 1546) [Google Starts Testing Google Music Internally]
+    TBS Type: 010 (2)
+    Outer Index entry: 0
+    Unknown (vwi: always 0?): 0
+    Parent section index (fvwi): 4
+    Flags: 0
+    Article at start of block as offset from parent index (fvwi): 39 [43 absolute]
+    Number of nodes (byte): 2
+
+If the record had only a single article end, the last two bytes would be replaced with: f0
+
+If the last record has multiple section transitions, it is of type 6 and looks like::
+
+    Record #9: Starts at: 32768 Ends at: 34953
+        Contains: 9 index entries (3 ends, 6 complete, 0 starts)
+    TBS bytes: 86 80 2 1 d0 1 c8 1 d0 1 c8 1 d0 1 c8 1 d0
+        Ends:
+            Index Entry: 0 (Parent index: -1, Depth: 0, Offset: 215, Size: 34739) [j_x's Google reader]
+            Index Entry: 1 (Parent index: 0, Depth: 1, Offset: 7758, Size: 26279) [Ars Technica]
+            Index Entry: 14 (Parent index: 1, Depth: 2, Offset: 31929, Size: 2108) [Trademarked keyword sales may soon be restricted in Europe]
+        Complete:
+            Index Entry: 2 (Parent index: 0, Depth: 1, Offset: 34037, Size: 316) [Neowin.net]
+            Index Entry: 3 (Parent index: 0, Depth: 1, Offset: 34353, Size: 282) [OSNews]
+            Index Entry: 4 (Parent index: 0, Depth: 1, Offset: 34635, Size: 319) [Slashdot]
+            Index Entry: 15 (Parent index: 2, Depth: 2, Offset: 34045, Size: 308) [Max and the Magic Marker for iPad: Review]
+            Index Entry: 16 (Parent index: 3, Depth: 2, Offset: 34361, Size: 274) [OSnews Asks on Interrupts: The Results]
+            Index Entry: 17 (Parent index: 4, Depth: 2, Offset: 34643, Size: 311) [Leonard Nimoy Turns 80]
+    TBS Type: 110 (6)
+    Outer Index entry: 0
+    Unknown (vwi: always 0?): 0
+    Unknown (byte: always 2?): 2
+    Article index at start of record or first article index, relative to parent section (fvwi): 13 [14 absolute]
+    Remaining bytes: 1 c8 1 d0 1 c8 1 d0 1 c8 1 d0
+
--- a/src/calibre/ebooks/mobi/utils.py
+++ b/src/calibre/ebooks/mobi/utils.py
@ -0,0 +1,334 @@
+#!/usr/bin/env python
+# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai
+from __future__ import (unicode_literals, division, absolute_import,
+                        print_function)
+
+__license__   = 'GPL v3'
+__copyright__ = '2011, Kovid Goyal <kovid@kovidgoyal.net>'
+__docformat__ = 'restructuredtext en'
+
+import struct
+from collections import OrderedDict
+
+from calibre.utils.magick.draw import Image, save_cover_data_to, thumbnail
+from calibre.ebooks import normalize
+
+IMAGE_MAX_SIZE = 10 * 1024 * 1024
+
+def decode_hex_number(raw):
+    '''
+    Return a variable length number encoded using hexadecimal encoding. These
+    numbers have the first byte which tells the number of bytes that follow.
+    The bytes that follow are simply the hexadecimal representation of the
+    number.
+
+    :param raw: Raw binary data as a bytestring
+
+    :return: The number and the number of bytes from raw that the number
+    occupies
+    '''
+    length, = struct.unpack(b'>B', raw[0])
+    raw = raw[1:1+length]
+    consumed = length+1
+    return int(raw, 16), consumed
+
+def encode_number_as_hex(num):
+    '''
+    Encode num as a variable length encoded hexadecimal number. Returns the
+    bytestring containing the encoded number. These
+    numbers have the first byte which tells the number of bytes that follow.
+    The bytes that follow are simply the hexadecimal representation of the
+    number.
+    '''
+    num = bytes(hex(num)[2:].upper())
+    nlen = len(num)
+    if nlen % 2 != 0:
+        num = b'0'+num
+    ans = bytearray(num)
+    ans.insert(0, len(num))
+    return bytes(ans)
+
+def encint(value, forward=True):
+    '''
+    Some parts of the Mobipocket format encode data as variable-width integers.
+    These integers are represented big-endian with 7 bits per byte in bits 1-7.
+    They may be either forward-encoded, in which case only the first byte has bit 8 set,
+    or backward-encoded, in which case only the last byte has bit 8 set.
+    For example, the number 0x11111 = 0b10001000100010001 would be represented
+    forward-encoded as:
+
+        0x04 0x22 0x91 = 0b100 0b100010 0b10010001
+
+    And backward-encoded as:
+
+        0x84 0x22 0x11 = 0b10000100 0b100010 0b10001
+
+    This function encodes the integer ``value`` as a variable width integer and
+    returns the bytestring corresponding to it.
+
+    If forward is True the bytes returned are suitable for prepending to the
+    output buffer, otherwise they must be append to the output buffer.
+    '''
+    if value < 0:
+        raise ValueError('Cannot encode negative numbers as vwi')
+    # Encode vwi
+    byts = bytearray()
+    while True:
+        b = value & 0b01111111
+        value >>= 7 # shift value to the right by 7 bits
+
+        byts.append(b)
+        if value == 0:
+            break
+    byts[0 if forward else -1] |= 0b10000000
+    byts.reverse()
+    return bytes(byts)
+
+def decint(raw, forward=True):
+    '''
+    Read a variable width integer from the bytestring or bytearray raw and return the
+    integer and the number of bytes read. If forward is True bytes are read
+    from the start of raw, otherwise from the end of raw.
+
+    This function is the inverse of encint above, see its docs for more
+    details.
+    '''
+    val = 0
+    byts = bytearray()
+    src = bytearray(raw)
+    if not forward:
+        src.reverse()
+    for bnum in src:
+        byts.append(bnum & 0b01111111)
+        if bnum & 0b10000000:
+            break
+    if not forward:
+        byts.reverse()
+    for byte in byts:
+        val <<= 7 # Shift value to the left by 7 bits
+        val |= byte
+
+    return val, len(byts)
+
+def test_decint(num):
+    for d in (True, False):
+        raw = encint(num, forward=d)
+        sz = len(raw)
+        if (num, sz) != decint(raw, forward=d):
+            raise ValueError('Failed for num %d, forward=%r: %r != %r' % (
+                num, d, (num, sz), decint(raw, forward=d)))
+
+def rescale_image(data, maxsizeb=IMAGE_MAX_SIZE, dimen=None):
+    '''
+    Convert image setting all transparent pixels to white and changing format
+    to JPEG. Ensure the resultant image has a byte size less than
+    maxsizeb.
+
+    If dimen is not None, generate a thumbnail of width=dimen, height=dimen
+
+    Returns the image as a bytestring
+    '''
+    if dimen is not None:
+        data = thumbnail(data, width=dimen, height=dimen,
+                compression_quality=90)[-1]
+    else:
+        # Replace transparent pixels with white pixels and convert to JPEG
+        data = save_cover_data_to(data, 'img.jpg', return_data=True)
+    if len(data) <= maxsizeb:
+        return data
+    orig_data = data
+    img = Image()
+    quality = 95
+
+    img.load(data)
+    while len(data) >= maxsizeb and quality >= 10:
+        quality -= 5
+        img.set_compression_quality(quality)
+        data = img.export('jpg')
+    if len(data) <= maxsizeb:
+        return data
+    orig_data = data
+
+    scale = 0.9
+    while len(data) >= maxsizeb and scale >= 0.05:
+        img = Image()
+        img.load(orig_data)
+        w, h = img.size
+        img.size = (int(scale*w), int(scale*h))
+        img.set_compression_quality(quality)
+        data = img.export('jpg')
+        scale -= 0.05
+    return data
+
+def get_trailing_data(record, extra_data_flags):
+    '''
+    Given a text record as a bytestring and the extra data flags from the MOBI
+    header, return the trailing data as a dictionary, mapping bit number to
+    data as bytestring. Also returns the record - all trailing data.
+
+    :return: Trailing data, record - trailing data
+    '''
+    data = OrderedDict()
+    for i in xrange(16, -1, -1):
+        flag = 1 << i # 2**i
+        if flag & extra_data_flags:
+            if i == 0:
+                # Only the first two bits are used for the size since there can
+                # never be more than 3 trailing multibyte chars
+                sz = (ord(record[-1]) & 0b11) + 1
+                consumed = 1
+            else:
+                sz, consumed = decint(record, forward=False)
+            if sz > consumed:
+                data[i] = record[-sz:-consumed]
+            record = record[:-sz]
+    return data, record
+
+def encode_trailing_data(raw):
+    '''
+    Given some data in the bytestring raw, return a bytestring of the form
+
+        <data><size>
+
+    where size is a backwards encoded vwi whose value is the length of the
+    entire returned bytestring. data is the bytestring passed in as raw.
+
+    This is the encoding used for trailing data entries at the end of text
+    records. See get_trailing_data() for details.
+    '''
+    lsize = 1
+    while True:
+        encoded = encint(len(raw) + lsize, forward=False)
+        if len(encoded) == lsize:
+            break
+        lsize += 1
+    return raw + encoded
+
+def encode_fvwi(val, flags, flag_size=4):
+    '''
+    Encode the value val and the flag_size bits from flags as a fvwi. This encoding is
+    used in the trailing byte sequences for indexing. Returns encoded
+    bytestring.
+    '''
+    ans = val << flag_size
+    for i in xrange(flag_size):
+        ans |= (flags & (1 << i))
+    return encint(ans)
+
+
+def decode_fvwi(byts, flag_size=4):
+    '''
+    Decode encoded fvwi. Returns number, flags, consumed
+    '''
+    arg, consumed = decint(bytes(byts))
+    val = arg >> flag_size
+    flags = 0
+    for i in xrange(flag_size):
+        flags |= (arg & (1 << i))
+    return val, flags, consumed
+
+
+def decode_tbs(byts, flag_size=4):
+    '''
+    Trailing byte sequences for indexing consists of series of fvwi numbers.
+    This function reads the fvwi number and its associated flags. It them uses
+    the flags to read any more numbers that belong to the series. The flags are
+    the lowest 4 bits of the vwi (see the encode_fvwi function above).
+
+    Returns the fvwi number, a dictionary mapping flags bits to the associated
+    data and the number of bytes consumed.
+    '''
+    byts = bytes(byts)
+    val, flags, consumed = decode_fvwi(byts, flag_size=flag_size)
+    extra = {}
+    byts = byts[consumed:]
+    if flags & 0b1000 and flag_size > 3:
+        extra[0b1000] = True
+    if flags & 0b0010:
+        x, consumed2 = decint(byts)
+        byts = byts[consumed2:]
+        extra[0b0010] = x
+        consumed += consumed2
+    if flags & 0b0100:
+        extra[0b0100] = ord(byts[0])
+        byts = byts[1:]
+        consumed += 1
+    if flags & 0b0001:
+        x, consumed2 = decint(byts)
+        byts = byts[consumed2:]
+        extra[0b0001] = x
+        consumed += consumed2
+    return val, extra, consumed
+
+def encode_tbs(val, extra, flag_size=4):
+    '''
+    Encode the number val and the extra data in the extra dict as an fvwi. See
+    decode_tbs above.
+    '''
+    flags = 0
+    for flag in extra:
+        flags |= flag
+    ans = encode_fvwi(val, flags, flag_size=flag_size)
+
+    if 0b0010 in extra:
+        ans += encint(extra[0b0010])
+    if 0b0100 in extra:
+        ans += bytes(bytearray([extra[0b0100]]))
+    if 0b0001 in extra:
+        ans += encint(extra[0b0001])
+    return ans
+
+def utf8_text(text):
+    '''
+    Convert a possibly null string to utf-8 bytes, guaranteeing to return a non
+    empty, normalized bytestring.
+    '''
+    if text and text.strip():
+        text = text.strip()
+        if not isinstance(text, unicode):
+            text = text.decode('utf-8', 'replace')
+        text = normalize(text).encode('utf-8')
+    else:
+        text = _('Unknown').encode('utf-8')
+    return text
+
+def align_block(raw, multiple=4, pad=b'\0'):
+    '''
+    Return raw with enough pad bytes append to ensure its length is a multiple
+    of 4.
+    '''
+    extra = len(raw) % multiple
+    if extra == 0: return raw
+    return raw + pad*(multiple - extra)
+
+
+def detect_periodical(toc, log=None):
+    '''
+    Detect if the TOC object toc contains a periodical that conforms to the
+    structure required by kindlegen to generate a periodical.
+    '''
+    for node in toc.iterdescendants():
+        if node.depth() == 1 and node.klass != 'article':
+            if log is not None:
+                log.debug(
+                'Not a periodical: Deepest node does not have '
+                'class="article"')
+            return False
+        if node.depth() == 2 and node.klass != 'section':
+            if log is not None:
+                log.debug(
+                'Not a periodical: Second deepest node does not have'
+                ' class="section"')
+            return False
+        if node.depth() == 3 and node.klass != 'periodical':
+            if log is not None:
+                log.debug('Not a periodical: Third deepest node'
+                    ' does not have class="periodical"')
+            return False
+        if node.depth() > 3:
+            if log is not None:
+                log.debug('Not a periodical: Has nodes of depth > 3')
+            return False
+    return True
+
+
--- a/src/calibre/ebooks/mobi/writer.py
+++ b/src/calibre/ebooks/mobi/writer.py
@ -111,7 +111,8 @@ def align_block(raw, multiple=4, pad='\0'):

 def rescale_image(data, maxsizeb, dimen=None):
    if dimen is not None:
-        data = thumbnail(data, width=dimen, height=dimen)[-1]
+        data = thumbnail(data, width=dimen[0], height=dimen[1],
+                compression_quality=90)[-1]
    else:
        # Replace transparent pixels with white pixels and convert to JPEG
        data = save_cover_data_to(data, 'img.jpg', return_data=True)
@ -141,7 +142,7 @@ def rescale_image(data, maxsizeb, dimen=None):
        scale -= 0.05
    return data

-class Serializer(object):
+class Serializer(object): # {{{
    NSRMAP = {'': None, XML_NS: 'xml', XHTML_NS: '', MBP_NS: 'mbp'}

    def __init__(self, oeb, images, write_page_breaks_after_item=True):
@ -172,6 +173,9 @@ class Serializer(object):
        hrefs = self.oeb.manifest.hrefs
        buffer.write('<guide>')
        for ref in self.oeb.guide.values():
+            # The Kindle decides where to open a book based on the presence of
+            # an item in the guide that looks like
+            # <reference type="text" title="Start" href="chapter-one.xhtml"/>
            path = urldefrag(ref.href)[0]
            if path not in hrefs or hrefs[path].media_type not in OEB_DOCS:
                continue
@ -215,12 +219,6 @@ class Serializer(object):
        self.anchor_offset = buffer.tell()
        buffer.write('<body>')
        self.anchor_offset_kindle = buffer.tell()
-        # CybookG3 'Start Reading' link
-        if 'text' in self.oeb.guide:
-            href = self.oeb.guide['text'].href
-            buffer.write('<a ')
-            self.serialize_href(href)
-            buffer.write(' />')
        spine = [item for item in self.oeb.spine if item.linear]
        spine.extend([item for item in self.oeb.spine if not item.linear])
        for item in spine:
@ -315,16 +313,20 @@ class Serializer(object):
                    buffer.seek(hoff)
                    buffer.write('%010d' % ioff)

+    # }}}
+
 class MobiWriter(object):
    COLLAPSE_RE = re.compile(r'[ \t\r\n\v]+')

-    def __init__(self, opts, compression=PALMDOC, imagemax=None,
-            prefer_author_sort=False, write_page_breaks_after_item=True):
+    def __init__(self, opts,
+            write_page_breaks_after_item=True):
        self.opts = opts
        self.write_page_breaks_after_item = write_page_breaks_after_item
-        self._compression = compression or UNCOMPRESSED
-        self._imagemax = imagemax or OTHER_MAX_IMAGE_SIZE
-        self._prefer_author_sort = prefer_author_sort
+        self._compression = UNCOMPRESSED if getattr(opts, 'dont_compress',
+                False) else PALMDOC
+        self._imagemax = (PALM_MAX_IMAGE_SIZE if getattr(opts,
+            'rescale_images', False) else OTHER_MAX_IMAGE_SIZE)
+        self._prefer_author_sort = getattr(opts, 'prefer_author_sort', False)
        self._primary_index_record = None
        self._conforming_periodical_toc = False
        self._indexable = False
@ -1258,11 +1260,11 @@ class MobiWriter(object):
                data = compress_doc(data)
            record = StringIO()
            record.write(data)
+            # Write trailing muti-byte sequence if any
+            record.write(overlap)
+            record.write(pack('>B', len(overlap)))

-            # Marshall's utf-8 break code.
            if WRITE_PBREAKS :
-                record.write(overlap)
-                record.write(pack('>B', len(overlap)))
                nextra = 0
                pbreak = 0
                running = offset
@ -1325,6 +1327,8 @@ class MobiWriter(object):
            except:
                self._oeb.logger.warn('Bad image file %r' % item.href)
                continue
+            finally:
+                item.unload_data_from_memory()
            self._records.append(data)
            if self._first_image_record is None:
                self._first_image_record = len(self._records)-1
@ -1638,6 +1642,61 @@ class MobiWriter(object):
        for record in self._records:
            self._write(record)

+    def _clean_text_value(self, text):
+        if text is not None and text.strip() :
+            text = text.strip()
+            if not isinstance(text, unicode):
+                text = text.decode('utf-8', 'replace')
+            text = normalize(text).encode('utf-8')
+        else :
+            text = "(none)".encode('utf-8')
+        return text
+
+    def _compute_offset_length(self, i, node, entries) :
+        h = node.href
+        if h not in self._id_offsets:
+            self._oeb.log.warning('Could not find TOC entry:', node.title)
+            return -1, -1
+
+        offset = self._id_offsets[h]
+        length = None
+        # Calculate length based on next entry's offset
+        for sibling in entries[i+1:]:
+            h2 = sibling.href
+            if h2 in self._id_offsets:
+                offset2 = self._id_offsets[h2]
+                if offset2 > offset:
+                    length = offset2 - offset
+                    break
+        if length is None:
+            length = self._content_length - offset
+        return offset, length
+
+    def _establish_document_structure(self) :
+        documentType = None
+        try :
+            klass = self._ctoc_map[0]['klass']
+        except :
+            klass = None
+
+        if klass == 'chapter' or klass == None :
+            documentType = 'book'
+            if self.opts.verbose > 2 :
+                self._oeb.logger.info("Adding a MobiBook to self._MobiDoc")
+            self._MobiDoc.documentStructure = MobiBook()
+
+        elif klass == 'periodical' :
+            documentType = klass
+            if self.opts.verbose > 2 :
+                self._oeb.logger.info("Adding a MobiPeriodical to self._MobiDoc")
+            self._MobiDoc.documentStructure = MobiPeriodical(self._MobiDoc.getNextNode())
+            self._MobiDoc.documentStructure.startAddress = self._anchor_offset_kindle
+        else :
+            raise NotImplementedError('_establish_document_structure: unrecognized klass: %s' % klass)
+        return documentType
+
+    # Index {{{
+
    def _generate_index(self):
        self._oeb.log('Generating INDX ...')
        self._primary_index_record = None
@ -1811,276 +1870,7 @@ class MobiWriter(object):
                    open(os.path.join(t, n+'.bin'), 'wb').write(self._records[-(i+1)])
                self._oeb.log.debug('Index records dumped to', t)

-    def _clean_text_value(self, text):
-        if text is not None and text.strip() :
-            text = text.strip()
-            if not isinstance(text, unicode):
-                text = text.decode('utf-8', 'replace')
-            text = normalize(text).encode('utf-8')
-        else :
-            text = "(none)".encode('utf-8')
-        return text
-
-    def _add_to_ctoc(self, ctoc_str, record_offset):
-        # Write vwilen + string to ctoc
-        # Return offset
-        # Is there enough room for this string in the current ctoc record?
-        if 0xfbf8 - self._ctoc.tell() < 2 + len(ctoc_str):
-            # flush this ctoc, start a new one
-            # print "closing ctoc_record at 0x%X" % self._ctoc.tell()
-            # print "starting new ctoc with '%-50.50s ...'" % ctoc_str
-            # pad with 00
-            pad = 0xfbf8 - self._ctoc.tell()
-            # print "padding %d bytes of 00" % pad
-            self._ctoc.write('\0' * (pad))
-            self._ctoc_records.append(self._ctoc.getvalue())
-            self._ctoc.truncate(0)
-            self._ctoc_offset += 0x10000
-            record_offset = self._ctoc_offset
-
-        offset = self._ctoc.tell() + record_offset
-        self._ctoc.write(decint(len(ctoc_str), DECINT_FORWARD) + ctoc_str)
-        return offset
-
-    def _add_flat_ctoc_node(self, node, ctoc, title=None):
-        # Process 'chapter' or 'article' nodes only, force either to 'chapter'
-        t = node.title if title is None else title
-        t = self._clean_text_value(t)
-        self._last_toc_entry = t
-
-        # Create an empty dictionary for this node
-        ctoc_name_map = {}
-
-        # article = chapter
-        if node.klass == 'article' :
-            ctoc_name_map['klass'] = 'chapter'
-        else :
-            ctoc_name_map['klass'] = node.klass
-
-        # Add title offset to name map
-        ctoc_name_map['titleOffset'] = self._add_to_ctoc(t, self._ctoc_offset)
-        self._chapterCount += 1
-
-        # append this node's name_map to map
-        self._ctoc_map.append(ctoc_name_map)
-
-        return
-
-    def _add_structured_ctoc_node(self, node, ctoc, title=None):
-        # Process 'periodical', 'section' and 'article'
-
-        # Fetch the offset referencing the current ctoc_record
-        if node.klass is None :
-            return
-        t = node.title if title is None else title
-        t = self._clean_text_value(t)
-        self._last_toc_entry = t
-
-        # Create an empty dictionary for this node
-        ctoc_name_map = {}
-
-        # Add the klass of this node
-        ctoc_name_map['klass'] = node.klass
-
-        if node.klass == 'chapter':
-            # Add title offset to name map
-            ctoc_name_map['titleOffset'] = self._add_to_ctoc(t, self._ctoc_offset)
-            self._chapterCount += 1
-
-        elif node.klass == 'periodical' :
-            # Add title offset
-            ctoc_name_map['titleOffset'] = self._add_to_ctoc(t, self._ctoc_offset)
-
-            # Look for existing class entry 'periodical' in _ctoc_map
-            for entry in self._ctoc_map:
-                if entry['klass'] == 'periodical':
-                    # Use the pre-existing instance
-                    ctoc_name_map['classOffset'] = entry['classOffset']
-                    break
-                else :
-                    continue
-            else:
-                # class names should always be in CNCX 0 - no offset
-                ctoc_name_map['classOffset'] = self._add_to_ctoc(node.klass, 0)
-
-            self._periodicalCount += 1
-
-        elif node.klass == 'section' :
-            # Add title offset
-            ctoc_name_map['titleOffset'] = self._add_to_ctoc(t, self._ctoc_offset)
-
-            # Look for existing class entry 'section' in _ctoc_map
-            for entry in self._ctoc_map:
-                if entry['klass'] == 'section':
-                    # Use the pre-existing instance
-                    ctoc_name_map['classOffset'] = entry['classOffset']
-                    break
-                else :
-                    continue
-            else:
-                # class names should always be in CNCX 0 - no offset
-                ctoc_name_map['classOffset'] = self._add_to_ctoc(node.klass, 0)
-
-            self._sectionCount += 1
-
-        elif node.klass == 'article' :
-            # Add title offset/title
-            ctoc_name_map['titleOffset'] = self._add_to_ctoc(t, self._ctoc_offset)
-
-            # Look for existing class entry 'article' in _ctoc_map
-            for entry in self._ctoc_map:
-                if entry['klass'] == 'article':
-                    ctoc_name_map['classOffset'] = entry['classOffset']
-                    break
-                else :
-                    continue
-            else:
-                # class names should always be in CNCX 0 - no offset
-                ctoc_name_map['classOffset'] = self._add_to_ctoc(node.klass, 0)
-
-            # Add description offset/description
-            if node.description :
-                d = self._clean_text_value(node.description)
-                ctoc_name_map['descriptionOffset'] = self._add_to_ctoc(d, self._ctoc_offset)
-            else :
-                ctoc_name_map['descriptionOffset'] = None
-
-            # Add author offset/attribution
-            if node.author :
-                a = self._clean_text_value(node.author)
-                ctoc_name_map['authorOffset'] = self._add_to_ctoc(a, self._ctoc_offset)
-            else :
-                ctoc_name_map['authorOffset'] = None
-
-            self._articleCount += 1
-
-        else :
-            raise NotImplementedError( \
-            'writer._generate_ctoc.add_node: title: %s has unrecognized klass: %s, playOrder: %d' % \
-            (node.title, node.klass, node.play_order))
-
-        # append this node's name_map to map
-        self._ctoc_map.append(ctoc_name_map)
-
-    def _generate_ctoc(self):
-        # Generate the compiled TOC strings
-        # Each node has 1-4 CTOC entries:
-        #	Periodical (0xDF)
-        #		title, class
-        #	Section (0xFF)
-        #		title, class
-        #	Article (0x3F)
-        #		title, class, description, author
-        #	Chapter (0x0F)
-        #		title, class
-        #   nb: Chapters don't actually have @class, so we synthesize it
-        #   in reader._toc_from_navpoint
-
-        toc = self._oeb.toc
-        reduced_toc = []
-        self._ctoc_map = []				# per node dictionary of {class/title/desc/author} offsets
-        self._last_toc_entry = None
-        #ctoc = StringIO()
-        self._ctoc = StringIO()
-
-        # Track the individual node types
-        self._periodicalCount = 0
-        self._sectionCount = 0
-        self._articleCount = 0
-        self._chapterCount = 0
-
-        #first = True
-
-        if self._conforming_periodical_toc :
-            self._oeb.logger.info('Generating structured CTOC ...')
-            for (child) in toc.iter():
-                if self.opts.verbose > 2 :
-                    self._oeb.logger.info("  %s" % child)
-                self._add_structured_ctoc_node(child, self._ctoc)
-                #first = False
-
-        else :
-            self._oeb.logger.info('Generating flat CTOC ...')
-            previousOffset = -1
-            currentOffset = 0
-            for (i, child) in enumerate(toc.iterdescendants()):
-                # Only add chapters or articles at depth==1
-                # no class defaults to 'chapter'
-                if child.klass is None : child.klass = 'chapter'
-                if (child.klass == 'article' or child.klass == 'chapter') and child.depth() == 1 :
-                    if self.opts.verbose > 2 :
-                        self._oeb.logger.info("adding (klass:%s depth:%d) %s to flat ctoc" % \
-                                              (child.klass, child.depth(), child) )
-
-                    # Test to see if this child's offset is the same as the previous child's
-                    # offset, skip it
-                    h = child.href
-
-                    if h is None:
-                        self._oeb.logger.warn('  Ignoring TOC entry with no href:',
-                                child.title)
-                        continue
-                    if h not in self._id_offsets:
-                        self._oeb.logger.warn('  Ignoring missing TOC entry:',
-                                unicode(child))
-                        continue
-
-                    currentOffset = self._id_offsets[h]
-                    # print "_generate_ctoc: child offset: 0x%X" % currentOffset
-
-                    if currentOffset != previousOffset :
-                        self._add_flat_ctoc_node(child, self._ctoc)
-                        reduced_toc.append(child)
-                        previousOffset = currentOffset
-                    else :
-                        self._oeb.logger.warn("  Ignoring redundant href: %s in '%s'" % (h, child.title))
-
-                else :
-                    if self.opts.verbose > 2 :
-                        self._oeb.logger.info("skipping class: %s depth %d at position %d" % \
-                                              (child.klass, child.depth(),i))
-
-            # Update the TOC with our edited version
-            self._oeb.toc.nodes = reduced_toc
-
-        # Instantiate a MobiDocument(mobitype)
-        if (not self._periodicalCount and not self._sectionCount and not self._articleCount) or \
-            not self.opts.mobi_periodical :
-            mobiType = 0x002
-        elif self._periodicalCount:
-            pt = None
-            if self._oeb.metadata.publication_type:
-                x = unicode(self._oeb.metadata.publication_type[0]).split(':')
-                if len(x) > 1:
-                    pt = x[1]
-            mobiType = {'newspaper':0x101}.get(pt, 0x103)
-        else :
-            raise NotImplementedError('_generate_ctoc: Unrecognized document structured')
-
-        self._MobiDoc = MobiDocument(mobiType)
-
-        if self.opts.verbose > 2 :
-            structType = 'book'
-            if mobiType > 0x100 :
-                structType = 'flat periodical' if mobiType == 0x102 else 'structured periodical'
-            self._oeb.logger.info("Instantiating a %s MobiDocument of type 0x%X" % (structType, mobiType ) )
-            if mobiType > 0x100 :
-                self._oeb.logger.info("periodicalCount: %d  sectionCount: %d  articleCount: %d"% \
-                                    (self._periodicalCount, self._sectionCount, self._articleCount) )
-            else :
-                self._oeb.logger.info("chapterCount: %d" % self._chapterCount)
-
-        # Apparently the CTOC must end with a null byte
-        self._ctoc.write('\0')
-
-        ctoc = self._ctoc.getvalue()
-        rec_count = len(self._ctoc_records)
-        self._oeb.logger.info("  CNCX utilization: %d %s %.0f%% full" % \
-            (rec_count + 1, 'records, last record' if rec_count else 'record,',
-                len(ctoc)/655) )
-
-        return align_block(ctoc)
-
+    # Index nodes {{{
    def _write_periodical_node(self, indxt, indices, index, offset, length, count, firstSection, lastSection) :
        pos = 0xc0 + indxt.tell()
        indices.write(pack('>H', pos))								# Save the offset for IDXTIndices
@ -2172,48 +1962,8 @@ class MobiWriter(object):
        indxt.write(decint(self._ctoc_map[index]['titleOffset'], DECINT_FORWARD))	# vwi title offset in CNCX
        indxt.write(decint(0, DECINT_FORWARD))						# unknown byte

-    def _compute_offset_length(self, i, node, entries) :
-        h = node.href
-        if h not in self._id_offsets:
-            self._oeb.log.warning('Could not find TOC entry:', node.title)
-            return -1, -1
+    # }}}

-        offset = self._id_offsets[h]
-        length = None
-        # Calculate length based on next entry's offset
-        for sibling in entries[i+1:]:
-            h2 = sibling.href
-            if h2 in self._id_offsets:
-                offset2 = self._id_offsets[h2]
-                if offset2 > offset:
-                    length = offset2 - offset
-                    break
-        if length is None:
-            length = self._content_length - offset
-        return offset, length
-
-    def _establish_document_structure(self) :
-        documentType = None
-        try :
-            klass = self._ctoc_map[0]['klass']
-        except :
-            klass = None
-
-        if klass == 'chapter' or klass == None :
-            documentType = 'book'
-            if self.opts.verbose > 2 :
-                self._oeb.logger.info("Adding a MobiBook to self._MobiDoc")
-            self._MobiDoc.documentStructure = MobiBook()
-
-        elif klass == 'periodical' :
-            documentType = klass
-            if self.opts.verbose > 2 :
-                self._oeb.logger.info("Adding a MobiPeriodical to self._MobiDoc")
-            self._MobiDoc.documentStructure = MobiPeriodical(self._MobiDoc.getNextNode())
-            self._MobiDoc.documentStructure.startAddress = self._anchor_offset_kindle
-        else :
-            raise NotImplementedError('_establish_document_structure: unrecognized klass: %s' % klass)
-        return documentType

    def _generate_section_indices(self, child, currentSection, myPeriodical, myDoc ) :
        sectionTitles = list(child.iter())[1:]
@ -2491,6 +2241,270 @@ class MobiWriter(object):
            last_name, c = self._add_periodical_structured_articles(myDoc, indxt, indices)

        return align_block(indxt.getvalue()), c, align_block(indices.getvalue()), last_name
+    # }}}
+
+    # CTOC {{{
+    def _add_to_ctoc(self, ctoc_str, record_offset):
+        # Write vwilen + string to ctoc
+        # Return offset
+        # Is there enough room for this string in the current ctoc record?
+        if 0xfbf8 - self._ctoc.tell() < 2 + len(ctoc_str):
+            # flush this ctoc, start a new one
+            # print "closing ctoc_record at 0x%X" % self._ctoc.tell()
+            # print "starting new ctoc with '%-50.50s ...'" % ctoc_str
+            # pad with 00
+            pad = 0xfbf8 - self._ctoc.tell()
+            # print "padding %d bytes of 00" % pad
+            self._ctoc.write('\0' * (pad))
+            self._ctoc_records.append(self._ctoc.getvalue())
+            self._ctoc.truncate(0)
+            self._ctoc_offset += 0x10000
+            record_offset = self._ctoc_offset
+
+        offset = self._ctoc.tell() + record_offset
+        self._ctoc.write(decint(len(ctoc_str), DECINT_FORWARD) + ctoc_str)
+        return offset
+
+    def _add_flat_ctoc_node(self, node, ctoc, title=None):
+        # Process 'chapter' or 'article' nodes only, force either to 'chapter'
+        t = node.title if title is None else title
+        t = self._clean_text_value(t)
+        self._last_toc_entry = t
+
+        # Create an empty dictionary for this node
+        ctoc_name_map = {}
+
+        # article = chapter
+        if node.klass == 'article' :
+            ctoc_name_map['klass'] = 'chapter'
+        else :
+            ctoc_name_map['klass'] = node.klass
+
+        # Add title offset to name map
+        ctoc_name_map['titleOffset'] = self._add_to_ctoc(t, self._ctoc_offset)
+        self._chapterCount += 1
+
+        # append this node's name_map to map
+        self._ctoc_map.append(ctoc_name_map)
+
+        return
+
+    def _add_structured_ctoc_node(self, node, ctoc, title=None):
+        # Process 'periodical', 'section' and 'article'
+
+        # Fetch the offset referencing the current ctoc_record
+        if node.klass is None :
+            return
+        t = node.title if title is None else title
+        t = self._clean_text_value(t)
+        self._last_toc_entry = t
+
+        # Create an empty dictionary for this node
+        ctoc_name_map = {}
+
+        # Add the klass of this node
+        ctoc_name_map['klass'] = node.klass
+
+        if node.klass == 'chapter':
+            # Add title offset to name map
+            ctoc_name_map['titleOffset'] = self._add_to_ctoc(t, self._ctoc_offset)
+            self._chapterCount += 1
+
+        elif node.klass == 'periodical' :
+            # Add title offset
+            ctoc_name_map['titleOffset'] = self._add_to_ctoc(t, self._ctoc_offset)
+
+            # Look for existing class entry 'periodical' in _ctoc_map
+            for entry in self._ctoc_map:
+                if entry['klass'] == 'periodical':
+                    # Use the pre-existing instance
+                    ctoc_name_map['classOffset'] = entry['classOffset']
+                    break
+                else :
+                    continue
+            else:
+                # class names should always be in CNCX 0 - no offset
+                ctoc_name_map['classOffset'] = self._add_to_ctoc(node.klass, 0)
+
+            self._periodicalCount += 1
+
+        elif node.klass == 'section' :
+            # Add title offset
+            ctoc_name_map['titleOffset'] = self._add_to_ctoc(t, self._ctoc_offset)
+
+            # Look for existing class entry 'section' in _ctoc_map
+            for entry in self._ctoc_map:
+                if entry['klass'] == 'section':
+                    # Use the pre-existing instance
+                    ctoc_name_map['classOffset'] = entry['classOffset']
+                    break
+                else :
+                    continue
+            else:
+                # class names should always be in CNCX 0 - no offset
+                ctoc_name_map['classOffset'] = self._add_to_ctoc(node.klass, 0)
+
+            self._sectionCount += 1
+
+        elif node.klass == 'article' :
+            # Add title offset/title
+            ctoc_name_map['titleOffset'] = self._add_to_ctoc(t, self._ctoc_offset)
+
+            # Look for existing class entry 'article' in _ctoc_map
+            for entry in self._ctoc_map:
+                if entry['klass'] == 'article':
+                    ctoc_name_map['classOffset'] = entry['classOffset']
+                    break
+                else :
+                    continue
+            else:
+                # class names should always be in CNCX 0 - no offset
+                ctoc_name_map['classOffset'] = self._add_to_ctoc(node.klass, 0)
+
+            # Add description offset/description
+            if node.description :
+                d = self._clean_text_value(node.description)
+                ctoc_name_map['descriptionOffset'] = self._add_to_ctoc(d, self._ctoc_offset)
+            else :
+                ctoc_name_map['descriptionOffset'] = None
+
+            # Add author offset/attribution
+            if node.author :
+                a = self._clean_text_value(node.author)
+                ctoc_name_map['authorOffset'] = self._add_to_ctoc(a, self._ctoc_offset)
+            else :
+                ctoc_name_map['authorOffset'] = None
+
+            self._articleCount += 1
+
+        else :
+            raise NotImplementedError( \
+            'writer._generate_ctoc.add_node: title: %s has unrecognized klass: %s, playOrder: %d' % \
+            (node.title, node.klass, node.play_order))
+
+        # append this node's name_map to map
+        self._ctoc_map.append(ctoc_name_map)
+
+    def _generate_ctoc(self):
+        # Generate the compiled TOC strings
+        # Each node has 1-4 CTOC entries:
+        #	Periodical (0xDF)
+        #		title, class
+        #	Section (0xFF)
+        #		title, class
+        #	Article (0x3F)
+        #		title, class, description, author
+        #	Chapter (0x0F)
+        #		title, class
+        #   nb: Chapters don't actually have @class, so we synthesize it
+        #   in reader._toc_from_navpoint
+
+        toc = self._oeb.toc
+        reduced_toc = []
+        self._ctoc_map = []				# per node dictionary of {class/title/desc/author} offsets
+        self._last_toc_entry = None
+        #ctoc = StringIO()
+        self._ctoc = StringIO()
+
+        # Track the individual node types
+        self._periodicalCount = 0
+        self._sectionCount = 0
+        self._articleCount = 0
+        self._chapterCount = 0
+
+        #first = True
+
+        if self._conforming_periodical_toc :
+            self._oeb.logger.info('Generating structured CTOC ...')
+            for (child) in toc.iter():
+                if self.opts.verbose > 2 :
+                    self._oeb.logger.info("  %s" % child)
+                self._add_structured_ctoc_node(child, self._ctoc)
+                #first = False
+
+        else :
+            self._oeb.logger.info('Generating flat CTOC ...')
+            previousOffset = -1
+            currentOffset = 0
+            for (i, child) in enumerate(toc.iterdescendants()):
+                # Only add chapters or articles at depth==1
+                # no class defaults to 'chapter'
+                if child.klass is None : child.klass = 'chapter'
+                if (child.klass == 'article' or child.klass == 'chapter') and child.depth() == 1 :
+                    if self.opts.verbose > 2 :
+                        self._oeb.logger.info("adding (klass:%s depth:%d) %s to flat ctoc" % \
+                                              (child.klass, child.depth(), child) )
+
+                    # Test to see if this child's offset is the same as the previous child's
+                    # offset, skip it
+                    h = child.href
+
+                    if h is None:
+                        self._oeb.logger.warn('  Ignoring TOC entry with no href:',
+                                child.title)
+                        continue
+                    if h not in self._id_offsets:
+                        self._oeb.logger.warn('  Ignoring missing TOC entry:',
+                                unicode(child))
+                        continue
+
+                    currentOffset = self._id_offsets[h]
+                    # print "_generate_ctoc: child offset: 0x%X" % currentOffset
+
+                    if currentOffset != previousOffset :
+                        self._add_flat_ctoc_node(child, self._ctoc)
+                        reduced_toc.append(child)
+                        previousOffset = currentOffset
+                    else :
+                        self._oeb.logger.warn("  Ignoring redundant href: %s in '%s'" % (h, child.title))
+
+                else :
+                    if self.opts.verbose > 2 :
+                        self._oeb.logger.info("skipping class: %s depth %d at position %d" % \
+                                              (child.klass, child.depth(),i))
+
+            # Update the TOC with our edited version
+            self._oeb.toc.nodes = reduced_toc
+
+        # Instantiate a MobiDocument(mobitype)
+        if (not self._periodicalCount and not self._sectionCount and not self._articleCount) or \
+            not self.opts.mobi_periodical :
+            mobiType = 0x002
+        elif self._periodicalCount:
+            pt = None
+            if self._oeb.metadata.publication_type:
+                x = unicode(self._oeb.metadata.publication_type[0]).split(':')
+                if len(x) > 1:
+                    pt = x[1]
+            mobiType = {'newspaper':0x101}.get(pt, 0x103)
+        else :
+            raise NotImplementedError('_generate_ctoc: Unrecognized document structured')
+
+        self._MobiDoc = MobiDocument(mobiType)
+
+        if self.opts.verbose > 2 :
+            structType = 'book'
+            if mobiType > 0x100 :
+                structType = 'flat periodical' if mobiType == 0x102 else 'structured periodical'
+            self._oeb.logger.info("Instantiating a %s MobiDocument of type 0x%X" % (structType, mobiType ) )
+            if mobiType > 0x100 :
+                self._oeb.logger.info("periodicalCount: %d  sectionCount: %d  articleCount: %d"% \
+                                    (self._periodicalCount, self._sectionCount, self._articleCount) )
+            else :
+                self._oeb.logger.info("chapterCount: %d" % self._chapterCount)
+
+        # Apparently the CTOC must end with a null byte
+        self._ctoc.write('\0')
+
+        ctoc = self._ctoc.getvalue()
+        rec_count = len(self._ctoc_records)
+        self._oeb.logger.info("  CNCX utilization: %d %s %.0f%% full" % \
+            (rec_count + 1, 'records, last record' if rec_count else 'record,',
+                len(ctoc)/655) )
+
+        return align_block(ctoc)
+
+    # }}}

 class HTMLRecordData(object):
    """ A data structure containing indexing/navigation data for an HTML record """
--- a/src/calibre/ebooks/mobi/writer2/init.py
+++ b/src/calibre/ebooks/mobi/writer2/init.py
@ -0,0 +1,16 @@
+#!/usr/bin/env python
+# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai
+from __future__ import (unicode_literals, division, absolute_import,
+                        print_function)
+
+__license__   = 'GPL v3'
+__copyright__ = '2011, Kovid Goyal <kovid@kovidgoyal.net>'
+__docformat__ = 'restructuredtext en'
+
+
+UNCOMPRESSED = 1
+PALMDOC = 2
+HUFFDIC = 17480
+PALM_MAX_IMAGE_SIZE = 63 * 1024
+RECORD_SIZE = 0x1000 # 4096 (Text record size (uncompressed))
+
--- a/src/calibre/ebooks/mobi/writer2/indexer.py
+++ b/src/calibre/ebooks/mobi/writer2/indexer.py
@ -0,0 +1,727 @@
+#!/usr/bin/env python
+# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai
+from __future__ import (unicode_literals, division, absolute_import,
+                        print_function)
+from future_builtins import filter
+
+__license__   = 'GPL v3'
+__copyright__ = '2011, Kovid Goyal <kovid@kovidgoyal.net>'
+__docformat__ = 'restructuredtext en'
+
+from struct import pack
+from cStringIO import StringIO
+from collections import OrderedDict, defaultdict
+
+from calibre.ebooks.mobi.writer2 import RECORD_SIZE
+from calibre.ebooks.mobi.utils import (encint, encode_number_as_hex,
+        encode_tbs, align_block, utf8_text, detect_periodical)
+
+
+class CNCX(object): # {{{
+
+    '''
+    Create the CNCX records. These are records containing all the strings from
+    the NCX. Each record is of the form: <vwi string size><utf-8 encoded
+    string>
+    '''
+
+    MAX_STRING_LENGTH = 500
+
+    def __init__(self, toc, is_periodical):
+        self.strings = OrderedDict()
+
+        for item in toc.iterdescendants(breadth_first=True):
+            self.strings[item.title] = 0
+            if is_periodical:
+                self.strings[item.klass] = 0
+
+        self.records = []
+        offset = 0
+        buf = StringIO()
+        for key in tuple(self.strings.iterkeys()):
+            utf8 = utf8_text(key[:self.MAX_STRING_LENGTH])
+            l = len(utf8)
+            sz_bytes = encint(l)
+            raw = sz_bytes + utf8
+            if 0xfbf8 - buf.tell() < 6 + len(raw):
+                # Records in PDB files cannot be larger than 0x10000, so we
+                # stop well before that.
+                pad = 0xfbf8 - self._ctoc.tell()
+                buf.write(b'\0' * pad)
+                self.records.append(buf.getvalue())
+                buf.truncate(0)
+                offset = len(self.records) * 0x10000
+            buf.write(raw)
+            self.strings[key] = offset
+            offset += len(raw)
+
+        self.records.append(align_block(buf.getvalue()))
+
+    def __getitem__(self, string):
+        return self.strings[string]
+# }}}
+
+class IndexEntry(object): # {{{
+
+    TAG_VALUES = {
+            'offset': 1,
+            'size': 2,
+            'label_offset': 3,
+            'depth': 4,
+            'class_offset': 5,
+            'parent_index': 21,
+            'first_child_index': 22,
+            'last_child_index': 23,
+    }
+    RTAG_MAP = {v:k for k, v in TAG_VALUES.iteritems()}
+
+    BITMASKS = [1, 2, 3, 4, 5, 21, 22, 23,]
+
+    def __init__(self, offset, label_offset, depth=0, class_offset=None):
+        self.offset, self.label_offset = offset, label_offset
+        self.depth, self.class_offset = depth, class_offset
+
+        self.length = 0
+        self.index = 0
+
+        self.parent_index = None
+        self.first_child_index = None
+        self.last_child_index = None
+
+    def __repr__(self):
+        return ('IndexEntry(offset=%r, depth=%r, length=%r, index=%r,'
+                ' parent_index=%r)')%(self.offset, self.depth, self.length,
+                        self.index, self.parent_index)
+
+    @dynamic_property
+    def size(self):
+        def fget(self): return self.length
+        def fset(self, val): self.length = val
+        return property(fget=fget, fset=fset, doc='Alias for length')
+
+    @classmethod
+    def tagx_block(cls, for_periodical=True):
+        buf = bytearray()
+
+        def add_tag(tag, num_values=1):
+            buf.append(tag)
+            buf.append(num_values)
+            # bitmask
+            buf.append(1 << (cls.BITMASKS.index(tag)))
+            # eof
+            buf.append(0)
+
+        for tag in xrange(1, 5):
+            add_tag(tag)
+
+        if for_periodical:
+            for tag in (5, 21, 22, 23):
+                add_tag(tag)
+
+        # End of TAGX record
+        for i in xrange(3): buf.append(0)
+        buf.append(1)
+
+        header = b'TAGX'
+        header += pack(b'>I', 12+len(buf)) # table length
+        header += pack(b'>I', 1) # control byte count
+
+        return header + bytes(buf)
+
+    @property
+    def next_offset(self):
+        return self.offset + self.length
+
+    @property
+    def tag_nums(self):
+        for i in range(1, 5):
+            yield i
+        for attr in ('class_offset', 'parent_index', 'first_child_index',
+                'last_child_index'):
+            if getattr(self, attr) is not None:
+                yield self.TAG_VALUES[attr]
+
+    @property
+    def entry_type(self):
+        ans = 0
+        for tag in self.tag_nums:
+            ans |= (1 << self.BITMASKS.index(tag)) # 1 << x == 2**x
+        return ans
+
+    @property
+    def bytestring(self):
+        buf = StringIO()
+        buf.write(encode_number_as_hex(self.index))
+        et = self.entry_type
+        buf.write(bytes(bytearray([et])))
+
+        for tag in self.tag_nums:
+            attr = self.RTAG_MAP[tag]
+            val = getattr(self, attr)
+            buf.write(encint(val))
+
+        ans = buf.getvalue()
+        return ans
+
+# }}}
+
+class TBS(object): # {{{
+
+    '''
+    Take the list of index nodes starting/ending on a record and calculate the
+    trailing byte sequence for the record.
+    '''
+
+    def __init__(self, data, is_periodical, first=False, section_map={},
+            after_first=False):
+        self.section_map = section_map
+        #import pprint
+        #pprint.pprint(data)
+        #print()
+        if is_periodical:
+            # The starting bytes.
+            # The value is zero which I think indicates the periodical
+            # index entry. The values for the various flags seem to be
+            # unused. If the 0b100 is present, it means that the record
+            # deals with section 1 (or is the final record with section
+            # transitions).
+            self.type_010 = encode_tbs(0, {0b010: 0}, flag_size=3)
+            self.type_011 = encode_tbs(0, {0b010: 0, 0b001: 0},
+                    flag_size=3)
+            self.type_110 = encode_tbs(0, {0b100: 2, 0b010: 0},
+                    flag_size=3)
+            self.type_111 = encode_tbs(0, {0b100: 2, 0b010: 0, 0b001:
+                0}, flag_size=3)
+
+            if not data:
+                byts = b''
+                if after_first:
+                    # This can happen if a record contains only text between
+                    # the periodical start and the first section
+                    byts = self.type_011
+                self.bytestring = byts
+            else:
+                depth_map = defaultdict(list)
+                for x in ('starts', 'ends', 'completes'):
+                    for idx in data[x]:
+                        depth_map[idx.depth].append(idx)
+                for l in depth_map.itervalues():
+                    l.sort(key=lambda x:x.offset)
+                self.periodical_tbs(data, first, depth_map)
+        else:
+            if not data:
+                self.bytestring = b''
+            else:
+                self.book_tbs(data, first)
+
+    def periodical_tbs(self, data, first, depth_map):
+        buf = StringIO()
+
+        has_section_start = (depth_map[1] and
+                set(depth_map[1]).intersection(set(data['starts'])))
+        spanner = data['spans']
+        parent_section_index = -1
+
+        if depth_map[0]:
+            # We have a terminal record
+
+            # Find the first non periodical node
+            first_node = None
+            for nodes in (depth_map[1], depth_map[2]):
+                for node in nodes:
+                    if (first_node is None or (node.offset, node.depth) <
+                            (first_node.offset, first_node.depth)):
+                        first_node = node
+
+            typ = (self.type_110 if has_section_start else self.type_010)
+
+            # parent_section_index is needed for the last record
+            if first_node is not None and first_node.depth > 0:
+                parent_section_index = (first_node.index if first_node.depth
+                        == 1 else first_node.parent_index)
+            else:
+                parent_section_index = max(self.section_map.iterkeys())
+
+        else:
+            # Non terminal record
+
+            if spanner is not None:
+                # record is spanned by a single article
+                parent_section_index = spanner.parent_index
+                typ = (self.type_110 if parent_section_index == 1 else
+                        self.type_010)
+            elif not depth_map[1]:
+                # has only article nodes, i.e. spanned by a section
+                parent_section_index = depth_map[2][0].parent_index
+                typ = (self.type_111 if parent_section_index == 1 else
+                        self.type_010)
+            else:
+                # has section transitions
+                if depth_map[2]:
+                    parent_section_index = depth_map[2][0].parent_index
+                else:
+                    parent_section_index = depth_map[1][0].index
+                typ = self.type_011
+
+        buf.write(typ)
+
+        if typ not in (self.type_110, self.type_111) and parent_section_index > 0:
+            # Write starting section information
+            if spanner is None:
+                num_articles = len([a for a in depth_map[1] if a.parent_index
+                    == parent_section_index])
+                extra = {}
+                if num_articles > 1:
+                    extra = {0b0100: num_articles}
+            else:
+                extra = {0b0001: 0}
+            buf.write(encode_tbs(parent_section_index, extra))
+
+        if spanner is None:
+            articles = depth_map[2]
+            sections = set([self.section_map[a.parent_index] for a in
+                articles])
+            sections = sorted(sections, key=lambda x:x.offset)
+            section_map = {s:[a for a in articles if a.parent_index ==
+                s.index] for s in sections}
+            for i, section in enumerate(sections):
+                # All the articles in this record that belong to section
+                articles = section_map[section]
+                first_article = articles[0]
+                last_article = articles[-1]
+                num = len(articles)
+
+                try:
+                    next_sec = sections[i+1]
+                except:
+                    next_sec = None
+
+                extra = {}
+                if num > 1:
+                    extra[0b0100] = num
+                if i == 0 and next_sec is not None:
+                    # Write offset to next section from start of record
+                    # For some reason kindlegen only writes this offset
+                    # for the first section transition. Imitate it.
+                    extra[0b0001] = next_sec.offset - data['offset']
+
+                buf.write(encode_tbs(first_article.index-section.index, extra))
+
+                if next_sec is not None:
+                    buf.write(encode_tbs(last_article.index-next_sec.index,
+                        {0b1000: 0}))
+        else:
+            buf.write(encode_tbs(spanner.index - parent_section_index,
+                {0b0001: 0}))
+
+        self.bytestring = buf.getvalue()
+
+    def book_tbs(self, data, first):
+        self.bytestring = b''
+# }}}
+
+class Indexer(object): # {{{
+
+    def __init__(self, serializer, number_of_text_records,
+            size_of_last_text_record, opts, oeb):
+        self.serializer = serializer
+        self.number_of_text_records = number_of_text_records
+        self.text_size = (RECORD_SIZE * (self.number_of_text_records-1) +
+                            size_of_last_text_record)
+        self.oeb = oeb
+        self.log = oeb.log
+        self.opts = opts
+
+        self.is_periodical = detect_periodical(self.oeb.toc, self.log)
+        self.log('Generating MOBI index for a %s'%('periodical' if
+            self.is_periodical else 'book'))
+        self.is_flat_periodical = False
+        if self.is_periodical:
+            periodical_node = iter(oeb.toc).next()
+            sections = tuple(periodical_node)
+            self.is_flat_periodical = len(sections) == 1
+
+        self.records = []
+
+        self.cncx = CNCX(oeb.toc, self.is_periodical)
+
+        if self.is_periodical:
+            self.indices = self.create_periodical_index()
+        else:
+            self.indices = self.create_book_index()
+
+        self.records.append(self.create_index_record())
+        self.records.insert(0, self.create_header())
+        self.records.extend(self.cncx.records)
+
+        self.calculate_trailing_byte_sequences()
+
+    def create_index_record(self): # {{{
+        header_length = 192
+        buf = StringIO()
+        indices = self.indices
+
+        # Write index entries
+        offsets = []
+        for i in indices:
+            offsets.append(buf.tell())
+            buf.write(i.bytestring)
+        index_block = align_block(buf.getvalue())
+
+        # Write offsets to index entries as an IDXT block
+        idxt_block = b'IDXT'
+        buf.truncate(0)
+        for offset in offsets:
+            buf.write(pack(b'>H', header_length+offset))
+        idxt_block = align_block(idxt_block + buf.getvalue())
+        body = index_block + idxt_block
+
+        header = b'INDX'
+        buf.truncate(0)
+        buf.write(pack(b'>I', header_length))
+        buf.write(b'\0'*4) # Unknown
+        buf.write(pack(b'>I', 1)) # Header type? Or index record number?
+        buf.write(b'\0'*4) # Unknown
+        # IDXT block offset
+        buf.write(pack(b'>I', header_length + len(index_block)))
+        # Number of index entries
+        buf.write(pack(b'>I', len(offsets)))
+        # Unknown
+        buf.write(b'\xff'*8)
+        # Unknown
+        buf.write(b'\0'*156)
+
+        header += buf.getvalue()
+
+        ans = header + body
+        if len(ans) > 0x10000:
+            raise ValueError('Too many entries (%d) in the TOC'%len(offsets))
+        return ans
+    # }}}
+
+    def create_header(self): # {{{
+        buf = StringIO()
+        tagx_block = IndexEntry.tagx_block(self.is_periodical)
+        header_length = 192
+
+        # Ident 0 - 4
+        buf.write(b'INDX')
+
+        # Header length 4 - 8
+        buf.write(pack(b'>I', header_length))
+
+        # Unknown 8-16
+        buf.write(b'\0'*8)
+
+        # Index type: 0 - normal, 2 - inflection 16 - 20
+        buf.write(pack(b'>I', 2))
+
+        # IDXT offset 20-24
+        buf.write(pack(b'>I', 0)) # Filled in later
+
+        # Number of index records 24-28
+        buf.write(pack(b'>I', len(self.records)))
+
+        # Index Encoding 28-32
+        buf.write(pack(b'>I', 65001)) # utf-8
+
+        # Unknown 32-36
+        buf.write(b'\xff'*4)
+
+        # Number of index entries 36-40
+        buf.write(pack(b'>I', len(self.indices)))
+
+        # ORDT offset 40-44
+        buf.write(pack(b'>I', 0))
+
+        # LIGT offset 44-48
+        buf.write(pack(b'>I', 0))
+
+        # Number of LIGT entries 48-52
+        buf.write(pack(b'>I', 0))
+
+        # Number of CNCX records 52-56
+        buf.write(pack(b'>I', len(self.cncx.records)))
+
+        # Unknown 56-180
+        buf.write(b'\0'*124)
+
+        # TAGX offset 180-184
+        buf.write(pack(b'>I', header_length))
+
+        # Unknown 184-192
+        buf.write(b'\0'*8)
+
+        # TAGX block
+        buf.write(tagx_block)
+
+        num = len(self.indices)
+
+        # The index of the last entry in the NCX
+        buf.write(encode_number_as_hex(num-1))
+
+        # The number of entries in the NCX
+        buf.write(pack(b'>H', num))
+
+        # Padding
+        pad = (4 - (buf.tell()%4))%4
+        if pad:
+            buf.write(b'\0'*pad)
+
+        idxt_offset = buf.tell()
+
+        buf.write(b'IDXT')
+        buf.write(pack(b'>H', header_length + len(tagx_block)))
+        buf.write(b'\0')
+        buf.seek(20)
+        buf.write(pack(b'>I', idxt_offset))
+
+        return align_block(buf.getvalue())
+    # }}}
+
+    def create_book_index(self): # {{{
+        indices = []
+        seen = set()
+        id_offsets = self.serializer.id_offsets
+
+        for node in self.oeb.toc.iterdescendants():
+            try:
+                offset = id_offsets[node.href]
+                label = self.cncx[node.title]
+            except:
+                self.log.warn('TOC item %s not found in document'%node.href)
+                continue
+            if offset in seen:
+                continue
+            seen.add(offset)
+            index = IndexEntry(offset, label)
+            indices.append(index)
+
+        indices.sort(key=lambda x:x.offset)
+
+        # Set lengths
+        for i, index in enumerate(indices):
+            try:
+                next_offset = indices[i+1].offset
+            except:
+                next_offset = self.serializer.body_end_offset
+            index.length = next_offset - index.offset
+
+        # Remove empty nodes
+        indices = [i for i in indices if i.length > 0]
+
+        # Set index values
+        for i, index in enumerate(indices):
+            index.index = i
+
+        # Set lengths again to close up any gaps left by filtering
+        for i, index in enumerate(indices):
+            try:
+                next_offset = indices[i+1].offset
+            except:
+                next_offset = self.serializer.body_end_offset
+            index.length = next_offset - index.offset
+
+        return indices
+
+    # }}}
+
+    def create_periodical_index(self): # {{{
+        periodical_node = iter(self.oeb.toc).next()
+        periodical_node_offset = self.serializer.body_start_offset
+        periodical_node_size = (self.serializer.body_end_offset -
+                periodical_node_offset)
+
+        normalized_sections = []
+
+        id_offsets = self.serializer.id_offsets
+
+        periodical = IndexEntry(periodical_node_offset,
+                self.cncx[periodical_node.title],
+                class_offset=self.cncx[periodical_node.klass])
+        periodical.length = periodical_node_size
+        periodical.first_child_index = 1
+
+        seen_sec_offsets = set()
+        seen_art_offsets = set()
+
+        for sec in periodical_node:
+            normalized_articles = []
+            try:
+                offset = id_offsets[sec.href]
+                label = self.cncx[sec.title]
+                klass = self.cncx[sec.klass]
+            except:
+                continue
+            if offset in seen_sec_offsets:
+                continue
+            seen_sec_offsets.add(offset)
+            section = IndexEntry(offset, label, class_offset=klass, depth=1)
+            section.parent_index = 0
+            for art in sec:
+                try:
+                    offset = id_offsets[art.href]
+                    label = self.cncx[art.title]
+                    klass = self.cncx[art.klass]
+                except:
+                    continue
+                if offset in seen_art_offsets:
+                    continue
+                seen_art_offsets.add(offset)
+                article = IndexEntry(offset, label, class_offset=klass,
+                        depth=2)
+                normalized_articles.append(article)
+            if normalized_articles:
+                normalized_articles.sort(key=lambda x:x.offset)
+                normalized_sections.append((section, normalized_articles))
+
+        normalized_sections.sort(key=lambda x:x[0].offset)
+
+        # Set lengths
+        for s, x in enumerate(normalized_sections):
+            sec, normalized_articles = x
+            try:
+                sec.length = normalized_sections[s+1][0].offset - sec.offset
+            except:
+                sec.length = self.serializer.body_end_offset - sec.offset
+            for i, art in enumerate(normalized_articles):
+                try:
+                    art.length = normalized_articles[i+1].offset - art.offset
+                except:
+                    art.length = sec.offset + sec.length - art.offset
+
+        # Filter
+        for i, x in list(enumerate(normalized_sections)):
+            sec, normalized_articles = x
+            normalized_articles = list(filter(lambda x: x.length > 0,
+                normalized_articles))
+            normalized_sections[i] = (sec, normalized_articles)
+
+        normalized_sections = list(filter(lambda x: x[0].length > 0 and x[1],
+            normalized_sections))
+
+        # Set indices
+        i = 0
+        for sec, articles in normalized_sections:
+            i += 1
+            sec.index = i
+            sec.parent_index = 0
+
+        for sec, articles in normalized_sections:
+            for art in articles:
+                i += 1
+                art.index = i
+                art.parent_index = sec.index
+
+        for sec, normalized_articles in normalized_sections:
+            sec.first_child_index = normalized_articles[0].index
+            sec.last_child_index = normalized_articles[-1].index
+
+        # Set lengths again to close up any gaps left by filtering
+        for s, x in enumerate(normalized_sections):
+            sec, articles = x
+            try:
+                next_offset = normalized_sections[s+1][0].offset
+            except:
+                next_offset = self.serializer.body_end_offset
+            sec.length = next_offset - sec.offset
+
+            for a, art in enumerate(articles):
+                try:
+                    next_offset = articles[a+1].offset
+                except:
+                    next_offset = sec.next_offset
+                art.length = next_offset - art.offset
+
+        # Sanity check
+        for s, x in enumerate(normalized_sections):
+            sec, articles = x
+            try:
+                next_sec = normalized_sections[s+1][0]
+            except:
+                if (sec.length == 0 or sec.next_offset !=
+                        self.serializer.body_end_offset):
+                    raise ValueError('Invalid section layout')
+            else:
+                if next_sec.offset != sec.next_offset or sec.length == 0:
+                    raise ValueError('Invalid section layout')
+            for a, art in enumerate(articles):
+                try:
+                    next_art = articles[a+1]
+                except:
+                    if (art.length == 0 or art.next_offset !=
+                            sec.next_offset):
+                        raise ValueError('Invalid article layout')
+                else:
+                    if art.length == 0 or art.next_offset != next_art.offset:
+                        raise ValueError('Invalid article layout')
+
+        # Flatten
+        indices = [periodical]
+        for sec, articles in normalized_sections:
+            indices.append(sec)
+            periodical.last_child_index = sec.index
+
+        for sec, articles in normalized_sections:
+            for a in articles:
+                indices.append(a)
+
+        return indices
+    # }}}
+
+    # TBS {{{
+    def calculate_trailing_byte_sequences(self):
+        self.tbs_map = {}
+        found_node = False
+        sections = [i for i in self.indices if i.depth == 1]
+        section_map = OrderedDict((i.index, i) for i in
+                sorted(sections, key=lambda x:x.offset))
+
+        deepest = max(i.depth for i in self.indices)
+
+        for i in xrange(self.number_of_text_records):
+            offset = i * RECORD_SIZE
+            next_offset = offset + RECORD_SIZE
+            data = {'ends':[], 'completes':[], 'starts':[],
+                    'spans':None, 'offset':offset, 'record_number':i+1}
+
+            for index in self.indices:
+                if index.offset >= next_offset:
+                    # Node starts after current record
+                    if index.depth == deepest:
+                        break
+                    else:
+                        continue
+                if index.next_offset <= offset:
+                    # Node ends before current record
+                    continue
+                if index.offset >= offset:
+                    # Node starts in current record
+                    if index.next_offset <= next_offset:
+                        # Node ends in current record
+                        data['completes'].append(index)
+                    else:
+                        data['starts'].append(index)
+                else:
+                    # Node starts before current records
+                    if index.next_offset <= next_offset:
+                        # Node ends in current record
+                        data['ends'].append(index)
+                    elif index.depth == deepest:
+                        data['spans'] = index
+
+            if (data['ends'] or data['completes'] or data['starts'] or
+                    data['spans'] is not None):
+                self.tbs_map[i+1] = TBS(data, self.is_periodical, first=not
+                        found_node, section_map=section_map)
+                found_node = True
+            else:
+                self.tbs_map[i+1] = TBS({}, self.is_periodical, first=False,
+                        after_first=found_node, section_map=section_map)
+
+    def get_trailing_byte_sequence(self, num):
+        return self.tbs_map[num].bytestring
+    # }}}
+
+# }}}
+
--- a/src/calibre/ebooks/mobi/writer2/main.py
+++ b/src/calibre/ebooks/mobi/writer2/main.py
@ -0,0 +1,565 @@
+#!/usr/bin/env python
+# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai
+from __future__ import (unicode_literals, division, absolute_import,
+                        print_function)
+
+__license__   = 'GPL v3'
+__copyright__ = '2011, Kovid Goyal <kovid@kovidgoyal.net>'
+__docformat__ = 'restructuredtext en'
+
+import re, random, time
+from cStringIO import StringIO
+from struct import pack
+
+from calibre.ebooks import normalize
+from calibre.ebooks.oeb.base import OEB_RASTER_IMAGES
+from calibre.ebooks.mobi.writer2.serializer import Serializer
+from calibre.ebooks.compression.palmdoc import compress_doc
+from calibre.ebooks.mobi.langcodes import iana2mobi
+from calibre.utils.filenames import ascii_filename
+from calibre.ebooks.mobi.writer2 import (PALMDOC, UNCOMPRESSED, RECORD_SIZE)
+from calibre.ebooks.mobi.utils import (rescale_image, encint,
+        encode_trailing_data)
+from calibre.ebooks.mobi.writer2.indexer import Indexer
+
+EXTH_CODES = {
+    'creator': 100,
+    'publisher': 101,
+    'description': 103,
+    'identifier': 104,
+    'subject': 105,
+    'pubdate': 106,
+    'review': 107,
+    'contributor': 108,
+    'rights': 109,
+    'type': 111,
+    'source': 112,
+    'title': 503,
+    }
+
+# Disabled as I dont care about uncrossable breaks
+WRITE_UNCROSSABLE_BREAKS = False
+
+MAX_THUMB_SIZE = 16 * 1024
+MAX_THUMB_DIMEN = (180, 240)
+
+class MobiWriter(object):
+    COLLAPSE_RE = re.compile(r'[ \t\r\n\v]+')
+
+    def __init__(self, opts, write_page_breaks_after_item=True):
+        self.opts = opts
+        self.write_page_breaks_after_item = write_page_breaks_after_item
+        self.compression = UNCOMPRESSED if opts.dont_compress else PALMDOC
+        self.prefer_author_sort = opts.prefer_author_sort
+        self.last_text_record_idx = 1
+
+    def __call__(self, oeb, path_or_stream):
+        self.log = oeb.log
+        if hasattr(path_or_stream, 'write'):
+            return self.dump_stream(oeb, path_or_stream)
+        with open(path_or_stream, 'w+b') as stream:
+            return self.dump_stream(oeb, stream)
+
+    def write(self, *args):
+        for datum in args:
+            self.stream.write(datum)
+
+    def tell(self):
+        return self.stream.tell()
+
+    def dump_stream(self, oeb, stream):
+        self.oeb = oeb
+        self.stream = stream
+        self.records = [None]
+        self.generate_content()
+        self.generate_record0()
+        self.write_header()
+        self.write_content()
+
+    def generate_content(self):
+        self.map_image_names()
+        self.generate_text()
+        # Index records come after text records
+        self.generate_index()
+        self.write_uncrossable_breaks()
+        # Image records come after index records
+        self.generate_images()
+
+    # Indexing {{{
+    def generate_index(self):
+        self.primary_index_record_idx = None
+        try:
+            self.indexer = Indexer(self.serializer, self.last_text_record_idx,
+                    len(self.records[self.last_text_record_idx]),
+                    self.opts, self.oeb)
+        except:
+            self.log.exception('Failed to generate MOBI index:')
+        else:
+            self.primary_index_record_idx = len(self.records)
+            for i in xrange(len(self.records)):
+                if i == 0: continue
+                tbs = self.indexer.get_trailing_byte_sequence(i)
+                self.records[i] += encode_trailing_data(tbs)
+            self.records.extend(self.indexer.records)
+
+    @property
+    def is_periodical(self):
+        return (self.primary_index_record_idx is None or not
+                self.indexer.is_periodical)
+
+    # }}}
+
+    def write_uncrossable_breaks(self): # {{{
+        '''
+        Write information about uncrossable breaks (non linear items in
+        the spine.
+        '''
+        if not WRITE_UNCROSSABLE_BREAKS:
+            return
+
+        breaks = self.serializer.breaks
+
+        for i in xrange(1, self.last_text_record_idx+1):
+            offset = i * RECORD_SIZE
+            pbreak = 0
+            running = offset
+
+            buf = StringIO()
+
+            while breaks and (breaks[0] - offset) < RECORD_SIZE:
+                pbreak = (breaks.pop(0) - running) >> 3
+                encoded = encint(pbreak)
+                buf.write(encoded)
+                running += pbreak << 3
+            encoded = encode_trailing_data(buf.getvalue())
+            self.records[i] += encoded
+    # }}}
+
+    # Images {{{
+    def map_image_names(self):
+        '''
+        Map image names to record indices, ensuring that the masthead image if
+        present has index number 1.
+        '''
+        index = 1
+        self.images = images = {}
+        mh_href = None
+
+        if 'masthead' in self.oeb.guide:
+            mh_href = self.oeb.guide['masthead'].href
+            images[mh_href] = 1
+            index += 1
+
+        for item in self.oeb.manifest.values():
+            if item.media_type in OEB_RASTER_IMAGES:
+                if item.href == mh_href: continue
+                images[item.href] = index
+                index += 1
+
+    def generate_images(self):
+        self.oeb.logger.info('Serializing images...')
+        images = [(index, href) for href, index in self.images.iteritems()]
+        images.sort()
+        self.first_image_record = None
+        for _, href in images:
+            item = self.oeb.manifest.hrefs[href]
+            try:
+                data = rescale_image(item.data)
+            except:
+                self.oeb.logger.warn('Bad image file %r' % item.href)
+                continue
+            finally:
+                item.unload_data_from_memory()
+            self.records.append(data)
+            if self.first_image_record is None:
+                self.first_image_record = len(self.records) - 1
+
+    def add_thumbnail(self, item):
+        try:
+            data = rescale_image(item.data, dimen=MAX_THUMB_DIMEN,
+                    maxsizeb=MAX_THUMB_SIZE)
+        except IOError:
+            self.oeb.logger.warn('Bad image file %r' % item.href)
+            return None
+        manifest = self.oeb.manifest
+        id, href = manifest.generate('thumbnail', 'thumbnail.jpeg')
+        manifest.add(id, href, 'image/jpeg', data=data)
+        index = len(self.images) + 1
+        self.images[href] = index
+        self.records.append(data)
+        return index
+
+    # }}}
+
+    # Text {{{
+
+    def generate_text(self):
+        self.oeb.logger.info('Serializing markup content...')
+        self.serializer = Serializer(self.oeb, self.images,
+                write_page_breaks_after_item=self.write_page_breaks_after_item)
+        text = self.serializer()
+        self.text_length = len(text)
+        text = StringIO(text)
+        nrecords = 0
+
+        if self.compression != UNCOMPRESSED:
+            self.oeb.logger.info('  Compressing markup content...')
+
+        while text.tell() < self.text_length:
+            data, overlap = self.read_text_record(text)
+            if self.compression == PALMDOC:
+                data = compress_doc(data)
+
+            data += overlap
+            data += pack(b'>B', len(overlap))
+
+            self.records.append(data)
+            nrecords += 1
+
+        self.last_text_record_idx = nrecords
+
+    def read_text_record(self, text):
+        '''
+        Return a Palmdoc record of size RECORD_SIZE from the text file object.
+        In case the record ends in the middle of a multibyte character return
+        the overlap as well.
+
+        Returns data, overlap: where both are byte strings. overlap is the
+        extra bytes needed to complete the truncated multibyte character.
+        '''
+        opos = text.tell()
+        text.seek(0, 2)
+        # npos is the position of the next record
+        npos = min((opos + RECORD_SIZE, text.tell()))
+        # Number of bytes from the next record needed to complete the last
+        # character in this record
+        extra = 0
+
+        last = b''
+        while not last.decode('utf-8', 'ignore'):
+            # last contains no valid utf-8 characters
+            size = len(last) + 1
+            text.seek(npos - size)
+            last = text.read(size)
+
+        # last now has one valid utf-8 char and possibly some bytes that belong
+        # to a truncated char
+
+        try:
+            last.decode('utf-8', 'strict')
+        except UnicodeDecodeError:
+            # There are some truncated bytes in last
+            prev = len(last)
+            while True:
+                text.seek(npos - prev)
+                last = text.read(len(last) + 1)
+                try:
+                    last.decode('utf-8')
+                except UnicodeDecodeError:
+                    pass
+                else:
+                    break
+            extra = len(last) - prev
+
+        text.seek(opos)
+        data = text.read(RECORD_SIZE)
+        overlap = text.read(extra)
+        text.seek(npos)
+
+        return data, overlap
+
+    # }}}
+
+    def generate_record0(self): #  MOBI header {{{
+        metadata = self.oeb.metadata
+        exth = self.build_exth()
+        last_content_record = len(self.records) - 1
+
+        # FCIS/FLIS (Seem to server no purpose)
+        flis_number = len(self.records)
+        self.records.append(
+            b'FLIS\0\0\0\x08\0\x41\0\0\0\0\0\0\xff\xff\xff\xff\0\x01\0\x03\0\0\0\x03\0\0\0\x01'+
+            b'\xff'*4)
+        fcis = b'FCIS\x00\x00\x00\x14\x00\x00\x00\x10\x00\x00\x00\x01\x00\x00\x00\x00'
+        fcis += pack(b'>I', self.text_length)
+        fcis += b'\x00\x00\x00\x00\x00\x00\x00\x20\x00\x00\x00\x08\x00\x01\x00\x01\x00\x00\x00\x00'
+        fcis_number = len(self.records)
+        self.records.append(fcis)
+
+        # EOF record
+        self.records.append(b'\xE9\x8E\x0D\x0A')
+
+        record0 = StringIO()
+        # The MOBI Header
+        record0.write(pack(b'>HHIHHHH',
+            self.compression, # compression type # compression type
+            0, # Unused
+            self.text_length, # Text length
+            self.last_text_record_idx, # Number of text records or last tr idx
+            RECORD_SIZE, # Text record size
+            0, # Unused
+            0  # Unused
+        )) # 0 - 15 (0x0 - 0xf)
+        uid = random.randint(0, 0xffffffff)
+        title = normalize(unicode(metadata.title[0])).encode('utf-8')
+
+        # 0x0 - 0x3
+        record0.write(b'MOBI')
+
+        # 0x4 - 0x7   : Length of header
+        # 0x8 - 0x11  : MOBI type
+        #   type    meaning
+        #   0x002   MOBI book (chapter - chapter navigation)
+        #   0x101   News - Hierarchical navigation with sections and articles
+        #   0x102   News feed - Flat navigation
+        #   0x103   News magazine - same as 0x101
+        # 0xC - 0xF   : Text encoding (65001 is utf-8)
+        # 0x10 - 0x13 : UID
+        # 0x14 - 0x17 : Generator version
+
+        bt = 0x002
+        if self.primary_index_record_idx is not None:
+            if self.indexer.is_flat_periodical:
+                bt = 0x102
+            elif self.indexer.is_periodical:
+                bt = 0x103
+
+        record0.write(pack(b'>IIIII',
+            0xe8, bt, 65001, uid, 6))
+
+        # 0x18 - 0x1f : Unknown
+        record0.write(b'\xff' * 8)
+
+        # 0x20 - 0x23 : Secondary index record
+        record0.write(pack(b'>I', 0xffffffff))
+
+        # 0x24 - 0x3f : Unknown
+        record0.write(b'\xff' * 28)
+
+        # 0x40 - 0x43 : Offset of first non-text record
+        record0.write(pack(b'>I',
+            self.last_text_record_idx + 1))
+
+        # 0x44 - 0x4b : title offset, title length
+        record0.write(pack(b'>II',
+            0xe8 + 16 + len(exth), len(title)))
+
+        # 0x4c - 0x4f : Language specifier
+        record0.write(iana2mobi(
+            str(metadata.language[0])))
+
+        # 0x50 - 0x57 : Input language and Output language
+        record0.write(b'\0' * 8)
+
+        # 0x58 - 0x5b : Format version
+        # 0x5c - 0x5f : First image record number
+        record0.write(pack(b'>II',
+            6, self.first_image_record if self.first_image_record else
+            len(self.records)-1))
+
+        # 0x60 - 0x63 : First HUFF/CDIC record number
+        # 0x64 - 0x67 : Number of HUFF/CDIC records
+        # 0x68 - 0x6b : First DATP record number
+        # 0x6c - 0x6f : Number of DATP records
+        record0.write(b'\0' * 16)
+
+        # 0x70 - 0x73 : EXTH flags
+        # Bit 6 (0b1000000) being set indicates the presence of an EXTH header
+        # The purpose of the other bits is unknown
+        exth_flags = 0b1011000
+        if self.is_periodical:
+            exth_flags |= 0b1000
+        record0.write(pack(b'>I', exth_flags))
+
+        # 0x74 - 0x93 : Unknown
+        record0.write(b'\0' * 32)
+
+        # 0x94 - 0x97 : DRM offset
+        # 0x98 - 0x9b : DRM count
+        # 0x9c - 0x9f : DRM size
+        # 0xa0 - 0xa3 : DRM flags
+        record0.write(pack(b'>IIII',
+            0xffffffff, 0xffffffff, 0, 0))
+
+
+        # 0xa4 - 0xaf : Unknown
+        record0.write(b'\0'*12)
+
+        # 0xb0 - 0xb1 : First content record number
+        # 0xb2 - 0xb3 : last content record number
+        # (Includes Image, DATP, HUFF, DRM)
+        record0.write(pack(b'>HH', 1, last_content_record))
+
+        # 0xb4 - 0xb7 : Unknown
+        record0.write(b'\0\0\0\x01')
+
+        # 0xb8 - 0xbb : FCIS record number
+        record0.write(pack(b'>I', fcis_number))
+
+        # 0xbc - 0xbf : Unknown (FCIS record count?)
+        record0.write(pack(b'>I', 1))
+
+        # 0xc0 - 0xc3 : FLIS record number
+        record0.write(pack(b'>I', flis_number))
+
+        # 0xc4 - 0xc7 : Unknown (FLIS record count?)
+        record0.write(pack(b'>I', 1))
+
+        # 0xc8 - 0xcf : Unknown
+        record0.write(b'\0'*8)
+
+        # 0xd0 - 0xdf : Unknown
+        record0.write(pack(b'>IIII', 0xffffffff, 0, 0xffffffff, 0xffffffff))
+
+        # 0xe0 - 0xe3 : Extra record data
+        # Extra record data flags:
+        #   - 0b1  : <extra multibyte bytes><size>
+        #   - 0b10 : <TBS indexing description of this HTML record><size>
+        #   - 0b100: <uncrossable breaks><size>
+        # Setting bit 2 (0x2) disables <guide><reference type="start"> functionality
+        extra_data_flags = 0b1 # Has multibyte overlap bytes
+        if self.primary_index_record_idx is not None:
+            extra_data_flags |= 0b10
+        if WRITE_UNCROSSABLE_BREAKS:
+            extra_data_flags |= 0b100
+        record0.write(pack(b'>I', extra_data_flags))
+
+        # 0xe4 - 0xe7 : Primary index record
+        record0.write(pack(b'>I', 0xffffffff if self.primary_index_record_idx
+            is None else self.primary_index_record_idx))
+
+        record0.write(exth)
+        record0.write(title)
+        record0 = record0.getvalue()
+        # Add some buffer so that Amazon can add encryption information if this
+        # MOBI is submitted for publication
+        record0 += (b'\0' * (1024*8))
+        self.records[0] = record0
+    # }}}
+
+    def build_exth(self): # EXTH Header {{{
+        oeb = self.oeb
+        exth = StringIO()
+        nrecs = 0
+        for term in oeb.metadata:
+            if term not in EXTH_CODES: continue
+            code = EXTH_CODES[term]
+            items = oeb.metadata[term]
+            if term == 'creator':
+                if self.prefer_author_sort:
+                    creators = [normalize(unicode(c.file_as or c)) for c in items]
+                else:
+                    creators = [normalize(unicode(c)) for c in items]
+                items = ['; '.join(creators)]
+            for item in items:
+                data = self.COLLAPSE_RE.sub(' ', normalize(unicode(item)))
+                if term == 'identifier':
+                    if data.lower().startswith('urn:isbn:'):
+                        data = data[9:]
+                    elif item.scheme.lower() == 'isbn':
+                        pass
+                    else:
+                        continue
+                data = data.encode('utf-8')
+                exth.write(pack(b'>II', code, len(data) + 8))
+                exth.write(data)
+                nrecs += 1
+            if term == 'rights' :
+                try:
+                    rights = normalize(unicode(oeb.metadata.rights[0])).encode('utf-8')
+                except:
+                    rights = b'Unknown'
+                exth.write(pack(b'>II', EXTH_CODES['rights'], len(rights) + 8))
+                exth.write(rights)
+                nrecs += 1
+
+        # Write UUID as ASIN
+        uuid = None
+        from calibre.ebooks.oeb.base import OPF
+        for x in oeb.metadata['identifier']:
+            if (x.get(OPF('scheme'), None).lower() == 'uuid' or
+                    unicode(x).startswith('urn:uuid:')):
+                uuid = unicode(x).split(':')[-1]
+                break
+        if uuid is None:
+            from uuid import uuid4
+            uuid = str(uuid4())
+
+        if isinstance(uuid, unicode):
+            uuid = uuid.encode('utf-8')
+        exth.write(pack(b'>II', 113, len(uuid) + 8))
+        exth.write(uuid)
+        nrecs += 1
+
+        # Write cdetype
+        if self.is_periodical:
+            data = b'EBOK'
+            exth.write(pack(b'>II', 501, len(data)+8))
+            exth.write(data)
+            nrecs += 1
+
+        # Add a publication date entry
+        if oeb.metadata['date']:
+            datestr = str(oeb.metadata['date'][0])
+        elif oeb.metadata['timestamp']:
+            datestr = str(oeb.metadata['timestamp'][0])
+
+        if datestr is not None:
+            datestr = bytes(datestr)
+            datestr = datestr.replace(b'+00:00', b'Z')
+            exth.write(pack(b'>II', EXTH_CODES['pubdate'], len(datestr) + 8))
+            exth.write(datestr)
+            nrecs += 1
+        else:
+            raise NotImplementedError("missing date or timestamp needed for mobi_periodical")
+
+        # Write the same creator info as kindlegen 1.2
+        for code, val in [(204, 202), (205, 1), (206, 2), (207, 33307)]:
+            exth.write(pack(b'>II', code, 12))
+            exth.write(pack(b'>I', val))
+            nrecs += 1
+
+        if (oeb.metadata.cover and
+                unicode(oeb.metadata.cover[0]) in oeb.manifest.ids):
+            id = unicode(oeb.metadata.cover[0])
+            item = oeb.manifest.ids[id]
+            href = item.href
+            if href in self.images:
+                index = self.images[href] - 1
+                exth.write(pack(b'>III', 0xc9, 0x0c, index))
+                exth.write(pack(b'>III', 0xcb, 0x0c, 0))
+                nrecs += 2
+                index = self.add_thumbnail(item)
+                if index is not None:
+                    exth.write(pack(b'>III', 0xca, 0x0c, index - 1))
+                    nrecs += 1
+
+        exth = exth.getvalue()
+        trail = len(exth) % 4
+        pad = b'\0' * (4 - trail) # Always pad w/ at least 1 byte
+        exth = [b'EXTH', pack(b'>II', len(exth) + 12, nrecs), exth, pad]
+        return b''.join(exth)
+    # }}}
+
+    def write_header(self): # PalmDB header {{{
+        '''
+        Write the PalmDB header
+        '''
+        title = ascii_filename(unicode(self.oeb.metadata.title[0]))
+        title = title + (b'\0' * (32 - len(title)))
+        now = int(time.time())
+        nrecords = len(self.records)
+        self.write(title, pack(b'>HHIIIIII', 0, 0, now, now, 0, 0, 0, 0),
+            b'BOOK', b'MOBI', pack(b'>IIH', nrecords, 0, nrecords))
+        offset = self.tell() + (8 * nrecords) + 2
+        for i, record in enumerate(self.records):
+            self.write(pack(b'>I', offset), b'\0', pack(b'>I', 2*i)[1:])
+            offset += len(record)
+        self.write(b'\0\0')
+    # }}}
+
+    def write_content(self):
+        for record in self.records:
+            self.write(record)
+
+
--- a/src/calibre/ebooks/mobi/writer2/serializer.py
+++ b/src/calibre/ebooks/mobi/writer2/serializer.py
@ -0,0 +1,247 @@
+#!/usr/bin/env python
+# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai
+from __future__ import (unicode_literals, division, absolute_import,
+                        print_function)
+
+__license__   = 'GPL v3'
+__copyright__ = '2011, Kovid Goyal <kovid@kovidgoyal.net>'
+__docformat__ = 'restructuredtext en'
+
+from calibre.ebooks.oeb.base import (OEB_DOCS, XHTML, XHTML_NS, XML_NS,
+        namespace, prefixname, urlnormalize)
+from calibre.ebooks.mobi.mobiml import MBP_NS
+
+from collections import defaultdict
+from urlparse import urldefrag
+from cStringIO import StringIO
+
+
+class Serializer(object):
+    NSRMAP = {'': None, XML_NS: 'xml', XHTML_NS: '', MBP_NS: 'mbp'}
+
+    def __init__(self, oeb, images, write_page_breaks_after_item=True):
+        '''
+        Write all the HTML markup in oeb into a single in memory buffer
+        containing a single html document with links replaced by offsets into
+        the buffer.
+
+        :param oeb: OEBBook object that encapsulates the document to be
+        processed.
+
+        :param images: Mapping of image hrefs (urlnormalized) to image record
+        indices.
+
+        :param write_page_breaks_after_item: If True a MOBIpocket pagebreak tag
+        is written after every element of the spine in ``oeb``.
+        '''
+        self.oeb = oeb
+        self.images = images
+        self.logger = oeb.logger
+        self.write_page_breaks_after_item = write_page_breaks_after_item
+
+        # Mapping of hrefs (urlnormalized) to the offset in the buffer where
+        # the resource pointed to by the href lives. Used at the end to fill in
+        # the correct values into all filepos="..." links.
+        self.id_offsets = {}
+
+        # Mapping of hrefs (urlnormalized) to a list of offsets into the buffer
+        # where filepos="..." elements are written corresponding to links that
+        # point to the href. This is used at the end to fill in the correct values.
+        self.href_offsets = defaultdict(list)
+
+        # List of offsets in the buffer of non linear items in the spine. These
+        # become uncrossable breaks in the MOBI
+        self.breaks = []
+
+    def __call__(self):
+        '''
+        Return the document serialized as a single UTF-8 encoded bytestring.
+        '''
+        buf = self.buf = StringIO()
+        buf.write(b'<html>')
+        self.serialize_head()
+        self.serialize_body()
+        buf.write(b'</html>')
+        self.fixup_links()
+        return buf.getvalue()
+
+    def serialize_head(self):
+        buf = self.buf
+        buf.write(b'<head>')
+        if len(self.oeb.guide) > 0:
+            self.serialize_guide()
+        buf.write(b'</head>')
+
+    def serialize_guide(self):
+        '''
+        The Kindle decides where to open a book based on the presence of
+        an item in the guide that looks like
+        <reference type="text" title="Start" href="chapter-one.xhtml"/>
+
+        Similarly an item with type="toc" controls where the Goto Table of
+        Contents operation on the kindle goes.
+        '''
+
+        buf = self.buf
+        hrefs = self.oeb.manifest.hrefs
+        buf.write(b'<guide>')
+        for ref in self.oeb.guide.values():
+            path = urldefrag(ref.href)[0]
+            if path not in hrefs or hrefs[path].media_type not in OEB_DOCS:
+                continue
+
+            buf.write(b'<reference type="')
+            if ref.type.startswith('other.') :
+                self.serialize_text(ref.type.replace('other.',''), quot=True)
+            else:
+                self.serialize_text(ref.type, quot=True)
+            buf.write(b'" ')
+            if ref.title is not None:
+                buf.write(b'title="')
+                self.serialize_text(ref.title, quot=True)
+                buf.write(b'" ')
+            self.serialize_href(ref.href)
+            # Space required or won't work, I kid you not
+            buf.write(b' />')
+
+        buf.write(b'</guide>')
+
+    def serialize_href(self, href, base=None):
+        '''
+        Serialize the href attribute of an <a> or <reference> tag. It is
+        serialized as filepos="000000000" and a pointer to its location is
+        stored in self.href_offsets so that the correct value can be filled in
+        at the end.
+        '''
+        hrefs = self.oeb.manifest.hrefs
+        path, frag = urldefrag(urlnormalize(href))
+        if path and base:
+            path = base.abshref(path)
+        if path and path not in hrefs:
+            return False
+        buf = self.buf
+        item = hrefs[path] if path else None
+        if item and item.spine_position is None:
+            return False
+        path = item.href if item else base.href
+        href = '#'.join((path, frag)) if frag else path
+        buf.write(b'filepos=')
+        self.href_offsets[href].append(buf.tell())
+        buf.write(b'0000000000')
+        return True
+
+    def serialize_body(self):
+        '''
+        Serialize all items in the spine of the document. Non linear items are
+        moved to the end.
+        '''
+        buf = self.buf
+        self.anchor_offset = buf.tell()
+        buf.write(b'<body>')
+        self.body_start_offset = buf.tell()
+        spine = [item for item in self.oeb.spine if item.linear]
+        spine.extend([item for item in self.oeb.spine if not item.linear])
+        for item in spine:
+            self.serialize_item(item)
+        self.body_end_offset = buf.tell()
+        buf.write(b'</body>')
+
+    def serialize_item(self, item):
+        '''
+        Serialize an individual item from the spine of the input document.
+        A reference to this item is stored in self.href_offsets
+        '''
+        buf = self.buf
+        if not item.linear:
+            self.breaks.append(buf.tell() - 1)
+        self.id_offsets[urlnormalize(item.href)] = buf.tell()
+        # Kindle periodical articles are contained in a <div> tag
+        buf.write(b'<div>')
+        for elem in item.data.find(XHTML('body')):
+            self.serialize_elem(elem, item)
+        # Kindle periodical article end marker
+        buf.write(b'<div></div>')
+        if self.write_page_breaks_after_item:
+            buf.write(b'<mbp:pagebreak/>')
+        buf.write(b'</div>')
+        self.anchor_offset = None
+
+    def serialize_elem(self, elem, item, nsrmap=NSRMAP):
+        buf = self.buf
+        if not isinstance(elem.tag, basestring) \
+            or namespace(elem.tag) not in nsrmap:
+                return
+        tag = prefixname(elem.tag, nsrmap)
+        # Previous layers take care of @name
+        id_ = elem.attrib.pop('id', None)
+        if id_:
+            href = '#'.join((item.href, id_))
+            offset = self.anchor_offset or buf.tell()
+            self.id_offsets[urlnormalize(href)] = offset
+        if self.anchor_offset is not None and \
+            tag == 'a' and not elem.attrib and \
+            not len(elem) and not elem.text:
+                return
+        self.anchor_offset = buf.tell()
+        buf.write(b'<')
+        buf.write(tag.encode('utf-8'))
+        if elem.attrib:
+            for attr, val in elem.attrib.items():
+                if namespace(attr) not in nsrmap:
+                    continue
+                attr = prefixname(attr, nsrmap)
+                buf.write(b' ')
+                if attr == 'href':
+                    if self.serialize_href(val, item):
+                        continue
+                elif attr == 'src':
+                    href = urlnormalize(item.abshref(val))
+                    if href in self.images:
+                        index = self.images[href]
+                        buf.write(b'recindex="%05d"' % index)
+                        continue
+                buf.write(attr.encode('utf-8'))
+                buf.write(b'="')
+                self.serialize_text(val, quot=True)
+                buf.write(b'"')
+        buf.write(b'>')
+        if elem.text or len(elem) > 0:
+            if elem.text:
+                self.anchor_offset = None
+                self.serialize_text(elem.text)
+            for child in elem:
+                self.serialize_elem(child, item)
+                if child.tail:
+                    self.anchor_offset = None
+                    self.serialize_text(child.tail)
+        buf.write(b'</%s>' % tag.encode('utf-8'))
+
+    def serialize_text(self, text, quot=False):
+        text = text.replace('&', '&amp;')
+        text = text.replace('<', '&lt;')
+        text = text.replace('>', '&gt;')
+        text = text.replace(u'\u00AD', '') # Soft-hyphen
+        if quot:
+            text = text.replace('"', '&quot;')
+        self.buf.write(text.encode('utf-8'))
+
+    def fixup_links(self):
+        '''
+        Fill in the correct values for all filepos="..." links with the offsets
+        of the linked to content (as stored in id_offsets).
+        '''
+        buf = self.buf
+        id_offsets = self.id_offsets
+        for href, hoffs in self.href_offsets.items():
+            # Iterate over all filepos items
+            if href not in id_offsets:
+                self.logger.warn('Hyperlink target %r not found' % href)
+                # Link to the top of the document, better than just ignoring
+                href, _ = urldefrag(href)
+            if href in self.id_offsets:
+                ioff = self.id_offsets[href]
+                for hoff in hoffs:
+                    buf.seek(hoff)
+                    buf.write(b'%010d' % ioff)
+
+
--- a/src/calibre/ebooks/oeb/base.py
+++ b/src/calibre/ebooks/oeb/base.py
@ -1180,8 +1180,9 @@ class Manifest(object):
                if memory is None:
                    from calibre.ptempfile import PersistentTemporaryFile
                    pt = PersistentTemporaryFile(suffix='_oeb_base_mem_unloader.img')
-                    pt.write(self._data)
-                    pt.close()
+                    with pt:
+                        pt.write(self._data)
+                    self.oeb._temp_files.append(pt.name)
                    def loader(*args):
                        with open(pt.name, 'rb') as f:
                            ans = f.read()
@ -1196,8 +1197,6 @@ class Manifest(object):
                    self._loader = loader2
                self._data = None

-
-
        def __str__(self):
            data = self.data
            if isinstance(data, etree._Element):
@ -1681,11 +1680,18 @@ class TOC(object):
                return True
        return False

-    def iterdescendants(self):
+    def iterdescendants(self, breadth_first=False):
        """Iterate over all descendant nodes in depth-first order."""
-        for child in self.nodes:
-            for node in child.iter():
-                yield node
+        if breadth_first:
+            for child in self.nodes:
+                yield child
+            for child in self.nodes:
+                for node in child.iterdescendants(breadth_first=True):
+                    yield node
+        else:
+            for child in self.nodes:
+                for node in child.iter():
+                    yield node

    def __iter__(self):
        """Iterate over all immediate child nodes."""
@ -1913,6 +1919,14 @@ class OEBBook(object):
        self.toc = TOC()
        self.pages = PageList()
        self.auto_generated_toc = True
+        self._temp_files = []
+
+    def clean_temp_files(self):
+        for path in self._temp_files:
+            try:
+                os.remove(path)
+            except:
+                pass

    @classmethod
    def generate(cls, opts):
--- a/src/calibre/ebooks/oeb/iterator.py
+++ b/src/calibre/ebooks/oeb/iterator.py
@ -92,7 +92,7 @@ class EbookIterator(object):
        self.config = DynamicConfig(name='iterator')
        ext = os.path.splitext(pathtoebook)[1].replace('.', '').lower()
        ext = re.sub(r'(x{0,1})htm(l{0,1})', 'html', ext)
-        self.ebook_ext = ext
+        self.ebook_ext = ext.replace('original_', '')

    def search(self, text, index, backwards=False):
        text = text.lower()
--- a/src/calibre/ebooks/oeb/reader.py
+++ b/src/calibre/ebooks/oeb/reader.py
@ -163,6 +163,8 @@ class OEBReader(object):
            if item.media_type in check:
                try:
                    item.data
+                except KeyboardInterrupt:
+                    raise
                except:
                    self.logger.exception('Failed to parse content in %s'%
                            item.href)
@ -186,8 +188,13 @@ class OEBReader(object):
                        href, _ = urldefrag(href)
                        if not href:
                            continue
-                        href = item.abshref(urlnormalize(href))
-                        scheme = urlparse(href).scheme
+                        try:
+                            href = item.abshref(urlnormalize(href))
+                            scheme = urlparse(href).scheme
+                        except:
+                            self.oeb.log.exception(
+                                'Skipping invalid href: %r'%href)
+                            continue
                        if not scheme and href not in known:
                            new.add(href)
                elif item.media_type in OEB_STYLES:
--- a/src/calibre/ebooks/oeb/transforms/flatcss.py
+++ b/src/calibre/ebooks/oeb/transforms/flatcss.py
@ -318,7 +318,8 @@ class CSSFlattener(object):
                    for edge in ('top', 'bottom'):
                        cssdict['%s-%s'%(prop, edge)] = '0pt'
            if self.context.insert_blank_line:
-                cssdict['margin-top'] = cssdict['margin-bottom'] = '0.5em'
+                cssdict['margin-top'] = cssdict['margin-bottom'] = \
+                    '%fem'%self.context.insert_blank_line_size
            if self.context.remove_paragraph_spacing:
                cssdict['text-indent'] =  "%1.1fem" % self.context.remove_paragraph_spacing_indent_size

--- a/src/calibre/ebooks/oeb/transforms/guide.py
+++ b/src/calibre/ebooks/oeb/transforms/guide.py
@ -36,5 +36,8 @@ class Clean(object):
            href = urldefrag(self.oeb.guide[x].href)[0]
            if x.lower() not in ('cover', 'titlepage', 'masthead', 'toc',
                    'title-page', 'copyright-page', 'start'):
+                item = self.oeb.guide[x]
+                if item.title and item.title.lower() == 'start':
+                    continue
                self.oeb.guide.remove(x)

--- a/src/calibre/ebooks/oeb/transforms/htmltoc.py
+++ b/src/calibre/ebooks/oeb/transforms/htmltoc.py
@ -45,9 +45,10 @@ body > .calibre_toc_block {
    }

 class HTMLTOCAdder(object):
-    def __init__(self, title=None, style='nested'):
+    def __init__(self, title=None, style='nested', position='end'):
        self.title = title
        self.style = style
+        self.position = position

    @classmethod
    def config(cls, cfg):
@ -98,7 +99,10 @@ class HTMLTOCAdder(object):
        self.add_toc_level(body, oeb.toc)
        id, href = oeb.manifest.generate('contents', 'contents.xhtml')
        item = oeb.manifest.add(id, href, XHTML_MIME, data=contents)
-        oeb.spine.add(item, linear=False)
+        if self.position == 'end':
+            oeb.spine.add(item, linear=False)
+        else:
+            oeb.spine.insert(0, item, linear=True)
        oeb.guide.add('toc', 'Table of Contents', href)

    def add_toc_level(self, elem, toc):
--- a/src/calibre/ebooks/oeb/transforms/metadata.py
+++ b/src/calibre/ebooks/oeb/transforms/metadata.py
@ -47,15 +47,19 @@ def meta_info_to_oeb_metadata(mi, m, log, override_input_metadata=False):
        m.add('series', mi.series)
    elif override_input_metadata:
        m.clear('series')
-    if not mi.is_null('isbn'):
+    identifiers = mi.get_identifiers()
+    set_isbn = False
+    for typ, val in identifiers.iteritems():
        has = False
+        if typ.lower() == 'isbn':
+            set_isbn = True
        for x in m.identifier:
-            if x.scheme.lower() == 'isbn':
-                x.content = mi.isbn
+            if x.scheme.lower() == typ.lower():
+                x.content = val
                has = True
        if not has:
-            m.add('identifier', mi.isbn, scheme='ISBN')
-    elif override_input_metadata:
+            m.add('identifier', val, scheme=typ.upper())
+    if override_input_metadata and not set_isbn:
        m.filter('identifier', lambda x: x.scheme.lower() == 'isbn')
    if not mi.is_null('language'):
        m.clear('language')
--- a/src/calibre/ebooks/oeb/transforms/trimmanifest.py
+++ b/src/calibre/ebooks/oeb/transforms/trimmanifest.py
@ -47,7 +47,10 @@ class ManifestTrimmer(object):
                   item.data is not None:
                    hrefs = [r[2] for r in iterlinks(item.data)]
                    for href in hrefs:
-                        href = item.abshref(urlnormalize(href))
+                        try:
+                            href = item.abshref(urlnormalize(href))
+                        except:
+                            continue
                        if href in oeb.manifest.hrefs:
                            found = oeb.manifest.hrefs[href]
                            if found not in used:
--- a/src/calibre/ebooks/pdf/writer.py
+++ b/src/calibre/ebooks/pdf/writer.py
@ -165,6 +165,7 @@ class PDFWriter(QObject): # {{{
            printer = get_pdf_printer(self.opts)
            printer.setOutputFileName(item_path)
            self.view.print_(printer)
+            printer.abort()
        self._render_book()

    def _delete_tmpdir(self):
@ -186,6 +187,7 @@ class PDFWriter(QObject): # {{{
            draw_image_page(printer, painter, p,
                    preserve_aspect_ratio=self.opts.preserve_cover_aspect_ratio)
            painter.end()
+        printer.abort()


    def _write(self):
--- a/src/calibre/gui2/init.py
+++ b/src/calibre/gui2/init.py
@ -15,7 +15,6 @@ APP_UID  = 'libprs500'
 from calibre.constants import (islinux, iswindows, isbsd, isfrozen, isosx,
        config_dir)
 from calibre.utils.config import Config, ConfigProxy, dynamic, JSONConfig
-from calibre.utils.localization import set_qt_translator
 from calibre.ebooks.metadata import MetaInformation
 from calibre.utils.date import UNDEFINED_DATE

@ -631,6 +630,22 @@ class ResizableDialog(QDialog):
        nw = min(self.width(), nw)
        self.resize(nw, nh)

+class Translator(QTranslator):
+    '''
+    Translator to load translations for strings in Qt from the calibre
+    translations. Does not support advanced features of Qt like disambiguation
+    and plural forms.
+    '''
+
+    def translate(self, *args, **kwargs):
+        try:
+            src = unicode(args[1])
+        except:
+            return u''
+        t = _
+        return t(src)
+
+
 gui_thread = None

 qt_app = None
@ -677,9 +692,8 @@ class Application(QApplication):
    def load_translations(self):
        if self._translator is not None:
            self.removeTranslator(self._translator)
-        self._translator = QTranslator(self)
-        if set_qt_translator(self._translator):
-            self.installTranslator(self._translator)
+        self._translator = Translator(self)
+        self.installTranslator(self._translator)

    def event(self, e):
        if callable(self.file_event_hook) and e.type() == QEvent.FileOpen:
--- a/src/calibre/gui2/actions/convert.py
+++ b/src/calibre/gui2/actions/convert.py
@ -12,7 +12,7 @@ from PyQt4.Qt import QModelIndex, QMenu

 from calibre.gui2 import error_dialog, Dispatcher
 from calibre.gui2.tools import convert_single_ebook, convert_bulk_ebook
-from calibre.utils.config import prefs
+from calibre.utils.config import prefs, tweaks
 from calibre.gui2.actions import InterfaceAction
 from calibre.customize.ui import plugin_for_input_format

@ -118,6 +118,8 @@ class ConvertAction(InterfaceAction):
    def queue_convert_jobs(self, jobs, changed, bad, rows, previous,
            converted_func, extra_job_args=[]):
        for func, args, desc, fmt, id, temp_files in jobs:
+            func, _, same_fmt = func.partition(':')
+            same_fmt = same_fmt == 'same_fmt'
            input_file = args[0]
            input_fmt = os.path.splitext(input_file)[1]
            core_usage = 1
@ -131,6 +133,7 @@ class ConvertAction(InterfaceAction):
                job = self.gui.job_manager.run_job(Dispatcher(converted_func),
                                            func, args=args, description=desc,
                                            core_usage=core_usage)
+                job.conversion_of_same_fmt = same_fmt
                args = [temp_files, fmt, id]+extra_job_args
                self.conversion_jobs[job] = tuple(args)

@ -166,14 +169,18 @@ class ConvertAction(InterfaceAction):
            if job.failed:
                self.gui.job_exception(job)
                return
+            same_fmt = getattr(job, 'conversion_of_same_fmt', False)
            fmtf = temp_files[-1].name
            if os.stat(fmtf).st_size < 1:
                raise Exception(_('Empty output file, '
                    'probably the conversion process crashed'))

+            db = self.gui.current_db
+            if same_fmt and tweaks['save_original_format']:
+                db.save_original_format(book_id, fmt, notify=False)
+
            with open(temp_files[-1].name, 'rb') as data:
-                self.gui.library_view.model().db.add_format(book_id, \
-                    fmt, data, index_is_id=True)
+                db.add_format(book_id, fmt, data, index_is_id=True)
            self.gui.status_bar.show_message(job.description + \
                    (' completed'), 2000)
        finally:
--- a/src/calibre/gui2/actions/delete.py
+++ b/src/calibre/gui2/actions/delete.py
@ -81,7 +81,7 @@ class MultiDeleter(QObject):
 class DeleteAction(InterfaceAction):

    name = 'Remove Books'
-    action_spec = (_('Remove books'), 'trash.png', None, _('Del'))
+    action_spec = (_('Remove books'), 'trash.png', None, 'Del')
    action_type = 'current'

    def genesis(self):
--- a/src/calibre/gui2/actions/view.py
+++ b/src/calibre/gui2/actions/view.py
@ -128,7 +128,8 @@ class ViewAction(InterfaceAction):
            self.gui.unsetCursor()

    def _view_file(self, name):
-        ext = os.path.splitext(name)[1].upper().replace('.', '')
+        ext = os.path.splitext(name)[1].upper().replace('.',
+                '').replace('ORIGINAL_', '')
        viewer = 'lrfviewer' if ext == 'LRF' else 'ebook-viewer'
        internal = ext in config['internally_viewed_formats']
        self._launch_viewer(name, viewer, internal)
--- a/src/calibre/gui2/book_details.py
+++ b/src/calibre/gui2/book_details.py
@ -133,6 +133,7 @@ def render_data(mi, use_roman_numbers=True, all_fields=False):
            authors = []
            formatter = EvalFormatter()
            for aut in mi.authors:
+                link = ''
                if mi.author_link_map[aut]:
                    link = mi.author_link_map[aut]
                elif gprefs.get('default_author_link'):
--- a/src/calibre/gui2/convert/look_and_feel.py
+++ b/src/calibre/gui2/convert/look_and_feel.py
@ -24,7 +24,10 @@ class LookAndFeelWidget(Widget, Ui_Form):
                    'font_size_mapping', 'line_height', 'minimum_line_height',
                    'linearize_tables', 'smarten_punctuation',
                    'disable_font_rescaling', 'insert_blank_line',
-                    'remove_paragraph_spacing', 'remove_paragraph_spacing_indent_size','input_encoding',
+                    'remove_paragraph_spacing',
+                    'remove_paragraph_spacing_indent_size',
+                    'insert_blank_line_size',
+                    'input_encoding',
                    'asciiize', 'keep_ligatures']
                )
        for val, text in [
--- a/src/calibre/gui2/convert/look_and_feel.ui
+++ b/src/calibre/gui2/convert/look_and_feel.ui
@ -6,7 +6,7 @@
   <rect>
    <x>0</x>
    <y>0</y>
-    <width>600</width>
+    <width>642</width>
    <height>500</height>
   </rect>
  </property>
@ -31,7 +31,7 @@
     </property>
    </widget>
   </item>
-   <item row="1" column="1" colspan="2">
+   <item row="1" column="1">
    <widget class="QDoubleSpinBox" name="opt_base_font_size">
     <property name="suffix">
      <string> pt</string>
@ -97,6 +97,29 @@
     </item>
    </layout>
   </item>
+   <item row="3" column="0">
+    <widget class="QLabel" name="label_6">
+     <property name="text">
+      <string>Minimum &amp;line height:</string>
+     </property>
+     <property name="buddy">
+      <cstring>opt_minimum_line_height</cstring>
+     </property>
+    </widget>
+   </item>
+   <item row="3" column="1">
+    <widget class="QDoubleSpinBox" name="opt_minimum_line_height">
+     <property name="suffix">
+      <string> %</string>
+     </property>
+     <property name="decimals">
+      <number>1</number>
+     </property>
+     <property name="maximum">
+      <double>900.000000000000000</double>
+     </property>
+    </widget>
+   </item>
   <item row="4" column="0">
    <widget class="QLabel" name="label">
     <property name="text">
@ -107,7 +130,7 @@
     </property>
    </widget>
   </item>
-   <item row="4" column="1" colspan="2">
+   <item row="4" column="1">
    <widget class="QDoubleSpinBox" name="opt_line_height">
     <property name="suffix">
      <string> pt</string>
@ -127,6 +150,13 @@
     </property>
    </widget>
   </item>
+   <item row="5" column="1" colspan="2">
+    <widget class="EncodingComboBox" name="opt_input_encoding">
+     <property name="editable">
+      <bool>true</bool>
+     </property>
+    </widget>
+   </item>
   <item row="6" column="0" colspan="2">
    <widget class="QCheckBox" name="opt_remove_paragraph_spacing">
     <property name="text">
@ -134,48 +164,58 @@
     </property>
    </widget>
   </item>
-   <item row="6" column="2" colspan="2">
-    <layout class="QHBoxLayout" name="horizontalLayout_2">
-     <item>
-      <widget class="QLabel" name="label_4">
-       <property name="text">
-        <string>Indent size:</string>
-       </property>
-       <property name="alignment">
-        <set>Qt::AlignRight|Qt::AlignTrailing|Qt::AlignVCenter</set>
-       </property>
-      </widget>
-     </item>
-     <item>
-      <widget class="QDoubleSpinBox" name="opt_remove_paragraph_spacing_indent_size">
-       <property name="toolTip">
-        <string>&lt;p&gt;When calibre removes inter paragraph spacing, it automatically sets a paragraph indent, to ensure that paragraphs can be easily distinguished. This option controls the width of that indent.</string>
-       </property>
-       <property name="suffix">
-        <string> em</string>
-       </property>
-       <property name="decimals">
-        <number>1</number>
-       </property>
-      </widget>
-     </item>
-    </layout>
-   </item>
-   <item row="7" column="0">
-    <widget class="QLabel" name="label_5">
+   <item row="7" column="0" colspan="2">
+    <widget class="QCheckBox" name="opt_insert_blank_line">
     <property name="text">
-      <string>Text justification:</string>
+      <string>Insert &amp;blank line between paragraphs</string>
+     </property>
+    </widget>
+   </item>
+   <item row="7" column="4">
+    <widget class="QDoubleSpinBox" name="opt_insert_blank_line_size">
+     <property name="suffix">
+      <string> em</string>
+     </property>
+     <property name="decimals">
+      <number>1</number>
     </property>
    </widget>
   </item>
   <item row="8" column="0">
+    <widget class="QLabel" name="label_5">
+     <property name="text">
+      <string>Text &amp;justification:</string>
+     </property>
+     <property name="buddy">
+      <cstring>opt_change_justification</cstring>
+     </property>
+    </widget>
+   </item>
+   <item row="8" column="2" colspan="3">
+    <widget class="QComboBox" name="opt_change_justification"/>
+   </item>
+   <item row="9" column="0">
    <widget class="QCheckBox" name="opt_linearize_tables">
     <property name="text">
      <string>&amp;Linearize tables</string>
     </property>
    </widget>
   </item>
-   <item row="11" column="0" colspan="4">
+   <item row="9" column="1" colspan="4">
+    <widget class="QCheckBox" name="opt_asciiize">
+     <property name="text">
+      <string>&amp;Transliterate unicode characters to ASCII</string>
+     </property>
+    </widget>
+   </item>
+   <item row="10" column="1" colspan="2">
+    <widget class="QCheckBox" name="opt_keep_ligatures">
+     <property name="text">
+      <string>Keep &amp;ligatures</string>
+     </property>
+    </widget>
+   </item>
+   <item row="12" column="0" colspan="5">
    <widget class="QGroupBox" name="groupBox">
     <property name="title">
      <string>Extra &amp;CSS</string>
@ -187,27 +227,16 @@
     </layout>
    </widget>
   </item>
-   <item row="7" column="2" colspan="2">
-    <widget class="QComboBox" name="opt_change_justification"/>
-   </item>
-   <item row="8" column="1" colspan="3">
-    <widget class="QCheckBox" name="opt_asciiize">
-     <property name="text">
-      <string>&amp;Transliterate unicode characters to ASCII</string>
+   <item row="6" column="4">
+    <widget class="QDoubleSpinBox" name="opt_remove_paragraph_spacing_indent_size">
+     <property name="toolTip">
+      <string>&lt;p&gt;When calibre removes inter paragraph spacing, it automatically sets a paragraph indent, to ensure that paragraphs can be easily distinguished. This option controls the width of that indent.</string>
     </property>
-    </widget>
-   </item>
-   <item row="9" column="0">
-    <widget class="QCheckBox" name="opt_insert_blank_line">
-     <property name="text">
-      <string>Insert &amp;blank line</string>
+     <property name="suffix">
+      <string> em</string>
     </property>
-    </widget>
-   </item>
-   <item row="9" column="1" colspan="2">
-    <widget class="QCheckBox" name="opt_keep_ligatures">
-     <property name="text">
-      <string>Keep &amp;ligatures</string>
+     <property name="decimals">
+      <number>1</number>
     </property>
    </widget>
   </item>
@ -218,33 +247,29 @@
     </property>
    </widget>
   </item>
-   <item row="3" column="0">
-    <widget class="QLabel" name="label_6">
+   <item row="6" column="3">
+    <widget class="QLabel" name="label_4">
     <property name="text">
-      <string>Minimum &amp;line height:</string>
+      <string>&amp;Indent size:</string>
+     </property>
+     <property name="alignment">
+      <set>Qt::AlignRight|Qt::AlignTrailing|Qt::AlignVCenter</set>
     </property>
     <property name="buddy">
-      <cstring>opt_minimum_line_height</cstring>
+      <cstring>opt_remove_paragraph_spacing_indent_size</cstring>
     </property>
    </widget>
   </item>
-   <item row="3" column="1" colspan="2">
-    <widget class="QDoubleSpinBox" name="opt_minimum_line_height">
-     <property name="suffix">
-      <string> %</string>
+   <item row="7" column="3">
+    <widget class="QLabel" name="label_7">
+     <property name="text">
+      <string>&amp;Line size:</string>
     </property>
-     <property name="decimals">
-      <number>1</number>
+     <property name="alignment">
+      <set>Qt::AlignRight|Qt::AlignTrailing|Qt::AlignVCenter</set>
     </property>
-     <property name="maximum">
-      <double>900.000000000000000</double>
-     </property>
-    </widget>
-   </item>
-   <item row="5" column="1" colspan="3">
-    <widget class="EncodingComboBox" name="opt_input_encoding">
-     <property name="editable">
-      <bool>true</bool>
+     <property name="buddy">
+      <cstring>opt_insert_blank_line_size</cstring>
     </property>
    </widget>
   </item>
--- a/src/calibre/gui2/convert/metadata.ui
+++ b/src/calibre/gui2/convert/metadata.ui
@ -240,7 +240,7 @@
          <string>Book </string>
         </property>
         <property name="maximum">
-          <double>9999.989999999999782</double>
+          <double>9999999999.99</double>
         </property>
         <property name="value">
          <double>1.000000000000000</double>
--- a/src/calibre/gui2/convert/mobi_output.py
+++ b/src/calibre/gui2/convert/mobi_output.py
@ -24,7 +24,7 @@ class PluginWidget(Widget, Ui_Form):
    def __init__(self, parent, get_option, get_help, db=None, book_id=None):
        Widget.__init__(self, parent,
                ['prefer_author_sort', 'rescale_images', 'toc_title',
-                    'mobi_ignore_margins',
+                    'mobi_ignore_margins', 'mobi_toc_at_start',
                'dont_compress', 'no_inline_toc', 'masthead_font','personal_doc']
                )
        from calibre.utils.fonts import fontconfig
--- a/src/calibre/gui2/convert/mobi_output.ui
+++ b/src/calibre/gui2/convert/mobi_output.ui
@ -27,21 +27,21 @@
   <item row="1" column="1">
    <widget class="QLineEdit" name="opt_toc_title"/>
   </item>
-   <item row="2" column="0" colspan="2">
+   <item row="4" column="0" colspan="2">
    <widget class="QCheckBox" name="opt_rescale_images">
     <property name="text">
      <string>Rescale images for &amp;Palm devices</string>
     </property>
    </widget>
   </item>
-   <item row="3" column="0" colspan="2">
+   <item row="5" column="0" colspan="2">
    <widget class="QCheckBox" name="opt_prefer_author_sort">
     <property name="text">
      <string>Use author &amp;sort for author</string>
     </property>
    </widget>
   </item>
-   <item row="4" column="0">
+   <item row="6" column="0">
    <widget class="QCheckBox" name="opt_dont_compress">
     <property name="text">
      <string>Disable compression of the file contents</string>
@ -55,7 +55,7 @@
     </property>
    </widget>
   </item>
-   <item row="6" column="0" colspan="2">
+   <item row="8" column="0" colspan="2">
    <widget class="QGroupBox" name="groupBox">
     <property name="title">
      <string>Kindle options</string>
@ -101,7 +101,7 @@
     </layout>
    </widget>
   </item>
-   <item row="7" column="0">
+   <item row="9" column="0">
    <spacer name="verticalSpacer_2">
     <property name="orientation">
      <enum>Qt::Vertical</enum>
@ -114,7 +114,14 @@
     </property>
    </spacer>
   </item>
-   <item row="5" column="0">
+   <item row="2" column="0" colspan="2">
+    <widget class="QCheckBox" name="opt_mobi_toc_at_start">
+     <property name="text">
+      <string>Put generated Table of Contents at &amp;start of book instead of end</string>
+     </property>
+    </widget>
+   </item>
+   <item row="3" column="0">
    <widget class="QCheckBox" name="opt_mobi_ignore_margins">
     <property name="text">
      <string>Ignore &amp;margins</string>
--- a/src/calibre/gui2/convert/regex_builder.py
+++ b/src/calibre/gui2/convert/regex_builder.py
@ -7,8 +7,8 @@ __docformat__ = 'restructuredtext en'
 import re, os

 from PyQt4.QtCore import SIGNAL, Qt, pyqtSignal
-from PyQt4.QtGui import QDialog, QWidget, QDialogButtonBox, \
-                        QBrush, QTextCursor, QTextEdit
+from PyQt4.QtGui import (QDialog, QWidget, QDialogButtonBox,
+        QBrush, QTextCursor, QTextEdit)

 from calibre.gui2.convert.regex_builder_ui import Ui_RegexBuilder
 from calibre.gui2.convert.xexp_edit_ui import Ui_Form as Ui_Edit
@ -16,6 +16,7 @@ from calibre.gui2 import error_dialog, choose_files
 from calibre.ebooks.oeb.iterator import EbookIterator
 from calibre.ebooks.conversion.preprocess import HTMLPreProcessor
 from calibre.gui2.dialogs.choose_format import ChooseFormatDialog
+from calibre.constants import iswindows

 class RegexBuilder(QDialog, Ui_RegexBuilder):

@ -134,8 +135,18 @@ class RegexBuilder(QDialog, Ui_RegexBuilder):
                         _('Cannot build regex using the GUI builder without a book.'),
                         show=True)
            return False
-        fpath = db.format(book_id, format, index_is_id=True,
-            as_path=True)
+        try:
+            fpath = db.format(book_id, format, index_is_id=True,
+                as_path=True)
+        except OSError:
+            if iswindows:
+                import traceback
+                error_dialog(self, _('Could not open file'),
+                    _('Could not open the file, do you have it open in'
+                        ' another program?'), show=True,
+                    det_msg=traceback.format_exc())
+                return False
+            raise
        try:
            self.open_book(fpath)
        finally:
--- a/src/calibre/gui2/custom_column_widgets.py
+++ b/src/calibre/gui2/custom_column_widgets.py
@ -723,6 +723,7 @@ class BulkSeries(BulkBase):
        layout.addWidget(self.force_number)
        self.series_start_number = QSpinBox(parent)
        self.series_start_number.setMinimum(1)
+        self.series_start_number.setMaximum(9999999)
        self.series_start_number.setProperty("value", 1)
        layout.addWidget(self.series_start_number)
        layout.addItem(QSpacerItem(20, 10, QSizePolicy.Expanding, QSizePolicy.Minimum))
--- a/src/calibre/gui2/dialogs/jobs.ui
+++ b/src/calibre/gui2/dialogs/jobs.ui
@ -29,9 +29,6 @@
     <property name="alternatingRowColors">
      <bool>true</bool>
     </property>
-     <property name="selectionMode">
-      <enum>QAbstractItemView::SingleSelection</enum>
-     </property>
     <property name="selectionBehavior">
      <enum>QAbstractItemView::SelectRows</enum>
     </property>
@ -46,7 +43,7 @@
   <item>
    <widget class="QPushButton" name="kill_button">
     <property name="text">
-      <string>&amp;Stop selected job</string>
+      <string>&amp;Stop selected jobs</string>
     </property>
    </widget>
   </item>
--- a/src/calibre/gui2/dialogs/quickview.py
+++ b/src/calibre/gui2/dialogs/quickview.py
@ -183,7 +183,6 @@ class Quickview(QDialog, Ui_Quickview):
        self.items.blockSignals(False)

    def indicate_no_items(self):
-        print 'no items'
        self.no_valid_items = True
        self.items.clear()
        self.items.addItem(QListWidgetItem(_('**No items found**')))
--- a/src/calibre/gui2/jobs.py
+++ b/src/calibre/gui2/jobs.py
@ -172,8 +172,9 @@ class JobManager(QAbstractTableModel): # {{{
                    if job.is_finished:
                        self.job_done.emit(len(self.unfinished_jobs()))
            if needs_reset:
+                self.layoutAboutToBeChanged.emit()
                self.jobs.sort()
-                self.reset()
+                self.layoutChanged.emit()
            else:
                for job in jobs:
                    idx = self.jobs.index(job)
@ -267,7 +268,8 @@ class JobManager(QAbstractTableModel): # {{{
 # }}}

 # Jobs UI {{{
-class ProgressBarDelegate(QAbstractItemDelegate):
+
+class ProgressBarDelegate(QAbstractItemDelegate): # {{{

    def sizeHint(self, option, index):
        return QSize(120, 30)
@ -284,8 +286,9 @@ class ProgressBarDelegate(QAbstractItemDelegate):
        opts.progress = percent
        opts.text = QString(_('Unavailable') if percent == 0 else '%d%%'%percent)
        QApplication.style().drawControl(QStyle.CE_ProgressBar, opts, painter)
+# }}}

-class DetailView(QDialog, Ui_Dialog):
+class DetailView(QDialog, Ui_Dialog): # {{{

    def __init__(self, parent, job):
        QDialog.__init__(self, parent)
@ -318,8 +321,9 @@ class DetailView(QDialog, Ui_Dialog):
            self.next_pos = f.tell()
            if more:
                self.log.appendPlainText(more.decode('utf-8', 'replace'))
+# }}}

-class JobsButton(QFrame):
+class JobsButton(QFrame): # {{{

    def __init__(self, horizontal=False, size=48, parent=None):
        QFrame.__init__(self, parent)
@ -404,6 +408,7 @@ class JobsButton(QFrame):
            self.stop()
            QCoreApplication.instance().alert(self, 5000)

+# }}}

 class JobsDialog(QDialog, Ui_JobsDialog):

@ -446,7 +451,6 @@ class JobsDialog(QDialog, Ui_JobsDialog):
        except:
            pass

-
    def show_job_details(self, index):
        row = index.row()
        job = self.jobs_view.model().row_to_job(row)
@ -455,18 +459,23 @@ class JobsDialog(QDialog, Ui_JobsDialog):
        d.timer.stop()

    def show_details(self, *args):
-        for index in self.jobs_view.selectedIndexes():
+        index = self.jobs_view.currentIndex()
+        if index.isValid():
            self.show_job_details(index)
-            return

    def kill_job(self, *args):
-        if question_dialog(self, _('Are you sure?'), _('Do you really want to stop the selected job?')):
-            for index in self.jobs_view.selectionModel().selectedRows():
-                row = index.row()
+        rows = [index.row() for index in
+                self.jobs_view.selectionModel().selectedRows()]
+        if question_dialog(self, _('Are you sure?'),
+                ngettext('Do you really want to stop the selected job?',
+                    'Do you really want to stop all the selected jobs?',
+                    len(rows))):
+            for row in rows:
                self.model.kill_job(row, self)

    def kill_all_jobs(self, *args):
-        if question_dialog(self, _('Are you sure?'), _('Do you really want to stop all non-device jobs?')):
+        if question_dialog(self, _('Are you sure?'),
+                _('Do you really want to stop all non-device jobs?')):
            self.model.kill_all_jobs()

    def closeEvent(self, e):
--- a/src/calibre/gui2/library/delegates.py
+++ b/src/calibre/gui2/library/delegates.py
@ -16,7 +16,7 @@ from PyQt4.Qt import (QColor, Qt, QModelIndex, QSize, QApplication,

 from calibre.gui2 import UNDEFINED_QDATE, error_dialog
 from calibre.gui2.widgets import EnLineEdit
-from calibre.gui2.complete import MultiCompleteLineEdit
+from calibre.gui2.complete import MultiCompleteLineEdit, MultiCompleteComboBox
 from calibre.utils.date import now, format_date
 from calibre.utils.config import tweaks
 from calibre.utils.formatter import validation_formatter
@ -166,13 +166,26 @@ class TextDelegate(QStyledItemDelegate): # {{{

    def createEditor(self, parent, option, index):
        if self.auto_complete_function:
-            editor = MultiCompleteLineEdit(parent)
+            editor = MultiCompleteComboBox(parent)
            editor.set_separator(None)
            complete_items = [i[1] for i in self.auto_complete_function()]
            editor.update_items_cache(complete_items)
+            for item in sorted(complete_items, key=sort_key):
+                editor.addItem(item)
+            ct = index.data(Qt.DisplayRole).toString()
+            editor.setEditText(ct)
+            editor.lineEdit().selectAll()
        else:
            editor = EnLineEdit(parent)
        return editor
+
+    def setModelData(self, editor, model, index):
+        if isinstance(editor, MultiCompleteComboBox):
+            val = editor.lineEdit().text()
+            model.setData(index, QVariant(val), Qt.EditRole)
+        else:
+            QStyledItemDelegate.setModelData(self, editor, model, index)
+
 #}}}

 class CompleteDelegate(QStyledItemDelegate): # {{{
@ -188,7 +201,7 @@ class CompleteDelegate(QStyledItemDelegate): # {{{
    def createEditor(self, parent, option, index):
        if self.db and hasattr(self.db, self.items_func_name):
            col = index.model().column_map[index.column()]
-            editor = MultiCompleteLineEdit(parent)
+            editor = MultiCompleteComboBox(parent)
            editor.set_separator(self.sep)
            editor.set_space_before_sep(self.space_before_sep)
            if self.sep == '&':
@ -199,9 +212,21 @@ class CompleteDelegate(QStyledItemDelegate): # {{{
                all_items = list(self.db.all_custom(
                    label=self.db.field_metadata.key_to_label(col)))
            editor.update_items_cache(all_items)
+            for item in sorted(all_items, key=sort_key):
+                editor.addItem(item)
+            ct = index.data(Qt.DisplayRole).toString()
+            editor.setEditText(ct)
+            editor.lineEdit().selectAll()
        else:
            editor = EnLineEdit(parent)
        return editor
+
+    def setModelData(self, editor, model, index):
+        if isinstance(editor, MultiCompleteComboBox):
+            val = editor.lineEdit().text()
+            model.setData(index, QVariant(val), Qt.EditRole)
+        else:
+            QStyledItemDelegate.setModelData(self, editor, model, index)
 # }}}

 class CcDateDelegate(QStyledItemDelegate): # {{{
--- a/src/calibre/gui2/metadata/basic_widgets.py
+++ b/src/calibre/gui2/metadata/basic_widgets.py
@ -11,10 +11,10 @@ import textwrap, re, os

 from PyQt4.Qt import (Qt, QDateEdit, QDate, pyqtSignal, QMessageBox,
    QIcon, QToolButton, QWidget, QLabel, QGridLayout, QApplication,
-    QDoubleSpinBox, QListWidgetItem, QSize, QPixmap, QDialog,
-    QPushButton, QSpinBox, QLineEdit, QSizePolicy, QDialogButtonBox)
+    QDoubleSpinBox, QListWidgetItem, QSize, QPixmap, QDialog, QMenu,
+    QPushButton, QSpinBox, QLineEdit, QSizePolicy, QDialogButtonBox, QAction)

-from calibre.gui2.widgets import EnLineEdit, FormatList, ImageView
+from calibre.gui2.widgets import EnLineEdit, FormatList as _FormatList, ImageView
 from calibre.gui2.complete import MultiCompleteLineEdit, MultiCompleteComboBox
 from calibre.utils.icu import sort_key
 from calibre.utils.config import tweaks, prefs
@ -33,6 +33,7 @@ from calibre.gui2.comments_editor import Editor
 from calibre.library.comments import comments_to_html
 from calibre.gui2.dialogs.tag_editor import TagEditor
 from calibre.utils.icu import strcmp
+from calibre.ptempfile import PersistentTemporaryFile

 def save_dialog(parent, title, msg, det_msg=''):
    d = QMessageBox(parent)
@ -572,7 +573,9 @@ class BuddyLabel(QLabel): # {{{
        self.setAlignment(Qt.AlignRight|Qt.AlignVCenter)
 # }}}

-class Format(QListWidgetItem): # {{{
+# Formats {{{
+
+class Format(QListWidgetItem):

    def __init__(self, parent, ext, size, path=None, timestamp=None):
        self.path = path
@ -588,13 +591,52 @@ class Format(QListWidgetItem): # {{{
            self.setToolTip(text)
            self.setStatusTip(text)

-# }}}
+class OrigAction(QAction):

-class FormatsManager(QWidget): # {{{
+    restore_fmt = pyqtSignal(object)
+
+    def __init__(self, fmt, parent):
+        self.fmt = fmt.replace('ORIGINAL_', '')
+        QAction.__init__(self, _('Restore %s from the original')%self.fmt, parent)
+        self.triggered.connect(self._triggered)
+
+    def _triggered(self):
+        self.restore_fmt.emit(self.fmt)
+
+class FormatList(_FormatList):
+
+    restore_fmt = pyqtSignal(object)

    def __init__(self, parent):
+        _FormatList.__init__(self, parent)
+        self.setContextMenuPolicy(Qt.DefaultContextMenu)
+
+    def contextMenuEvent(self, event):
+        originals = [self.item(x).ext.upper() for x in range(self.count())]
+        originals = [x for x in originals if x.startswith('ORIGINAL_')]
+        if not originals:
+            return
+        self.cm = cm = QMenu(self)
+        for fmt in originals:
+            action = OrigAction(fmt, cm)
+            action.restore_fmt.connect(self.restore_fmt)
+            cm.addAction(action)
+        cm.popup(event.globalPos())
+        event.accept()
+
+    def remove_format(self, fmt):
+        for i in range(self.count()):
+            f = self.item(i)
+            if f.ext.upper() == fmt.upper():
+                self.takeItem(i)
+                break
+
+class FormatsManager(QWidget):
+
+    def __init__(self, parent, copy_fmt):
        QWidget.__init__(self, parent)
        self.dialog = parent
+        self.copy_fmt = copy_fmt
        self.changed = False

        self.l = l = QGridLayout()
@ -628,6 +670,7 @@ class FormatsManager(QWidget): # {{{
        self.formats = FormatList(self)
        self.formats.setAcceptDrops(True)
        self.formats.formats_dropped.connect(self.formats_dropped)
+        self.formats.restore_fmt.connect(self.restore_fmt)
        self.formats.delete_format.connect(self.remove_format)
        self.formats.itemDoubleClicked.connect(self.show_format)
        self.formats.setDragDropMode(self.formats.DropOnly)
@ -640,7 +683,7 @@ class FormatsManager(QWidget): # {{{
        l.addWidget(self.remove_format_button,        2, 2, 1, 1)
        l.addWidget(self.formats,                     0, 1, 3, 1)

-
+        self.temp_files = []

    def initialize(self, db, id_):
        self.changed = False
@ -694,6 +737,16 @@ class FormatsManager(QWidget): # {{{
                             [(_('Books'), BOOK_EXTENSIONS)])
        self._add_formats(files)

+    def restore_fmt(self, fmt):
+        pt = PersistentTemporaryFile(suffix='_restore_fmt.'+fmt.lower())
+        ofmt = 'ORIGINAL_'+fmt
+        with pt:
+            self.copy_fmt(ofmt, pt)
+        self._add_formats((pt.name,))
+        self.temp_files.append(pt.name)
+        self.changed = True
+        self.formats.remove_format(ofmt)
+
    def _add_formats(self, paths):
        added = False
        if not paths:
@ -774,6 +827,13 @@ class FormatsManager(QWidget): # {{{

    def break_cycles(self):
        self.dialog = None
+        self.copy_fmt = None
+        for name in self.temp_files:
+            try:
+                os.remove(name)
+            except:
+                pass
+        self.temp_files = []
 # }}}

 class Cover(ImageView): # {{{
--- a/src/calibre/gui2/metadata/single.py
+++ b/src/calibre/gui2/metadata/single.py
@ -145,7 +145,7 @@ class MetadataSingleDialogBase(ResizableDialog):
        self.series_index = SeriesIndexEdit(self, self.series)
        self.basic_metadata_widgets.extend([self.series, self.series_index])

-        self.formats_manager = FormatsManager(self)
+        self.formats_manager = FormatsManager(self, self.copy_fmt)
        self.basic_metadata_widgets.append(self.formats_manager)
        self.formats_manager.metadata_from_format_button.clicked.connect(
                self.metadata_from_format)
@ -240,6 +240,8 @@ class MetadataSingleDialogBase(ResizableDialog):
        else:
            self.view_format.emit(self.book_id, fmt)

+    def copy_fmt(self, fmt, f):
+        self.db.copy_format_to(self.book_id, fmt, f, index_is_id=True)

    def do_layout(self):
        raise NotImplementedError()
--- a/src/calibre/gui2/preferences/look_feel.py
+++ b/src/calibre/gui2/preferences/look_feel.py
@ -105,13 +105,18 @@ class ConfigWidget(ConfigWidgetBase, Ui_Form):

        r('cover_flow_queue_length', config, restart_required=True)

+        def get_esc_lang(l):
+            if l == 'en':
+                return 'English'
+            return get_language(l)
+
        lang = get_lang()
        if lang is None or lang not in available_translations():
            lang = 'en'
-        items = [(l, get_language(l)) for l in available_translations() \
+        items = [(l, get_esc_lang(l)) for l in available_translations() \
                 if l != lang]
        if lang != 'en':
-            items.append(('en', get_language('en')))
+            items.append(('en', get_esc_lang('en')))
        items.sort(cmp=lambda x, y: cmp(x[1].lower(), y[1].lower()))
        choices = [(y, x) for x, y in items]
        # Default language is the autodetected one
--- a/src/calibre/gui2/preferences/toolbar.ui
+++ b/src/calibre/gui2/preferences/toolbar.ui
@ -14,30 +14,18 @@
   <string>Form</string>
  </property>
  <layout class="QGridLayout" name="gridLayout">
-   <item row="0" column="0">
-    <widget class="QLabel" name="label">
-     <property name="font">
-      <font>
-       <weight>75</weight>
-       <bold>true</bold>
-      </font>
-     </property>
-     <property name="text">
-      <string>Choose the &amp;toolbar to customize:</string>
-     </property>
-     <property name="buddy">
-      <cstring>what</cstring>
-     </property>
-    </widget>
-   </item>
-   <item row="0" column="1" colspan="3">
+   <item row="0" column="0" colspan="5">
    <widget class="QComboBox" name="what">
     <property name="font">
      <font>
+       <pointsize>20</pointsize>
       <weight>75</weight>
       <bold>true</bold>
      </font>
     </property>
+     <property name="toolTip">
+      <string>Choose the toolbar to customize</string>
+     </property>
     <property name="sizeAdjustPolicy">
      <enum>QComboBox::AdjustToMinimumContentsLengthWithIcon</enum>
     </property>
@ -46,7 +34,7 @@
     </property>
    </widget>
   </item>
-   <item row="1" column="0" rowspan="2">
+   <item row="1" column="0" colspan="2">
    <widget class="QGroupBox" name="groupBox">
     <property name="title">
      <string>A&amp;vailable actions</string>
@ -74,7 +62,67 @@
     </layout>
    </widget>
   </item>
-   <item row="1" column="2" rowspan="2">
+   <item row="1" column="2">
+    <layout class="QVBoxLayout" name="verticalLayout_3">
+     <item>
+      <widget class="QToolButton" name="add_action_button">
+       <property name="toolTip">
+        <string>Add selected actions to toolbar</string>
+       </property>
+       <property name="text">
+        <string>...</string>
+       </property>
+       <property name="icon">
+        <iconset resource="../../../../resources/images.qrc">
+         <normaloff>:/images/forward.png</normaloff>:/images/forward.png</iconset>
+       </property>
+       <property name="iconSize">
+        <size>
+         <width>24</width>
+         <height>24</height>
+        </size>
+       </property>
+      </widget>
+     </item>
+     <item>
+      <spacer name="verticalSpacer">
+       <property name="orientation">
+        <enum>Qt::Vertical</enum>
+       </property>
+       <property name="sizeType">
+        <enum>QSizePolicy::Fixed</enum>
+       </property>
+       <property name="sizeHint" stdset="0">
+        <size>
+         <width>20</width>
+         <height>40</height>
+        </size>
+       </property>
+      </spacer>
+     </item>
+     <item>
+      <widget class="QToolButton" name="remove_action_button">
+       <property name="toolTip">
+        <string>Remove selected actions from toolbar</string>
+       </property>
+       <property name="text">
+        <string>...</string>
+       </property>
+       <property name="icon">
+        <iconset resource="../../../../resources/images.qrc">
+         <normaloff>:/images/back.png</normaloff>:/images/back.png</iconset>
+       </property>
+       <property name="iconSize">
+        <size>
+         <width>24</width>
+         <height>24</height>
+        </size>
+       </property>
+      </widget>
+     </item>
+    </layout>
+   </item>
+   <item row="1" column="3" colspan="2">
    <widget class="QGroupBox" name="groupBox_2">
     <property name="title">
      <string>&amp;Current actions</string>
@ -162,66 +210,6 @@
     </layout>
    </widget>
   </item>
-   <item row="1" column="1" rowspan="2">
-    <layout class="QVBoxLayout" name="verticalLayout_3">
-     <item>
-      <widget class="QToolButton" name="add_action_button">
-       <property name="toolTip">
-        <string>Add selected actions to toolbar</string>
-       </property>
-       <property name="text">
-        <string>...</string>
-       </property>
-       <property name="icon">
-        <iconset resource="../../../../resources/images.qrc">
-         <normaloff>:/images/forward.png</normaloff>:/images/forward.png</iconset>
-       </property>
-       <property name="iconSize">
-        <size>
-         <width>24</width>
-         <height>24</height>
-        </size>
-       </property>
-      </widget>
-     </item>
-     <item>
-      <spacer name="verticalSpacer">
-       <property name="orientation">
-        <enum>Qt::Vertical</enum>
-       </property>
-       <property name="sizeType">
-        <enum>QSizePolicy::Fixed</enum>
-       </property>
-       <property name="sizeHint" stdset="0">
-        <size>
-         <width>20</width>
-         <height>40</height>
-        </size>
-       </property>
-      </spacer>
-     </item>
-     <item>
-      <widget class="QToolButton" name="remove_action_button">
-       <property name="toolTip">
-        <string>Remove selected actions from toolbar</string>
-       </property>
-       <property name="text">
-        <string>...</string>
-       </property>
-       <property name="icon">
-        <iconset resource="../../../../resources/images.qrc">
-         <normaloff>:/images/back.png</normaloff>:/images/back.png</iconset>
-       </property>
-       <property name="iconSize">
-        <size>
-         <width>24</width>
-         <height>24</height>
-        </size>
-       </property>
-      </widget>
-     </item>
-    </layout>
-   </item>
  </layout>
 </widget>
 <resources>
--- a/src/calibre/gui2/store/init.py
+++ b/src/calibre/gui2/store/init.py
@ -6,6 +6,8 @@ __license__ = 'GPL 3'
 __copyright__ = '2011, John Schember <john@nachtimwald.com>'
 __docformat__ = 'restructuredtext en'

+from calibre.utils.filenames import ascii_filename
+
 class StorePlugin(object): # {{{
    '''
    A plugin representing an online ebook repository (store). The store can
@ -43,7 +45,7 @@ class StorePlugin(object): # {{{
    The easiest way to handle affiliate money payouts is to randomly select
    between the author's affiliate id and calibre's affiliate id so that
    70% of the time the author's id is used.
-    
+
    See declined.txt for a list of stores that do not want to be included.
    '''

@ -53,7 +55,7 @@ class StorePlugin(object): # {{{
        self.gui = gui
        self.name = name
        self.base_plugin = None
-        self.config = JSONConfig('store/stores/' + self.name)
+        self.config = JSONConfig('store/stores/' + ascii_filename(self.name))

    def open(self, gui, parent=None, detail_item=None, external=False):
        '''
--- a/src/calibre/gui2/store/declined.txt
+++ b/src/calibre/gui2/store/declined.txt
@ -4,3 +4,4 @@ or asked not to be included in the store integration.
 * Borders (http://www.borders.com/).
 * Indigo (http://www.chapters.indigo.ca/).
 * Libraria Rizzoli (http://libreriarizzoli.corriere.it/).
+* EPubBuy DE: reason: too much traffic for too little sales
--- a/Show More
+++ b/Show More