sync with Kovid's branch

2025-08-30 23:00:21 -04:00 · 2013-03-25 22:58:46 +01:00 · 2013-03-25 22:58:46 +01:00 · 3ed040d74b
commit 3ed040d74b
parent 39058f2552 4f6709d754
18 changed files with 371 additions and 152 deletions
--- a/manual/conversion.rst
+++ b/manual/conversion.rst
@ -434,6 +434,18 @@ a number of older formats either do not support a metadata based Table of Conten
 documents do not have one. In these cases, the options in this section can help you automatically
 generate a Table of Contents in the converted ebook, based on the actual content in the input document.

+.. note:: Using these options can be a little challenging to get exactly right.
+    If you prefer creating/editing the Table of Contents by hand, convert to
+    the EPUB or AZW3 formats and select the checkbox at the bottom of the
+    screen that says 
+    :guilabel:`Manually fine-tune the Table of Contents after conversion`. 
+    This will launch the ToC Editor tool after the conversion. It allows you to
+    create entries in the Table of Contents by simply clicking the place in the
+    book where you want the entry to point. You can also use the ToC Editor by
+    itself, without doing a conversion. Go to :guilabel:`Preferences->Toolbars`
+    and add the ToC Editor to the main toolbar. Then just select the book you
+    want to edit and click the ToC Editor button.
+
 The first option is :guilabel:`Force use of auto-generated Table of Contents`. By checking this option
 you can have |app| override any Table of Contents found in the metadata of the input document with the
 auto generated one. 
@ -456,7 +468,7 @@ For example, to remove all entries titles "Next" or "Previous" use::

    Next|Previous

-Finally, the :guilabel:`Level 1,2,3 TOC` options allow you to create a sophisticated multi-level Table of Contents.
+The :guilabel:`Level 1,2,3 TOC` options allow you to create a sophisticated multi-level Table of Contents.
 They are XPath expressions that match tags in the intermediate XHTML produced by the conversion pipeline. See the 
 :ref:`conversion-introduction` for how to get access to this XHTML. Also read the :ref:`xpath-tutorial`, to learn
 how to construct XPath expressions. Next to each option is a button that launches a wizard to help with the creation
--- a/manual/faq.rst
+++ b/manual/faq.rst
@ -87,7 +87,9 @@ this bug.

 How do I convert a collection of HTML files in a specific order?
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-In order to convert a collection of HTML files in a specific oder, you have to create a table of contents file. That is, another HTML file that contains links to all the other files in the desired order. Such a file looks like::
+In order to convert a collection of HTML files in a specific oder, you have to
+create a table of contents file. That is, another HTML file that contains links
+to all the other files in the desired order. Such a file looks like::

   <html>
      <body>
@ -102,18 +104,35 @@ In order to convert a collection of HTML files in a specific oder, you have to c
      </body>
   </html>

-Then just add this HTML file to the GUI and use the convert button to create your ebook. 
+Then, just add this HTML file to the GUI and use the convert button to create
+your ebook. You can use the option in the Table of Contents section in the
+conversion dialog to control how the Table of Contents is generated.

-.. note:: By default, when adding HTML files, |app| follows links in the files in *depth first* order. This means that if file A.html links to B.html and C.html and D.html, but B.html also links to D.html, then the files will be in the order A.html, B.html, D.html, C.html. If instead you want the order to be A.html, B.html, C.html, D.html then you must tell |app| to add your files in *breadth first* order. Do this by going to Preferences->Plugins and customizing the HTML to ZIP plugin.
+.. note:: By default, when adding HTML files, |app| follows links in the files
+    in *depth first* order. This means that if file A.html links to B.html and
+    C.html and D.html, but B.html also links to D.html, then the files will be
+    in the order A.html, B.html, D.html, C.html. If instead you want the order
+    to be A.html, B.html, C.html, D.html then you must tell |app| to add your
+    files in *breadth first* order. Do this by going to Preferences->Plugins
+    and customizing the HTML to ZIP plugin.

 The EPUB I produced with |app| is not valid?
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

-|app| does not guarantee that an EPUB produced by it is valid. The only guarantee it makes is that if you feed it valid XHTML 1.1 + CSS 2.1 it will output a valid EPUB. |app| is designed for ebook consumers, not producers. It tries hard to ensure that EPUBs it produces actually work as intended on a wide variety of devices, a goal that is incompatible with producing valid EPUBs, and one that is far more important to the vast majority of its users. If you need a tool that always produces valid EPUBs, |app| is not for you.
+|app| does not guarantee that an EPUB produced by it is valid. The only
+guarantee it makes is that if you feed it valid XHTML 1.1 + CSS 2.1 it will
+output a valid EPUB. |app| is designed for ebook consumers, not producers. It
+tries hard to ensure that EPUBs it produces actually work as intended on a wide
+variety of devices, a goal that is incompatible with producing valid EPUBs, and
+one that is far more important to the vast majority of its users. If you need a
+tool that always produces valid EPUBs, |app| is not for you.

 How do I use some of the advanced features of the conversion tools?
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- You can get help on any individual feature of the converters by mousing over it in the GUI or running ``ebook-convert dummy.html .epub -h`` at a terminal. A good place to start is to look at the following demo files that demonstrate some of the advanced features:
+ You can get help on any individual feature of the converters by mousing over
+ it in the GUI or running ``ebook-convert dummy.html .epub -h`` at a terminal.
+ A good place to start is to look at the following demo files that demonstrate
+ some of the advanced features:
  * `html-demo.zip <http://calibre-ebook.com/downloads/html-demo.zip>`_


@ -126,11 +145,11 @@ Device Integration

 What devices does |app| support?
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-|app| can directly connect to all the major (and most of the minor) ebook reading devices,
-smarthphones, tablets, etc.
-In addition, using the :guilabel:`Connect to folder` function you can use it with any ebook reader that exports itself as a USB disk. 
-You can even connect to Apple devices (via iTunes), using the :guilabel:`Connect to iTunes`
-function.
+|app| can directly connect to all the major (and most of the minor) ebook
+reading devices, smarthphones, tablets, etc.  In addition, using the
+:guilabel:`Connect to folder` function you can use it with any ebook reader
+that exports itself as a USB disk.  You can even connect to Apple devices (via
+iTunes), using the :guilabel:`Connect to iTunes` function.

 .. _devsupport:

--- a/recipes/eclipseonline.recipe
+++ b/recipes/eclipseonline.recipe
@ -0,0 +1,38 @@
+from calibre.web.feeds.news import BasicNewsRecipe
+class EclipseOnline(BasicNewsRecipe):
+	
+	#
+	# oldest_article specifies the maximum age, in days, of posts to retrieve.
+	# The default of 32 is intended to work well with a "days of month = 1"
+	# recipe schedule to download "monthly issues" of Eclipse Online.
+	# Increase this value to include additional posts. However, the RSS feed
+	# currently only includes the 10 most recent posts, so that's the max.
+	#
+	oldest_article = 32
+	
+	title = u'Eclipse Online'
+	description = u'"Where strange and wonderful things happen, where reality is eclipsed for a little while with something magical and new." Eclipse Online is edited by Jonathan Strahan and published online by Night Shade Books. http://www.nightshadebooks.com/category/eclipse/'
+	publication_type = 'magazine'
+	language = 'en'
+	
+	__author__ = u'Jim DeVona'
+	__version__ = '1.0'
+	
+	# For now, use this Eclipse Online logo as the ebook cover image.
+	# (Disable the cover_url line to let Calibre generate a default cover, including date.)
+	cover_url = 'http://www.nightshadebooks.com/wp-content/uploads/2012/10/Eclipse-Logo.jpg'
+		
+	# Extract the "post" div containing the story (minus redundant metadata) from each page.
+	keep_only_tags = [dict(name='div', attrs={'class':lambda x: x and 'post' in x})]
+	remove_tags = [dict(name='span', attrs={'class': ['post-author', 'post-category', 'small']})]
+
+	# Nice plain markup (like Eclipse's) works best for most e-readers.
+	# Disregard any special styling rules, but center illustrations.
+	auto_cleanup = False
+	no_stylesheets = True
+	remove_attributes = ['style', 'align']
+	extra_css = '.wp-caption {text-align: center;} .wp-caption-text {font-size: small; font-style: italic;}'
+	
+	# Tell Calibre where to look for article links. It will proceed to retrieve
+	# these posts and format them into an ebook according to the above rules.
+	feeds = ['http://www.nightshadebooks.com/category/eclipse/feed/']
--- a/recipes/hindu.recipe
+++ b/recipes/hindu.recipe
@ -2,7 +2,6 @@ from __future__ import with_statement
 __license__ = 'GPL 3'
 __copyright__ = '2009, Kovid Goyal <kovid@kovidgoyal.net>'

-import time
 from calibre.web.feeds.news import BasicNewsRecipe

 class TheHindu(BasicNewsRecipe):
@ -14,44 +13,42 @@ class TheHindu(BasicNewsRecipe):
    max_articles_per_feed = 100
    no_stylesheets = True

-    keep_only_tags = [dict(id='content')]
-    remove_tags = [dict(attrs={'class':['article-links', 'breadcr']}),
-            dict(id=['email-section', 'right-column', 'printfooter', 'topover',
-                     'slidebox', 'th_footer'])]
+    auto_cleanup = True
+

    extra_css = '.photo-caption { font-size: smaller }'

-    def preprocess_raw_html(self, raw, url):
-        return raw.replace('<body><p>', '<p>').replace('</p></body>', '</p>')
-
-    def postprocess_html(self, soup, first_fetch):
-        for t in soup.findAll(['table', 'tr', 'td','center']):
-            t.name = 'div'
-        return soup
-
    def parse_index(self):
-        today = time.strftime('%Y-%m-%d')
-        soup = self.index_to_soup(
-                'http://www.thehindu.com/todays-paper/tp-index/?date=' + today)
-        div = soup.find(id='left-column')
-        feeds = []
+        soup = self.index_to_soup('http://www.thehindu.com/todays-paper/')
+        div = soup.find('div', attrs={'id':'left-column'})
+        soup.find(id='subnav-tpbar').extract()
+
+
+
        current_section = None
        current_articles = []
-        for x in div.findAll(['h3', 'div']):
-            if current_section and x.get('class', '') == 'tpaper':
-                a = x.find('a', href=True)
-                if a is not None:
-                    title = self.tag_to_string(a)
-                    self.log('\tFound article:', title)
-                    current_articles.append({'url':a['href']+'?css=print',
-                        'title':title, 'date': '',
-                        'description':''})
-            if x.name == 'h3':
-                if current_section and current_articles:
+        feeds = []
+        for x in div.findAll(['a', 'span']):
+            if x.name == 'span' and x['class'] == 's-link':
+                # Section heading found
+                if current_articles and current_section:
                    feeds.append((current_section, current_articles))
                current_section = self.tag_to_string(x)
-                self.log('Found section:', current_section)
                current_articles = []
+                self.log('\tFound section:', current_section)
+            elif x.name == 'a':
+
+                        title = self.tag_to_string(x)
+                        url = x.get('href', False)
+                        if not url or not title:
+                            continue
+                        self.log('\t\tFound article:', title)
+                        self.log('\t\t\t', url)
+                        current_articles.append({'title': title, 'url':url,
+                            'description':'', 'date':''})
+
+        if current_articles and current_section:
+             feeds.append((current_section, current_articles))
+
        return feeds

-
--- a/recipes/interia_fakty.recipe
+++ b/recipes/interia_fakty.recipe
@ -5,7 +5,7 @@ __copyright__ = u'2010-2013, Tomasz Dlugosz <tomek3d@gmail.com>'
 '''
 fakty.interia.pl
 '''
-
+import re
 from calibre.web.feeds.news import BasicNewsRecipe

 class InteriaFakty(BasicNewsRecipe):
--- a/recipes/irish_times.recipe
+++ b/recipes/irish_times.recipe
@ -1,65 +1,62 @@
 __license__  = 'GPL v3'
-__copyright__ = "2008, Derry FitzGerald. 2009 Modified by Ray Kinsella and David O'Callaghan, 2011 Modified by Phil Burns"
+__copyright__ = "2008, Derry FitzGerald. 2009 Modified by Ray Kinsella and David O'Callaghan, 2011 Modified by Phil Burns, 2013 Tom Scholl"
 '''
 irishtimes.com
 '''
-import re
+import urlparse, re

 from calibre.web.feeds.news import BasicNewsRecipe
+from calibre.ptempfile import PersistentTemporaryFile
+

 class IrishTimes(BasicNewsRecipe):
    title          = u'The Irish Times'
-    encoding  = 'ISO-8859-15'
-    __author__    = "Derry FitzGerald, Ray Kinsella, David O'Callaghan and Phil Burns"
+    __author__    = "Derry FitzGerald, Ray Kinsella, David O'Callaghan and Phil Burns, Tom Scholl"
    language = 'en_IE'
-    timefmt = ' (%A, %B %d, %Y)'

+    masthead_url = 'http://www.irishtimes.com/assets/images/generic/website/logo_theirishtimes.png'

+    encoding = 'utf-8'
    oldest_article = 1.0
    max_articles_per_feed = 100
+    remove_empty_feeds = True
    no_stylesheets = True
-    simultaneous_downloads= 5
-
-    r = re.compile('.*(?P<url>http:\/\/(www.irishtimes.com)|(rss.feedsportal.com\/c)\/.*\.html?).*')
-    remove_tags    = [dict(name='div', attrs={'class':'footer'})]
-    extra_css      = 'p, div { margin: 0pt; border: 0pt; text-indent: 0.5em } .headline {font-size: large;} \n .fact { padding-top: 10pt  }'
+    temp_files = []
+    articles_are_obfuscated = True

    feeds          = [
-                      ('Frontpage', 'http://www.irishtimes.com/feeds/rss/newspaper/index.rss'),
-                      ('Ireland', 'http://www.irishtimes.com/feeds/rss/newspaper/ireland.rss'),
-                      ('World', 'http://www.irishtimes.com/feeds/rss/newspaper/world.rss'),
-                      ('Finance', 'http://www.irishtimes.com/feeds/rss/newspaper/finance.rss'),
-                      ('Features', 'http://www.irishtimes.com/feeds/rss/newspaper/features.rss'),
-                      ('Sport', 'http://www.irishtimes.com/feeds/rss/newspaper/sport.rss'),
-                      ('Opinion', 'http://www.irishtimes.com/feeds/rss/newspaper/opinion.rss'),
-                      ('Letters', 'http://www.irishtimes.com/feeds/rss/newspaper/letters.rss'),
-                      ('Magazine', 'http://www.irishtimes.com/feeds/rss/newspaper/magazine.rss'),
-                      ('Health', 'http://www.irishtimes.com/feeds/rss/newspaper/health.rss'),
-                      ('Education & Parenting', 'http://www.irishtimes.com/feeds/rss/newspaper/education.rss'),
-                      ('Motors', 'http://www.irishtimes.com/feeds/rss/newspaper/motors.rss'),
-                      ('An Teanga Bheo', 'http://www.irishtimes.com/feeds/rss/newspaper/anteangabheo.rss'),
-                      ('Commercial Property', 'http://www.irishtimes.com/feeds/rss/newspaper/commercialproperty.rss'),
-                      ('Science Today', 'http://www.irishtimes.com/feeds/rss/newspaper/sciencetoday.rss'),
-                      ('Property', 'http://www.irishtimes.com/feeds/rss/newspaper/property.rss'),
-                      ('The Tickets', 'http://www.irishtimes.com/feeds/rss/newspaper/theticket.rss'),
-                      ('Weekend', 'http://www.irishtimes.com/feeds/rss/newspaper/weekend.rss'),
-                      ('News features', 'http://www.irishtimes.com/feeds/rss/newspaper/newsfeatures.rss'),
-                      ('Obituaries', 'http://www.irishtimes.com/feeds/rss/newspaper/obituaries.rss'),
+                      ('News', 'http://www.irishtimes.com/cmlink/the-irish-times-news-1.1319192'),
+                      ('World', 'http://www.irishtimes.com/cmlink/irishtimesworldfeed-1.1321046'),
+                      ('Politics', 'http://www.irishtimes.com/cmlink/irish-times-politics-rss-1.1315953'),
+                      ('Business', 'http://www.irishtimes.com/cmlink/the-irish-times-business-1.1319195'),
+                      ('Culture', 'http://www.irishtimes.com/cmlink/the-irish-times-culture-1.1319213'),
+                      ('Sport', 'http://www.irishtimes.com/cmlink/the-irish-times-sport-1.1319194'),
+                      ('Debate', 'http://www.irishtimes.com/cmlink/debate-1.1319211'),
+                      ('Life & Style', 'http://www.irishtimes.com/cmlink/the-irish-times-life-style-1.1319214'),
                    ]


-    def print_version(self, url):
-        if url.count('rss.feedsportal.com'):
-            #u = url.replace('0Bhtml/story01.htm','_pf0Bhtml/story01.htm')
-            u = url.find('irishtimes')
-            u = 'http://www.irishtimes.com' + url[u + 12:]
-            u = u.replace('0C', '/')
-            u = u.replace('A', '')
-            u = u.replace('0Bhtml/story01.htm', '_pf.html')
-        else:
-            u = url.replace('.html','_pf.html')
-        return u
+    def get_obfuscated_article(self, url):
+        # Insert a pic from the original url, but use content from the print url
+        pic = None
+        pics = self.index_to_soup(url)
+        div = pics.find('div', {'class' : re.compile('image-carousel')})
+        if div:
+            pic = div.img
+            if pic:
+                try:
+                    pic['src'] = urlparse.urljoin(url, pic['src'])
+                    pic.extract()
+                except:
+                    pic = None
+
+        content = self.index_to_soup(url + '?mode=print&ot=example.AjaxPageLayout.ot')
+        if pic:
+            content.p.insert(0, pic)
+
+        self.temp_files.append(PersistentTemporaryFile('_fa.html'))
+        self.temp_files[-1].write(content.prettify())
+        self.temp_files[-1].close()
+        return self.temp_files[-1].name

-    def get_article_url(self, article):
-        return article.link

--- a/recipes/ledevoir.recipe
+++ b/recipes/ledevoir.recipe
@ -2,7 +2,7 @@ __license__   = 'GPL v3'
 __author__    = 'Lorenzo Vigentini and Olivier Daigle'
 __copyright__ = '2012, Lorenzo Vigentini <l.vigentini at gmail.com>, Olivier Daigle <odaigle _at nuvucameras __dot__ com>'
 __version__     = 'v1.01'
-__date__        = '22, December 2012'
+__date__        = '17, March 2013'
 __description__   = 'Canadian Paper '

 '''
@ -12,6 +12,7 @@ http://www.ledevoir.com/
 import re

 from calibre.web.feeds.news import BasicNewsRecipe
+from calibre.utils.magick import Image

 class ledevoir(BasicNewsRecipe):
    author        = 'Lorenzo Vigentini'
@ -28,10 +29,14 @@ class ledevoir(BasicNewsRecipe):

    oldest_article = 1
    max_articles_per_feed = 200
+    min_articles_per_feed = 0
    use_embedded_content  = False
    recursion             = 10
    needs_subscription    = 'optional'

+    compress_news_images = True
+    compress_news_images_auto_size = 4
+
    filterDuplicates = False
    url_list = []

@ -66,16 +71,16 @@ class ledevoir(BasicNewsRecipe):

    feeds          = [
                       (u'A la une', 'http://www.ledevoir.com/rss/manchettes.xml'),
-#                       (u'Édition complete', 'http://feeds2.feedburner.com/fluxdudevoir'),
-#                       (u'Opinions', 'http://www.ledevoir.com/rss/opinions.xml'),
-#                       (u'Chroniques', 'http://www.ledevoir.com/rss/chroniques.xml'),
-#                       (u'Politique', 'http://www.ledevoir.com/rss/section/politique.xml?id=51'),
-#                       (u'International', 'http://www.ledevoir.com/rss/section/international.xml?id=76'),
-#                       (u'Culture', 'http://www.ledevoir.com/rss/section/culture.xml?id=48'),
-#                       (u'Environnement', 'http://www.ledevoir.com/rss/section/environnement.xml?id=78'),
-#                       (u'Societe', 'http://www.ledevoir.com/rss/section/societe.xml?id=52'),
-#                       (u'Economie', 'http://www.ledevoir.com/rss/section/economie.xml?id=49'),
-#                       (u'Sports', 'http://www.ledevoir.com/rss/section/sports.xml?id=85'),
+                       (u'Édition complete', 'http://feeds2.feedburner.com/fluxdudevoir'),
+                       (u'Opinions', 'http://www.ledevoir.com/rss/opinions.xml'),
+                       (u'Chroniques', 'http://www.ledevoir.com/rss/chroniques.xml'),
+                       (u'Politique', 'http://www.ledevoir.com/rss/section/politique.xml?id=51'),
+                       (u'International', 'http://www.ledevoir.com/rss/section/international.xml?id=76'),
+                       (u'Culture', 'http://www.ledevoir.com/rss/section/culture.xml?id=48'),
+                       (u'Environnement', 'http://www.ledevoir.com/rss/section/environnement.xml?id=78'),
+                       (u'Societe', 'http://www.ledevoir.com/rss/section/societe.xml?id=52'),
+                       (u'Economie', 'http://www.ledevoir.com/rss/section/economie.xml?id=49'),
+                       (u'Sports', 'http://www.ledevoir.com/rss/section/sports.xml?id=85'),
                       (u'Art de vivre', 'http://www.ledevoir.com/rss/section/art-de-vivre.xml?id=50')
                     ]

@ -113,3 +118,23 @@ class ledevoir(BasicNewsRecipe):
        self.url_list.append(url)
        return url

+'''
+    def postprocess_html(self, soup, first):
+        #process all the images. assumes that the new html has the correct path
+        if first == 0:
+          return soup
+
+        for tag in soup.findAll(lambda tag: tag.name.lower()=='img' and tag.has_key('src')):
+             iurl = tag['src']
+             img = Image()
+             img.open(iurl)
+        #     width, height = img.size
+        #     print 'img is: ', iurl, 'width is: ', width, 'height is: ', height 
+             if img < 0:
+                raise RuntimeError('Out of memory')
+             img.set_compression_quality(30)
+             img.save(iurl)
+        return soup
+'''
+        
+           
--- a/src/calibre/db/tests/writing.py
+++ b/src/calibre/db/tests/writing.py
@ -252,7 +252,21 @@ class WritingTest(BaseTest):
        ae(cache.field_for('author_sort', 1), 'GoyaL, KoviD')
        ae(cache.field_for('author_sort', 3), 'GoyaL, KoviD & Layog, Divok')

-        # TODO: identifiers, languages
+        # Languages
+        f = cache.fields['languages']
+        ae(f.table.id_map, {1: 'eng', 2: 'deu'})
+        ae(sf('languages', {1:''}), set([1]))
+        ae(cache.field_for('languages', 1), ())
+        ae(sf('languages', {2:('und',)}), set([2]))
+        af(f.table.id_map)
+        ae(sf('languages', {1:'eng,fra,deu', 2:'es,Dutch', 3:'English'}), {1, 2, 3})
+        ae(cache.field_for('languages', 1), ('eng', 'fra', 'deu'))
+        ae(cache.field_for('languages', 2), ('spa', 'nld'))
+        ae(cache.field_for('languages', 3), ('eng',))
+        ae(sf('languages', {3:None}), set([3]))
+        ae(cache.field_for('languages', 3), ())
+
+        # TODO: identifiers

    # }}}

--- a/src/calibre/db/write.py
+++ b/src/calibre/db/write.py
@ -15,6 +15,7 @@ from calibre.constants import preferred_encoding, ispy3
 from calibre.ebooks.metadata import author_to_author_sort
 from calibre.utils.date import (parse_only_date, parse_date, UNDEFINED_DATE,
                                isoformat)
+from calibre.utils.localization import canonicalize_lang
 from calibre.utils.icu import strcmp

 if ispy3:
@ -96,6 +97,15 @@ def adapt_bool(x):
            x = bool(int(x))
    return x if x is None else bool(x)

+def adapt_languages(to_tuple, x):
+    ans = []
+    for lang in to_tuple(x):
+        lc = canonicalize_lang(lang)
+        if not lc or lc in ans or lc in ('und', 'zxx', 'mis', 'mul'):
+            continue
+        ans.append(lc)
+    return tuple(ans)
+
 def get_adapter(name, metadata):
    dt = metadata['datatype']
    if dt == 'text':
@ -133,6 +143,8 @@ def get_adapter(name, metadata):
        return lambda x: ans(x) or UNDEFINED_DATE
    if name == 'series_index':
        return lambda x: 1.0 if ans(x) is None else ans(x)
+    if name == 'languages':
+        return partial(adapt_languages, ans)

    return ans
 # }}}
--- a/src/calibre/gui2/actions/convert.py
+++ b/src/calibre/gui2/actions/convert.py
@ -167,8 +167,8 @@ class ConvertAction(InterfaceAction):
    def queue_convert_jobs(self, jobs, changed, bad, rows, previous,
            converted_func, extra_job_args=[], rows_are_ids=False):
        for func, args, desc, fmt, id, temp_files in jobs:
-            func, _, same_fmt = func.partition(':')
-            same_fmt = same_fmt == 'same_fmt'
+            func, _, parts = func.partition(':')
+            parts = {x for x in parts.split(';')}
            input_file = args[0]
            input_fmt = os.path.splitext(input_file)[1]
            core_usage = 1
@ -182,7 +182,8 @@ class ConvertAction(InterfaceAction):
                job = self.gui.job_manager.run_job(Dispatcher(converted_func),
                                            func, args=args, description=desc,
                                            core_usage=core_usage)
-                job.conversion_of_same_fmt = same_fmt
+                job.conversion_of_same_fmt = 'same_fmt' in parts
+                job.manually_fine_tune_toc = 'manually_fine_tune_toc' in parts
                args = [temp_files, fmt, id]+extra_job_args
                self.conversion_jobs[job] = tuple(args)

@ -223,6 +224,7 @@ class ConvertAction(InterfaceAction):
                self.gui.job_exception(job)
                return
            same_fmt = getattr(job, 'conversion_of_same_fmt', False)
+            manually_fine_tune_toc = getattr(job, 'manually_fine_tune_toc', False)
            fmtf = temp_files[-1].name
            if os.stat(fmtf).st_size < 1:
                raise Exception(_('Empty output file, '
@ -248,4 +250,7 @@ class ConvertAction(InterfaceAction):
            current = self.gui.library_view.currentIndex()
            if current.isValid():
                self.gui.library_view.model().current_changed(current, QModelIndex())
+        if manually_fine_tune_toc:
+            self.gui.iactions['Edit ToC'].do_one(book_id, fmt.upper())
+

--- a/src/calibre/gui2/convert/bulk.py
+++ b/src/calibre/gui2/convert/bulk.py
@ -88,6 +88,7 @@ class BulkConfig(Config):
        ps = widget_factory(PageSetupWidget)
        sd = widget_factory(StructureDetectionWidget)
        toc = widget_factory(TOCWidget)
+        toc.manually_fine_tune_toc.hide()

        output_widget = self.plumber.output_plugin.gui_configuration_widget(
                self.stack, self.plumber.get_option_by_name,
--- a/src/calibre/gui2/convert/single.py
+++ b/src/calibre/gui2/convert/single.py
@ -165,6 +165,12 @@ class Config(ResizableDialog, Ui_Dialog):
    def output_format(self):
        return unicode(self.output_formats.currentText()).lower()

+    @property
+    def manually_fine_tune_toc(self):
+        for i in xrange(self.stack.count()):
+            w = self.stack.widget(i)
+            if hasattr(w, 'manually_fine_tune_toc'):
+                return w.manually_fine_tune_toc.isChecked()

    def setup_pipeline(self, *args):
        oidx = self.groups.currentIndex().row()
@ -191,6 +197,8 @@ class Config(ResizableDialog, Ui_Dialog):
        ps = widget_factory(PageSetupWidget)
        sd = widget_factory(StructureDetectionWidget)
        toc = widget_factory(TOCWidget)
+        from calibre.gui2.actions.toc_edit import SUPPORTED
+        toc.manually_fine_tune_toc.setVisible(output_format.upper() in SUPPORTED)
        debug = widget_factory(DebugWidget)

        output_widget = self.plumber.output_plugin.gui_configuration_widget(
--- a/src/calibre/gui2/convert/toc.ui
+++ b/src/calibre/gui2/convert/toc.ui
@ -6,22 +6,32 @@
   <rect>
    <x>0</x>
    <y>0</y>
-    <width>436</width>
-    <height>382</height>
+    <width>596</width>
+    <height>493</height>
   </rect>
  </property>
  <property name="windowTitle">
   <string>Form</string>
  </property>
  <layout class="QGridLayout" name="gridLayout">
+   <item row="7" column="1">
+    <widget class="QSpinBox" name="opt_toc_threshold"/>
+   </item>
   <item row="1" column="0" colspan="2">
+    <widget class="QCheckBox" name="opt_use_auto_toc">
+     <property name="text">
+      <string>&amp;Force use of auto-generated Table of Contents</string>
+     </property>
+    </widget>
+   </item>
+   <item row="2" column="0" colspan="2">
    <widget class="QCheckBox" name="opt_no_chapters_in_toc">
     <property name="text">
      <string>Do not add &amp;detected chapters to the Table of Contents</string>
     </property>
    </widget>
   </item>
-   <item row="3" column="0">
+   <item row="6" column="0">
    <widget class="QLabel" name="label_10">
     <property name="text">
      <string>Number of &amp;links to add to Table of Contents</string>
@ -31,34 +41,7 @@
     </property>
    </widget>
   </item>
-   <item row="3" column="1">
-    <widget class="QSpinBox" name="opt_max_toc_links">
-     <property name="maximum">
-      <number>10000</number>
-     </property>
-    </widget>
-   </item>
-   <item row="4" column="0">
-    <widget class="QLabel" name="label_16">
-     <property name="text">
-      <string>Chapter &amp;threshold</string>
-     </property>
-     <property name="buddy">
-      <cstring>opt_toc_threshold</cstring>
-     </property>
-    </widget>
-   </item>
-   <item row="4" column="1">
-    <widget class="QSpinBox" name="opt_toc_threshold"/>
-   </item>
-   <item row="0" column="0" colspan="2">
-    <widget class="QCheckBox" name="opt_use_auto_toc">
-     <property name="text">
-      <string>&amp;Force use of auto-generated Table of Contents</string>
-     </property>
-    </widget>
-   </item>
-   <item row="5" column="0">
+   <item row="8" column="0">
    <widget class="QLabel" name="label">
     <property name="text">
      <string>TOC &amp;Filter:</string>
@ -68,19 +51,27 @@
     </property>
    </widget>
   </item>
-   <item row="5" column="1">
-    <widget class="QLineEdit" name="opt_toc_filter"/>
-   </item>
-   <item row="6" column="0" colspan="2">
-    <widget class="XPathEdit" name="opt_level1_toc" native="true"/>
-   </item>
-   <item row="7" column="0" colspan="2">
-    <widget class="XPathEdit" name="opt_level2_toc" native="true"/>
-   </item>
-   <item row="8" column="0" colspan="2">
+   <item row="11" column="0" colspan="2">
    <widget class="XPathEdit" name="opt_level3_toc" native="true"/>
   </item>
-   <item row="9" column="0">
+   <item row="6" column="1">
+    <widget class="QSpinBox" name="opt_max_toc_links">
+     <property name="maximum">
+      <number>10000</number>
+     </property>
+    </widget>
+   </item>
+   <item row="7" column="0">
+    <widget class="QLabel" name="label_16">
+     <property name="text">
+      <string>Chapter &amp;threshold</string>
+     </property>
+     <property name="buddy">
+      <cstring>opt_toc_threshold</cstring>
+     </property>
+    </widget>
+   </item>
+   <item row="13" column="0">
    <spacer name="verticalSpacer">
     <property name="orientation">
      <enum>Qt::Vertical</enum>
@ -93,13 +84,47 @@
     </property>
    </spacer>
   </item>
-   <item row="2" column="0" colspan="2">
+   <item row="3" column="0" colspan="2">
    <widget class="QCheckBox" name="opt_duplicate_links_in_toc">
     <property name="text">
      <string>Allow &amp;duplicate links when creating the Table of Contents</string>
     </property>
    </widget>
   </item>
+   <item row="10" column="0" colspan="2">
+    <widget class="XPathEdit" name="opt_level2_toc" native="true"/>
+   </item>
+   <item row="8" column="1">
+    <widget class="QLineEdit" name="opt_toc_filter"/>
+   </item>
+   <item row="9" column="0" colspan="2">
+    <widget class="XPathEdit" name="opt_level1_toc" native="true"/>
+   </item>
+   <item row="0" column="0" colspan="2">
+    <widget class="QLabel" name="label_2">
+     <property name="text">
+      <string>&lt;a href=&quot;http://manual.calibre-ebook.com/conversion.html#table-of-contents&quot;&gt;Help with using these options to generate a Table of Contents&lt;/a&gt;</string>
+     </property>
+     <property name="wordWrap">
+      <bool>true</bool>
+     </property>
+     <property name="openExternalLinks">
+      <bool>true</bool>
+     </property>
+    </widget>
+   </item>
+   <item row="12" column="0" colspan="2">
+    <widget class="QCheckBox" name="manually_fine_tune_toc">
+     <property name="toolTip">
+      <string>This option will cause calibre to popup the Table of Contents Editor tool,
+ which will allow you to manually edit the Table of Contents, to fix any errors
+ caused by automatic generation.</string>
+     </property>
+     <property name="text">
+      <string>&amp;Manually fine-tune the ToC after conversion is completed</string>
+     </property>
+    </widget>
+   </item>
  </layout>
 </widget>
 <customwidgets>
--- a/src/calibre/gui2/dialogs/book_info.py
+++ b/src/calibre/gui2/dialogs/book_info.py
@ -5,7 +5,7 @@ __docformat__ = 'restructuredtext en'


 from PyQt4.Qt import (QCoreApplication, SIGNAL, QModelIndex, QTimer, Qt,
-    QDialog, QPixmap, QIcon, QSize, QPalette)
+    QDialog, QPixmap, QIcon, QSize, QPalette, QShortcut, QKeySequence)

 from calibre.gui2.dialogs.book_info_ui import Ui_BookInfo
 from calibre.gui2 import dynamic
@ -43,6 +43,14 @@ class BookInfo(QDialog, Ui_BookInfo):
        self.fit_cover.stateChanged.connect(self.toggle_cover_fit)
        self.cover.resizeEvent = self.cover_view_resized
        self.cover.cover_changed.connect(self.cover_changed)
+        self.ns = QShortcut(QKeySequence('Alt+Right'), self)
+        self.ns.activated.connect(self.next)
+        self.ps = QShortcut(QKeySequence('Alt+Left'), self)
+        self.ps.activated.connect(self.previous)
+        self.next_button.setToolTip(_('Next [%s]')%
+                unicode(self.ns.key().toString(QKeySequence.NativeText)))
+        self.previous_button.setToolTip(_('Previous [%s]')%
+                unicode(self.ps.key().toString(QKeySequence.NativeText)))

        desktop = QCoreApplication.instance().desktop()
        screen_height = desktop.availableGeometry().height() - 100
--- a/src/calibre/gui2/email.py
+++ b/src/calibre/gui2/email.py
@ -160,7 +160,7 @@ def email_news(mi, remove, get_fmts, done, job_manager):
    return sent_mails

 plugboard_email_value = 'email'
-plugboard_email_formats = ['epub', 'mobi']
+plugboard_email_formats = ['epub', 'mobi', 'azw3']

 class EmailMixin(object): # {{{

--- a/src/calibre/gui2/preferences/conversion.py
+++ b/src/calibre/gui2/preferences/conversion.py
@ -61,6 +61,8 @@ class Base(ConfigWidgetBase, Ui_Form):
        for w in widgets:
            w.changed_signal.connect(self.changed_signal)
            self.stack.addWidget(w)
+            if isinstance(w, TOCWidget):
+                w.manually_fine_tune_toc.hide()

        self.list.currentChanged = self.category_current_changed
        self.list.setCurrentIndex(self.model.index(0))
--- a/src/calibre/gui2/toc/location.py
+++ b/src/calibre/gui2/toc/location.py
@ -11,10 +11,11 @@ from base64 import b64encode

 from PyQt4.Qt import (QWidget, QGridLayout, QListWidget, QSize, Qt, QUrl,
                      pyqtSlot, pyqtSignal, QVBoxLayout, QFrame, QLabel,
-                      QLineEdit, QTimer)
+                      QLineEdit, QTimer, QPushButton, QIcon)
 from PyQt4.QtWebKit import QWebView, QWebPage, QWebElement

 from calibre.ebooks.oeb.display.webview import load_html
+from calibre.gui2 import error_dialog, question_dialog
 from calibre.utils.logging import default_log

 class Page(QWebPage): # {{{
@ -115,16 +116,26 @@ class ItemEdit(QWidget):
        self.dest_list = dl = QListWidget(self)
        dl.setMinimumWidth(250)
        dl.currentItemChanged.connect(self.current_changed)
-        l.addWidget(dl, 1, 0)
+        l.addWidget(dl, 1, 0, 2, 1)

        self.view = WebView(self)
        self.view.elem_clicked.connect(self.elem_clicked)
-        l.addWidget(self.view, 1, 1)
+        l.addWidget(self.view, 1, 1, 1, 3)

        self.f = f = QFrame()
        f.setFrameShape(f.StyledPanel)
        f.setMinimumWidth(250)
-        l.addWidget(f, 1, 2)
+        l.addWidget(f, 1, 4, 2, 1)
+        self.search_text = s = QLineEdit(self)
+        s.setPlaceholderText(_('Search for text...'))
+        l.addWidget(s, 2, 1, 1, 1)
+        self.ns_button = b = QPushButton(QIcon(I('arrow-down.png')), _('Find &next'), self)
+        b.clicked.connect(self.find_next)
+        l.addWidget(b, 2, 2, 1, 1)
+        self.ps_button = b = QPushButton(QIcon(I('arrow-up.png')), _('Find &previous'), self)
+        l.addWidget(b, 2, 3, 1, 1)
+        b.clicked.connect(self.find_previous)
+        l.setRowStretch(1, 10)
        l = f.l = QVBoxLayout()
        f.setLayout(l)

@ -156,6 +167,42 @@ class ItemEdit(QWidget):

        l.addStretch()

+    def keyPressEvent(self, ev):
+        if ev.key() in (Qt.Key_Return, Qt.Key_Enter) and self.search_text.hasFocus():
+            # Prevent pressing enter in the search box from triggering the dialog's accept() method
+            ev.accept()
+            return
+        return super(ItemEdit, self).keyPressEvent(ev)
+
+    def find(self, forwards=True):
+        text = unicode(self.search_text.text()).strip()
+        flags = QWebPage.FindFlags(0) if forwards else QWebPage.FindBackward
+        d = self.dest_list
+        if d.count() == 1:
+            flags |= QWebPage.FindWrapsAroundDocument
+        if not self.view.findText(text, flags) and text:
+            if d.count() == 1:
+                return error_dialog(self, _('No match found'),
+                    _('No match found for: %s')%text, show=True)
+
+            delta = 1 if forwards else -1
+            current = unicode(d.currentItem().data(Qt.DisplayRole).toString())
+            next_index = (d.currentRow() + delta)%d.count()
+            next = unicode(d.item(next_index).data(Qt.DisplayRole).toString())
+            msg = '<p>'+_('No matches for %(text)s found in the current file [%(current)s].'
+                          ' Do you want to search in the %(which)s file [%(next)s]?')
+            msg = msg%dict(text=text, current=current, next=next,
+                           which=_('next') if forwards else _('previous'))
+            if question_dialog(self, _('No match found'), msg):
+                self.pending_search = self.find_next if forwards else self.find_previous
+                d.setCurrentRow(next_index)
+
+    def find_next(self):
+        return self.find()
+
+    def find_previous(self):
+        return self.find(forwards=False)
+
    def load(self, container):
        self.container = container
        spine_names = [container.abspath_to_name(p) for p in
@ -175,6 +222,10 @@ class ItemEdit(QWidget):
        self.view.load_js()
        self.dest_label.setText(self.base_msg + '<br>' + _('File:') + ' ' +
                                name + '<br>' + _('Top of the file'))
+        if hasattr(self, 'pending_search'):
+            f = self.pending_search
+            del self.pending_search
+            f()

    def __call__(self, item, where):
        self.current_item, self.current_where = item, where
--- a/src/calibre/gui2/tools.py
+++ b/src/calibre/gui2/tools.py
@ -82,8 +82,13 @@ def convert_single_ebook(parent, db, book_ids, auto_conversion=False, # {{{
                args = [in_file.name, out_file.name, recs]
                temp_files.append(out_file)
                func = 'gui_convert_override'
+                parts = []
+                if not auto_conversion and d.manually_fine_tune_toc:
+                    parts.append('manually_fine_tune_toc')
                if same_fmt:
-                    func += ':same_fmt'
+                    parts.append('same_fmt')
+                if parts:
+                    func += ':%s'%(';'.join(parts))
                jobs.append((func, args, desc, d.output_format.upper(), book_id, temp_files))

                changed = True