Sync to trunk.

2025-07-09 03:04:10 -04:00 · 2011-12-17 22:17:51 -05:00 · 2011-12-17 22:17:51 -05:00 · b9558edd63
commit b9558edd63
parent 92e18a4755 69aa538660
211 changed files with 75816 additions and 40788 deletions
--- a/.bzrignore
+++ b/.bzrignore
@ -2,6 +2,7 @@
 .check-cache.pickle
 src/calibre/plugins
 resources/images.qrc
 src/calibre/ebooks/oeb/display/test/*.js
 src/calibre/manual/.build/
 src/calibre/manual/cli/
 src/calibre/manual/template_ref.rst
@ -15,6 +16,7 @@ resources/ebook-convert-complete.pickle
 resources/builtin_recipes.xml
 resources/builtin_recipes.zip
 resources/template-functions.json
 resources/display/*.js
 setup/installer/windows/calibre/build.log
 src/calibre/translations/.errors
 src/cssutils/.svn/
--- a/Changelog.yaml
+++ b/Changelog.yaml
@ -19,6 +19,125 @@
 #   new recipes:
 #     - title: 
 - version: 0.8.31
  date: 2011-12-16
  new features:
    - title: "Conversion engine: When parsing invalid XHTML use the HTML 5 algorithm, for greater robustness."
      tickets: [901466]
    - title: "Driver for PocketBook 611 and Lenovo IdeaPad"
    - title: "Allow customization of the order in which custom column editing is performed in the edit metadata dialog. Setting is available via Preferences->Tweaks."
      tickets: [902731]
    - title: "MOBI news download: Allow recipes to set a thumbnail for entries in the periodical table of contents. Currently used by the NYTimes, WSJ, Independent, GUardian and Globe and Mail recipes"
      tickets: [900130]
    - title: "E-book viewer: Add an option to the right click menu to search for the currently selected word"
    - title: "Automatically hide the no internet connection available error message if the connection is restored before the user clicks OK"
  bug fixes:
    - title: "Fix comments not hidden in Book details panel when they are turned off via Preferences->Look & Feel->Book Details"
    - title: "E-book viewer: Do not popup an error message if the user tries to use the mouse wheel to scroll before a document is loaded."
      tickets: [903449] 
    - title: "Add docx to the list of ebook extensions."
      tickets: [903452]
    - title: "When downloading metadata from non-English Amazon websites, do not correct the case of book titles."
    - title: "Fix regression in 0.8.30 that broke bulk conversion of a single book." 
      tickets: [902506]
    - title: "When minimized to system tray do not display the no internet connection error as a dialog box, instead use a system tray notification"
    - title: "Catalog generation: Include the series_index field for custom series columns as well"
    - title: "Comic Input: Do not rescale images when using the Tablet output profile (or any output profile with a screen size larger than 3000x3000)"
    - title: "HTML Input: Ignore unparseable URLs instead of crashing on them."
      tickets: [902372] 
  improved recipes:
    - La Republica
    - CND
    - Berliner Zeitung
    - Zaman Gazetesi
  new recipes:
    - title: CND Weekly
      author: Derek Liang
    - title: descopera.org 
      author: Marius Ignatescu
    - title: Rynek Zdrowia 
      author: spi630
 - version: 0.8.30
  date: 2011-12-09
  new features:
    - title: "Get Books: Add amazon.es and amazon.it"
    - title: "Bulk convert dialog: Disable the Use saved conversion settings checkbox when none of the books being converted has saved conversion settings"
    - title: "ebook-viewer: And a command line switch to specify the position at which the file should be opened."
      tickets: [899325]
    - title: "Distribute calibre source code compressed with xz instead of gzip for a 40% reduction in size"
  bug fixes:
    - title: "Get Books: Fix ebooks.com and amazon.fr. Fix cover display in Diesel ebooks store."
    - title: "HTML Input: Fix regression that broke processing of a small fraction of HTML files encoded in a multi-byte character encoding."
      tickets: [899691]
    - title: "Greatly reduce the delay at the end of a bulk metadata edit operation that operates on a very large number (thousands) of books"
    - title: "Template language: Fix the subitems formatter function to split only when the period is surrounded by non-white space and not another period"
    - title: "Fix ampersands in titles not displaying in the Cover Browser"
    - title: "MOBI Output: Do not ignore an empty anchor at the end of a block element."
    - title: "MOBI Output: Handle links to inline anchors placed inside large blocks of text correctly, i.e. the link should not point to the start of the block."
      tickets: [899831]
    - title: "E-book viewer: Fix searching for text that is represented as entities in the underlying HTML."
      tickets: [899573]
    - title: "Have the Esc shortcut perform exactly the same set of actions as clicking the clear button."
      tickets: [900048]
    - title: "Prevent the adding books dialog from becoming too wide"
    - title: "Fix custom column editing not behaving correctly with the Previous button in the edit metadata dialog."
      tickets: [899836]
    - title: "T1 driver. More fixes to datetime handling to try to convince the T1's buggy firmware to not rescan metadata."
      tickets: [899514]
    - title: "Only allow searching via non accented author names if the user interface language in calibre is set to English."
      tickets: [899227]
  improved recipes:
    - Die Zeit subscription
    - Metro UK
    - suedeutsche.de
  new recipes:
    - title: Blues News 
      author: Oskar Kunicki
    - title: "TVXS"
      author: Hargikas
 - version: 0.8.29
  date: 2011-12-02
--- a/recipes/adventure_zone_pl.recipe
+++ b/recipes/adventure_zone_pl.recipe
@ -1,19 +1,38 @@
 from calibre.web.feeds.news import BasicNewsRecipe
-
+import re
 class Adventure_zone(BasicNewsRecipe):
    title          = u'Adventure Zone'
    __author__        = 'fenuks'
    description   = 'Adventure zone - adventure games from A to Z'
    category       = 'games'
    language       = 'pl'
    oldest_article = 15
    max_articles_per_feed = 100
    no_stylesheets = True
    oldest_article = 20
    max_articles_per_feed = 100
    use_embedded_content=False
    preprocess_regexps     = [(re.compile(r"<td class='capmain'>Komentarze</td>", re.IGNORECASE), lambda m: '')]
    remove_tags_before= dict(name='td', attrs={'class':'main-bg'})
-    remove_tags_after= dict(name='td', attrs={'class':'main-body middle-border'})
+    remove_tags= [dict(name='img', attrs={'alt':'Drukuj'})]
    remove_tags_after= dict(id='comments')
    extra_css              = '.main-bg{text-align: left;}  td.capmain{ font-size: 22px; }'
    feeds          = [(u'Nowinki', u'http://www.adventure-zone.info/fusion/feeds/news.php')]
    def parse_feeds (self): 
      feeds = BasicNewsRecipe.parse_feeds(self) 
      soup=self.index_to_soup(u'http://www.adventure-zone.info/fusion/feeds/news.php')
      tag=soup.find(name='channel')
      titles=[]
      for r in tag.findAll(name='image'):
          r.extract()
      art=tag.findAll(name='item')
      for i in art:
            titles.append(i.title.string)
      for feed in feeds:
        for article in feed.articles[:]:
            article.title=titles[feed.articles.index(article)]
      return feeds
    def get_cover_url(self):
        soup = self.index_to_soup('http://www.adventure-zone.info/fusion/news.php')
        cover=soup.find(id='box_OstatninumerAZ')
@ -22,17 +41,10 @@ class Adventure_zone(BasicNewsRecipe):
    def skip_ad_pages(self, soup):
-        skip_tag = soup.body.findAll(name='a')
+        skip_tag = soup.body.find(name='td', attrs={'class':'main-bg'})
-        if skip_tag is not None:
+        skip_tag = skip_tag.findAll(name='a')
-            for r in skip_tag:
+        for r in skip_tag:
-                 if 'articles.php?' in r['href']:
+           if r.strong:
-                     if r.strong is not None:
+                 word=r.strong.string
-                         word=r.strong.string
+                 if word and (('zapowied' in word) or ('recenzj' in word)  or ('solucj' in word)):
-                         if ('zapowied' or 'recenzj') in word:
+                   return self.index_to_soup('http://www.adventure-zone.info/fusion/print.php?type=A&item'+r['href'][r['href'].find('article_id')+7:], raw=True)
                             return self.index_to_soup('http://www.adventure-zone.info/fusion/print.php?type=A&item_id'+r['href'][r['href'].find('_id')+3:], raw=True)
        else:
            None
    def print_version(self, url):
        return url.replace('news.php?readmore', 'print.php?type=N&item_id')
--- a/recipes/astro_news_pl.recipe
+++ b/recipes/astro_news_pl.recipe
@ -1,5 +1,4 @@
 from calibre.web.feeds.news import BasicNewsRecipe
 class AstroNEWS(BasicNewsRecipe):
    title          = u'AstroNEWS'
    __author__        = 'fenuks'
@ -8,11 +7,16 @@ class AstroNEWS(BasicNewsRecipe):
    language       = 'pl'
    oldest_article = 8
    max_articles_per_feed = 100
-    auto_cleanup = True
+    #extra_css= 'table {text-align: left;}'
    no_stylesheets=True
    cover_url='http://news.astronet.pl/img/logo_news.jpg'
-   # no_stylesheets= True
+    remove_tags=[dict(name='hr')]
    feeds          = [(u'Wiadomości', u'http://news.astronet.pl/rss.cgi')]
    def print_version(self, url):
        return url.replace('astronet.pl/', 'astronet.pl/print.cgi?')
    def preprocess_html(self, soup):
        for item in soup.findAll(align=True):
            del item['align']
        return soup
--- a/recipes/berliner_zeitung.recipe
+++ b/recipes/berliner_zeitung.recipe
@ -1,61 +1,44 @@
 from calibre.web.feeds.recipes import BasicNewsRecipe
-import re
+
 '''Calibre recipe to convert the RSS feeds of the Berliner Zeitung to an ebook.'''
 class SportsIllustratedRecipe(BasicNewsRecipe) :
-    __author__    = 'ape'
+    __author__    = 'a.peter'
-    __copyright__ = 'ape'
+    __copyright__ = 'a.peter'
    __license__   = 'GPL v3'
    language      = 'de'
-    description   = 'Berliner Zeitung'
+    description   = 'Berliner Zeitung RSS'
-    version       = 2
+    version       = 4
    title         = u'Berliner Zeitung'
    timefmt       = ' [%d.%m.%Y]'
    #oldest_article = 7.0
    no_stylesheets = True
    remove_javascript = True
    use_embedded_content = False
    publication_type = 'newspaper'
-    keep_only_tags = [dict(name='div', attrs={'class':'teaser t_split t_artikel'})]
+    remove_tags_before = dict(name='div', attrs={'class':'newstype'})
    remove_tags_after = [dict(id='article_text')]
-    INDEX = 'http://www.berlinonline.de/berliner-zeitung/'
+    feeds = [(u'Startseite', u'http://www.berliner-zeitung.de/home/10808950,10808950,view,asFeed.xml'),
-
+             (u'Politik', u'http://www.berliner-zeitung.de/home/10808018,10808018,view,asFeed.xml'),
-    def parse_index(self):
+             (u'Wirtschaft', u'http://www.berliner-zeitung.de/home/10808230,10808230,view,asFeed.xml'),
-        base = 'http://www.berlinonline.de'
+             (u'Berlin', u'http://www.berliner-zeitung.de/home/10809148,10809148,view,asFeed.xml'),
-        answer = []
+             (u'Brandenburg', u'http://www.berliner-zeitung.de/home/10809312,10809312,view,asFeed.xml'),
-        articles = {}
+             (u'Wissenschaft', u'http://www.berliner-zeitung.de/home/10808894,10808894,view,asFeed.xml'),
-        more = 1
+             (u'Digital', u'http://www.berliner-zeitung.de/home/10808718,10808718,view,asFeed.xml'),
-
+             (u'Kultur', u'http://www.berliner-zeitung.de/home/10809150,10809150,view,asFeed.xml'),
-        soup = self.index_to_soup(self.INDEX)
+             (u'Panorama', u'http://www.berliner-zeitung.de/home/10808334,10808334,view,asFeed.xml'),
-
+             (u'Sport', u'http://www.berliner-zeitung.de/home/10808794,10808794,view,asFeed.xml'),
-        # Get list of links to ressorts from index page
+             (u'Hertha', u'http://www.berliner-zeitung.de/home/10808800,10808800,view,asFeed.xml'),
-        ressort_list = soup.findAll('ul', attrs={'class': re.compile('ressortlist')})
+             (u'Union', u'http://www.berliner-zeitung.de/home/10808802,10808802,view,asFeed.xml'),
-        for ressort in ressort_list[0].findAll('a'):
+             (u'Verkehr', u'http://www.berliner-zeitung.de/home/10809298,10809298,view,asFeed.xml'),
-            feed_title = ressort.string
+             (u'Polizei', u'http://www.berliner-zeitung.de/home/10809296,10809296,view,asFeed.xml'),
-            print 'Analyzing', feed_title
+             (u'Meinung', u'http://www.berliner-zeitung.de/home/10808020,10808020,view,asFeed.xml')]
            if not articles.has_key(feed_title):
                articles[feed_title] = []
                answer.append(feed_title)
            # Load ressort page.
            feed = self.index_to_soup('http://www.berlinonline.de' + ressort['href'])
            # find mainbar div which contains the list of all articles
            for article_container in feed.findAll('div', attrs={'class': re.compile('mainbar')}):
                # iterate over all articles
                for article_teaser in article_container.findAll('div', attrs={'class': re.compile('teaser')}):
                    # extract title of article
                    if article_teaser.h3 != None:
                        article = {'title' : article_teaser.h3.a.string, 'date' : u'', 'url'  : base + article_teaser.h3.a['href'], 'description' : u''}
                        articles[feed_title].append(article)
                    else:
                        # Skip teasers for missing photos
                        if article_teaser.div.p.contents[0].find('Foto:') > -1:
                            continue
                        article = {'title': 'Weitere Artikel ' + str(more), 'date': u'', 'url': base + article_teaser.div.p.a['href'], 'description': u''}
                        articles[feed_title].append(article)
                        more += 1
        answer = [[key, articles[key]] for key in answer if articles.has_key(key)]
        return answer
    def get_masthead_url(self):
-        return 'http://www.berlinonline.de/.img/berliner-zeitung/blz_logo.gif'
+        return 'http://www.berliner-zeitung.de/image/view/10810244,7040611,data,logo.png'
    def print_version(self, url):
        return url.replace('.html', ',view,printVersion.html')
--- a/recipes/biolog_pl.recipe
+++ b/recipes/biolog_pl.recipe
@ -0,0 +1,19 @@
 # vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai
 from calibre.web.feeds.news import BasicNewsRecipe
 class Biolog_pl(BasicNewsRecipe):
    title          = u'Biolog.pl'
    oldest_article = 7
    max_articles_per_feed = 100
    remove_empty_feeds=True
    __author__        = 'fenuks'
    description   = u'Przyrodnicze aktualności ze świata nauki (codziennie aktualizowane), kurs biologii, testy i sprawdziany, forum dyskusyjne.'
    category       = 'biology'
    language       = 'pl'
    cover_url='http://www.biolog.pl/naukowy,portal,biolog.png'
    no_stylesheets = True
    #keeps_only_tags=[dict(id='main')]
    remove_tags_before=dict(id='main')
    remove_tags_after=dict(name='a', attrs={'name':'komentarze'})
    remove_tags=[dict(name='img', attrs={'alt':'Komentarze'})]
    feeds          = [(u'Wszystkie', u'http://www.biolog.pl/backend.php'), (u'Medycyna', u'http://www.biolog.pl/medycyna-rss.php'), (u'Ekologia', u'http://www.biolog.pl/rss-ekologia.php'), (u'Genetyka i biotechnologia', u'http://www.biolog.pl/rss-biotechnologia.php'), (u'Botanika', u'http://www.biolog.pl/rss-botanika.php'), (u'Le\u015bnictwo', u'http://www.biolog.pl/rss-lesnictwo.php'), (u'Zoologia', u'http://www.biolog.pl/rss-zoologia.php')]
--- a/recipes/blues.recipe
+++ b/recipes/blues.recipe
@ -0,0 +1,26 @@
 __license__   = 'GPL v3'
 __copyright__ = '2011, Oskar Kunicki <rakso at interia.pl>'
 '''
 Changelog:
 2011-11-27
 News from BluesRSS.info
 '''
 from calibre.web.feeds.news import BasicNewsRecipe
 class BluesRSS(BasicNewsRecipe):
    title                     = 'Blues News'
    __author__          = 'Oskar Kunicki'
    description           ='Blues news from around the world'
    publisher             = 'BluesRSS.info'
    category              = 'news, blues, USA,UK'
    oldest_article        = 5
    max_articles_per_feed = 100
    language              = 'en'
    cover_url             = 'http://bluesrss.info/cover.jpg'
    masthead_url       = 'http://bluesrss.info/cover.jpg'
    no_stylesheets = True
    remove_tags    = [dict(name='div', attrs={'class':'wp-pagenavi'})]
    feeds = [(u'News', u'http://bluesrss.info/feed/')]
--- a/recipes/cnd.recipe
+++ b/recipes/cnd.recipe
@ -23,7 +23,9 @@ class TheCND(BasicNewsRecipe):
 	remove_tags		= [dict(name='table', attrs={'align':'right'}), dict(name='img', attrs={'src':'http://my.cnd.org/images/logo.gif'}), dict(name='hr', attrs={}), dict(name='small', attrs={})]
 	no_stylesheets	 = True
-	preprocess_regexps = [(re.compile(r'<!--.*?-->', re.DOTALL), lambda m: '')]
+	preprocess_regexps = [  (re.compile(r'<!--.*?-->', re.DOTALL), lambda m: ''),
 				(re.compile('<table width.*?</table>', re.DOTALL), lambda m: ''),
 				]
 	def print_version(self, url):
 		if url.find('news/article.php') >= 0:
@ -46,16 +48,18 @@ class TheCND(BasicNewsRecipe):
 			title = self.tag_to_string(a)
 			self.log('\tFound article: ', title, 'at', url)
 			date = a.nextSibling
 			if re.search('cm', date):
 				continue
 			if (date is not None) and len(date)>2:
 				if not articles.has_key(date):
 					articles[date] = []
 				articles[date].append({'title':title, 'url':url, 'description': '', 'date':''})
 				self.log('\t\tAppend to : ', date)
-		self.log('log articles', articles)
+		#self.log('log articles', articles)
 		mostCurrent = sorted(articles).pop()
-		self.title = 'CND ' + mostCurrent
+		self.title = 'CND ' + mostCurrent		
-
+		
 		feeds.append((self.title, articles[mostCurrent]))
 		return feeds
--- a/recipes/cnd_weekly.recipe
+++ b/recipes/cnd_weekly.recipe
@ -0,0 +1,72 @@
 #!/usr/bin/env  python
 __license__   = 'GPL v3'
 __copyright__ = '2010, Derek Liang <Derek.liang.ca @@@at@@@ gmail.com>'
 '''
 cnd.org
 '''
 import re
 from calibre.web.feeds.news import BasicNewsRecipe
 class TheCND(BasicNewsRecipe):
 	title	  = 'CND Weekly'
 	__author__ = 'Derek Liang'
 	description = ''
 	INDEX = 'http://cnd.org'
 	language = 'zh'
 	conversion_options = {'linearize_tables':True}
 	remove_tags_before = dict(name='div', id='articleHead')
 	remove_tags_after  = dict(id='copyright')
 	remove_tags		= [dict(name='table', attrs={'align':'right'}), dict(name='img', attrs={'src':'http://my.cnd.org/images/logo.gif'}), dict(name='hr', attrs={}), dict(name='small', attrs={})]
 	no_stylesheets	 = True
 	preprocess_regexps = [  (re.compile(r'<!--.*?-->', re.DOTALL), lambda m: ''),
 				(re.compile('<table width.*?</table>', re.DOTALL), lambda m: ''),
 				]
 	def print_version(self, url):
 		if url.find('news/article.php') >= 0:
 			return re.sub("^[^=]*", "http://my.cnd.org/modules/news/print.php?storyid", url)
 		else:
 			return re.sub("^[^=]*", "http://my.cnd.org/modules/wfsection/print.php?articleid", url)
 	def parse_index(self):
 		soup = self.index_to_soup(self.INDEX)
 		feeds = []
 		articles = {}
 		for a in soup.findAll('a', attrs={'target':'_cnd'}):
 			url = a['href']
 			if url.find('article.php') < 0 :
 				continue
 			if url.startswith('/'):
 				url = 'http://cnd.org'+url
 			title = self.tag_to_string(a)
 			date = a.nextSibling
 			if not re.search('cm', date):
 				continue
 			self.log('\tFound article: ', title, 'at', url, '@', date)
 			if (date is not None) and len(date)>2:
 				if not articles.has_key(date):
 					articles[date] = []
 				articles[date].append({'title':title, 'url':url, 'description': '', 'date':''})
 				self.log('\t\tAppend to : ', date)
 		sorted_articles = sorted(articles)
 		while sorted_articles:
 			mostCurrent = sorted_articles.pop()
 			self.title = 'CND ' + mostCurrent
 			feeds.append((self.title, articles[mostCurrent]))
 		return feeds
 	def populate_article_metadata(self, article, soup, first):
 		header = soup.find('h3')
 		self.log('header: ' + self.tag_to_string(header))
 		pass
--- a/recipes/computerworld_pl.recipe
+++ b/recipes/computerworld_pl.recipe
@ -0,0 +1,22 @@
 # vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai
 from calibre.web.feeds.news import BasicNewsRecipe
 class Computerworld_pl(BasicNewsRecipe):
    title          = u'Computerworld.pl'
    __author__        = 'fenuks'
    description   = u'Serwis o IT w przemyśle, finansach, handlu, administracji oraz rynku IT i telekomunikacyjnym - wiadomości, opinie, analizy, porady prawne'
    category       = 'IT'
    language       = 'pl'
    no_stylesheets=True
    oldest_article = 7
    max_articles_per_feed = 100
    keep_only_tags=[dict(name='div', attrs={'id':'s'})]
    remove_tags_after=dict(name='div', attrs={'class':'rMobi'})
    remove_tags=[dict(name='div', attrs={'class':['nnav', 'rMobi']}), dict(name='table', attrs={'class':'ramka_slx'})]
    feeds          = [(u'Wiadomo\u015bci', u'http://rssout.idg.pl/cw/news_iso.xml')]
    def get_cover_url(self):
        soup = self.index_to_soup('http://www.computerworld.pl/')
        cover=soup.find(name='img', attrs={'class':'prawo'})
        self.cover_url=cover['src']
        return getattr(self, 'cover_url', self.cover_url)
--- a/recipes/datasport.recipe
+++ b/recipes/datasport.recipe
@ -0,0 +1,15 @@
 __license__   = 'GPL v3'
 __author__    = 'faber1971'
 description   = 'Italian soccer news website - v1.00 (17, December 2011)'
 from calibre.web.feeds.news import BasicNewsRecipe
 class AdvancedUserRecipe1324114272(BasicNewsRecipe):
    title          = u'Datasport'
    language = 'it'
    __author__ = 'faber1971'
    oldest_article = 1
    max_articles_per_feed = 100
    auto_cleanup = True
    feeds          = [(u'Datasport', u'http://www.datasport.it/calcio/rss.xml')]
--- a/recipes/descopera_org.recipe
+++ b/recipes/descopera_org.recipe
@ -0,0 +1,27 @@
 # -*- coding: utf-8 -*-
 '''
 descopera.org
 '''
 from calibre.web.feeds.news import BasicNewsRecipe
 class Descopera(BasicNewsRecipe):
    title = u'Descoperă.org'
    __author__  = 'Marius Ignătescu'
    description = 'Descoperă. Placerea de a cunoaște'
    publisher = 'descopera.org'
    category = 'science, technology, culture, history, earth'
    language = 'ro'
    oldest_article = 14
    max_articles_per_feed = 100
    encoding = 'utf8'
    no_stylesheets = True
    extra_css = ' body{ font-family: Verdana,Helvetica,Arial,sans-serif } .introduction{font-weight: bold} .story-feature{display: block; padding: 0; border: 1px solid; width: 40%; font-size: small} .story-feature h2{text-align: center; text-transform: uppercase} '
    keep_only_tags    = [dict(name='div', attrs={'class':['post']})]
    remove_tags = [dict(name='div', attrs={'class':['topnav', 'box_a', 'shr-bookmarks shr-bookmarks-expand shr-bookmarks-center shr-bookmarks-bg-knowledge']})]
    remove_attributes = ['width','height']
    cover_url = 'http://www.descopera.org/wp-content/themes/dorg/styles/default/img/b_top.png?width=400'
    feeds  = [(u'Articles', u'http://www.descopera.org/feed/')]
    def preprocess_html(self, soup):
        return self.adeify_images(soup)
--- a/recipes/dziennik_pl.recipe
+++ b/recipes/dziennik_pl.recipe
@ -0,0 +1,58 @@
 # vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai
 from calibre.web.feeds.news import BasicNewsRecipe
 import re
 class Dziennik_pl(BasicNewsRecipe):
    title          = u'Dziennik.pl'
    __author__        = 'fenuks'
    description   = u'Wiadomości z kraju i ze świata. Wiadomości gospodarcze. Znajdziesz u nas informacje, wydarzenia, komentarze, opinie.'
    category       = 'newspaper'
    language       = 'pl'
    cover_url='http://6.s.dziennik.pl/images/og_dziennik.jpg'
    no_stylesheets = True
    oldest_article = 7
    max_articles_per_feed = 100
    remove_javascript=True
    remove_empty_feeds=True
    preprocess_regexps     = [(re.compile("Komentarze:"), lambda m: '')]
    keep_only_tags=[dict(id='article')]
    remove_tags=[dict(name='div', attrs={'class':['art_box_dodatki', 'new_facebook_icons2', 'leftArt', 'article_print', 'quiz-widget']}), dict(name='a', attrs={'class':'komentarz'})]
    feeds          = [(u'Wszystko', u'http://rss.dziennik.pl/Dziennik-PL/'),
 		(u'Wiadomości', u'http://rss.dziennik.pl/Dziennik-Wiadomosci'),
 		(u'Gospodarka', u'http://rss.dziennik.pl/Dziennik-Gospodarka'),
 		(u'Kobieta', u'http://rss.dziennik.pl/Dziennik-Kobieta'),
 		(u'Auto', u'http://rss.dziennik.pl/Dziennik-Auto'),
 		(u'Rozrywka', u'http://rss.dziennik.pl/Dziennik-Rozrywka'),
 		(u'Film', u'http://rss.dziennik.pl/Dziennik-Film'),
 		(u'Muzyka' , u'http://rss.dziennik.pl/Dziennik-Muzyka'),
 		(u'Kultura', u'http://rss.dziennik.pl/Dziennik-Kultura'),
 		(u'Nauka', u'http://rss.dziennik.pl/Dziennik-Nauka'),
 		(u'Podróże', u'http://rss.dziennik.pl/Dziennik-Podroze/'),
 		(u'Nieruchomości', u'http://rss.dziennik.pl/Dziennik-Nieruchomosci')]
    def append_page(self, soup, appendtag):
        tag=soup.find('a', attrs={'class':'page_next'})
        if tag:
            appendtag.find('div', attrs={'class':'article_paginator'}).extract()
        while tag:
            soup2= self.index_to_soup(tag['href'])
            tag=soup2.find('a', attrs={'class':'page_next'})
            if not tag:
                for r in appendtag.findAll('div', attrs={'class':'art_src'}):
                    r.extract()
            pagetext = soup2.find(name='div', attrs={'class':'article_body'})
            for dictionary in self.remove_tags:
                 v=pagetext.findAll(name=dictionary['name'], attrs=dictionary['attrs'])
                 for delete in v:
                     delete.extract()
            pos = len(appendtag.contents)
            appendtag.insert(pos, pagetext)
            if appendtag.find('div', attrs={'class':'article_paginator'}):
                appendtag.find('div', attrs={'class':'article_paginator'}).extract()
    def preprocess_html(self, soup):
         self.append_page(soup, soup.body)
         return soup
--- a/recipes/emuzica_pl.recipe
+++ b/recipes/emuzica_pl.recipe
@ -0,0 +1,16 @@
 # vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai
 from calibre.web.feeds.news import BasicNewsRecipe
 class eMuzyka(BasicNewsRecipe):
    title          = u'eMuzyka'
    __author__        = 'fenuks'
    description   = u'Emuzyka to największa i najpopularniejsza strona o muzyce w Polsce'
    category       = 'music'
    language       = 'pl'
    cover_url='http://s.emuzyka.pl/img/emuzyka_invert_small.jpg'
    no_stylesheets = True
    oldest_article = 7
    max_articles_per_feed = 100
    keep_only_tags=[dict(name='div', attrs={'id':'news_container'}), dict(name='h3'), dict(name='div', attrs={'class':'review_text'})]
    remove_tags=[dict(name='span', attrs={'id':'date'})]
    feeds          = [(u'Aktualno\u015bci', u'http://www.emuzyka.pl/rss.php?f=1'), (u'Recenzje', u'http://www.emuzyka.pl/rss.php?f=2')]
--- a/recipes/fisco_oggi.recipe
+++ b/recipes/fisco_oggi.recipe
@ -0,0 +1,18 @@
 __license__   = 'GPL v3'
 __author__    = 'faber1971'
 description   = 'Website of Italian Governament Income Agency (about revenue, taxation, taxes)- v1.00 (17, December 2011)'
 from calibre.web.feeds.news import BasicNewsRecipe
 class AdvancedUserRecipe1324112023(BasicNewsRecipe):
    title          = u'Fisco Oggi'
    language = 'it'
    __author__ = 'faber1971'
    oldest_article = 7
    max_articles_per_feed = 100
    auto_cleanup = True
    remove_javascript = True
    no_stylesheets = True
    feeds          = [(u'Attualit\xe0', u'http://www.fiscooggi.it/taxonomy/term/1/feed'), (u'Normativa', u'http://www.fiscooggi.it/taxonomy/term/5/feed'), (u'Giurisprudenza', u'http://www.fiscooggi.it/taxonomy/term/8/feed'), (u'Dati e statistiche', u'http://www.fiscooggi.it/taxonomy/term/12/feed'), (u'Analisi e commenti', u'http://www.fiscooggi.it/taxonomy/term/13/feed'), (u'Bilancio e contabilit\xe0', u'http://www.fiscooggi.it/taxonomy/term/576/feed'), (u'Dalle regioni', u'http://www.fiscooggi.it/taxonomy/term/16/feed'), (u'Dal mondo', u'http://www.fiscooggi.it/taxonomy/term/17/feed')]
--- a/recipes/focus_pl.recipe
+++ b/recipes/focus_pl.recipe
@ -1,57 +1,68 @@
-# -*- coding: utf-8 -*-
+import re
 from calibre.web.feeds.news import BasicNewsRecipe
-class Focus_pl(BasicNewsRecipe):
+class FocusRecipe(BasicNewsRecipe):
-    title          = u'Focus.pl'
+    __license__ = 'GPL v3'
-    oldest_article = 15
+    __author__ = u'intromatyk <intromatyk@gmail.com>'
-    max_articles_per_feed = 100
+    language = 'pl'
-    __author__        = 'fenuks'
+    version = 1
-    language       = 'pl'
+
-    description ='polish scientific monthly magazine'
+    title = u'Focus'
    publisher = u'Gruner + Jahr Polska'
    category = u'News'
    description = u'Newspaper'
    category='magazine'
    cover_url=''
    remove_empty_feeds= True
    no_stylesheets=True
-    remove_tags_before=dict(name='div', attrs={'class':'h2 h2f'})
+    oldest_article = 7
-    remove_tags_after=dict(name='div', attrs={'class':'clear'})
+    max_articles_per_feed = 100000
-    feeds          = [(u'Wszystkie kategorie', u'http://focus.pl.feedsportal.com/c/32992/f/532692/index.rss'),
+    recursions = 0
-	(u'Nauka', u'http://focus.pl.feedsportal.com/c/32992/f/532693/index.rss'),
+
-	(u'Historia', u'http://focus.pl.feedsportal.com/c/32992/f/532694/index.rss'),
+    no_stylesheets = True
-	(u'Cywilizacja', u'http://focus.pl.feedsportal.com/c/32992/f/532695/index.rss'),
+    remove_javascript = True
-	(u'Sport', u'http://focus.pl.feedsportal.com/c/32992/f/532696/index.rss'),
+    encoding = 'utf-8'
-	(u'Technika', u'http://focus.pl.feedsportal.com/c/32992/f/532697/index.rss'),
+    # Seems to work best, but YMMV
-	(u'Przyroda', u'http://focus.pl.feedsportal.com/c/32992/f/532698/index.rss'),
+    simultaneous_downloads = 5
-	(u'Technologie', u'http://focus.pl.feedsportal.com/c/32992/f/532699/index.rss'),
+
-	(u'Warto wiedzieć', u'http://focus.pl.feedsportal.com/c/32992/f/532700/index.rss'),
+    r = re.compile('.*(?P<url>http:\/\/(www.focus.pl)|(rss.feedsportal.com\/c)\/.*\.html?).*')
    keep_only_tags =[]
    keep_only_tags.append(dict(name = 'div', attrs = {'id' : 'cll'}))
    remove_tags =[]
    remove_tags.append(dict(name = 'div', attrs = {'class' : 'ulm noprint'}))
    remove_tags.append(dict(name = 'div', attrs = {'class' : 'txb'}))
    remove_tags.append(dict(name = 'div', attrs = {'class' : 'h2'}))
    remove_tags.append(dict(name = 'ul', attrs = {'class' : 'txu'}))
    remove_tags.append(dict(name = 'div', attrs = {'class' : 'ulc'}))
    extra_css = '''
                    body {font-family: verdana, arial, helvetica, geneva, sans-serif ;}
                    h1{text-align: left;}
                    h2{font-size: medium; font-weight: bold;}
                    p.lead {font-weight: bold; text-align: left;}
                    .authordate {font-size: small; color: #696969;}
                    .fot{font-size: x-small; color: #666666;}
                    '''    
-
+    feeds          = [
-]
+                            ('Nauka', 'http://focus.pl.feedsportal.com/c/32992/f/532693/index.rss'),
                            ('Historia', 'http://focus.pl.feedsportal.com/c/32992/f/532694/index.rss'),
                            ('Cywilizacja', 'http://focus.pl.feedsportal.com/c/32992/f/532695/index.rss'),
                            ('Sport', 'http://focus.pl.feedsportal.com/c/32992/f/532696/index.rss'),
                            ('Technika', 'http://focus.pl.feedsportal.com/c/32992/f/532697/index.rss'),
                            ('Przyroda', 'http://focus.pl.feedsportal.com/c/32992/f/532698/index.rss'),
                            ('Technologie', 'http://focus.pl.feedsportal.com/c/32992/f/532699/index.rss'),                            
                          ]
    def skip_ad_pages(self, soup):
-          tag=soup.find(name='a')
+        if ('advertisement' in soup.find('title').string.lower()):
-          if tag:
+            href = soup.find('a').get('href')
-            new_soup=self.index_to_soup(tag['href']+ 'do-druku/1/', raw=True)
+            return self.index_to_soup(href, raw=True)
-            return new_soup
+        else:
-
+            return None
    def append_page(self, appendtag):
        tag=appendtag.find(name='div', attrs={'class':'arrows'})
        if tag:
            nexturl='http://www.focus.pl/'+tag.a['href']
            for rem in appendtag.findAll(name='div', attrs={'class':'klik-nav'}):
                rem.extract()
            while nexturl:
                 soup2=self.index_to_soup(nexturl)
                 nexturl=None
                 pagetext=soup2.find(name='div', attrs={'class':'txt'})
                 tag=pagetext.find(name='div', attrs={'class':'arrows'})
                 for r in tag.findAll(name='a'):
                     if u'Następne' in r.string:
                         nexturl='http://www.focus.pl/'+r['href']
                 for rem in pagetext.findAll(name='div', attrs={'class':'klik-nav'}):
                     rem.extract()
                 pos = len(appendtag.contents)
                 appendtag.insert(pos, pagetext)
    def get_cover_url(self):
        soup=self.index_to_soup('http://www.focus.pl/magazyn/')
@ -60,7 +71,14 @@ class Focus_pl(BasicNewsRecipe):
            self.cover_url='http://www.focus.pl/' + tag.a['href']
            return getattr(self, 'cover_url', self.cover_url)
-
+    def print_version(self, url):
-    def preprocess_html(self, soup):
+     if url.count ('focus.pl.feedsportal.com'):
-         self.append_page(soup.body)
+            u = url.find('focus0Bpl')
-         return soup
+            u = 'http://www.focus.pl/' + url[u + 11:]
            u = u.replace('0C', '/')
            u = u.replace('A', '')
            u = u.replace ('0E','-')
            u = u.replace('/nc/1//story01.htm', '/do-druku/1')
     else:
            u = url.replace('/nc/1','/do-druku/1')           
     return u
--- a/recipes/globe_and_mail.recipe
+++ b/recipes/globe_and_mail.recipe
@ -51,6 +51,13 @@ class AdvancedUserRecipe1287083651(BasicNewsRecipe):
            {'class':['articleTools', 'pagination', 'Ads', 'topad',
                'breadcrumbs', 'footerNav', 'footerUtil', 'downloadlinks']}]
    def populate_article_metadata(self, article, soup, first):
        if first and hasattr(self, 'add_toc_thumbnail'):
            picdiv = soup.find('img')
            if picdiv is not None:
                self.add_toc_thumbnail(article,picdiv['src'])
    #Use the mobile version rather than the web version
    def print_version(self, url):
        return url.rpartition('?')[0] + '?service=mobile'
--- a/recipes/guardian.recipe
+++ b/recipes/guardian.recipe
@ -79,6 +79,12 @@ class Guardian(BasicNewsRecipe):
              url = None
          return url
    def populate_article_metadata(self, article, soup, first):
        if first and hasattr(self, 'add_toc_thumbnail'):
            picdiv = soup.find('img')
            if picdiv is not None:
                self.add_toc_thumbnail(article,picdiv['src'])
    def preprocess_html(self, soup):
          # multiple html sections in soup, useful stuff in the first
--- a/recipes/hackernews.recipe
+++ b/recipes/hackernews.recipe
@ -9,9 +9,9 @@ from calibre.ptempfile import PersistentTemporaryFile
 from urlparse import urlparse
 import re
-class HackerNews(BasicNewsRecipe):
+class HNWithCommentsLink(BasicNewsRecipe):
-    title                 = 'Hacker News'
+    title                 = 'HN With Comments Link'
-    __author__            = 'Tom Scholl'
+    __author__            = 'Tom Scholl & David Kerschner'
    description           = u'Hacker News, run by Y Combinator. Anything that good hackers would find interesting, with a focus on programming and startups.'
    publisher             = 'Y Combinator'
    category              = 'news, programming, it, technology'
@ -80,6 +80,11 @@ class HackerNews(BasicNewsRecipe):
        body = body + comments
        return u'<html><title>' + title + u'</title><body>' + body + '</body></html>'
    def parse_feeds(self):
        a = super(HNWithCommentsLink, self).parse_feeds()
        self.hn_articles = a[0].articles
        return a
    def get_obfuscated_article(self, url):
        if url.startswith('http://news.ycombinator.com'):
            content = self.get_hn_content(url)
@ -97,6 +102,13 @@ class HackerNews(BasicNewsRecipe):
            else:
                content = self.get_readable_content(url)
            article = 0
            for a in self.hn_articles:
                if a.url == url:
                    article = a
        content = re.sub(r'</body>\s*</html>\s*$', '', content) + article.summary + '</body></html>'
        self.temp_files.append(PersistentTemporaryFile('_fa.html'))
        self.temp_files[-1].write(content)
        self.temp_files[-1].close()
--- a/recipes/icons/biolog_pl.png
+++ b/recipes/icons/biolog_pl.png
--- a/recipes/icons/blues.png
+++ b/recipes/icons/blues.png
--- a/recipes/icons/computerworld_pl.png
+++ b/recipes/icons/computerworld_pl.png
--- a/recipes/icons/descopera_org.png
+++ b/recipes/icons/descopera_org.png
--- a/recipes/icons/dziennik_pl.png
+++ b/recipes/icons/dziennik_pl.png
--- a/recipes/icons/kosmonauta_pl.png
+++ b/recipes/icons/kosmonauta_pl.png
--- a/recipes/icons/mlody_technik_pl.recipe
+++ b/recipes/icons/mlody_technik_pl.recipe
--- a/recipes/icons/zaman.png
+++ b/recipes/icons/zaman.png
--- a/recipes/independent.recipe
+++ b/recipes/independent.recipe
@ -104,6 +104,12 @@ class TheIndependentNew(BasicNewsRecipe):
            url = None
        return url
    def populate_article_metadata(self, article, soup, first):
        if first and hasattr(self, 'add_toc_thumbnail'):
            picdiv = soup.find('img')
            if picdiv is not None:
                self.add_toc_thumbnail(article,picdiv['src'])
    def preprocess_html(self, soup):
        #remove 'advertorial articles'
@ -266,12 +272,15 @@ class TheIndependentNew(BasicNewsRecipe):
    def _insertRatingStars(self,soup,item):
-        if item.contents is None:
+        if item.contents is None or len(item.contents) < 1:
            return
        rating = item.contents[0]
-        if not rating.isdigit():
+
-            return None
+        try:
-        rating = int(item.contents[0])
+            rating = float(item.contents[0])
        except:
            print 'Could not convert decimal rating to star: malformatted float.'
            return
        for i in range(1,6):
            star = Tag(soup,'img')
            if i <= rating:
--- a/recipes/kosmonauta_pl.recipe
+++ b/recipes/kosmonauta_pl.recipe
@ -0,0 +1,14 @@
 # vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai
 from calibre.web.feeds.news import BasicNewsRecipe
 class Kosmonauta(BasicNewsRecipe):
    title          = u'Kosmonauta.net'
    __author__        = 'fenuks'
    description   = u'polskojęzyczny portal w całości dedykowany misjom kosmicznym i badaniom kosmosu.'
    category       = 'astronomy'
    language       = 'pl'
    cover_url='http://bi.gazeta.pl/im/4/10393/z10393414X,Kosmonauta-net.jpg'
    no_stylesheets = True
    oldest_article = 7
    max_articles_per_feed = 100
    feeds          = [(u'Kosmonauta.net', u'http://www.kosmonauta.net/index.php/feed/rss.html')]
--- a/recipes/la_republica.recipe
+++ b/recipes/la_republica.recipe
@ -1,13 +1,12 @@
 __license__   = 'GPL v3'
 __author__    = 'Lorenzo Vigentini, based on Darko Miletic, Gabriele Marini'
 __copyright__ = '2009-2011, Darko Miletic <darko.miletic at gmail.com>, Lorenzo Vigentini <l.vigentini at gmail.com>'
-description   = 'Italian daily newspaper - v1.01 (04, January 2010); 16.05.2010 new version; 17.10.2011 new version'
+description   = 'Italian daily newspaper - v1.01 (04, January 2010); 16.05.2010 new version; 17.10.2011 new version; 14.12.2011 new version'
 '''
 http://www.repubblica.it/
 '''
 import re
 from calibre.ptempfile import PersistentTemporaryFile
 from calibre.web.feeds.news import BasicNewsRecipe
@ -32,12 +31,6 @@ class LaRepubblica(BasicNewsRecipe):
                              """
    remove_attributes = ['width','height','lang','xmlns:og','xmlns:fb']
    preprocess_regexps = [
        (re.compile(r'.*?<head>', re.DOTALL|re.IGNORECASE), lambda match: '<head>'),
        (re.compile(r'<head>.*?<title>', re.DOTALL|re.IGNORECASE), lambda match: '<head><title>'),
        (re.compile(r'</title>.*?</head>', re.DOTALL|re.IGNORECASE), lambda match: '</title></head>')
    ]
    def get_article_url(self, article):
        link = BasicNewsRecipe.get_article_url(self, article)
@ -73,15 +66,15 @@ class LaRepubblica(BasicNewsRecipe):
    remove_tags        = [
                            dict(name=['object','link','meta','iframe','embed']),
                            dict(name='span',attrs={'class':'linkindice'}),
-                            dict(name='div', attrs={'class':'bottom-mobile'}),
+                            dict(name='div', attrs={'class':['bottom-mobile','adv adv-middle-inline']}),
-                            dict(name='div', attrs={'id':['rssdiv','blocco']}),
+                            dict(name='div', attrs={'id':['rssdiv','blocco','fb-like-head']}),
-                            dict(name='div', attrs={'class':'utility'}),
+                            dict(name='div', attrs={'class':['utility','fb-like-button','archive-button']}),
                            dict(name='div', attrs={'class':'generalbox'}),
                            dict(name='ul', attrs={'id':'hystory'})
                         ]
    feeds          = [
-                       (u'Rilievo', u'http://www.repubblica.it/rss/homepage/rss2.0.xml'),
+                       (u'Homepage', u'http://www.repubblica.it/rss/homepage/rss2.0.xml'),
                       (u'Cronaca', u'http://www.repubblica.it/rss/cronaca/rss2.0.xml'),
                       (u'Esteri', u'http://www.repubblica.it/rss/esteri/rss2.0.xml'),
                       (u'Economia', u'http://www.repubblica.it/rss/economia/rss2.0.xml'),
@ -110,3 +103,5 @@ class LaRepubblica(BasicNewsRecipe):
            del item['style']           
        return soup
    def preprocess_raw_html(self, raw, url):
       return '<html><head>'+raw[raw.find('</head>'):]
--- a/recipes/metro_news_nl.recipe
+++ b/recipes/metro_news_nl.recipe
@ -15,13 +15,13 @@ try:
    SHOWDEBUG1 = mlog.showdebuglevel(1)
    SHOWDEBUG2 = mlog.showdebuglevel(2)
 except:
-    print 'drMerry debuglogger not found, skipping debug options'
+    #print 'drMerry debuglogger not found, skipping debug options'
    SHOWDEBUG0 = False
    SHOWDEBUG1 = False
    SHOWDEBUG2 = False
    KEEPSTATS = False
-print ('level0: %s\nlevel1: %s\nlevel2: %s' % (SHOWDEBUG0,SHOWDEBUG1,SHOWDEBUG2))
+#print ('level0: %s\nlevel1: %s\nlevel2: %s' % (SHOWDEBUG0,SHOWDEBUG1,SHOWDEBUG2))
 ''' Version 1.2, updated cover image to match the changed website.
 added info date on title
--- a/recipes/mlody_technik_pl.recipe
+++ b/recipes/mlody_technik_pl.recipe
@ -0,0 +1,15 @@
 # vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai
 from calibre.web.feeds.news import BasicNewsRecipe
 class Mlody_technik(BasicNewsRecipe):
    title          = u'Mlody technik'
    __author__        = 'fenuks'
    description   = u'Młody technik'
    category       = 'science'
    language       = 'pl'
    cover_url='http://science-everywhere.pl/wp-content/uploads/2011/10/mt12.jpg'
    no_stylesheets = True
    oldest_article = 7
    max_articles_per_feed = 100
    #keep_only_tags=[dict(id='container')]
    feeds          = [(u'Artyku\u0142y', u'http://www.mt.com.pl/feed')]
--- a/recipes/naczytniki.recipe
+++ b/recipes/naczytniki.recipe
@ -7,6 +7,7 @@ class naczytniki(BasicNewsRecipe):
    language       = 'pl'
    description ='everything about e-readers'
    category='readers'
    no_stylesheets=True
    oldest_article = 7
    max_articles_per_feed = 100
    remove_tags_after= dict(name='div', attrs={'class':'sociable'})
--- a/recipes/nowa_fantastyka.recipe
+++ b/recipes/nowa_fantastyka.recipe
@ -1,20 +1,21 @@
 # -*- coding: utf-8 -*-
 from calibre.web.feeds.news import BasicNewsRecipe
 class Nowa_Fantastyka(BasicNewsRecipe):
    title          = u'Nowa Fantastyka'
    oldest_article = 7
    __author__        = 'fenuks'
    language       = 'pl'
    encoding='latin2'
    description ='site for fantasy readers'
    category='fantasy'
    max_articles_per_feed = 100
    INDEX='http://www.fantastyka.pl/'
    no_stylesheets=True
    needs_subscription = 'optional'
    remove_tags_before=dict(attrs={'class':'belka1-tlo-md'})
    #remove_tags_after=dict(name='span', attrs={'class':'naglowek-oceny'})
    remove_tags_after=dict(name='td', attrs={'class':'belka1-bot'})
-    remove_tags=[dict(attrs={'class':'avatar2'})]
+    remove_tags=[dict(attrs={'class':'avatar2'}), dict(name='span', attrs={'class':'alert-oceny'}), dict(name='img', attrs={'src':['obrazki/sledz1.png', 'obrazki/print.gif', 'obrazki/mlnf.gif']}), dict(name='b', text='Dodaj komentarz'),dict(name='a', attrs={'href':'http://www.fantastyka.pl/10,1727.html'})]
    feeds          = []
    def find_articles(self, url):
        articles = []
@ -45,3 +46,13 @@ class Nowa_Fantastyka(BasicNewsRecipe):
        cover=soup.find(name='img', attrs={'class':'okladka'})
        self.cover_url=self.INDEX+ cover['src']
        return getattr(self, 'cover_url', self.cover_url)
    def get_browser(self):
        br = BasicNewsRecipe.get_browser()
        if self.username is not None and self.password is not None:
            br.open('http://www.fantastyka.pl/')
            br.select_form(nr=0)
            br['login']   = self.username
            br['pass'] = self.password
            br.submit()
        return br
--- a/recipes/nytimes.recipe
+++ b/recipes/nytimes.recipe
@ -1,5 +1,5 @@
 #!/usr/bin/env  python
-
+# -*- coding: utf-8 -*-
 __license__   = 'GPL v3'
 __copyright__ = '2008, Kovid Goyal <kovid at kovidgoyal.net>'
 '''
@ -707,6 +707,16 @@ class NYTimes(BasicNewsRecipe):
 		return soup
    def populate_article_metadata(self, article, soup, first):
        if first and hasattr(self, 'add_toc_thumbnail'):
            idxdiv = soup.find('div',attrs={'class':'articleSpanImage'})
            if idxdiv is not None:
                if idxdiv.img:
                    self.add_toc_thumbnail(article, idxdiv.img['src'])
            else:
                img = soup.find('img')
                if img is not None:
                    self.add_toc_thumbnail(article, img['src'])
        shortparagraph = ""
        try:
            if len(article.text_summary.strip()) == 0:
--- a/recipes/nytimes_sub.recipe
+++ b/recipes/nytimes_sub.recipe
@ -855,6 +855,16 @@ class NYTimes(BasicNewsRecipe):
        return soup
    def populate_article_metadata(self, article, soup, first):
        if first and hasattr(self, 'add_toc_thumbnail'):
            idxdiv = soup.find('div',attrs={'class':'articleSpanImage'})
            if idxdiv is not None:
                if idxdiv.img:
                    self.add_toc_thumbnail(article, idxdiv.img['src'])
            else:
                img = soup.find('img')
                if img is not None:
                    self.add_toc_thumbnail(article, img['src'])
        shortparagraph = ""
        try:
            if len(article.text_summary.strip()) == 0:
--- a/recipes/rynek_zdrowia.recipe
+++ b/recipes/rynek_zdrowia.recipe
@ -0,0 +1,21 @@
 from calibre.web.feeds.news import BasicNewsRecipe
 class rynekzdrowia(BasicNewsRecipe):
    title          = u'Rynek Zdrowia'
    __author__ = u'spi630'
    language = 'pl'
    masthead_url = 'http://k.rynekzdrowia.pl/images/headerLogo.png'
    cover_url = 'http://k.rynekzdrowia.pl/images/headerLogo.png'
    oldest_article = 3
    max_articles_per_feed = 25
    no_stylesheets = True
    auto_cleanup = True
    remove_empty_feeds=True
    remove_tags_before = dict(name='h3')
    feeds          = [(u'Finanse i Zarz\u0105dzanie', u'http://www.rynekzdrowia.pl/Kanal/finanse.html'), (u'Inwestycje', u'http://www.rynekzdrowia.pl/Kanal/inwestycje.html'), (u'Aparatura i wyposa\u017cenie', u'http://www.rynekzdrowia.pl/Kanal/aparatura.html'), (u'Informatyka', u'http://www.rynekzdrowia.pl/Kanal/informatyka.html'), (u'Prawo', u'http://www.rynekzdrowia.pl/Kanal/prawo.html'), (u'Polityka zdrowotna', u'http://www.rynekzdrowia.pl/Kanal/polityka_zdrowotna.html'), (u'Ubezpieczenia Zdrowotne', u'http://www.rynekzdrowia.pl/Kanal/ubezpieczenia.html'), (u'Farmacja', u'http://www.rynekzdrowia.pl/Kanal/farmacja.html'), (u'Badania i rozw\xf3j', u'http://www.rynekzdrowia.pl/Kanal/badania.html'), (u'Nauka', u'http://www.rynekzdrowia.pl/Kanal/nauka.html'), (u'Po godzinach', u'http://www.rynekzdrowia.pl/Kanal/godziny.html'), (u'Us\u0142ugi medyczne', u'http://www.rynekzdrowia.pl/Kanal/uslugi.html')]
    def print_version(self, url):
        url = url.replace('.html', ',drukuj.html')
        return url
--- a/recipes/spiders_web_pl.recipe
+++ b/recipes/spiders_web_pl.recipe
@ -8,8 +8,8 @@ class SpidersWeb(BasicNewsRecipe):
    cover_url      = 'http://www.spidersweb.pl/wp-content/themes/spiderweb/img/Logo.jpg'
    category       = 'IT, WEB'
    language       = 'pl'
    no_stylesheers=True
    max_articles_per_feed = 100
-    remove_tags_before=dict(name="h1", attrs={'class':'Title'})
+    keep_only_tags=[dict(id='Post')]
-    remove_tags_after=dict(name="div", attrs={'class':'Text'})
+    remove_tags=[dict(name='div', attrs={'class':['Comments', 'Shows', 'Post-Tags']})]
    remove_tags=[dict(name='div', attrs={'class':['Tags', 'CommentCount FloatL', 'Show FloatL']})]
    feeds          = [(u'Wpisy', u'http://www.spidersweb.pl/feed')]
--- a/recipes/sueddeutsche.recipe
+++ b/recipes/sueddeutsche.recipe
@ -6,54 +6,21 @@ __copyright__ = '2008, Kovid Goyal <kovid at kovidgoyal.net>'
 Fetch sueddeutsche.de
 '''
 from calibre.web.feeds.news import BasicNewsRecipe
 class Sueddeutsche(BasicNewsRecipe):
    title = u'sueddeutsche.de'
    description = 'News from Germany'
-    __author__ = 'Oliver Niesner and Armin Geller' #AGe 2011-11-25
+    __author__ = 'Oliver Niesner and Armin Geller' #Update AGe 2011-12-16
    use_embedded_content   = False
    timefmt = ' [%d %b %Y]'
    oldest_article = 7
    max_articles_per_feed = 50
    no_stylesheets = True
    language = 'de'
    encoding = 'utf-8'
    remove_javascript = True
-    cover_url  = 'http://polpix.sueddeutsche.com/polopoly_fs/1.1219199.1322239289!/image/image.jpg_gen/derivatives/860x860/image.jpg' # 2011-11-25 AGe
+    auto_cleanup = True
-
+    cover_url  = 'http://polpix.sueddeutsche.com/polopoly_fs/1.1237395.1324054345!/image/image.jpg_gen/derivatives/860x860/image.jpg' # 2011-12-16 AGe
    remove_tags = [ dict(name='link'), dict(name='iframe'),
                    dict(name='div', attrs={'id':["bookmarking","themenbox","artikelfoot","CAD_AD",
                          "SKY_AD","NT1_AD","navbar1","sdesiteheader"]}),
                    dict(name='div', attrs={'class':["similar-article-box","artikelliste","nteaser301bg",
                                 "pages closed","basebox right narrow","headslot galleried"]}),
                    dict(name='div', attrs={'class':["articleDistractor","listHeader","listHeader2","hr2",
                             "item","videoBigButton","articlefooter full-column",
                                                     "bildbanderolle full-column","footerCopy padleft5"]}),
                    dict(name='p', attrs={'class':["ressortartikeln","artikelFliestext","entry-summary"]}),
                    dict(name='div', attrs={'style':["position:relative;"]}),
                    dict(name='span', attrs={'class':["nlinkheaderteaserschwarz","artikelLink","r10000000"]}),
                    dict(name='table', attrs={'class':["stoerBS","kommentare","footer","pageBoxBot","pageAktiv","bgcontent"]}),
                    dict(name='ul', attrs={'class':["breadcrumb","articles","activities","sitenav","actions"]}),
                    dict(name='td', attrs={'class':["artikelDruckenRight"]}),
                    dict(name='p', text = "ANZEIGE")
                     ]
    remove_tags_after = [dict(name='div', attrs={'class':["themenbox full-column"]})]
    extra_css = '''
                    h2{font-family:Arial,Helvetica,sans-serif; font-size: x-small; color: #003399;}
                    a{font-family:Arial,Helvetica,sans-serif; font-style:italic;}
                    .dachzeile p{font-family:Arial,Helvetica,sans-serif; font-size: x-small; }
                    h1{ font-family:Arial,Helvetica,sans-serif;  font-size:x-large; font-weight:bold;}
                    .artikelTeaser{font-family:Arial,Helvetica,sans-serif; font-size: x-small; font-weight:bold; }
                    body{font-family:Arial,Helvetica,sans-serif; }
                    .photo {font-family:Arial,Helvetica,sans-serif; font-size: x-small; color: #666666;}                 '''
    feeds = [
              (u'Politik', u'http://suche.sueddeutsche.de/query/%23/sort/-docdatetime/drilldown/%C2%A7ressort%3A%5EPolitik%24?output=rss'),
              (u'Wirtschaft', u'http://suche.sueddeutsche.de/query/%23/sort/-docdatetime/drilldown/%C2%A7ressort%3A%5EWirtschaft%24?output=rss'),
@ -62,7 +29,7 @@ class Sueddeutsche(BasicNewsRecipe):
              (u'Sport', u'http://suche.sueddeutsche.de/query/%23/sort/-docdatetime/drilldown/%C2%A7ressort%3A%5ESport%24?output=rss'),
              (u'Leben', u'http://suche.sueddeutsche.de/query/%23/sort/-docdatetime/drilldown/%C2%A7ressort%3A%5ELeben%24?output=rss'),
              (u'Karriere', u'http://suche.sueddeutsche.de/query/%23/sort/-docdatetime/drilldown/%C2%A7ressort%3A%5EKarriere%24?output=rss'),
-              (u'München & Region', u'http://www.sueddeutsche.de/app/service/rss/ressort/muenchen/rss.xml'), # AGe 2011-11-13
+              (u'München & Region', u'http://suche.sueddeutsche.de/query/%23/sort/-docdatetime/drilldown/%C2%A7ressort%3A%5EMünchen&Region%24?output=rss'),
              (u'Bayern', u'http://suche.sueddeutsche.de/query/%23/sort/-docdatetime/drilldown/%C2%A7ressort%3A%5EBayern%24?output=rss'),
              (u'Medien', u'http://suche.sueddeutsche.de/query/%23/sort/-docdatetime/drilldown/%C2%A7ressort%3A%5EMedien%24?output=rss'),
              (u'Digital', u'http://suche.sueddeutsche.de/query/%23/sort/-docdatetime/drilldown/%C2%A7ressort%3A%5EDigital%24?output=rss'),
@ -76,7 +43,12 @@ class Sueddeutsche(BasicNewsRecipe):
              (u'Service', u'http://suche.sueddeutsche.de/query/%23/sort/-docdatetime/drilldown/%C2%A7ressort%3A%5EService%24?output=rss'), # sometimes only
              (u'Verlag', u'http://suche.sueddeutsche.de/query/%23/sort/-docdatetime/drilldown/%C2%A7ressort%3A%5EVerlag%24?output=rss'),   # sometimes only
            ]
-
+# AGe 2011-12-16 Problem of Handling redirections solved by a solution of Recipes-Re-usable code from kiklop74.
 # Feed is:                    http://suche.sueddeutsche.de/query/%23/sort/-docdatetime/drilldown/%C2%A7ressort%3A%5ESport%24?output=rss
 # Article download source is: http://sz.de/1.1237295 (Ski Alpin: Der Erfolg kommt, der Trainer geht)
 # Article source is:          http://www.sueddeutsche.de/sport/ski-alpin-der-erfolg-kommt-der-trainer-geht-1.1237295
 # Article printversion is:    http://www.sueddeutsche.de/sport/2.220/ski-alpin-der-erfolg-kommt-der-trainer-geht-1.1237295
    def print_version(self, url):
-        main, sep, id = url.rpartition('/')
+        n_url=self.browser.open_novisit(url).geturl()
        main, sep, id = n_url.rpartition('/')
        return main + '/2.220/' + id
--- a/recipes/telegraph_uk.recipe
+++ b/recipes/telegraph_uk.recipe
@ -59,6 +59,11 @@ class TelegraphUK(BasicNewsRecipe):
                        ,(u'Travel'        , u'http://www.telegraph.co.uk/travel/rss'                                            )
                        ,(u'How about that?', u'http://www.telegraph.co.uk/news/newstopics/howaboutthat/rss'                     )
                         ]
    def populate_article_metadata(self, article, soup, first):
        if first and hasattr(self, 'add_toc_thumbnail'):
            picdiv = soup.find('img')
            if picdiv is not None:
                self.add_toc_thumbnail(article,picdiv['src'])
    def get_article_url(self, article):
        url = article.get('link', None)
--- a/recipes/tuttojove.recipe
+++ b/recipes/tuttojove.recipe
@ -0,0 +1,17 @@
 __license__   = 'GPL v3'
 __author__    = 'faber1971'
 description   = 'Italian website on Juventus F.C. - v1.00 (17, December 2011)'
 from calibre.web.feeds.news import BasicNewsRecipe
 class AdvancedUserRecipe1305984536(BasicNewsRecipe):
    title          = u'tuttojuve'
    description = 'Juventus'
    language = 'it'
    __author__ = 'faber1971'
    oldest_article = 1
    max_articles_per_feed = 100
    feeds          = [(u'notizie', u'http://feeds.tuttojuve.com/rss/'), (u'da vinovo', u'http://feeds.tuttojuve.com/rss/?c=10'), (u'primo piano', u'http://feeds.tuttojuve.com/rss/?c=16'), (u'editoriale', u'http://feeds.tuttojuve.com/rss/?c=3'), (u'il punto', u'http://feeds.tuttojuve.com/rss/?c=8'), (u'pagelle', u'http://feeds.tuttojuve.com/rss/?c=9'), (u'avversario', u'http://feeds.tuttojuve.com/rss/?c=11')]
    def print_version(self, url):
        return self.browser.open_novisit(url).geturl()
--- a/recipes/wsj.recipe
+++ b/recipes/wsj.recipe
@ -57,6 +57,12 @@ class WallStreetJournal(BasicNewsRecipe):
                        'username and password')
        return br
    def populate_article_metadata(self, article, soup, first):
        if first and hasattr(self, 'add_toc_thumbnail'):
            picdiv = soup.find('img')
            if picdiv is not None:
                self.add_toc_thumbnail(article,picdiv['src'])
    def postprocess_html(self, soup, first):
        for tag in soup.findAll(name=['table', 'tr', 'td']):
            tag.name = 'div'
--- a/recipes/wsj_free.recipe
+++ b/recipes/wsj_free.recipe
@ -44,6 +44,12 @@ class WallStreetJournal(BasicNewsRecipe):
                    ]
    remove_tags_after = [dict(id="article_story_body"), {'class':"article story"},]
    def populate_article_metadata(self, article, soup, first):
        if first and hasattr(self, 'add_toc_thumbnail'):
            picdiv = soup.find('img')
            if picdiv is not None:
                self.add_toc_thumbnail(article,picdiv['src'])
    def postprocess_html(self, soup, first):
        for tag in soup.findAll(name=['table', 'tr', 'td']):
            tag.name = 'div'
--- a/recipes/zaman.recipe
+++ b/recipes/zaman.recipe
@ -5,9 +5,10 @@ from calibre.web.feeds.news import BasicNewsRecipe
 class Zaman (BasicNewsRecipe):
    title                  = u'ZAMAN Gazetesi'
    description            =  ' Zaman Gazetesi''nin internet sitesinden günlük haberler'
    __author__             = u'thomass'
    oldest_article         = 2
-    max_articles_per_feed  =100
+    max_articles_per_feed  =50
   # no_stylesheets         = True
    #delay                  = 1
    #use_embedded_content   = False
@ -16,19 +17,19 @@ class Zaman (BasicNewsRecipe):
    category               = 'news, haberler,TR,gazete'
    language               = 'tr'
    publication_type = 'newspaper '
-    extra_css              = ' body{ font-family: Verdana,Helvetica,Arial,sans-serif } .introduction{font-weight: bold} .story-feature{display: block; padding: 0; border: 1px solid; width: 40%; font-size: small} .story-feature h2{text-align: center; text-transform: uppercase} '
+    extra_css              = '.buyukbaslik{font-weight: bold; font-size: 18px;color:#0000FF}'#body{ font-family: Verdana,Helvetica,Arial,sans-serif } .introduction{font-weight: bold} .story-feature{display: block; padding: 0; border: 1px solid; width: 40%; font-size: small} .story-feature h2{text-align: center; text-transform: uppercase} '
    conversion_options = {
                            'tags'            : category
                            ,'language'        : language
                            ,'publisher'       : publisher
-                            ,'linearize_tables': False
+                            ,'linearize_tables': True
                         }
    cover_img_url = 'https://fbcdn-profile-a.akamaihd.net/hprofile-ak-snc4/188140_81722291869_2111820_n.jpg'
    masthead_url = 'http://medya.zaman.com.tr/extentions/zaman.com.tr/img/section/logo-section.png'
-    keep_only_tags      = [dict(name='div', attrs={'id':[ 'news-detail-content']}), dict(name='td', attrs={'class':['columnist-detail','columnist_head']})  ]
+    #keep_only_tags      = [dict(name='div', attrs={'id':[ 'news-detail-content']}), dict(name='td', attrs={'class':['columnist-detail','columnist_head']})  ]
-    remove_tags = [ dict(name='div', attrs={'id':['news-detail-news-text-font-size','news-detail-gallery','news-detail-news-bottom-social']}),dict(name='div', attrs={'class':['radioEmbedBg','radyoProgramAdi']}),dict(name='a', attrs={'class':['webkit-html-attribute-value webkit-html-external-link']}),dict(name='table', attrs={'id':['yaziYorumTablosu']}),dict(name='img', attrs={'src':['http://medya.zaman.com.tr/pics/paylas.gif','http://medya.zaman.com.tr/extentions/zaman.com.tr/img/columnist/ma-16.png']})]
+    remove_tags = [ dict(name='img', attrs={'src':['http://medya.zaman.com.tr/zamantryeni/pics/zamanonline.gif']})]#,dict(name='div', attrs={'class':['radioEmbedBg','radyoProgramAdi']}),dict(name='a', attrs={'class':['webkit-html-attribute-value webkit-html-external-link']}),dict(name='table', attrs={'id':['yaziYorumTablosu']}),dict(name='img', attrs={'src':['http://medya.zaman.com.tr/pics/paylas.gif','http://medya.zaman.com.tr/extentions/zaman.com.tr/img/columnist/ma-16.png']})
    #remove_attributes = ['width','height']
@ -37,7 +38,8 @@ class Zaman (BasicNewsRecipe):
    feeds          = [
                      ( u'Anasayfa', u'http://www.zaman.com.tr/anasayfa.rss'),
                      ( u'Son Dakika', u'http://www.zaman.com.tr/sondakika.rss'),
-                      ( u'En çok Okunanlar', u'http://www.zaman.com.tr/max_all.rss'),
+                      #( u'En çok Okunanlar', u'http://www.zaman.com.tr/max_all.rss'),
                      #( u'Manşet', u'http://www.zaman.com.tr/manset.rss'),
                      ( u'Gündem', u'http://www.zaman.com.tr/gundem.rss'),
                      ( u'Yazarlar', u'http://www.zaman.com.tr/yazarlar.rss'),
                      ( u'Politika', u'http://www.zaman.com.tr/politika.rss'),
@ -45,11 +47,20 @@ class Zaman (BasicNewsRecipe):
                      ( u'Dış Haberler', u'http://www.zaman.com.tr/dishaberler.rss'),
                      ( u'Yorumlar', u'http://www.zaman.com.tr/yorumlar.rss'),
                      ( u'Röportaj', u'http://www.zaman.com.tr/roportaj.rss'),
                      ( u'Dizi Yazı', u'http://www.zaman.com.tr/dizi.rss'),
                      ( u'Bilişim', u'http://www.zaman.com.tr/bilisim.rss'),
                      ( u'Otomotiv', u'http://www.zaman.com.tr/otomobil.rss'),
                      ( u'Spor', u'http://www.zaman.com.tr/spor.rss'),
                      ( u'Kürsü', u'http://www.zaman.com.tr/kursu.rss'),
                      ( u'Eğitim', u'http://www.zaman.com.tr/egitim.rss'),
                      ( u'Kültür Sanat', u'http://www.zaman.com.tr/kultursanat.rss'),
                      ( u'Televizyon', u'http://www.zaman.com.tr/televizyon.rss'),
-                      ( u'Manşet', u'http://www.zaman.com.tr/manset.rss'),
+                      ( u'Aile', u'http://www.zaman.com.tr/aile.rss'), 
-
+                      ( u'Cuma Eki', u'http://www.zaman.com.tr/cuma.rss'),
                      ( u'Cumaertesi Eki', u'http://www.zaman.com.tr/cumaertesi.rss'),
                      ( u'Pazar Eki', u'http://www.zaman.com.tr/pazar.rss'),
                        ]
    def print_version(self, url):
     return url.replace('http://www.zaman.com.tr/haber.do?haberno=', 'http://www.zaman.com.tr/yazdir.do?haberno=')
--- a/resources/default_tweaks.py
+++ b/resources/default_tweaks.py
@ -409,6 +409,17 @@ locale_for_sorting =  ''
 # columns. If False, one column is used.
 metadata_single_use_2_cols_for_custom_fields = True
 #: Order of custom column(s) in edit metadata
 # Controls the order that custom columns are listed in edit metadata single
 # and bulk. The columns listed in the tweak are displayed first and in the
 # order provided. Any columns not listed are dislayed after the listed ones,
 # in alphabetical order. Do note that this tweak does not change the size of
 # the edit widgets. Putting comments widgets in this list may result in some
 # odd widget spacing when using two-column mode.
 # Enter a comma-separated list of custom field lookup names, as in
 # metadata_edit_custom_column_order = ['#genre', '#mytags', '#etc']
 metadata_edit_custom_column_order = []
 #: The number of seconds to wait before sending emails
 # The number of seconds to wait before sending emails when using a
 # public email server like gmail or hotmail. Default is: 5 minutes
--- a/session.vim
+++ b/session.vim
@ -1,5 +1,5 @@
 " Project wide builtins
-let g:pyflakes_builtins = ["_", "dynamic_property", "__", "P", "I", "lopen", "icu_lower", "icu_upper", "icu_title", "ngettext"]
+let $PYFLAKES_BUILTINS = "_,dynamic_property,__,P,I,lopen,icu_lower,icu_upper,icu_title,ngettext"
 python << EOFPY
 import os, sys
--- a/setup/commands.py
+++ b/setup/commands.py
@ -11,7 +11,7 @@ __all__ = [
        'build', 'build_pdf2xml', 'server',
        'gui',
        'develop', 'install',
-        'kakasi', 'resources',
+        'kakasi', 'coffee', 'resources',
        'check',
        'sdist',
        'manual', 'tag_release',
@ -49,9 +49,10 @@ gui = GUI()
 from setup.check import Check
 check = Check()
-from setup.resources import Resources, Kakasi
+from setup.resources import Resources, Kakasi, Coffee
 resources = Resources()
 kakasi = Kakasi()
 coffee = Coffee()
 from setup.publish import Manual, TagRelease, Stage1, Stage2, \
        Stage3, Stage4, Stage5, Publish
--- a/setup/install.py
+++ b/setup/install.py
@ -6,7 +6,7 @@ __license__   = 'GPL v3'
 __copyright__ = '2009, Kovid Goyal <kovid@kovidgoyal.net>'
 __docformat__ = 'restructuredtext en'
-import sys, os, textwrap, subprocess, shutil, tempfile, atexit, shlex
+import sys, os, textwrap, subprocess, shutil, tempfile, atexit, shlex, glob
 from setup import (Command, islinux, isbsd, basenames, modules, functions,
        __appname__, __version__)
@ -296,13 +296,14 @@ class Sdist(Command):
        for x in open('.bzrignore').readlines():
            if not x.startswith('resources/'): continue
            p = x.strip().replace('/', os.sep)
-            d = self.j(tdir, os.path.dirname(p))
+            for p in glob.glob(p):
-            if not self.e(d):
+                d = self.j(tdir, os.path.dirname(p))
-                os.makedirs(d)
+                if not self.e(d):
-            if os.path.isdir(p):
+                    os.makedirs(d)
-                shutil.copytree(p, self.j(tdir, p))
+                if os.path.isdir(p):
-            else:
+                    shutil.copytree(p, self.j(tdir, p))
-                shutil.copy2(p, d)
+                else:
                    shutil.copy2(p, d)
        for x in os.walk(os.path.join(self.SRC, 'calibre')):
            for f in x[-1]:
                if not f.endswith('_ui.py'): continue
--- a/setup/iso_639/ca.po
+++ b/setup/iso_639/ca.po
@ -12,14 +12,14 @@ msgstr ""
 "Report-Msgid-Bugs-To: Debian iso-codes team <pkg-isocodes-"
 "devel@lists.alioth.debian.org>\n"
 "POT-Creation-Date: 2011-11-25 14:01+0000\n"
-"PO-Revision-Date: 2011-11-22 16:45+0000\n"
+"PO-Revision-Date: 2011-12-14 19:48+0000\n"
 "Last-Translator: Ferran Rius <frius64@hotmail.com>\n"
 "Language-Team: Catalan <linux@softcatala.org>\n"
 "MIME-Version: 1.0\n"
 "Content-Type: text/plain; charset=UTF-8\n"
 "Content-Transfer-Encoding: 8bit\n"
-"X-Launchpad-Export-Date: 2011-11-26 05:10+0000\n"
+"X-Launchpad-Export-Date: 2011-12-15 05:18+0000\n"
-"X-Generator: Launchpad (build 14381)\n"
+"X-Generator: Launchpad (build 14487)\n"
 "Language: ca\n"
 #. name for aaa
@ -9348,7 +9348,7 @@ msgstr "Seit-Kaitetu"
 #. name for hil
 msgid "Hiligaynon"
-msgstr ""
+msgstr "Hiligainon"
 #. name for hin
 msgid "Hindi"
@ -9356,39 +9356,39 @@ msgstr "Hindi"
 #. name for hio
 msgid "Tsoa"
-msgstr ""
+msgstr "Tsoa"
 #. name for hir
 msgid "Himarimã"
-msgstr ""
+msgstr "Himarimà"
 #. name for hit
 msgid "Hittite"
-msgstr ""
+msgstr "Hittita"
 #. name for hiw
 msgid "Hiw"
-msgstr ""
+msgstr "Hiw"
 #. name for hix
 msgid "Hixkaryána"
-msgstr ""
+msgstr "Hishkaryana"
 #. name for hji
 msgid "Haji"
-msgstr ""
+msgstr "Aji"
 #. name for hka
 msgid "Kahe"
-msgstr ""
+msgstr "Kahe"
 #. name for hke
 msgid "Hunde"
-msgstr ""
+msgstr "Hunde"
 #. name for hkk
 msgid "Hunjara-Kaina Ke"
-msgstr ""
+msgstr "Hunjara"
 #. name for hks
 msgid "Hong Kong Sign Language"
@ -9396,27 +9396,27 @@ msgstr "Llenguatge de signes de Hong Kong"
 #. name for hla
 msgid "Halia"
-msgstr ""
+msgstr "Halia"
 #. name for hlb
 msgid "Halbi"
-msgstr ""
+msgstr "Halbi"
 #. name for hld
 msgid "Halang Doan"
-msgstr ""
+msgstr "Halang Doan"
 #. name for hle
 msgid "Hlersu"
-msgstr ""
+msgstr "Sansu"
 #. name for hlt
 msgid "Nga La"
-msgstr ""
+msgstr "Nga La"
 #. name for hlu
 msgid "Luwian; Hieroglyphic"
-msgstr ""
+msgstr "Luvi; jeroglífic"
 #. name for hma
 msgid "Miao; Southern Mashan"
@ -9424,7 +9424,7 @@ msgstr "Miao; Mashan meridional"
 #. name for hmb
 msgid "Songhay; Humburi Senni"
-msgstr ""
+msgstr "Songhai; central"
 #. name for hmc
 msgid "Miao; Central Huishui"
@ -9440,11 +9440,11 @@ msgstr "Miao; Huishui oriental"
 #. name for hmf
 msgid "Hmong Don"
-msgstr ""
+msgstr "Miao; Don"
 #. name for hmg
 msgid "Hmong; Southwestern Guiyang"
-msgstr ""
+msgstr "Miao; Guiyang sudoccidental"
 #. name for hmh
 msgid "Miao; Southwestern Huishui"
@ -9456,11 +9456,11 @@ msgstr "Miao; Huishui septentrional"
 #. name for hmj
 msgid "Ge"
-msgstr ""
+msgstr "Ge"
 #. name for hmk
 msgid "Maek"
-msgstr ""
+msgstr "Maek"
 #. name for hml
 msgid "Miao; Luopohe"
@ -9472,11 +9472,11 @@ msgstr "Miao; Mashan central"
 #. name for hmn
 msgid "Hmong"
-msgstr ""
+msgstr "Hmong (macrollengua)"
 #. name for hmo
 msgid "Hiri Motu"
-msgstr ""
+msgstr "Hiri Motu"
 #. name for hmp
 msgid "Miao; Northern Mashan"
@ -9488,7 +9488,7 @@ msgstr "Miao; Qiandong oriental"
 #. name for hmr
 msgid "Hmar"
-msgstr ""
+msgstr "Hmar"
 #. name for hms
 msgid "Miao; Southern Qiandong"
@ -9496,15 +9496,15 @@ msgstr "Miao; Qiandong meridional"
 #. name for hmt
 msgid "Hamtai"
-msgstr ""
+msgstr "Hamtai"
 #. name for hmu
 msgid "Hamap"
-msgstr ""
+msgstr "Hamap"
 #. name for hmv
 msgid "Hmong Dô"
-msgstr ""
+msgstr "Miao; Do"
 #. name for hmw
 msgid "Miao; Western Mashan"
@ -9520,19 +9520,19 @@ msgstr "Miao; Shua"
 #. name for hna
 msgid "Mina (Cameroon)"
-msgstr ""
+msgstr "Mina (Camerun)"
 #. name for hnd
 msgid "Hindko; Southern"
-msgstr ""
+msgstr "Hindko; meridional"
 #. name for hne
 msgid "Chhattisgarhi"
-msgstr ""
+msgstr "Chattisgarbi"
 #. name for hnh
 msgid "//Ani"
-msgstr ""
+msgstr "Ani"
 #. name for hni
 msgid "Hani"
@ -9540,7 +9540,7 @@ msgstr ""
 #. name for hnj
 msgid "Hmong Njua"
-msgstr ""
+msgstr "Miao; Hmong Njua"
 #. name for hnn
 msgid "Hanunoo"
@ -9548,7 +9548,7 @@ msgstr ""
 #. name for hno
 msgid "Hindko; Northern"
-msgstr ""
+msgstr "Hindko; septentrional"
 #. name for hns
 msgid "Hindustani; Caribbean"
@ -11800,7 +11800,7 @@ msgstr ""
 #. name for khq
 msgid "Songhay; Koyra Chiini"
-msgstr ""
+msgstr "Songhai; Koyra"
 #. name for khr
 msgid "Kharia"
@ -17288,7 +17288,7 @@ msgstr ""
 #. name for mww
 msgid "Hmong Daw"
-msgstr ""
+msgstr "Miao; blanc"
 #. name for mwx
 msgid "Mediak"
@ -28680,7 +28680,7 @@ msgstr ""
 #. name for xlu
 msgid "Luwian; Cuneiform"
-msgstr ""
+msgstr "Luvi; cuneïforme"
 #. name for xly
 msgid "Elymian"
--- a/setup/iso_639/uk.po
+++ b/setup/iso_639/uk.po
@ -12,14 +12,14 @@ msgstr ""
 "Report-Msgid-Bugs-To: Debian iso-codes team <pkg-isocodes-"
 "devel@lists.alioth.debian.org>\n"
 "POT-Creation-Date: 2011-11-25 14:01+0000\n"
-"PO-Revision-Date: 2011-09-27 15:33+0000\n"
+"PO-Revision-Date: 2011-12-03 15:11+0000\n"
-"Last-Translator: Kovid Goyal <Unknown>\n"
+"Last-Translator: Yuri Chornoivan <yurchor@gmail.com>\n"
 "Language-Team: Ukrainian <translation-team-uk@lists.sourceforge.net>\n"
 "MIME-Version: 1.0\n"
 "Content-Type: text/plain; charset=UTF-8\n"
 "Content-Transfer-Encoding: 8bit\n"
-"X-Launchpad-Export-Date: 2011-11-26 05:43+0000\n"
+"X-Launchpad-Export-Date: 2011-12-04 04:43+0000\n"
-"X-Generator: Launchpad (build 14381)\n"
+"X-Generator: Launchpad (build 14418)\n"
 "Language: uk\n"
 #. name for aaa
@ -17956,7 +17956,7 @@ msgstr "ндоола"
 #. name for nds
 msgid "German; Low"
-msgstr ""
+msgstr "нижньонімецька"
 #. name for ndt
 msgid "Ndunga"
--- a/setup/resources.py
+++ b/setup/resources.py
@ -6,7 +6,7 @@ __license__   = 'GPL v3'
 __copyright__ = '2009, Kovid Goyal <kovid@kovidgoyal.net>'
 __docformat__ = 'restructuredtext en'
-import os, cPickle, re, shutil, marshal, zipfile, glob
+import os, cPickle, re, shutil, marshal, zipfile, glob, subprocess, time
 from zlib import compress
 from setup import Command, basenames, __appname__
@ -23,7 +23,70 @@ def get_opts_from_parser(parser):
        for o in g.option_list:
            for x in do_opt(o): yield x
-class Kakasi(Command):
+class Coffee(Command): # {{{
    description = 'Compile coffeescript files into javascript'
    COFFEE_DIRS = {'ebooks/oeb/display': 'display'}
    def add_options(self, parser):
        parser.add_option('--watch', '-w', action='store_true', default=False,
                help='Autocompile when .coffee files are changed')
        parser.add_option('--show-js', action='store_true', default=False,
                help='Display the generated javascript')
    def run(self, opts):
        self.do_coffee_compile(opts)
        if opts.watch:
            try:
                while True:
                    time.sleep(0.5)
                    self.do_coffee_compile(opts, timestamp=True,
                            ignore_errors=True)
            except KeyboardInterrupt:
                pass
    def show_js(self, jsfile):
        from pygments.lexers import JavascriptLexer
        from pygments.formatters import TerminalFormatter
        from pygments import highlight
        with open(jsfile, 'rb') as f:
            raw = f.read()
        print highlight(raw, JavascriptLexer(), TerminalFormatter())
    def do_coffee_compile(self, opts, timestamp=False, ignore_errors=False):
        for toplevel, dest in self.COFFEE_DIRS.iteritems():
            dest = self.j(self.RESOURCES, dest)
            for x in glob.glob(self.j(self.SRC, __appname__, toplevel, '*.coffee')):
                js = self.j(dest, os.path.basename(x.rpartition('.')[0]+'.js'))
                if self.newer(js, x):
                    print ('\t%sCompiling %s'%(time.strftime('[%H:%M:%S] ') if
                        timestamp else '', os.path.basename(x)))
                    try:
                        subprocess.check_call(['coffee', '-c', '-o', dest, x])
                    except:
                        print ('\n\tCompilation of %s failed'%os.path.basename(x))
                        if ignore_errors:
                            with open(js, 'wb') as f:
                                f.write('# Compilation from coffeescript failed')
                        else:
                            raise SystemExit(1)
                    else:
                        if opts.show_js:
                            self.show_js(js)
                            print ('#'*80)
                            print ('#'*80)
    def clean(self):
        for toplevel, dest in self.COFFEE_DIRS.iteritems():
            dest = self.j(self.RESOURCES, dest)
            for x in glob.glob(self.j(self.SRC, __appname__, toplevel, '*.coffee')):
                x = x.rpartition('.')[0] + '.js'
                x = self.j(dest, os.path.basename(x))
                if os.path.exists(x):
                    os.remove(x)
 # }}}
 class Kakasi(Command): # {{{
    description = 'Compile resources for unihandecode'
@ -62,9 +125,6 @@ class Kakasi(Command):
            self.info('\tGenerating kanadict')
            self.mkkanadict(src, dest)
        return
    def mkitaiji(self, src, dst):
        dic = {}
        for line in open(src, "r"):
@ -125,11 +185,12 @@ class Kakasi(Command):
        kakasi = self.j(self.RESOURCES, 'localization', 'pykakasi')
        if os.path.exists(kakasi):
            shutil.rmtree(kakasi)
 # }}}
-class Resources(Command):
+class Resources(Command): # {{{
    description = 'Compile various needed calibre resources'
-    sub_commands = ['kakasi']
+    sub_commands = ['kakasi', 'coffee']
    def run(self, opts):
        scripts = {}
@ -223,13 +284,13 @@ class Resources(Command):
            x = self.j(self.RESOURCES, x+'.pickle')
            if os.path.exists(x):
                os.remove(x)
-        from setup.commands import kakasi
+        from setup.commands import kakasi, coffee
        kakasi.clean()
        coffee.clean()
        for x in ('builtin_recipes.xml', 'builtin_recipes.zip',
                'template-functions.json'):
            x = self.j(self.RESOURCES, x)
            if os.path.exists(x):
                os.remove(x)
-
+# }}}
--- a/setup/translations.py
+++ b/setup/translations.py
@ -215,32 +215,34 @@ class GetTranslations(Translations): # {{{
    description = 'Get updated translations from Launchpad'
    BRANCH = 'lp:~kovid/calibre/translations'
-    @classmethod
+    @property
-    def modified_translations(cls):
+    def modified_translations(self):
-        raw = subprocess.Popen(['bzr', 'status'],
+        raw = subprocess.Popen(['bzr', 'status', '-S', self.PATH],
                stdout=subprocess.PIPE).stdout.read().strip()
        ans = []
        for line in raw.splitlines():
            line = line.strip()
-            if line.startswith(cls.PATH) and line.endswith('.po'):
+            if line.startswith('M') and line.endswith('.po'):
-                yield line
+                ans.append(line.split()[-1])
        return ans
    def run(self, opts):
-        if len(list(self.modified_translations())) == 0:
+        if not self.modified_translations:
            subprocess.check_call(['bzr', 'merge', self.BRANCH])
        if len(list(self.modified_translations())) == 0:
            print 'No updated translations available'
        else:
            subprocess.check_call(['bzr', 'commit', '-m',
                'IGN:Updated translations', self.PATH])
        self.check_for_errors()
-    @classmethod
+        if self.modified_translations:
-    def check_for_errors(cls):
+            subprocess.check_call(['bzr', 'commit', '-m',
                'IGN:Updated translations', self.PATH])
        else:
            print('No updated translations available')
    def check_for_errors(self):
        errors = os.path.join(tempfile.gettempdir(), 'calibre-translation-errors')
        if os.path.exists(errors):
            shutil.rmtree(errors)
        os.mkdir(errors)
-        pofilter = ('pofilter', '-i', cls.PATH, '-o', errors,
+        pofilter = ('pofilter', '-i', self.PATH, '-o', errors,
                '-t', 'accelerators', '-t', 'escapes', '-t', 'variables',
                #'-t', 'xmltags',
                #'-t', 'brackets',
@ -253,23 +255,20 @@ class GetTranslations(Translations): # {{{
                '-t', 'printf')
        subprocess.check_call(pofilter)
        errfiles = glob.glob(errors+os.sep+'*.po')
-        subprocess.check_call(['gvim', '-f', '-p', '--']+errfiles)
+        if errfiles:
-        for f in errfiles:
+            subprocess.check_call(['gvim', '-f', '-p', '--']+errfiles)
-            with open(f, 'r+b') as f:
+            for f in errfiles:
-                raw = f.read()
+                with open(f, 'r+b') as f:
-                raw = re.sub(r'# \(pofilter\).*', '', raw)
+                    raw = f.read()
-                f.seek(0)
+                    raw = re.sub(r'# \(pofilter\).*', '', raw)
-                f.truncate()
+                    f.seek(0)
-                f.write(raw)
+                    f.truncate()
                    f.write(raw)
-        subprocess.check_call(['pomerge', '-t', cls.PATH, '-i', errors, '-o',
+            subprocess.check_call(['pomerge', '-t', self.PATH, '-i', errors, '-o',
-            cls.PATH])
+                self.PATH])
-        if len(list(cls.modified_translations())) > 0:
+            return True
-            subprocess.call(['bzr', 'diff', cls.PATH])
+        return False
            yes = raw_input('Merge corrections? [y/n]: ').strip()
            if yes in ['', 'y']:
                subprocess.check_call(['bzr', 'commit', '-m',
                    'IGN:Translation corrections', cls.PATH])
 # }}}
--- a/src/calibre/init.py
+++ b/src/calibre/init.py
@ -558,11 +558,11 @@ xml_entity_to_unicode = partial(entity_to_unicode, result_exceptions = {
    '>' : '&gt;',
    '&' : '&amp;'})
-def replace_entities(raw):
+def replace_entities(raw, encoding='cp1252'):
-    return _ent_pat.sub(entity_to_unicode, raw)
+    return _ent_pat.sub(partial(entity_to_unicode, encoding=encoding), raw)
-def xml_replace_entities(raw):
+def xml_replace_entities(raw, encoding='cp1252'):
-    return _ent_pat.sub(xml_entity_to_unicode, raw)
+    return _ent_pat.sub(partial(xml_entity_to_unicode, encoding=encoding), raw)
 def prepare_string_for_xml(raw, attribute=False):
    raw = _ent_pat.sub(entity_to_unicode, raw)
--- a/src/calibre/constants.py
+++ b/src/calibre/constants.py
@ -4,7 +4,7 @@ __license__   = 'GPL v3'
 __copyright__ = '2008, Kovid Goyal kovid@kovidgoyal.net'
 __docformat__ = 'restructuredtext en'
 __appname__   = u'calibre'
-numeric_version = (0, 8, 29)
+numeric_version = (0, 8, 31)
 __version__   = u'.'.join(map(unicode, numeric_version))
 __author__    = u"Kovid Goyal <kovid@kovidgoyal.net>"
--- a/src/calibre/customize/init.py
+++ b/src/calibre/customize/init.py
@ -451,6 +451,10 @@ class CatalogPlugin(Plugin): # {{{
                           'series_index','series','size','tags','timestamp',
                           'title_sort','title','uuid','languages'])
        all_custom_fields = set(db.custom_field_keys())
        for field in list(all_custom_fields):
            fm = db.field_metadata[field]
            if fm['datatype'] == 'series':
                all_custom_fields.add(field+'_index')
        all_fields = all_std_fields.union(all_custom_fields)
        if opts.fields != 'all':
--- a/src/calibre/devices/android/driver.py
+++ b/src/calibre/devices/android/driver.py
@ -143,6 +143,9 @@ class ANDROID(USBMS):
            # Kobo
            0x2237: { 0x2208 : [0x0226] },
            # Lenovo
            0x17ef : { 0x7421 : [0x0216] },
            }
    EBOOK_DIR_MAIN = ['eBooks/import', 'wordplayer/calibretransfer', 'Books',
            'sdcard/ebooks']
@ -155,7 +158,7 @@ class ANDROID(USBMS):
            'GT-I5700', 'SAMSUNG', 'DELL', 'LINUX', 'GOOGLE', 'ARCHOS',
            'TELECHIP', 'HUAWEI', 'T-MOBILE', 'SEMC', 'LGE', 'NVIDIA',
            'GENERIC-', 'ZTE', 'MID', 'QUALCOMM', 'PANDIGIT', 'HYSTON',
-            'VIZIO', 'GOOGLE', 'FREESCAL', 'KOBO_INC']
+            'VIZIO', 'GOOGLE', 'FREESCAL', 'KOBO_INC', 'LENOVO']
    WINDOWS_MAIN_MEM = ['ANDROID_PHONE', 'A855', 'A853', 'INC.NEXUS_ONE',
            '__UMS_COMPOSITE', '_MB200', 'MASS_STORAGE', '_-_CARD', 'SGH-I897',
            'GT-I9000', 'FILE-STOR_GADGET', 'SGH-T959', 'SAMSUNG_ANDROID',
@ -167,12 +170,13 @@ class ANDROID(USBMS):
            'MB525', 'ANDROID2.3', 'SGH-I997', 'GT-I5800_CARD', 'MB612',
            'GT-S5830_CARD', 'GT-S5570_CARD', 'MB870', 'MID7015A',
            'ALPANDIGITAL', 'ANDROID_MID', 'VTAB1008', 'EMX51_BBG_ANDROI',
-            'UMS', '.K080', 'P990', 'LTE', 'MB853', 'GT-S5660_CARD']
+            'UMS', '.K080', 'P990', 'LTE', 'MB853', 'GT-S5660_CARD', 'A107']
    WINDOWS_CARD_A_MEM = ['ANDROID_PHONE', 'GT-I9000_CARD', 'SGH-I897',
            'FILE-STOR_GADGET', 'SGH-T959', 'SAMSUNG_ANDROID', 'GT-P1000_CARD',
            'A70S', 'A101IT', '7', 'INCREDIBLE', 'A7EB', 'SGH-T849_CARD',
            '__UMS_COMPOSITE', 'SGH-I997_CARD', 'MB870', 'ALPANDIGITAL',
-            'ANDROID_MID', 'P990_SD_CARD', '.K080', 'LTE_CARD', 'MB853']
+            'ANDROID_MID', 'P990_SD_CARD', '.K080', 'LTE_CARD', 'MB853',
            'A1-07___C0541A4F']
    OSX_MAIN_MEM = 'Android Device Main Memory'
--- a/src/calibre/devices/eb600/driver.py
+++ b/src/calibre/devices/eb600/driver.py
@ -173,8 +173,9 @@ class INVESBOOK(EB600):
    FORMATS = ['epub', 'mobi', 'prc', 'fb2', 'html', 'pdf', 'rtf', 'txt']
    BCD         = [0x110, 0x323]
-    VENDOR_NAME = ['INVES_E6', 'INVES-WI']
+    VENDOR_NAME = ['INVES_E6', 'INVES-WI', 'POCKETBO']
-    WINDOWS_MAIN_MEM = WINDOWS_CARD_A_MEM = ['00INVES_E600', 'INVES-WIBOOK']
+    WINDOWS_MAIN_MEM = WINDOWS_CARD_A_MEM = ['00INVES_E600', 'INVES-WIBOOK',
            'OK_POCKET_611_61']
 class BOOQ(EB600):
    name = 'Booq Device Interface'
--- a/src/calibre/ebooks/init.py
+++ b/src/calibre/ebooks/init.py
@ -30,7 +30,7 @@ BOOK_EXTENSIONS = ['lrf', 'rar', 'zip', 'rtf', 'lit', 'txt', 'txtz', 'text', 'ht
                   'html', 'htmlz', 'xhtml', 'pdf', 'pdb', 'pdr', 'prc', 'mobi', 'azw', 'doc',
                   'epub', 'fb2', 'djv', 'djvu', 'lrx', 'cbr', 'cbz', 'cbc', 'oebzip',
                   'rb', 'imp', 'odt', 'chm', 'tpz', 'azw1', 'pml', 'pmlz', 'mbp', 'tan', 'snb',
-                   'xps', 'oxps', 'azw4', 'book', 'zbf', 'pobi']
+                   'xps', 'oxps', 'azw4', 'book', 'zbf', 'pobi', 'docx']
 class HTMLRenderer(object):
--- a/src/calibre/ebooks/comic/input.py
+++ b/src/calibre/ebooks/comic/input.py
@ -17,6 +17,10 @@ from calibre.ptempfile import PersistentTemporaryDirectory
 from calibre.utils.ipc.server import Server
 from calibre.utils.ipc.job import ParallelJob
 # If the specified screen has either dimension larger than this value, no image
 # rescaling is done (we assume that it is a tablet output profile)
 MAX_SCREEN_SIZE = 3000
 def extract_comic(path_to_comic_file):
    '''
    Un-archive the comic file.
@ -141,7 +145,7 @@ class PageProcessor(list): # {{{
                    newsizey = int(newsizex / aspect)
                    deltax = 0
                    deltay = (SCRHEIGHT - newsizey) / 2
-                if newsizex < 20000 and newsizey < 20000:
+                if newsizex < MAX_SCREEN_SIZE and newsizey < MAX_SCREEN_SIZE:
                    # Too large and resizing fails, so better
                    # to leave it as original size
                    wand.size = (newsizex, newsizey)
@ -165,14 +169,14 @@ class PageProcessor(list): # {{{
                    newsizey = int(newsizex / aspect)
                    deltax = 0
                    deltay = (wscreeny - newsizey) / 2
-                if newsizex < 20000 and newsizey < 20000:
+                if newsizex < MAX_SCREEN_SIZE and newsizey < MAX_SCREEN_SIZE:
                    # Too large and resizing fails, so better
                    # to leave it as original size
                    wand.size = (newsizex, newsizey)
                    wand.set_border_color(pw)
                    wand.add_border(pw, deltax, deltay)
            else:
-                if SCRWIDTH < 20000 and SCRHEIGHT < 20000:
+                if SCRWIDTH < MAX_SCREEN_SIZE and SCRHEIGHT < MAX_SCREEN_SIZE:
                    wand.size = (SCRWIDTH, SCRHEIGHT)
            if not self.opts.dont_sharpen:
--- a/src/calibre/ebooks/epub/output.py
+++ b/src/calibre/ebooks/epub/output.py
@ -229,7 +229,10 @@ class EPUBOutput(OutputFormatPlugin):
            if opts.extract_to is not None:
                from calibre.utils.zipfile import ZipFile
                if os.path.exists(opts.extract_to):
-                    shutil.rmtree(opts.extract_to)
+                    if os.path.isdir(opts.extract_to):
                        shutil.rmtree(opts.extract_to)
                    else:
                        os.remove(opts.extract_to)
                os.mkdir(opts.extract_to)
                with ZipFile(output_path) as zf:
                    zf.extractall(path=opts.extract_to)
--- a/src/calibre/ebooks/html/input.py
+++ b/src/calibre/ebooks/html/input.py
@ -148,7 +148,11 @@ class HTMLFile(object):
                url = match.group(i)
                if url:
                    break
-            link = self.resolve(url)
+            try:
                link = self.resolve(url)
            except ValueError:
                # Unparseable URL, ignore
                continue
            if link not in self.links:
                self.links.append(link)
--- a/src/calibre/ebooks/metadata/sources/amazon.py
+++ b/src/calibre/ebooks/metadata/sources/amazon.py
@ -16,7 +16,8 @@ from lxml.html import tostring
 from calibre import as_unicode
 from calibre.ebooks.metadata import check_isbn
-from calibre.ebooks.metadata.sources.base import Source, Option
+from calibre.ebooks.metadata.sources.base import (Source, Option, fixcase,
        fixauthors)
 from calibre.utils.cleantext import clean_ascii_chars
 from calibre.ebooks.chardet import xml_to_unicode
 from calibre.ebooks.metadata.book.base import Metadata
@ -509,6 +510,15 @@ class Amazon(Source):
        return domain
    def clean_downloaded_metadata(self, mi):
        if mi.title and self.domain in ('com', 'uk'):
            mi.title = fixcase(mi.title)
        mi.authors = fixauthors(mi.authors)
        if self.domain in ('com', 'uk'):
            mi.tags = list(map(fixcase, mi.tags))
        mi.isbn = check_isbn(mi.isbn)
    def create_query(self, log, title=None, authors=None, identifiers={}, # {{{
            domain=None):
        if domain is None:
--- a/src/calibre/ebooks/metadata/toc.py
+++ b/src/calibre/ebooks/metadata/toc.py
@ -31,7 +31,7 @@ class TOC(list):
    def __init__(self, href=None, fragment=None, text=None, parent=None, play_order=0,
                 base_path=os.getcwd(), type='unknown', author=None,
-                 description=None):
+                 description=None, toc_thumbnail=None):
        self.href = href
        self.fragment = fragment
        if not self.fragment:
@ -43,6 +43,7 @@ class TOC(list):
        self.type = type
        self.author = author
        self.description = description
        self.toc_thumbnail = toc_thumbnail
    def __str__(self):
        lines = ['TOC: %s#%s'%(self.href, self.fragment)]
@ -72,12 +73,12 @@ class TOC(list):
        entry.parent = None
    def add_item(self, href, fragment, text, play_order=None, type='unknown',
-            author=None, description=None):
+            author=None, description=None, toc_thumbnail=None):
        if play_order is None:
            play_order = (self[-1].play_order if len(self) else self.play_order) + 1
        self.append(TOC(href=href, fragment=fragment, text=text, parent=self,
                        base_path=self.base_path, play_order=play_order,
-                        type=type, author=author, description=description))
+                        type=type, author=author, description=description, toc_thumbnail=toc_thumbnail))
        return self[-1]
    def top_level_items(self):
@ -269,6 +270,9 @@ class TOC(list):
            if desc:
                desc = re.sub(r'\s+', ' ', desc)
                elem.append(C.meta(desc, name='description'))
            idx = getattr(np, 'toc_thumbnail', None)
            if idx:
                elem.append(C.meta(idx, name='toc_thumbnail'))   
            parent.append(elem)
            for np2 in np:
                navpoint(elem, np2)
--- a/src/calibre/ebooks/mobi/debug.py
+++ b/src/calibre/ebooks/mobi/debug.py
@ -656,11 +656,11 @@ class Tag(object): # {{{
                        ' image record associated with this article',
                        'image_index'),
                    70 : ('Description offset in cncx', 'desc_offset'),
-                    71 : ('Image attribution offset in cncx',
+                    71 : ('Author offset in cncx', 'author_offset'),
                        'image_attr_offset'),
                    72 : ('Image caption offset in cncx',
                        'image_caption_offset'),
-                    73 : ('Author offset in cncx', 'author_offset'),
+                    73 : ('Image attribution offset in cncx',
                        'image_attr_offset'),
            },
            'chapter_with_subchapters' : {
--- a/src/calibre/ebooks/mobi/reader.py
+++ b/src/calibre/ebooks/mobi/reader.py
@ -973,7 +973,8 @@ class MobiReader(object):
                continue
            processed_records.append(i)
            data  = self.sections[i][0]
-            if data[:4] in (b'FLIS', b'FCIS', b'SRCS', b'\xe9\x8e\r\n'):
+            if data[:4] in {b'FLIS', b'FCIS', b'SRCS', b'\xe9\x8e\r\n',
                    b'RESC', b'BOUN', b'FDST', b'DATP'}:
                # A FLIS, FCIS, SRCS or EOF record, ignore
                continue
            buf = cStringIO.StringIO(data)
--- a/src/calibre/ebooks/mobi/writer2/indexer.py
+++ b/src/calibre/ebooks/mobi/writer2/indexer.py
@ -136,7 +136,8 @@ class IndexEntry(object):
            'last_child_index': 23,
            'image_index': 69,
            'desc_offset': 70,
-            'author_offset': 73,
+            'author_offset': 71,
    }
    RTAG_MAP = {v:k for k, v in TAG_VALUES.iteritems()}
@ -754,6 +755,13 @@ class Indexer(object): # {{{
                normalized_articles.append(article)
                article.author_offset = self.cncx[art.author]
                article.desc_offset = self.cncx[art.description]
                if getattr(art, 'toc_thumbnail', None) is not None:
                    try:
                        ii = self.serializer.images[art.toc_thumbnail] - 1
                        if ii > -1:
                            article.image_index = ii
                    except KeyError:
                        pass # Image not found in serializer
            if normalized_articles:
                normalized_articles.sort(key=lambda x:x.offset)
--- a/src/calibre/ebooks/mobi/writer2/main.py
+++ b/src/calibre/ebooks/mobi/writer2/main.py
@ -161,7 +161,7 @@ class MobiWriter(object):
        index = 1
        mh_href = None
-        if 'masthead' in oeb.guide:
+        if 'masthead' in oeb.guide and oeb.guide['masthead'].href:
            mh_href = oeb.guide['masthead'].href
            self.image_records.append(None)
            index += 1
--- a/src/calibre/ebooks/mobi/writer2/serializer.py
+++ b/src/calibre/ebooks/mobi/writer2/serializer.py
@ -178,7 +178,11 @@ class Serializer(object):
        at the end.
        '''
        hrefs = self.oeb.manifest.hrefs
-        path, frag = urldefrag(urlnormalize(href))
+        try:
            path, frag = urldefrag(urlnormalize(href))
        except ValueError:
            # Unparseable URL
            return False
        if path and base:
            path = base.abshref(path)
        if path and path not in hrefs:
--- a/src/calibre/ebooks/oeb/base.py
+++ b/src/calibre/ebooks/oeb/base.py
@ -16,15 +16,13 @@ from urllib import unquote as urlunquote
 from lxml import etree, html
 from calibre.constants import filesystem_encoding, __version__
 from calibre.translations.dynamic import translate
-from calibre.ebooks.chardet import xml_to_unicode, strip_encoding_declarations
+from calibre.ebooks.chardet import xml_to_unicode
 from calibre.ebooks.oeb.entitydefs import ENTITYDEFS
 from calibre.ebooks.conversion.preprocess import CSSPreProcessor
-from calibre import isbytestring, as_unicode, get_types_map
+from calibre import (isbytestring, as_unicode, get_types_map)
-
+from calibre.ebooks.oeb.parse_utils import (barename, XHTML_NS, RECOVER_PARSER,
-RECOVER_PARSER = etree.XMLParser(recover=True, no_network=True)
+        namespace, XHTML, parse_html, NotHTML)
 XML_NS       = 'http://www.w3.org/XML/1998/namespace'
 XHTML_NS     = 'http://www.w3.org/1999/xhtml'
 OEB_DOC_NS   = 'http://openebook.org/namespaces/oeb-document/1.0/'
 OPF1_NS      = 'http://openebook.org/namespaces/oeb-package/1.0/'
 OPF2_NS      = 'http://www.idpf.org/2007/opf'
@ -55,9 +53,6 @@ OPF2_NSMAP   = {'opf': OPF2_NS, 'dc': DC11_NS, 'dcterms': DCTERMS_NS,
 def XML(name):
    return '{%s}%s' % (XML_NS, name)
 def XHTML(name):
    return '{%s}%s' % (XHTML_NS, name)
 def OPF(name):
    return '{%s}%s' % (OPF2_NS, name)
@ -279,22 +274,11 @@ PREFIXNAME_RE = re.compile(r'^[^:]+[:][^:]+')
 XMLDECL_RE    = re.compile(r'^\s*<[?]xml.*?[?]>')
 CSSURL_RE     = re.compile(r'''url[(](?P<q>["']?)(?P<url>[^)]+)(?P=q)[)]''')
 def element(parent, *args, **kwargs):
    if parent is not None:
        return etree.SubElement(parent, *args, **kwargs)
    return etree.Element(*args, **kwargs)
 def namespace(name):
    if '}' in name:
        return name.split('}', 1)[0][1:]
    return ''
 def barename(name):
    if '}' in name:
        return name.split('}', 1)[1]
    return name
 def prefixname(name, nsrmap):
    if not isqname(name):
        return name
@ -373,25 +357,6 @@ def urlnormalize(href):
    parts = (urlquote(part) for part in parts)
    return urlunparse(parts)
 def merge_multiple_html_heads_and_bodies(root, log=None):
    heads, bodies = xpath(root, '//h:head'), xpath(root, '//h:body')
    if not (len(heads) > 1 or len(bodies) > 1): return root
    for child in root: root.remove(child)
    head = root.makeelement(XHTML('head'))
    body = root.makeelement(XHTML('body'))
    for h in heads:
        for x in h:
            head.append(x)
    for b in bodies:
        for x in b:
            body.append(x)
    map(root.append, (head, body))
    if log is not None:
        log.warn('Merging multiple <head> and <body> sections')
    return root
 class DummyHandler(logging.Handler):
@ -418,10 +383,6 @@ class OEBError(Exception):
    """Generic OEB-processing error."""
    pass
 class NotHTML(OEBError):
    '''Raised when a file that should be HTML (as per manifest) is not'''
    pass
 class NullContainer(object):
    """An empty container.
@ -801,7 +762,6 @@ class Manifest(object):
        """
        NUM_RE = re.compile('^(.*)([0-9][0-9.]*)(?=[.]|$)')
        META_XP = XPath('/h:html/h:head/h:meta[@http-equiv="Content-Type"]')
        def __init__(self, oeb, id, href, media_type,
                     fallback=None, loader=str, data=None):
@ -830,244 +790,17 @@ class Manifest(object):
                return None
            return etree.fromstring(data, parser=RECOVER_PARSER)
        def clean_word_doc(self, data):
            prefixes = []
            for match in re.finditer(r'xmlns:(\S+?)=".*?microsoft.*?"', data):
                prefixes.append(match.group(1))
            if prefixes:
                self.oeb.log.warn('Found microsoft markup, cleaning...')
                # Remove empty tags as they are not rendered by browsers
                # but can become renderable HTML tags like <p/> if the
                # document is parsed by an HTML parser
                pat = re.compile(
                        r'<(%s):([a-zA-Z0-9]+)[^>/]*?></\1:\2>'%('|'.join(prefixes)),
                        re.DOTALL)
                data = pat.sub('', data)
                pat = re.compile(
                        r'<(%s):([a-zA-Z0-9]+)[^>/]*?/>'%('|'.join(prefixes)))
                data = pat.sub('', data)
            return data
        def _parse_xhtml(self, data):
            orig_data = data
-            self.oeb.log.debug('Parsing', self.href, '...')
+            fname = urlunquote(self.href)
-            # Convert to Unicode and normalize line endings
+            self.oeb.log.debug('Parsing', fname, '...')
            data = self.oeb.decode(data)
            data = strip_encoding_declarations(data)
            data = self.oeb.html_preprocessor(data)
            # There could be null bytes in data if it had &#0; entities in it
            data = data.replace('\0', '')
            # Remove DOCTYPE declaration as it messes up parsing
            # In particular, it causes tostring to insert xmlns
            # declarations, which messes up the coercing logic
            idx = data.find('<html')
            if idx == -1:
                idx = data.find('<HTML')
            if idx > -1:
                pre = data[:idx]
                data = data[idx:]
                if '<!DOCTYPE' in pre:
                    user_entities = {}
                    for match in re.finditer(r'<!ENTITY\s+(\S+)\s+([^>]+)', pre):
                        val = match.group(2)
                        if val.startswith('"') and val.endswith('"'):
                            val = val[1:-1]
                        user_entities[match.group(1)] = val
                    if user_entities:
                        pat = re.compile(r'&(%s);'%('|'.join(user_entities.keys())))
                        data = pat.sub(lambda m:user_entities[m.group(1)], data)
            # Setting huge_tree=True causes crashes in windows with large files
            parser = etree.XMLParser(no_network=True)
            # Try with more & more drastic measures to parse
            def first_pass(data):
                try:
                    data = etree.fromstring(data, parser=parser)
                except etree.XMLSyntaxError as err:
                    self.oeb.log.debug('Initial parse failed, using more'
                            ' forgiving parsers')
                    repl = lambda m: ENTITYDEFS.get(m.group(1), m.group(0))
                    data = ENTITY_RE.sub(repl, data)
                    try:
                        data = etree.fromstring(data, parser=parser)
                    except etree.XMLSyntaxError as err:
                        self.oeb.logger.warn('Parsing file %r as HTML' % self.href)
                        if err.args and err.args[0].startswith('Excessive depth'):
                            from calibre.utils.soupparser import fromstring
                            data = fromstring(data)
                        else:
                            data = html.fromstring(data)
                        data.attrib.pop('xmlns', None)
                        for elem in data.iter(tag=etree.Comment):
                            if elem.text:
                                elem.text = elem.text.strip('-')
                        data = etree.tostring(data, encoding=unicode)
                        try:
                            data = etree.fromstring(data, parser=parser)
                        except etree.XMLSyntaxError:
                            data = etree.fromstring(data, parser=RECOVER_PARSER)
                return data
            try:
-                data = self.clean_word_doc(data)
+                data = parse_html(data, log=self.oeb.log,
-            except:
+                        decoder=self.oeb.decode,
-                pass
+                        preprocessor=self.oeb.html_preprocessor,
-            data = first_pass(data)
+                        filename=fname, non_html_file_tags={'ncx'})
-
+            except NotHTML:
-            if data.tag == 'HTML':
+                return self._parse_xml(orig_data)
                # Lower case all tag and attribute names
                data.tag = data.tag.lower()
                for x in data.iterdescendants():
                    try:
                        x.tag = x.tag.lower()
                        for key, val in list(x.attrib.iteritems()):
                            del x.attrib[key]
                            key = key.lower()
                            x.attrib[key] = val
                    except:
                        pass
            # Handle weird (non-HTML/fragment) files
            if barename(data.tag) != 'html':
                if barename(data.tag) == 'ncx':
                    return self._parse_xml(orig_data)
                self.oeb.log.warn('File %r does not appear to be (X)HTML'%self.href)
                nroot = etree.fromstring('<html></html>')
                has_body = False
                for child in list(data):
                    if isinstance(child.tag, (unicode, str)) and barename(child.tag) == 'body':
                        has_body = True
                        break
                parent = nroot
                if not has_body:
                    self.oeb.log.warn('File %r appears to be a HTML fragment'%self.href)
                    nroot = etree.fromstring('<html><body/></html>')
                    parent = nroot[0]
                for child in list(data.iter()):
                    oparent = child.getparent()
                    if oparent is not None:
                        oparent.remove(child)
                    parent.append(child)
                data = nroot
            # Force into the XHTML namespace
            if not namespace(data.tag):
                self.oeb.log.warn('Forcing', self.href, 'into XHTML namespace')
                data.attrib['xmlns'] = XHTML_NS
                data = etree.tostring(data, encoding=unicode)
                try:
                    data = etree.fromstring(data, parser=parser)
                except:
                    data = data.replace(':=', '=').replace(':>', '>')
                    data = data.replace('<http:/>', '')
                    try:
                        data = etree.fromstring(data, parser=parser)
                    except etree.XMLSyntaxError:
                        self.oeb.logger.warn('Stripping comments from %s'%
                                self.href)
                        data = re.compile(r'<!--.*?-->', re.DOTALL).sub('',
                                data)
                        data = data.replace(
                            "<?xml version='1.0' encoding='utf-8'?><o:p></o:p>",
                            '')
                        data = data.replace("<?xml version='1.0' encoding='utf-8'??>", '')
                        try:
                            data = etree.fromstring(data,
                                    parser=RECOVER_PARSER)
                        except etree.XMLSyntaxError:
                            self.oeb.logger.warn('Stripping meta tags from %s'%
                                self.href)
                            data = re.sub(r'<meta\s+[^>]+?>', '', data)
                            data = etree.fromstring(data, parser=RECOVER_PARSER)
            elif namespace(data.tag) != XHTML_NS:
                # OEB_DOC_NS, but possibly others
                ns = namespace(data.tag)
                attrib = dict(data.attrib)
                nroot = etree.Element(XHTML('html'),
                    nsmap={None: XHTML_NS}, attrib=attrib)
                for elem in data.iterdescendants():
                    if isinstance(elem.tag, basestring) and \
                       namespace(elem.tag) == ns:
                        elem.tag = XHTML(barename(elem.tag))
                for elem in data:
                    nroot.append(elem)
                data = nroot
            data = merge_multiple_html_heads_and_bodies(data, self.oeb.logger)
            # Ensure has a <head/>
            head = xpath(data, '/h:html/h:head')
            head = head[0] if head else None
            if head is None:
                self.oeb.logger.warn(
                    'File %r missing <head/> element' % self.href)
                head = etree.Element(XHTML('head'))
                data.insert(0, head)
                title = etree.SubElement(head, XHTML('title'))
                title.text = self.oeb.translate(__('Unknown'))
            elif not xpath(data, '/h:html/h:head/h:title'):
                self.oeb.logger.warn(
                    'File %r missing <title/> element' % self.href)
                title = etree.SubElement(head, XHTML('title'))
                title.text = self.oeb.translate(__('Unknown'))
            # Remove any encoding-specifying <meta/> elements
            for meta in self.META_XP(data):
                meta.getparent().remove(meta)
            etree.SubElement(head, XHTML('meta'),
                attrib={'http-equiv': 'Content-Type',
                        'content': '%s; charset=utf-8' % XHTML_NS})
            # Ensure has a <body/>
            if not xpath(data, '/h:html/h:body'):
                body = xpath(data, '//h:body')
                if body:
                    body = body[0]
                    body.getparent().remove(body)
                    data.append(body)
                else:
                    self.oeb.logger.warn(
                        'File %r missing <body/> element' % self.href)
                    etree.SubElement(data, XHTML('body'))
            # Remove microsoft office markup
            r = [x for x in data.iterdescendants(etree.Element) if 'microsoft-com' in x.tag]
            for x in r:
                x.tag = XHTML('span')
            # Remove lang redefinition inserted by the amazing Microsoft Word!
            body = xpath(data, '/h:html/h:body')[0]
            for key in list(body.attrib.keys()):
                if key == 'lang' or key.endswith('}lang'):
                    body.attrib.pop(key)
            def remove_elem(a):
                p = a.getparent()
                idx = p.index(a) -1
                p.remove(a)
                if a.tail:
                    if idx <= 0:
                        if p.text is None:
                            p.text = ''
                        p.text += a.tail
                    else:
                        if p[idx].tail is None:
                            p[idx].tail = ''
                        p[idx].tail += a.tail
            # Remove hyperlinks with no content as they cause rendering
            # artifacts in browser based renderers
            # Also remove empty <b>, <u> and <i> tags
            for a in xpath(data, '//h:a[@href]|//h:i|//h:b|//h:u'):
                if a.get('id', None) is None and a.get('name', None) is None \
                        and len(a) == 0 and not a.text:
                    remove_elem(a)
            # Convert <br>s with content into paragraphs as ADE can't handle
            # them
            for br in xpath(data, '//h:br'):
                if len(br) > 0 or br.text:
                    br.tag = XHTML('div')
            return data
        def _parse_txt(self, data):
@ -1629,9 +1362,10 @@ class TOC(object):
    :attr:`id`: Option unique identifier for this node.
    :attr:`author`: Optional author attribution for periodicals <mbp:>
    :attr:`description`: Optional description attribute for periodicals <mbp:>
    :attr:`toc_thumbnail`: Optional toc thumbnail image
    """
    def __init__(self, title=None, href=None, klass=None, id=None,
-            play_order=None, author=None, description=None):
+            play_order=None, author=None, description=None, toc_thumbnail=None):
        self.title = title
        self.href = urlnormalize(href) if href else href
        self.klass = klass
@ -1643,10 +1377,11 @@ class TOC(object):
        self.play_order = play_order
        self.author = author
        self.description = description
        self.toc_thumbnail = toc_thumbnail
-    def add(self, title, href, klass=None, id=None, play_order=0, author=None, description=None):
+    def add(self, title, href, klass=None, id=None, play_order=0, author=None, description=None, toc_thumbnail=None):
        """Create and return a new sub-node of this node."""
-        node = TOC(title, href, klass, id, play_order, author, description)
+        node = TOC(title, href, klass, id, play_order, author, description, toc_thumbnail)
        self.nodes.append(node)
        return node
--- a/src/calibre/ebooks/oeb/display/cfi.coffee
+++ b/src/calibre/ebooks/oeb/display/cfi.coffee
@ -0,0 +1,225 @@
 #!/usr/bin/env coffee
 # vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai
 ###
 Copyright 2011, Kovid Goyal <kovid@kovidgoyal.net>
 Released under the GPLv3 License
 ###
 #
 log = (error) ->
    if error
        if window?.console?.log
            window.console.log(error)
        else if process?.stdout?.write
            process.stdout.write(error + '\n')
 # CFI escaping {{{
 escape_for_cfi = (raw) ->
    if raw
        for c in ['^', '[', ']', ',', '(', ')', ';', '~', '@', '-', '!']
            raw = raw.replace(c, '^'+c)
    raw
 unescape_from_cfi = (raw) ->
    ans = raw
    if raw
        dropped = false
        ans = []
        for c in raw
            if not dropped and c == '^'
                dropped = true
                continue
            dropped = false
            ans.push(c)
        ans = ans.join('')
    ans
 # }}}
 fstr = (d) -> # {{{
    # Convert a timestamp floating point number to a string
    ans = ""
    if ( d < 0 )
        ans = "-"
        d = -d
    n = Math.floor(d)
    ans += n
    n = Math.round((d-n)*100)
    if( n != 0 )
        ans += "."
        ans += if (n % 10 == 0) then (n/10) else n
    ans
 # }}}
 class CanonicalFragmentIdentifier
    # This class is a namespace to expose CFI functions via the window.cfi
    # object
    constructor: () ->
    encode: (doc, node, offset, tail) -> # {{{
        cfi = tail or ""
        # Handle the offset, if any
        switch node.nodeType
            when 1 # Element node
                if typeoff(offset) == 'number'
                    node = node.childNodes.item(offset)
            when 3, 4, 5, 6 # Text/entity/CDATA node
                offset or= 0
                while true
                    p = node.previousSibling
                    if (p?.nodeType not in [3, 4, 5, 6])
                        break
                    offset += p.nodeValue.length
                    node = p
                cfi = ":" + offset + cfi
            else # Not handled
                log("Offsets for nodes of type #{ node.nodeType } are not handled")
        # Construct the path to node from root
        until node == doc
            p = node.parentNode
            if not p
                if node.nodeType == 9 # Document node (iframe)
                    win = node.defaultView
                    if win.frameElement
                        node = win.frameElement
                        cfi = "!" + cfi
                        continue
                break
            # Increase index by the length of all previous sibling text nodes
            index = 0
            child = p.firstChild
            while true
                index |= 1
                if child.nodeType in [1, 7]
                    index++
                if child == node
                    break
                child = child.nextSibling
            # Add id assertions for robustness where possible
            id = node.getAttribute?('id')
            idspec = if id then "[#{ escape_for_cfi(id) }]" else ''
            cfi = '/' + index + idspec + cfi
            node = p
        cfi
    # }}}
    decode: (cfi, doc=window?.document) -> # {{{
        simple_node_regex = ///
            ^/(\d+)          # The node count
              (\[[^\]]*\])?  # The optional id assertion
        ///
        error = null
        node = doc
        until cfi.length <= 0 or error
            if ( (r = cfi.match(simple_node_regex)) is not null ) # Path step
                target = parseInt(r[1])
                assertion = r[2]
                if assertion
                    assertion = unescape_from_cfi(assertion.slice(1, assertion.length-1))
                index = 0
                child = node.firstChild
                while true
                    if not child
                        if assertion # Try to use the assertion to find the node
                            child = doc.getElementById(assertion)
                            if child
                                node = child
                        if not child
                            error = "No matching child found for CFI: " + cfi
                        break
                    index |= 1 # Increment index by 1 if it is even
                    if child.nodeType in [1, 7] # We have an element or a PI
                        index++
                    if ( index == target )
                        cfi = cfi.substr(r[0].length)
                        node = child
                        break
                    child = child.nextSibling
            else if cfi[0] == '!' # Indirection
                if node.contentDocument
                    node = node.contentDocument
                    cfi = cfi.substr(1)
                else
                    error = "Cannot reference #{ node.nodeName }'s content:" + cfi
            else
                break
        if error
            log(error)
            return null
        point = {}
        error = null
        point
    # }}}
    at: (x, y, doc=window?.document) -> # {{{
        cdoc = doc
        target = null
        cwin = cdoc.defaultView
        tail = ''
        offset = null
        name = null
        # Drill down into iframes, etc.
        while true
            target = cdoc.elementFromPoint x, y
            if not target or target.localName == 'html'
                log("No element at (#{ x }, #{ y })")
                return null
            name = target.localName
            if name not in ['iframe', 'embed', 'object']
                break
            cd = target.contentDocument
            if not cd
                break
            x = x + cwin.pageXOffset - target.offsetLeft
            y = y + cwin.pageYOffset - target.offsetTop
            cdoc = cd
            cwin = cdoc.defaultView
        target.normalize()
        if name in ['audio', 'video']
            tail = "~" + fstr target.currentTime
        if name in ['img', 'video']
            px = ((x + cwin.scrollX - target.offsetLeft)*100)/target.offsetWidth
            py = ((y + cwin.scrollY - target.offsetTop)*100)/target.offsetHeight
            tail = "#{ tail }@#{ fstr px },#{ fstr py }"
        else if name != 'audio'
            if cdoc.caretRangeFromPoint # WebKit
                range = cdoc.caretRangeFromPoint(x, y)
                if range
                    target = range.startContainer
                    offset = range.startOffset
            else
                # TODO: implement a span bisection algorithm for UAs
                # without caretRangeFromPoint (Gecko, IE)
        this.encode(doc, target, offset, tail)
    # }}}
 if window?
    window.cfi = new CanonicalFragmentIdentifier()
 else if process?
    # Some debugging code goes here to be run with the coffee interpreter
    cfi = new CanonicalFragmentIdentifier()
    t = 'a^!,1'
    log(t)
    log(escape_for_cfi(t))
    log(unescape_from_cfi(escape_for_cfi(t)))
--- a/src/calibre/ebooks/oeb/display/test/cfi-test.coffee
+++ b/src/calibre/ebooks/oeb/display/test/cfi-test.coffee
@ -0,0 +1,24 @@
 #!/usr/bin/env coffee
 # vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai
 ###
 Copyright 2011, Kovid Goyal <kovid@kovidgoyal.net>
 Released under the GPLv3 License
 ###
 viewport_top = (node) ->
    $(node).offset().top - window.pageYOffset
 viewport_left = (node) ->
    $(node).offset().left - window.pageXOffset
 window.onload = ->
    h1 = document.getElementsByTagName('h1')[0]
    x = h1.scrollLeft + 150
    y = viewport_top(h1) + h1.offsetHeight/2
    e = document.elementFromPoint x, y
    if e.getAttribute('id') != 'first-h1'
        alert 'Failed to find top h1'
        return
    alert window.cfi.at x, y
--- a/src/calibre/ebooks/oeb/display/test/test.html
+++ b/src/calibre/ebooks/oeb/display/test/test.html
@ -0,0 +1,14 @@
 <!DOCTYPE html>
 <html>
    <head>
        <title>Testing CFI functionality</title>
        <script type="text/javascript" src="cfi.js"></script>
        <script type="text/javascript" src="jquery.js"></script>
        <script type="text/javascript" src="cfi-test.js"></script>
    </head>
    <body>
        <h1 id="first-h1" style="border: solid 1px red">Testing CFI functionality</h1>
    </body>
 </html>
--- a/src/calibre/ebooks/oeb/display/test/test.py
+++ b/src/calibre/ebooks/oeb/display/test/test.py
@ -0,0 +1,26 @@
 #!/usr/bin/env python
 # vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai
 from __future__ import (unicode_literals, division, absolute_import,
                        print_function)
 __license__   = 'GPL v3'
 __copyright__ = '2011, Kovid Goyal <kovid@kovidgoyal.net>'
 __docformat__ = 'restructuredtext en'
 import os
 try:
    from calibre.utils.coffeescript import serve
 except ImportError:
    import init_calibre
    if False: init_calibre, serve
    from calibre.utils.coffeescript import serve
 def run_devel_server():
    os.chdir(os.path.dirname(__file__))
    serve(['../cfi.coffee', 'cfi-test.coffee'])
 if __name__ == '__main__':
    run_devel_server()
--- a/src/calibre/ebooks/oeb/entitydefs.py
+++ b/src/calibre/ebooks/oeb/entitydefs.py
@ -1,256 +0,0 @@
 """
 Replacement for htmlentitydefs which uses purely numeric entities.
 """
 __license__   = 'GPL v3'
 __copyright__ = '2008, Marshall T. Vandegrift <llasram@gmail.com>'
 ENTITYDEFS = \
    {'AElig': '&#198;',
     'Aacute': '&#193;',
     'Acirc': '&#194;',
     'Agrave': '&#192;',
     'Alpha': '&#913;',
     'Aring': '&#197;',
     'Atilde': '&#195;',
     'Auml': '&#196;',
     'Beta': '&#914;',
     'Ccedil': '&#199;',
     'Chi': '&#935;',
     'Dagger': '&#8225;',
     'Delta': '&#916;',
     'ETH': '&#208;',
     'Eacute': '&#201;',
     'Ecirc': '&#202;',
     'Egrave': '&#200;',
     'Epsilon': '&#917;',
     'Eta': '&#919;',
     'Euml': '&#203;',
     'Gamma': '&#915;',
     'Iacute': '&#205;',
     'Icirc': '&#206;',
     'Igrave': '&#204;',
     'Iota': '&#921;',
     'Iuml': '&#207;',
     'Kappa': '&#922;',
     'Lambda': '&#923;',
     'Mu': '&#924;',
     'Ntilde': '&#209;',
     'Nu': '&#925;',
     'OElig': '&#338;',
     'Oacute': '&#211;',
     'Ocirc': '&#212;',
     'Ograve': '&#210;',
     'Omega': '&#937;',
     'Omicron': '&#927;',
     'Oslash': '&#216;',
     'Otilde': '&#213;',
     'Ouml': '&#214;',
     'Phi': '&#934;',
     'Pi': '&#928;',
     'Prime': '&#8243;',
     'Psi': '&#936;',
     'Rho': '&#929;',
     'Scaron': '&#352;',
     'Sigma': '&#931;',
     'THORN': '&#222;',
     'Tau': '&#932;',
     'Theta': '&#920;',
     'Uacute': '&#218;',
     'Ucirc': '&#219;',
     'Ugrave': '&#217;',
     'Upsilon': '&#933;',
     'Uuml': '&#220;',
     'Xi': '&#926;',
     'Yacute': '&#221;',
     'Yuml': '&#376;',
     'Zeta': '&#918;',
     'aacute': '&#225;',
     'acirc': '&#226;',
     'acute': '&#180;',
     'aelig': '&#230;',
     'agrave': '&#224;',
     'alefsym': '&#8501;',
     'alpha': '&#945;',
     'and': '&#8743;',
     'ang': '&#8736;',
     'aring': '&#229;',
     'asymp': '&#8776;',
     'atilde': '&#227;',
     'auml': '&#228;',
     'bdquo': '&#8222;',
     'beta': '&#946;',
     'brvbar': '&#166;',
     'bull': '&#8226;',
     'cap': '&#8745;',
     'ccedil': '&#231;',
     'cedil': '&#184;',
     'cent': '&#162;',
     'chi': '&#967;',
     'circ': '&#710;',
     'clubs': '&#9827;',
     'cong': '&#8773;',
     'copy': '&#169;',
     'crarr': '&#8629;',
     'cup': '&#8746;',
     'curren': '&#164;',
     'dArr': '&#8659;',
     'dagger': '&#8224;',
     'darr': '&#8595;',
     'deg': '&#176;',
     'delta': '&#948;',
     'diams': '&#9830;',
     'divide': '&#247;',
     'eacute': '&#233;',
     'ecirc': '&#234;',
     'egrave': '&#232;',
     'empty': '&#8709;',
     'emsp': '&#8195;',
     'ensp': '&#8194;',
     'epsilon': '&#949;',
     'equiv': '&#8801;',
     'eta': '&#951;',
     'eth': '&#240;',
     'euml': '&#235;',
     'euro': '&#8364;',
     'exist': '&#8707;',
     'fnof': '&#402;',
     'forall': '&#8704;',
     'frac12': '&#189;',
     'frac14': '&#188;',
     'frac34': '&#190;',
     'frasl': '&#8260;',
     'gamma': '&#947;',
     'ge': '&#8805;',
     'hArr': '&#8660;',
     'harr': '&#8596;',
     'hearts': '&#9829;',
     'hellip': '&#8230;',
     'iacute': '&#237;',
     'icirc': '&#238;',
     'iexcl': '&#161;',
     'igrave': '&#236;',
     'image': '&#8465;',
     'infin': '&#8734;',
     'int': '&#8747;',
     'iota': '&#953;',
     'iquest': '&#191;',
     'isin': '&#8712;',
     'iuml': '&#239;',
     'kappa': '&#954;',
     'lArr': '&#8656;',
     'lambda': '&#955;',
     'lang': '&#9001;',
     'laquo': '&#171;',
     'larr': '&#8592;',
     'lceil': '&#8968;',
     'ldquo': '&#8220;',
     'le': '&#8804;',
     'lfloor': '&#8970;',
     'lowast': '&#8727;',
     'loz': '&#9674;',
     'lrm': '&#8206;',
     'lsaquo': '&#8249;',
     'lsquo': '&#8216;',
     'macr': '&#175;',
     'mdash': '&#8212;',
     'micro': '&#181;',
     'middot': '&#183;',
     'minus': '&#8722;',
     'mu': '&#956;',
     'nabla': '&#8711;',
     'nbsp': '&#160;',
     'ndash': '&#8211;',
     'ne': '&#8800;',
     'ni': '&#8715;',
     'not': '&#172;',
     'notin': '&#8713;',
     'nsub': '&#8836;',
     'ntilde': '&#241;',
     'nu': '&#957;',
     'oacute': '&#243;',
     'ocirc': '&#244;',
     'oelig': '&#339;',
     'ograve': '&#242;',
     'oline': '&#8254;',
     'omega': '&#969;',
     'omicron': '&#959;',
     'oplus': '&#8853;',
     'or': '&#8744;',
     'ordf': '&#170;',
     'ordm': '&#186;',
     'oslash': '&#248;',
     'otilde': '&#245;',
     'otimes': '&#8855;',
     'ouml': '&#246;',
     'para': '&#182;',
     'part': '&#8706;',
     'permil': '&#8240;',
     'perp': '&#8869;',
     'phi': '&#966;',
     'pi': '&#960;',
     'piv': '&#982;',
     'plusmn': '&#177;',
     'pound': '&#163;',
     'prime': '&#8242;',
     'prod': '&#8719;',
     'prop': '&#8733;',
     'psi': '&#968;',
     'rArr': '&#8658;',
     'radic': '&#8730;',
     'rang': '&#9002;',
     'raquo': '&#187;',
     'rarr': '&#8594;',
     'rceil': '&#8969;',
     'rdquo': '&#8221;',
     'real': '&#8476;',
     'reg': '&#174;',
     'rfloor': '&#8971;',
     'rho': '&#961;',
     'rlm': '&#8207;',
     'rsaquo': '&#8250;',
     'rsquo': '&#8217;',
     'sbquo': '&#8218;',
     'scaron': '&#353;',
     'sdot': '&#8901;',
     'sect': '&#167;',
     'shy': '&#173;',
     'sigma': '&#963;',
     'sigmaf': '&#962;',
     'sim': '&#8764;',
     'spades': '&#9824;',
     'sub': '&#8834;',
     'sube': '&#8838;',
     'sum': '&#8721;',
     'sup': '&#8835;',
     'sup1': '&#185;',
     'sup2': '&#178;',
     'sup3': '&#179;',
     'supe': '&#8839;',
     'szlig': '&#223;',
     'tau': '&#964;',
     'there4': '&#8756;',
     'theta': '&#952;',
     'thetasym': '&#977;',
     'thinsp': '&#8201;',
     'thorn': '&#254;',
     'tilde': '&#732;',
     'times': '&#215;',
     'trade': '&#8482;',
     'uArr': '&#8657;',
     'uacute': '&#250;',
     'uarr': '&#8593;',
     'ucirc': '&#251;',
     'ugrave': '&#249;',
     'uml': '&#168;',
     'upsih': '&#978;',
     'upsilon': '&#965;',
     'uuml': '&#252;',
     'weierp': '&#8472;',
     'xi': '&#958;',
     'yacute': '&#253;',
     'yen': '&#165;',
     'yuml': '&#255;',
     'zeta': '&#950;',
     'zwj': '&#8205;',
     'zwnj': '&#8204;'}
--- a/src/calibre/ebooks/oeb/parse_utils.py
+++ b/src/calibre/ebooks/oeb/parse_utils.py
@ -0,0 +1,347 @@
 #!/usr/bin/env python
 # vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai
 from __future__ import (unicode_literals, division, absolute_import,
                        print_function)
 __license__   = 'GPL v3'
 __copyright__ = '2011, Kovid Goyal <kovid@kovidgoyal.net>'
 __docformat__ = 'restructuredtext en'
 import re
 from lxml import etree, html
 from calibre import xml_replace_entities, force_unicode
 from calibre.constants import filesystem_encoding
 from calibre.ebooks.chardet import xml_to_unicode, strip_encoding_declarations
 RECOVER_PARSER = etree.XMLParser(recover=True, no_network=True)
 XHTML_NS     = 'http://www.w3.org/1999/xhtml'
 class NotHTML(Exception):
    def __init__(self, root_tag):
        Exception.__init__(self, 'Data is not HTML')
        self.root_tag = root_tag
 def barename(name):
    return name.rpartition('}')[-1]
 def namespace(name):
    if '}' in name:
        return name.split('}', 1)[0][1:]
    return ''
 def XHTML(name):
    return '{%s}%s' % (XHTML_NS, name)
 def xpath(elem, expr):
    return elem.xpath(expr, namespaces={'h':XHTML_NS})
 def XPath(expr):
    return etree.XPath(expr, namespaces={'h':XHTML_NS})
 META_XP = XPath('/h:html/h:head/h:meta[@http-equiv="Content-Type"]')
 def merge_multiple_html_heads_and_bodies(root, log=None):
    heads, bodies = xpath(root, '//h:head'), xpath(root, '//h:body')
    if not (len(heads) > 1 or len(bodies) > 1): return root
    for child in root: root.remove(child)
    head = root.makeelement(XHTML('head'))
    body = root.makeelement(XHTML('body'))
    for h in heads:
        for x in h:
            head.append(x)
    for b in bodies:
        for x in b:
            body.append(x)
    map(root.append, (head, body))
    if log is not None:
        log.warn('Merging multiple <head> and <body> sections')
    return root
 def _html5_parse(data):
    import html5lib
    data = html5lib.parse(data, treebuilder='lxml').getroot()
    html_ns = [ns for ns, val in data.nsmap.iteritems() if (val == XHTML_NS and
            ns is not None)]
    if html_ns:
        # html5lib causes the XHTML namespace to not
        # be set as the default namespace
        nsmap = dict(data.nsmap)
        nsmap[None] = XHTML_NS
        for x in html_ns:
            nsmap.pop(x)
        nroot = etree.Element(data.tag, nsmap=nsmap,
                attrib=dict(data.attrib))
        nroot.text = data.text
        nroot.tail = data.tail
        for child in data:
            nroot.append(child)
        data = nroot
    return data
 def _html4_parse(data, prefer_soup=False):
    if prefer_soup:
        from calibre.utils.soupparser import fromstring
        data = fromstring(data)
    else:
        data = html.fromstring(data)
    data.attrib.pop('xmlns', None)
    for elem in data.iter(tag=etree.Comment):
        if elem.text:
            elem.text = elem.text.strip('-')
    data = etree.tostring(data, encoding=unicode)
    # Setting huge_tree=True causes crashes in windows with large files
    parser = etree.XMLParser(no_network=True)
    try:
        data = etree.fromstring(data, parser=parser)
    except etree.XMLSyntaxError:
        data = etree.fromstring(data, parser=RECOVER_PARSER)
    return data
 def clean_word_doc(data, log):
    prefixes = []
    for match in re.finditer(r'xmlns:(\S+?)=".*?microsoft.*?"', data):
        prefixes.append(match.group(1))
    if prefixes:
        log.warn('Found microsoft markup, cleaning...')
        # Remove empty tags as they are not rendered by browsers
        # but can become renderable HTML tags like <p/> if the
        # document is parsed by an HTML parser
        pat = re.compile(
                r'<(%s):([a-zA-Z0-9]+)[^>/]*?></\1:\2>'%('|'.join(prefixes)),
                re.DOTALL)
        data = pat.sub('', data)
        pat = re.compile(
                r'<(%s):([a-zA-Z0-9]+)[^>/]*?/>'%('|'.join(prefixes)))
        data = pat.sub('', data)
    return data
 def parse_html(data, log=None, decoder=None, preprocessor=None,
        filename='<string>', non_html_file_tags=frozenset()):
    if log is None:
        from calibre.utils.logging import default_log
        log = default_log
    filename = force_unicode(filename, enc=filesystem_encoding)
    if not isinstance(data, unicode):
        if decoder is not None:
            data = decoder(data)
        else:
            data = xml_to_unicode(data)[0]
    data = strip_encoding_declarations(data)
    if preprocessor is not None:
        data = preprocessor(data)
    # There could be null bytes in data if it had &#0; entities in it
    data = data.replace('\0', '')
    # Remove DOCTYPE declaration as it messes up parsing
    # In particular, it causes tostring to insert xmlns
    # declarations, which messes up the coercing logic
    idx = data.find('<html')
    if idx == -1:
        idx = data.find('<HTML')
    if idx > -1:
        pre = data[:idx]
        data = data[idx:]
        if '<!DOCTYPE' in pre: # Handle user defined entities
            user_entities = {}
            for match in re.finditer(r'<!ENTITY\s+(\S+)\s+([^>]+)', pre):
                val = match.group(2)
                if val.startswith('"') and val.endswith('"'):
                    val = val[1:-1]
                user_entities[match.group(1)] = val
            if user_entities:
                pat = re.compile(r'&(%s);'%('|'.join(user_entities.keys())))
                data = pat.sub(lambda m:user_entities[m.group(1)], data)
    data = clean_word_doc(data, log)
    # Setting huge_tree=True causes crashes in windows with large files
    parser = etree.XMLParser(no_network=True)
    # Try with more & more drastic measures to parse
    try:
        data = etree.fromstring(data, parser=parser)
    except etree.XMLSyntaxError:
        log.debug('Initial parse failed, using more'
                ' forgiving parsers')
        data = xml_replace_entities(data)
        try:
            data = etree.fromstring(data, parser=parser)
        except etree.XMLSyntaxError:
            log.debug('Parsing %s as HTML' % filename)
            try:
                data = _html5_parse(data)
            except:
                log.exception(
                    'HTML 5 parsing failed, falling back to older parsers')
                data = _html4_parse(data)
    if data.tag == 'HTML':
        # Lower case all tag and attribute names
        data.tag = data.tag.lower()
        for x in data.iterdescendants():
            try:
                x.tag = x.tag.lower()
                for key, val in list(x.attrib.iteritems()):
                    del x.attrib[key]
                    key = key.lower()
                    x.attrib[key] = val
            except:
                pass
    if barename(data.tag) != 'html':
        if barename(data.tag) in non_html_file_tags:
            raise NotHTML(data.tag)
        log.warn('File %r does not appear to be (X)HTML'%filename)
        nroot = etree.fromstring('<html></html>')
        has_body = False
        for child in list(data):
            if isinstance(child.tag, (unicode, str)) and barename(child.tag) == 'body':
                has_body = True
                break
        parent = nroot
        if not has_body:
            log.warn('File %r appears to be a HTML fragment'%filename)
            nroot = etree.fromstring('<html><body/></html>')
            parent = nroot[0]
        for child in list(data.iter()):
            oparent = child.getparent()
            if oparent is not None:
                oparent.remove(child)
            parent.append(child)
        data = nroot
    # Force into the XHTML namespace
    if not namespace(data.tag):
        log.warn('Forcing', filename, 'into XHTML namespace')
        data.attrib['xmlns'] = XHTML_NS
        data = etree.tostring(data, encoding=unicode)
        try:
            data = etree.fromstring(data, parser=parser)
        except:
            data = data.replace(':=', '=').replace(':>', '>')
            data = data.replace('<http:/>', '')
            try:
                data = etree.fromstring(data, parser=parser)
            except etree.XMLSyntaxError:
                log.warn('Stripping comments from %s'%
                        filename)
                data = re.compile(r'<!--.*?-->', re.DOTALL).sub('',
                        data)
                data = data.replace(
                    "<?xml version='1.0' encoding='utf-8'?><o:p></o:p>",
                    '')
                data = data.replace("<?xml version='1.0' encoding='utf-8'??>", '')
                try:
                    data = etree.fromstring(data,
                            parser=RECOVER_PARSER)
                except etree.XMLSyntaxError:
                    log.warn('Stripping meta tags from %s'% filename)
                    data = re.sub(r'<meta\s+[^>]+?>', '', data)
                    data = etree.fromstring(data, parser=RECOVER_PARSER)
    elif namespace(data.tag) != XHTML_NS:
        # OEB_DOC_NS, but possibly others
        ns = namespace(data.tag)
        attrib = dict(data.attrib)
        nroot = etree.Element(XHTML('html'),
            nsmap={None: XHTML_NS}, attrib=attrib)
        for elem in data.iterdescendants():
            if isinstance(elem.tag, basestring) and \
                namespace(elem.tag) == ns:
                elem.tag = XHTML(barename(elem.tag))
        for elem in data:
            nroot.append(elem)
        data = nroot
    data = merge_multiple_html_heads_and_bodies(data, log)
    # Ensure has a <head/>
    head = xpath(data, '/h:html/h:head')
    head = head[0] if head else None
    if head is None:
        log.warn('File %s missing <head/> element' % filename)
        head = etree.Element(XHTML('head'))
        data.insert(0, head)
        title = etree.SubElement(head, XHTML('title'))
        title.text = _('Unknown')
    elif not xpath(data, '/h:html/h:head/h:title'):
        log.warn('File %s missing <title/> element' % filename)
        title = etree.SubElement(head, XHTML('title'))
        title.text = _('Unknown')
    # Remove any encoding-specifying <meta/> elements
    for meta in META_XP(data):
        meta.getparent().remove(meta)
    etree.SubElement(head, XHTML('meta'),
        attrib={'http-equiv': 'Content-Type',
                'content': '%s; charset=utf-8' % XHTML_NS})
    # Ensure has a <body/>
    if not xpath(data, '/h:html/h:body'):
        body = xpath(data, '//h:body')
        if body:
            body = body[0]
            body.getparent().remove(body)
            data.append(body)
        else:
            log.warn('File %s missing <body/> element' % filename)
            etree.SubElement(data, XHTML('body'))
    # Remove microsoft office markup
    r = [x for x in data.iterdescendants(etree.Element) if 'microsoft-com' in x.tag]
    for x in r:
        x.tag = XHTML('span')
    # Remove lang redefinition inserted by the amazing Microsoft Word!
    body = xpath(data, '/h:html/h:body')[0]
    for key in list(body.attrib.keys()):
        if key == 'lang' or key.endswith('}lang'):
            body.attrib.pop(key)
    def remove_elem(a):
        p = a.getparent()
        idx = p.index(a) -1
        p.remove(a)
        if a.tail:
            if idx <= 0:
                if p.text is None:
                    p.text = ''
                p.text += a.tail
            else:
                if p[idx].tail is None:
                    p[idx].tail = ''
                p[idx].tail += a.tail
    # Remove hyperlinks with no content as they cause rendering
    # artifacts in browser based renderers
    # Also remove empty <b>, <u> and <i> tags
    for a in xpath(data, '//h:a[@href]|//h:i|//h:b|//h:u'):
        if a.get('id', None) is None and a.get('name', None) is None \
                and len(a) == 0 and not a.text:
            remove_elem(a)
    # Convert <br>s with content into paragraphs as ADE can't handle
    # them
    for br in xpath(data, '//h:br'):
        if len(br) > 0 or br.text:
            br.tag = XHTML('div')
    # Remove any stray text in the <head> section and format it nicely
    data.text = '\n  '
    head = xpath(data, '//h:head')
    if head:
        head = head[0]
        head.text = '\n    '
        head.tail = '\n  '
        for child in head:
            child.tail = '\n    '
        child.tail = '\n  '
    return data
--- a/src/calibre/ebooks/oeb/reader.py
+++ b/src/calibre/ebooks/oeb/reader.py
@ -19,16 +19,15 @@ from calibre.ebooks.oeb.base import OPF1_NS, OPF2_NS, OPF2_NSMAP, DC11_NS, \
 from calibre.ebooks.oeb.base import OEB_DOCS, OEB_STYLES, OEB_IMAGES, \
    PAGE_MAP_MIME, JPEG_MIME, NCX_MIME, SVG_MIME
 from calibre.ebooks.oeb.base import XMLDECL_RE, COLLAPSE_RE, \
-    ENTITY_RE, MS_COVER_TYPE, iterlinks
+    MS_COVER_TYPE, iterlinks
 from calibre.ebooks.oeb.base import namespace, barename, XPath, xpath, \
                                    urlnormalize, BINARY_MIME, \
                                    OEBError, OEBBook, DirContainer
 from calibre.ebooks.oeb.writer import OEBWriter
 from calibre.ebooks.oeb.entitydefs import ENTITYDEFS
 from calibre.utils.localization import get_lang
 from calibre.ptempfile import TemporaryDirectory
 from calibre.constants import __appname__, __version__
-from calibre import guess_type
+from calibre import guess_type, xml_replace_entities
 __all__ = ['OEBReader']
@ -107,8 +106,7 @@ class OEBReader(object):
        try:
            opf = etree.fromstring(data)
        except etree.XMLSyntaxError:
-            repl = lambda m: ENTITYDEFS.get(m.group(1), m.group(0))
+            data = xml_replace_entities(data, encoding=None)
            data = ENTITY_RE.sub(repl, data)
            try:
                opf = etree.fromstring(data)
                self.logger.warn('OPF contains invalid HTML named entities')
@ -371,8 +369,15 @@ class OEBReader(object):
            else :
                description = None
            index_image = xpath(child,
                    'descendant::calibre:meta[@name = "toc_thumbnail"]')
            toc_thumbnail = (index_image[0].text if index_image else None)
            if not toc_thumbnail or not toc_thumbnail.strip():
                toc_thumbnail = None
            node = toc.add(title, href, id=id, klass=klass,
-                    play_order=po, description=description, author=author)
+                    play_order=po, description=description, author=author,
                           toc_thumbnail=toc_thumbnail)
            self._toc_from_navpoint(item, node, child)
--- a/src/calibre/ebooks/oeb/transforms/filenames.py
+++ b/src/calibre/ebooks/oeb/transforms/filenames.py
@ -159,15 +159,18 @@ class FlatFilenames(object): # {{{
                continue
            data = item.data
            isp = item.spine_position
            nhref = oeb.manifest.generate(href=nhref)[1]
            if isp is not None:
                oeb.spine.remove(item)
            oeb.manifest.remove(item)
            nitem = oeb.manifest.add(item.id, nhref, item.media_type, data=data,
                                     fallback=item.fallback)
            self.rename_map[item.href] = nhref
            self.renamed_items_map[nhref] = item
-            if item.spine_position is not None:
+            if isp is not None:
-                oeb.spine.insert(item.spine_position, nitem, item.linear)
+                oeb.spine.insert(isp, nitem, item.linear)
                oeb.spine.remove(item)
            oeb.manifest.remove(item)
        if self.rename_map:
            self.log('Found non-flat filenames, renaming to support broken'
--- a/src/calibre/ebooks/oeb/transforms/split.py
+++ b/src/calibre/ebooks/oeb/transforms/split.py
@ -154,7 +154,11 @@ class Split(object):
    def rewrite_links(self, url):
        href, frag = urldefrag(url)
-        href = self.current_item.abshref(href)
+        try:
            href = self.current_item.abshref(href)
        except ValueError:
            # Unparseable URL
            return url
        if href in self.map:
            anchor_map = self.map[href]
            nhref = anchor_map[frag if frag else None]
--- a/src/calibre/ebooks/oeb/transforms/unsmarten.py
+++ b/src/calibre/ebooks/oeb/transforms/unsmarten.py
@ -16,7 +16,7 @@ class UnsmartenPunctuation(object):
    def unsmarten(self, root):
        for x in self.html_tags(root):
-            if not barename(x) == 'pre':
+            if not barename(x.tag) == 'pre':
                if getattr(x, 'text', None):
                    x.text = unsmarten_text(x.text)
                if getattr(x, 'tail', None) and x.tail:
--- a/src/calibre/gui2/book_details.py
+++ b/src/calibre/gui2/book_details.py
@ -56,8 +56,11 @@ def render_html(mi, css, vertical, widget, all_fields=False): # {{{
        </body>
    <html>
    '''%(f, c, css)
    fm = getattr(mi, 'field_metadata', field_metadata)
    fl = dict(get_field_list(fm))
    show_comments = (all_fields or fl.get('comments', True))
    comments = u''
-    if mi.comments:
+    if mi.comments and show_comments:
        comments = comments_to_html(force_unicode(mi.comments))
    right_pane = u'<div id="comments" class="comments">%s</div>'%comments
--- a/src/calibre/gui2/catalog/catalog_bibtex.py
+++ b/src/calibre/gui2/catalog/catalog_bibtex.py
@ -35,7 +35,10 @@ class PluginWidget(QWidget, Ui_Form):
        self.all_fields = [x for x in FIELDS if x != 'all']
        #add custom columns
-        self.all_fields.extend([x for x in sorted(db.custom_field_keys())])
+        for x in sorted(db.custom_field_keys()):
            self.all_fields.append(x)
            if db.field_metadata[x]['datatype'] == 'series':
                self.all_fields.append(x+'_index')
        #populate
        for x in self.all_fields:
            QListWidgetItem(x, self.db_fields)
--- a/src/calibre/gui2/catalog/catalog_csv_xml.py
+++ b/src/calibre/gui2/catalog/catalog_csv_xml.py
@ -33,6 +33,9 @@ class PluginWidget(QWidget, Ui_Form):
            self.all_fields.append(x)
            QListWidgetItem(x, self.db_fields)
            fm = db.field_metadata[x]
            if fm['datatype'] == 'series':
                QListWidgetItem(x+'_index', self.db_fields)
    def initialize(self, name, db):
        self.name = name
--- a/src/calibre/gui2/cover_flow.py
+++ b/src/calibre/gui2/cover_flow.py
@ -70,7 +70,7 @@ if pictureflow is not None:
                    ans = ''
            except:
                ans = ''
-            return ans
+            return ans.replace('&', '&&')
        def subtitle(self, index):
            try:
--- a/src/calibre/gui2/custom_column_widgets.py
+++ b/src/calibre/gui2/custom_column_widgets.py
@ -8,7 +8,7 @@ __docformat__ = 'restructuredtext en'
 from functools import partial
 from PyQt4.Qt import QComboBox, QLabel, QSpinBox, QDoubleSpinBox, QDateTimeEdit, \
-        QDateTime, QGroupBox, QVBoxLayout, QSizePolicy, \
+        QDateTime, QGroupBox, QVBoxLayout, QSizePolicy, QGridLayout, \
        QSpacerItem, QIcon, QCheckBox, QWidget, QHBoxLayout, SIGNAL, \
        QPushButton
@ -401,70 +401,106 @@ widgets = {
        'enumeration': Enumeration
 }
-def field_sort_key(y, x=None):
+def field_sort_key(y, fm=None):
-    m1 = x[y]
+    m1 = fm[y]
-    n1 = 'zzzzz' if m1['datatype'] == 'comments' else m1['name']
+    name = icu_lower(m1['name'])
    n1 = 'zzzzz' + name if m1['datatype'] == 'comments' else name
    return sort_key(n1)
 def populate_metadata_page(layout, db, book_id, bulk=False, two_column=False, parent=None):
-    def widget_factory(type, col):
+    def widget_factory(typ, key):
        if bulk:
-            w = bulk_widgets[type](db, col, parent)
+            w = bulk_widgets[typ](db, key, parent)
        else:
-            w = widgets[type](db, col, parent)
+            w = widgets[typ](db, key, parent)
        if book_id is not None:
            w.initialize(book_id)
        return w
-    x = db.custom_column_num_map
+    fm = db.field_metadata
    cols = list(x)
    cols.sort(key=partial(field_sort_key, x=x))
    count_non_comment = len([c for c in cols if x[c]['datatype'] != 'comments'])
-    layout.setColumnStretch(1, 10)
+    # Get list of all non-composite custom fields. We must make widgets for these
    fields = fm.custom_field_keys(include_composites=False)
    cols_to_display = fields
    cols_to_display.sort(key=partial(field_sort_key, fm=fm))
    # This will contain the fields in the order to display them
    cols = []
    # The fields named here must be first in the widget list
    tweak_cols = tweaks['metadata_edit_custom_column_order']
    comments_in_tweak = 0
    for key in (tweak_cols or ()):
        # Add the key if it really exists in the database
        if key in cols_to_display:
            cols.append(key)
            if fm[key]['datatype'] == 'comments':
                comments_in_tweak += 1
    # Add all the remaining fields
    comments_not_in_tweak = 0
    for key in cols_to_display:
        if key not in cols:
            cols.append(key)
            if fm[key]['datatype'] == 'comments':
                comments_not_in_tweak += 1
    count = len(cols)
    layout_rows_for_comments = 9
    if two_column:
-        turnover_point = (count_non_comment+1)/2
+        turnover_point = ((count-comments_not_in_tweak+1) +
-        layout.setColumnStretch(3, 10)
+                          comments_in_tweak*(layout_rows_for_comments-1))/2
    else:
        # Avoid problems with multi-line widgets
-        turnover_point = count_non_comment + 1000
+        turnover_point = count + 1000
    ans = []
-    column = row = comments_row = 0
+    column = row = base_row = max_row = 0
-    for col in cols:
+    for key in cols:
-        if not x[col]['editable']:
+        if not fm[key]['is_editable']:
            continue # this almost never happens
        dt = fm[key]['datatype']
        if dt == 'composite' or (bulk and dt == 'comments'):
            continue
-        dt = x[col]['datatype']
+        w = widget_factory(dt, fm[key]['colnum'])
        if dt == 'composite':
            continue
        if dt == 'comments':
            continue
        w = widget_factory(dt, col)
        ans.append(w)
        if two_column and dt == 'comments':
            # Here for compatibility with old layout. Comments always started
            # in the left column
            comments_in_tweak -= 1
            # no special processing if the comment field was named in the tweak
            if comments_in_tweak < 0 and comments_not_in_tweak > 0:
                # Force a turnover, adding comments widgets below max_row.
                # Save the row to return to if we turn over again
                column = 0
                row = max_row
                base_row = row
                turnover_point = row + (comments_not_in_tweak * layout_rows_for_comments)/2
                comments_not_in_tweak = 0
        l = QGridLayout()
        if dt == 'comments':
            layout.addLayout(l, row, column, layout_rows_for_comments, 1)
            layout.setColumnStretch(column, 100)
            row += layout_rows_for_comments
        else:
            layout.addLayout(l, row, column, 1, 1)
            layout.setColumnStretch(column, 100)
            row += 1
        for c in range(0, len(w.widgets), 2):
            w.widgets[c].setWordWrap(True)
            w.widgets[c].setBuddy(w.widgets[c+1])
            layout.addWidget(w.widgets[c], row, column)
            layout.addWidget(w.widgets[c+1], row, column+1)
            row += 1
        comments_row = max(comments_row, row)
        if row >= turnover_point:
            column += 2
            turnover_point = count_non_comment + 1000
            row = 0
    if not bulk: # Add the comments fields
        row = comments_row
        column = 0
        for col in cols:
            dt = x[col]['datatype']
            if dt != 'comments':
-                continue
+                w.widgets[c].setWordWrap(True)
-            w = widget_factory(dt, col)
+                w.widgets[c].setBuddy(w.widgets[c+1])
-            ans.append(w)
+                l.addWidget(w.widgets[c], c, 0)
-            layout.addWidget(w.widgets[0], row, column, 1, 2)
+                l.addWidget(w.widgets[c+1], c, 1)
-            if two_column and column == 0:
+                l.setColumnStretch(1, 10000)
-                column = 2
+            else:
-                continue
+                l.addWidget(w.widgets[0], 0, 0, 1, 2)
-            column = 0
+        l.addItem(QSpacerItem(0, 0, vPolicy=QSizePolicy.Expanding), c, 0, 1, 1)
-            row += 1
+        max_row = max(max_row, row)
        if row >= turnover_point:
            column = 1
            turnover_point = count + 1000
            row = base_row
    items = []
    if len(ans) > 0:
        items.append(QSpacerItem(10, 10, QSizePolicy.Minimum,
--- a/src/calibre/gui2/dialogs/add_from_isbn.py
+++ b/src/calibre/gui2/dialogs/add_from_isbn.py
@ -12,7 +12,7 @@ from PyQt4.Qt import QDialog, QApplication
 from calibre.gui2.dialogs.add_from_isbn_ui import Ui_Dialog
 from calibre.ebooks.metadata import check_isbn
 from calibre.constants import iswindows
-from calibre.gui2 import gprefs
+from calibre.gui2 import gprefs, question_dialog, error_dialog
 class AddFromISBN(QDialog, Ui_Dialog):
@ -44,6 +44,7 @@ class AddFromISBN(QDialog, Ui_Dialog):
        tags = list(filter(None, [x.strip() for x in tags]))
        gprefs['add from ISBN tags'] = tags
        self.set_tags = tags
        bad = set()
        for line in unicode(self.isbn_box.toPlainText()).strip().splitlines():
            line = line.strip()
            if not line:
@ -64,5 +65,19 @@ class AddFromISBN(QDialog, Ui_Dialog):
                        os.access(parts[1], os.R_OK) and os.path.isfile(parts[1]):
                        book['path'] = parts[1]
                    self.books.append(book)
            else:
                bad.add(parts[0])
        if bad:
            if self.books:
                if not question_dialog(self, _('Some invalid ISBNs'),
                    _('Some of the ISBNs you entered were invalid. They will'
                        ' be ignored. Click Show Details to see which ones.'
                        ' Do you want to proceed?'), det_msg='\n'.join(bad),
                    show_copy_button=True):
                    return
            else:
                return error_dialog(self, _('All invalid ISBNs'),
                        _('All the ISBNs you entered were invalid. No books'
                            ' can be added.'), show=True)
        QDialog.accept(self, *args)
--- a/src/calibre/gui2/dialogs/scheduler.py
+++ b/src/calibre/gui2/dialogs/scheduler.py
@ -419,6 +419,13 @@ class Scheduler(QObject):
        QObject.__init__(self, parent)
        self.internet_connection_failed = False
        self._parent = parent
        self.no_internet_msg = _('Cannot download news as no internet connection '
                'is active')
        self.no_internet_dialog = d = error_dialog(self._parent,
                self.no_internet_msg, _('No internet connection'),
                show_copy_button=False)
        d.setModal(False)
        self.recipe_model = RecipeModel()
        self.db = db
        self.lock = QMutex(QMutex.Recursive)
@ -434,7 +441,7 @@ class Scheduler(QObject):
        self.news_menu.addAction(self.cac)
        self.news_menu.addSeparator()
        self.all_action = self.news_menu.addAction(
-                _('Download all scheduled new sources'),
+                _('Download all scheduled news sources'),
                self.download_all_scheduled)
        self.timer = QTimer(self)
@ -523,7 +530,6 @@ class Scheduler(QObject):
        finally:
            self.lock.unlock()
    def download_clicked(self, urn):
        if urn is not None:
            return self.download(urn)
@ -534,18 +540,25 @@ class Scheduler(QObject):
    def download_all_scheduled(self):
        self.download_clicked(None)
-    def download(self, urn):
+    def has_internet_connection(self):
        self.lock.lock()
        if not internet_connected():
            if not self.internet_connection_failed:
                self.internet_connection_failed = True
-                d = error_dialog(self._parent, _('No internet connection'),
+                if self._parent.is_minimized_to_tray:
-                        _('Cannot download news as no internet connection '
+                    self._parent.status_bar.show_message(self.no_internet_msg,
-                            'is active'))
+                            5000)
-                d.setModal(False)
+                elif not self.no_internet_dialog.isVisible():
-                d.show()
+                    self.no_internet_dialog.show()
            return False
        self.internet_connection_failed = False
        if self.no_internet_dialog.isVisible():
            self.no_internet_dialog.hide()
        return True
    def download(self, urn):
        self.lock.lock()
        if not self.has_internet_connection():
            return False
        doit = urn not in self.download_queue
        self.lock.unlock()
        if doit:
@ -555,7 +568,9 @@ class Scheduler(QObject):
    def check(self):
        recipes = self.recipe_model.get_to_be_downloaded_recipes()
        for urn in recipes:
-            self.download(urn)
+            if not self.download(urn):
                # No internet connection, we will try again in a minute
                break
 if __name__ == '__main__':
    from calibre.gui2 import is_ok_to_use_qt
--- a/src/calibre/gui2/preferences/toolbar.py
+++ b/src/calibre/gui2/preferences/toolbar.py
@ -28,11 +28,11 @@ class BaseModel(QAbstractListModel):
    def name_to_action(self, name, gui):
        if name == 'Donate':
-            return FakeAction(name, 'donate.png',
+            return FakeAction(_('Donate'), 'donate.png',
                    dont_add_to=frozenset(['context-menu',
                        'context-menu-device']))
        if name == 'Location Manager':
-            return FakeAction(name, None,
+            return FakeAction(_('Location Manager'), None,
                    _('Switch between library and device views'),
                    dont_add_to=frozenset(['menubar', 'toolbar',
                        'toolbar-child', 'context-menu',
--- a/src/calibre/gui2/ui.py
+++ b/src/calibre/gui2/ui.py
@ -723,10 +723,10 @@ class Main(MainWindow, MainWindowMixin, DeviceMixin, EmailMixin, # {{{
        self.write_settings()
        if self.system_tray_icon.isVisible():
            if not dynamic['systray_msg'] and not isosx:
-                info_dialog(self, 'calibre', 'calibre '+\
+                info_dialog(self, 'calibre', 'calibre '+ \
                        _('will keep running in the system tray. To close it, '
                        'choose <b>Quit</b> in the context menu of the '
-                        'system tray.')).exec_()
+                        'system tray.'), show_copy_button=False).exec_()
                dynamic['systray_msg'] = True
            self.hide_windows()
            e.ignore()
--- a/src/calibre/gui2/viewer/documentview.py
+++ b/src/calibre/gui2/viewer/documentview.py
@ -537,6 +537,12 @@ class DocumentView(QWebView): # {{{
        self.dictionary_action.setShortcut(Qt.CTRL+Qt.Key_L)
        self.dictionary_action.triggered.connect(self.lookup)
        self.addAction(self.dictionary_action)
        self.search_action = QAction(QIcon(I('dictionary.png')),
                _('&Search for next occurrence'), self)
        self.search_action.setShortcut(Qt.CTRL+Qt.Key_S)
        self.search_action.triggered.connect(self.search_next)
        self.addAction(self.search_action)
        self.goto_location_action = QAction(_('Go to...'), self)
        self.goto_location_menu = m = QMenu(self)
        self.goto_location_actions = a = {
@ -620,6 +626,7 @@ class DocumentView(QWebView): # {{{
        text = unicode(self.selectedText())
        if text:
            menu.insertAction(list(menu.actions())[0], self.dictionary_action)
            menu.insertAction(list(menu.actions())[0], self.search_action)
        menu.addSeparator()
        menu.addAction(self.goto_location_action)
        menu.exec_(ev.globalPos())
@ -630,6 +637,12 @@ class DocumentView(QWebView): # {{{
            if t:
                self.manager.lookup(t.split()[0])
    def search_next(self):
        if self.manager is not None:
            t = unicode(self.selectedText()).strip()
            if t:
                self.manager.search.set_search_string(t)
    def set_manager(self, manager):
        self.manager = manager
        self.scrollbar = manager.horizontal_scrollbar
--- a/src/calibre/gui2/viewer/main.py
+++ b/src/calibre/gui2/viewer/main.py
@ -758,11 +758,12 @@ class EbookViewer(MainWindow, Ui_EbookViewer):
        self.set_page_number(frac)
    def next_document(self):
-        if self.current_index < len(self.iterator.spine) - 1:
+        if (hasattr(self, 'current_index') and self.current_index <
                len(self.iterator.spine) - 1):
            self.load_path(self.iterator.spine[self.current_index+1])
    def previous_document(self):
-        if self.current_index > 0:
+        if hasattr(self, 'current_index') and self.current_index > 0:
            self.load_path(self.iterator.spine[self.current_index-1], pos=1.0)
    def keyPressEvent(self, event):
--- a/src/calibre/library/catalog.py
+++ b/src/calibre/library/catalog.py
@ -347,7 +347,9 @@ class BIBTEX(CatalogPlugin): # {{{
            for field in fields:
                if field.startswith('#'):
-                        item = db.get_field(entry['id'],field,index_is_id=True)
+                    item = db.get_field(entry['id'],field,index_is_id=True)
                    if isinstance(item, (bool, float, int)):
                        item = repr(item)
                elif field == 'title_sort':
                    item = entry['sort']
                else:
@ -391,7 +393,7 @@ class BIBTEX(CatalogPlugin): # {{{
                elif field == 'isbn' :
                    # Could be 9, 10 or 13 digits
-                    bibtex_entry.append(u'isbn = "%s"' % re.sub(u'[\D]', u'', item))
+                    bibtex_entry.append(u'isbn = "%s"' % re.sub(u'[0-9xX]', u'', item))
                elif field == 'formats' :
                    #Add file path if format is selected
@ -413,7 +415,8 @@ class BIBTEX(CatalogPlugin): # {{{
                    bibtex_entry.append(u'month = "%s"' % bibtexdict.utf8ToBibtex(strftime("%b", item)))
                elif field.startswith('#') :
-                    bibtex_entry.append(u'%s = "%s"' % (field[1:], bibtexdict.utf8ToBibtex(item)))
+                    bibtex_entry.append(u'custom_%s = "%s"' % (field[1:],
                        bibtexdict.utf8ToBibtex(item)))
                else:
                    # elif field in ['title', 'publisher', 'cover', 'uuid', 'ondevice',
--- a/src/calibre/library/cli.py
+++ b/src/calibre/library/cli.py
@ -64,8 +64,17 @@ def do_list(db, fields, afields, sort_by, ascending, search_text, line_width, se
    data = db.get_data_as_dict(prefix, authors_as_string=True)
    fields = ['id'] + fields
    title_fields = fields
-    fields = [db.custom_column_label_map[x[1:]]['num'] if x[0]=='*'
+    def field_name(f):
-            else x for x in fields]
+        ans = f
        if f[0] == '*':
            if f.endswith('_index'):
                fkey = f[1:-len('_index')]
                num = db.custom_column_label_map[fkey]['num']
                ans = '%d_index'%num
            else:
                ans = db.custom_column_label_map[f[1:]]['num']
        return ans
    fields = list(map(field_name, fields))
    for f in data:
        fmts = [x for x in f['formats'] if x is not None]
@ -121,8 +130,10 @@ def do_list(db, fields, afields, sort_by, ascending, search_text, line_width, se
 def list_option_parser(db=None):
    fields = set(FIELDS)
    if db is not None:
-        for f in db.custom_column_label_map:
+        for f, data in db.custom_column_label_map.iteritems():
            fields.add('*'+f)
            if data['datatype'] == 'series':
                fields.add('*'+f+'_index')
    parser = get_parser(_(
 '''\
@ -161,8 +172,10 @@ def command_list(args, dbpath):
    opts, args = parser.parse_args(sys.argv[:1] + args)
    afields = set(FIELDS)
    if db is not None:
-        for f in db.custom_column_label_map:
+        for f, data in db.custom_column_label_map.iteritems():
            afields.add('*'+f)
            if data['datatype'] == 'series':
                afields.add('*'+f+'_index')
    fields = [str(f.strip().lower()) for f in opts.fields.split(',')]
    if 'all' in fields:
        fields = sorted(list(afields))
--- a/src/calibre/library/database.py
+++ b/src/calibre/library/database.py
@ -1089,8 +1089,12 @@ ALTER TABLE books ADD COLUMN isbn TEXT DEFAULT "" COLLATE NOCASE;
        ids = tuple(ids)
        if len(ids) > 50000:
            return True
        if len(ids) == 1:
            ids = '(%d)'%ids[0]
        else:
            ids = repr(ids)
        return self.conn.get('''
-            SELECT data FROM conversion_options WHERE book IN %r AND
+            SELECT data FROM conversion_options WHERE book IN %s AND
        format=? LIMIT 1'''%(ids,), (format,), all=False) is not None
    def delete_conversion_options(self, id, format, commit=True):
--- a/src/calibre/library/database2.py
+++ b/src/calibre/library/database2.py
@ -3376,11 +3376,15 @@ class LibraryDatabase2(LibraryDatabase, SchemaUpgrade, CustomColumns):
        '''
        if prefix is None:
            prefix = self.library_path
-        FIELDS = set(['title', 'sort', 'authors', 'author_sort', 'publisher', 'rating',
+        fdata = self.custom_column_num_map
-            'timestamp', 'size', 'tags', 'comments', 'series', 'series_index',
+
-            'uuid', 'pubdate', 'last_modified', 'identifiers', 'languages'])
+        FIELDS = set(['title', 'sort', 'authors', 'author_sort', 'publisher',
-        for x in self.custom_column_num_map:
+            'rating', 'timestamp', 'size', 'tags', 'comments', 'series',
-            FIELDS.add(x)
+            'series_index', 'uuid', 'pubdate', 'last_modified', 'identifiers',
            'languages']).union(set(fdata))
        for x, data in fdata.iteritems():
            if data['datatype'] == 'series':
                FIELDS.add('%d_index'%x)
        data = []
        for record in self.data:
            if record is None: continue
--- a/src/calibre/library/save_to_disk.py
+++ b/src/calibre/library/save_to_disk.py
@ -154,7 +154,7 @@ class Formatter(TemplateFormatter):
                    return self.composite_values[key]
                self.composite_values[key] = 'RECURSIVE_COMPOSITE FIELD (S2D) ' + key
                self.composite_values[key] = \
-                    self.vformat(b['display']['composite_template'], [], kwargs)
+                    self.evaluate(b['display']['composite_template'], [], kwargs)
                return self.composite_values[key]
            if key in kwargs:
                val = kwargs[key]
--- a/src/calibre/manual/customize.rst
+++ b/src/calibre/manual/customize.rst
@ -47,7 +47,7 @@ Overriding icons, templates, etcetera
 |app| allows you to override the static resources, like icons, templates, javascript, etc. with customized versions that you like.
 All static resources are stored in the resources sub-folder of the calibre install location. On Windows, this is usually
-:file:`C:\Program Files\Calibre2\resources`. On OS X, :file:`/Applications/calibre.app/Contents/Resources/resources/`. On linux, if you are using the binary installer
+:file:`C:/Program Files/Calibre2/resources`. On OS X, :file:`/Applications/calibre.app/Contents/Resources/resources/`. On linux, if you are using the binary installer
 from the calibre website it will be :file:`/opt/calibre/resources`. These paths can change depending on where you choose to install |app|. 
 You should not change the files in this resources folder, as your changes will get overwritten the next time you update |app|. Instead, go to
--- a/src/calibre/manual/template_lang.rst
+++ b/src/calibre/manual/template_lang.rst
@ -112,7 +112,7 @@ Functions are always applied before format specifications. See further down for
 The syntax for using functions is ``{field:function(arguments)}``, or ``{field:function(arguments)|prefix|suffix}``. Arguments are separated by commas. Commas inside arguments must be preceeded by a backslash ( '\\' ). The last (or only) argument cannot contain a closing parenthesis ( ')' ). Functions return the value of the field used in the template, suitably modified.
-If you have programming experience, please note that the syntax in this mode (single function) is not what you might expect. Strings are not quoted. Spaces are significant. All arguments must be constants; there is no sub-evaluation. Use :ref:`template program mode <template_mode>` and :ref:`general program mode <general_mode>` to avoid these differences.
+If you have programming experience, please note that the syntax in this mode (single function) is not what you might expect. Strings are not quoted. Spaces are significant. All arguments must be constants; there is no sub-evaluation. **Do not use subtemplates (`{ ... }`) as function arguments.** Instead, use :ref:`template program mode <template_mode>` and :ref:`general program mode <general_mode>`.
 Many functions use regular expressions. In all cases, regular expression matching is case-insensitive.
--- a/src/calibre/translations/af.po
+++ b/src/calibre/translations/af.po
--- a/src/calibre/translations/ar.po
+++ b/src/calibre/translations/ar.po
--- a/src/calibre/translations/ast.po
+++ b/src/calibre/translations/ast.po
--- a/Show More
+++ b/Show More