GwR integration of epub/mobi

2025-11-14 02:26:58 -05:00 · 2010-01-22 11:03:04 -07:00 · 2010-01-22 11:03:04 -07:00 · 64611c1b0d
commit 64611c1b0d
parent 0fe72a4b42 5419bac497
26 changed files with 10099 additions and 7324 deletions
--- a/Changelog.yaml
+++ b/Changelog.yaml
@ -4,6 +4,131 @@
 # for important features/bug fixes.
 # Also, each release can have new and improved recipes.
 - version: 0.6.35
  date: 2010-01-22
  new features:
    - title: Catalog generation
      type: major
      description: >
        "You can now easily generate a catlog of all books in your calibre library by clicking the arrow next to the convert button. The catalog can be in one of several formats: XML, CSV, EPUB and MOBI, with scope for future formats via plugins. If you generate the catalog in an e-book format, it will be automatically sent to your e-book reader the next time you  connect it, allowing you to easily browse your collection on the reader itself."
    - title: "RTF Input: Support for unicode characters. Needs testing."
      type: major
      tickets: [4501]
    - title: "Add Quick Start Guide by John Schember to calibre library on first run of calibre"
      type: major
    - title: "Improve handling of justification"
      description: >
        "Now calibre will explicitly change the justification of all left aligned paragraphs to justified or vice versa depending on the justification setting. This should make it possible to robustly convert all content to either justified or not. calibre will not touch centered or right aligned content."
    - title: "E-book viewer: Fit images to viewer window (can be turned off via Preferences)"
    - title: "Add section on E-book viewer to User Manual"
    - title: "Development environment: First look for resources in the location pointed to by CALIBRE_DEVELOP_FROM. If not found, use the normal resource location"
    - title: "When reading metadata from filenames, with the Swap author names option checked, improve the logic used to detect author last name."
      tickets: [4620]
    - title: "News downloads: When getting an article URL from a RSS feed, look first for an original article link. This speeds up the download of news services that use a syndication service like feedburner or pheedo to publish their RSS feeds."
  bug fixes:
    - "Windows device detection: Don't do expensive polling while waiting for device disconnect. This should fix the problems people have with their floppy drive being activated while an e-book reader is connected"
    - title: "PML Input: Fix creation of metadata Table of Contents"
      tickets: [5633]
    - title: "Fix Tag browser not updating after using delete specific format actions"
      tickets: [4632]
    - title: "MOBI Output: Don't die when converting EPUB files with SVG covers"
    - title: "Nook driver: Remove the # character from filenames when sending to device"
      tickets: [4629]
    - title: "Workaround for bug in QtWebKit on windows that could cause crashes when using the next page button in the e-book viewer for certain files"
      tickets: [4606]
    - title: "MOBI Input: Rescale img width and height attributes that were specified in em units"
      tickets: [4608]
    - title: "ebook-meta: Fix setting of series metadata"
    - title: "RTF metadata: Fix reading metadata from very small files"
    - title: "Conversion pipeline: Don't error out if the user sets an invalid chapter detection XPath"
    - title: "Fix main mem and card being swapped in pocketbook detection on OS X"
    - title: "Welcome wizard: Set the language to english if the user doesn't explicitly change the language. This ensures that the language will be english on windows by default"
    - title: "Fix bug in OEBWriter that could cause writing out of resources in subdirectories with URL unsafe names to fail"
  new recipes:
    - title: Frankfurter Rundschau
      author: Justus Bisser
    - title: The Columbia Hournalism Review
      author: XanthanGum
    - title: Various CanWest Canadian news sources
      author: Nick Redding
    - title: gigitaljournal.com
      author: Darko Miletic
    - title: Pajamas Media
      autor: Krittika Goyal
    - title: Algemeen Dagbla
      author: kwetal
    - title: "The Reader's Digest"
      author: BrianG
    - title: The Yemen Times
      author: kwetal
    - title: The Kitsap Sun
      author: Darko Miletic
    - title: drivelry.com
      author: Krittika Goyal
    - title: New recipe for Google Reader that downloads unread articles instead of just starred ones
      author: rollercoaster
    - title: Le Devoir
      author: Lorenzo Vigentini
    - title: Joop
      author: kwetal
    - title: Various computer magazines
      author: Lorenzo Vigentini
    - title: The Wall Street journal (free parts)
      author: Nick Redding
    - title: Journal of Nephrology
      author: Krittika Goyal
    - title: stuff.co.nz
      author: Krittika Goyal
  improved recipes:
    - Physics Today
    - Wall Street Journal
    - American Spectator
    - FTD
    - The National Post
    - Blic
 - version: 0.6.34
  date: 2010-01-15
--- a/resources/recipes/fr_online.recipe
+++ b/resources/recipes/fr_online.recipe
@ -0,0 +1,67 @@
 __license__   = 'GPL v3'
 __copyright__ = '2009, Justus Bisser <justus.bisser at gmail.com>'
 '''
 fr-online.de
 '''
 import re
 from calibre.web.feeds.news import BasicNewsRecipe
 class Spiegel_ger(BasicNewsRecipe):
    title                 = 'Frankfurter Rundschau'
    __author__            = 'Justus Bisser'
    description           = "Dies ist die Online-Ausgabe der Frankfurter Rundschau. Um die abgerufenen individuell einzustellen bearbeiten sie die Liste im erweiterten Modus. Die Feeds findet man auf http://www.fr-online.de/verlagsservice/fr_newsreader/?em_cnt=574255"
    publisher             = 'Druck- und Verlagshaus Frankfurt am Main GmbH'
    category              = 'FR Online, Frankfurter Rundschau, Nachrichten, News,Dienste, RSS, RSS, Feedreader, Newsfeed, iGoogle, Netvibes, Widget'
    oldest_article        = 7
    max_articles_per_feed = 100
    language              = 'de'
    lang                  = 'de-DE'
    no_stylesheets        = True
    use_embedded_content  = False
    #encoding              = 'cp1252'
    conversion_options = {
                          'comment'          : description
                        , 'tags'             : category
                        , 'publisher'        : publisher
                        , 'language'         : lang
                        }
    recursions = 0
    max_articles_per_feed = 100
    #keep_only_tags = [dict(name='div', attrs={'class':'text'})]
    #tags_remove = [dict(name='div', attrs={'style':'text-align: left; margin: 4px 0px 0px 4px; width: 200px; float: right;'})]
    remove_attributes = ['style']
    feeds = []
    #remove_tags_before = [dict(name='div', attrs={'style':'padding-left: 0px;'})]
    #remove_tags_after = [dict(name='div', attrs={'class':'box_head_text'})]
    # enable for all news
    allNews = 0
    if allNews:
        feeds = [(u'Frankfurter Rundschau', u'http://www.fr-online.de/rss/sport/index.xml')]
    else:
        #select the feeds you like
        feeds = [(u'Nachrichten', u'http://www.fr-online.de/rss/politik/index.xml')]
        feeds.append((u'Kommentare und Analysen', u'http://www.fr-online.de/rss/meinung/index.xml'))
        feeds.append((u'Dokumentationen', u'http://www.fr-online.de/rss/dokumentation/index.xml'))
        feeds.append((u'Deutschlandtrend', u'http://www.fr-online.de/rss/deutschlandtrend/index.xml'))
        feeds.append((u'Wirtschaft', u'http://www.fr-online.de/rss/wirtschaft/index.xml'))
        feeds.append((u'Sport', u'http://www.fr-online.de/rss/sport/index.xml'))
        feeds.append((u'Feuilleton', u'http://www.fr-online.de/rss/feuilleton/index.xml'))
        feeds.append((u'Panorama', u'http://www.fr-online.de/rss/panorama/index.xml'))
        feeds.append((u'Rhein Main und Hessen', u'http://www.fr-online.de/rss/hessen/index.xml'))
        feeds.append((u'Fitness und Gesundheit', u'http://www.fr-online.de/rss/fit/index.xml'))
        feeds.append((u'Multimedia', u'http://www.fr-online.de/rss/multimedia/index.xml'))
        feeds.append((u'Wissen und Bildung', u'http://www.fr-online.de/rss/wissen/index.xml'))
    def get_article_url(self, article):
        url = article.link
        regex = re.compile("0C[0-9]{6,8}0A?")
        liste = regex.findall(url)
        string = liste.pop(0)
        string = string[2:len(string)-1]
        return "http://www.fr-online.de/_em_cms/_globals/print.php?em_cnt=" + string
--- a/resources/recipes/ftd.recipe
+++ b/resources/recipes/ftd.recipe
@ -9,16 +9,16 @@ from calibre.web.feeds.news import BasicNewsRecipe
 class FTDe(BasicNewsRecipe):
-    
+
    title = 'FTD'
    description = 'Financial Times Deutschland'
    __author__ = 'Oliver Niesner'
    use_embedded_content   = False
    timefmt = ' [%d %b %Y]'
-    language = _('German')
+    language = 'de'
    max_articles_per_feed = 40
    no_stylesheets = True
-    
+
    remove_tags = [dict(id='navi_top'),
 		   dict(id='topbanner'),
 		   dict(id='seitenkopf'),
@ -83,8 +83,8 @@ class FTDe(BasicNewsRecipe):
 		   dict(name='div', attrs={'class':'articleOptionFootFrame'}),
 		   dict(name='div', attrs={'class':'artikelsplitfaq'})]
    #remove_tags_after = [dict(name='a', attrs={'class':'more'})]
-    
+
-    feeds =  [ ('Finanzen', 'http://www.ftd.de/rss2/finanzen/maerkte'), 
+    feeds =  [ ('Finanzen', 'http://www.ftd.de/rss2/finanzen/maerkte'),
 	       ('Meinungshungrige', 'http://www.ftd.de/rss2/meinungshungrige'),
 	       ('Unternehmen', 'http://www.ftd.de/rss2/unternehmen'),
 	       ('Politik', 'http://www.ftd.de/rss2/politik'),
@ -95,8 +95,8 @@ class FTDe(BasicNewsRecipe):
 	       ('Auto', 'http://www.ftd.de/rss2/auto'),
 	       ('Lifestyle', 'http://www.ftd.de/rss2/lifestyle')
-	     ] 
+	     ]
-    
+
    def print_version(self, url):
        return url.replace('.html', '.html?mode=print')
--- a/src/calibre/ebooks/pml/pmlconverter.py
+++ b/src/calibre/ebooks/pml/pmlconverter.py
@ -8,6 +8,7 @@ __license__   = 'GPL v3'
 __copyright__ = '2009, John Schember <john@nachtimwald.com>'
 __docformat__ = 'restructuredtext en'
 import os
 import re
 import StringIO
@ -198,14 +199,26 @@ class PML_HTMLizer(object):
    def start_line(self):
        start = u''
        div = []
        span = []
        other = []
        for key, val in self.state.items():
            if val[0]:
-                if key in self.STATES_VALUE_REQ:
+                if key in self.DIV_STATES:
-                    start += self.STATES_TAGS[key][0] % val[1]
+                    div.append((key, val[1]))
-                elif key in self.STATES_VALUE_REQ_2:
+                elif key in self.SPAN_STATES:
-                    start += self.STATES_TAGS[key][0] % (val[1], val[1])
+                    span.append((key, val[1]))
                else:
-                    start += self.STATES_TAGS[key][0]
+                    other.append((key, val[1]))
        for key, val in other+div+span:
            if key in self.STATES_VALUE_REQ:
                start += self.STATES_TAGS[key][0] % val
            elif key in self.STATES_VALUE_REQ_2:
                start += self.STATES_TAGS[key][0] % (val, val)
            else:
                start += self.STATES_TAGS[key][0]
        return u'<p>%s' % start
@ -518,7 +531,7 @@ class PML_HTMLizer(object):
                    elif c == 'C':
                        line.read(1)
                        id = 'pml_toc-%s' % len(self.toc)
-                        self.toc.add_item(self.file_name, id, self.code_value(line))
+                        self.toc.add_item(os.path.basename(self.file_name), id, self.code_value(line))
                        text = '<span id="%s"></span>' % id
                    elif c == 'n':
                        pass
--- a/src/calibre/gui2/catalog/catalog_epub_mobi.py
+++ b/src/calibre/gui2/catalog/catalog_epub_mobi.py
@ -9,14 +9,22 @@ __docformat__ = 'restructuredtext en'
 from calibre.gui2 import gprefs
 from catalog_epub_mobi_ui import Ui_Form
 from calibre.ebooks.conversion.config import load_defaults
 from PyQt4.Qt import QWidget
 class PluginWidget(QWidget,Ui_Form):
    TITLE = _('EPUB/MOBI Options')
    HELP  = _('Options specific to')+' EPUB/MOBI '+_('output')
-    # Indicates whether this plugin wants its output synced to the connected device
+    OPTION_FIELDS = [('exclude_genre','\[[\w ]*\]'),
                     ('exclude_tags','~'),
                     ('read_tag','+'),
                     ('note_tag','*')]
    # Output synced to the connected device?
    sync_enabled = True
    # Formats supported by this plugin
    formats = set(['epub','mobi'])
    def __init__(self, parent=None):
@ -26,20 +34,25 @@ class PluginWidget(QWidget,Ui_Form):
    def initialize(self, name):
        self.name = name
        # Restore options from last use here
-        print "gui2.catalog.catalog_epub_mobi:initialize(): need to restore options"
+        print "gui2.catalog.catalog_epub_mobi:initialize(): Retrieving options"
-        
+        for opt in self.OPTION_FIELDS:
-    def options(self):
+            opt_value = gprefs[self.name + '_' + opt[0]]
-        OPTION_FIELDS = ['exclude_genre','exclude_tags','read_tag','note_tag','output_profile']
+            print "Restoring %s: %s" % (self.name + '_' + opt[0], opt_value)
            setattr(self,opt[0], unicode(opt_value))
-        # Save the current options
+    def options(self):
-        print "gui2.catalog.catalog_epub_mobi:options(): need to save options"
+
-        
+        # Save/return the current options
-        # Return a dictionary with current options
+        # getattr() returns text value of QLineEdit control
-        print "gui2.catalog.catalog_epub_mobi:options(): need to return options"
+        print "gui2.catalog.catalog_epub_mobi:options(): Saving options"
        print "gui2.catalog.catalog_epub_mobi:options(): using hard-coded options"
        opts_dict = {}
-        for opt in OPTION_FIELDS:
+        for opt in self.OPTION_FIELDS:
-            opts_dict[opt] = str(getattr(self,opt).text()).split(',')
+            opt_value = unicode(getattr(self,opt[0]))
            print "writing %s to gprefs" % opt_value
            gprefs.set(self.name + '_' + opt[0], opt_value)
            opts_dict[opt[0]] = opt_value.split(',')
        opts_dict['output_profile'] = [load_defaults('page_setup')['output_profile']]
        return opts_dict
--- a/src/calibre/gui2/catalog/catalog_epub_mobi.ui
+++ b/src/calibre/gui2/catalog/catalog_epub_mobi.ui
@ -58,19 +58,6 @@
    <string>Additional note tag prefix:</string>
   </property>
  </widget>
  <widget class="QLabel" name="label_5">
   <property name="geometry">
    <rect>
     <x>20</x>
     <y>140</y>
     <width>181</width>
     <height>17</height>
    </rect>
   </property>
   <property name="text">
    <string>Output profile:</string>
   </property>
  </widget>
  <widget class="QLineEdit" name="exclude_genre">
   <property name="geometry">
    <rect>
@ -148,22 +135,6 @@
    <string>*</string>
   </property>
  </widget>
  <widget class="QLineEdit" name="output_profile">
   <property name="geometry">
    <rect>
     <x>300</x>
     <y>140</y>
     <width>231</width>
     <height>22</height>
    </rect>
   </property>
   <property name="toolTip">
    <string extracomment="Tooltip comment here"/>
   </property>
   <property name="text">
    <string>kindle2</string>
   </property>
  </widget>
 </widget>
 <resources/>
 <connections/>
--- a/src/calibre/gui2/dialogs/catalog.py
+++ b/src/calibre/gui2/dialogs/catalog.py
@ -123,14 +123,14 @@ class Catalog(QDialog, Ui_Dialog):
        if self.sync.isEnabled():
            self.sync.setChecked(dynamic.get('catalog_sync_to_device', True))
-        self.format.currentIndexChanged.connect(self.format_changed)
+        self.format.currentIndexChanged.connect(self.show_plugin_tab)
        self.show_plugin_tab(None)
    def show_plugin_tab(self, idx):
        cf = unicode(self.format.currentText()).lower()
        while self.tabs.count() > 1:
-            self.tabs.remove(1)
+            self.tabs.removeTab(1)
        for pw in self.widgets:
            if cf in pw.formats:
                self.tabs.addTab(pw, pw.TITLE)
--- a/src/calibre/library/catalog.py
+++ b/src/calibre/library/catalog.py
@ -267,7 +267,6 @@ class EPUB_MOBI(CatalogPlugin):
                          "Applies to: ePub, MOBI output formats"))
                          ] 
    class NumberToText(object):
        ''' 
        Converts numbers to text
--- a/src/calibre/manual/gui.rst
+++ b/src/calibre/manual/gui.rst
@ -124,7 +124,7 @@ Convert e-books
 ~~~~~~~~~~~~~~~~~~~~~~
 .. |cei| image:: images/convert_ebooks.png
-|cei| Ebooks can be converted from a number of formats into the LRF format (for the SONY Reader). Note that ebooks you purchase will typically have `Digital Rights Management <http://en.wikipedia.org/wiki/Digital_rights_management>`_ *(DRM)*. |app| will not convert these ebooks. For many DRM formats, it is easy to remove the DRM, but as this is illegal, you have to find tools to liberate your books yourself and then use |app| to convert them.
+|cei| Ebooks can be converted from a number of formats into the LRF format (for the SONY Reader). Note that ebooks you purchase will typically have `Digital Rights Management <http://bugs.calibre-ebook.com/wiki/DRM>`_ *(DRM)*. |app| will not convert these ebooks. For many DRM formats, it is easy to remove the DRM, but as this is illegal, you have to find tools to liberate your books yourself and then use |app| to convert them.
 For most people, conversion should be a simple 1-click affair. But if you want to learn more about the conversion process, see :ref:`conversion`.
@ -134,7 +134,7 @@ The :guilabel:`Convert E-books` action has three variations, accessed by the arr
    2. **Bulk convert**: This allows you to specify options only once to convert a number of ebooks in bulk.
-    3. **Set conversion defaults**: Allows you to set the default settings for future conversions.
+    3. **Create catalog**: This action allow yous to generate a complete listing with all metadata of the books in your library, in several formats, like XML, CSV, EPUB and MOBI. The catalog will contain all the books showing in the library view currently, so you can use the search features to limit the books to be catalogued. In addition, if you select multiple books using the mouse, only those books will be added to the catalog. If you generate the catalog in an e-book format such as EPUB or MOBI, the next time you connect your e-book reader, the catalog will be automatically sent to the device.
 .. _view:
--- a/src/calibre/translations/ar.po
+++ b/src/calibre/translations/ar.po
--- a/src/calibre/translations/de.po
+++ b/src/calibre/translations/de.po
--- a/src/calibre/translations/el.po
+++ b/src/calibre/translations/el.po
--- a/src/calibre/translations/es.po
+++ b/src/calibre/translations/es.po
--- a/src/calibre/translations/fr.po
+++ b/src/calibre/translations/fr.po
--- a/src/calibre/translations/gl.po
+++ b/src/calibre/translations/gl.po
--- a/src/calibre/translations/it.po
+++ b/src/calibre/translations/it.po
--- a/src/calibre/translations/lv.po
+++ b/src/calibre/translations/lv.po
--- a/src/calibre/translations/nb.po
+++ b/src/calibre/translations/nb.po
--- a/src/calibre/translations/nl.po
+++ b/src/calibre/translations/nl.po
--- a/src/calibre/translations/pl.po
+++ b/src/calibre/translations/pl.po
--- a/src/calibre/translations/pt_BR.po
+++ b/src/calibre/translations/pt_BR.po
--- a/src/calibre/translations/ru.po
+++ b/src/calibre/translations/ru.po
--- a/src/calibre/translations/sq.po
+++ b/src/calibre/translations/sq.po
--- a/src/calibre/translations/sv.po
+++ b/src/calibre/translations/sv.po
--- a/src/calibre/translations/tr.po
+++ b/src/calibre/translations/tr.po
--- a/src/calibre/translations/zh_TW.po
+++ b/src/calibre/translations/zh_TW.po