Sync to trunk.

This commit is contained in:
John Schember 2012-03-17 11:06:59 -04:00
commit b81bb527b5
147 changed files with 101527 additions and 59242 deletions

View File

@ -19,6 +19,127 @@
# new recipes: # new recipes:
# - title: # - title:
- version: 0.8.43
date: 2012-03-16
new features:
- title: "Template language: Speedup evaluation of general program mode templates by pre-compiling them to python. If you experience errors with this optimization, you can turn it off via Preferences->Tweaks. Also other miscellaneous optimizations in evaluating templates with composite columns."
- title: "MOBI Output: Add an option to not convert all images to JPEG when creating MOBI files. For maximum compatibility of the produced MOBI files, do not use this option."
tickets: [954025]
- title: "Add iPad3 Output Profile"
bug fixes:
- title: "KF8 Input: Add support for KF8 files with obfuscated embedded fonts"
tickets: [953260]
- title: "Make the stars in the book list a little larger on windows >= vista"
- title: "Revised periodical Section layout, for touchscreen devices resolving iBooks problem with tables spanning multiple pages"
- title: "Read dc:contributor metadata from MOBI files"
- title: "MOBI Output: Fix a regression that caused the generated thumbnail embedded in calibre produced MOBI files to be a large, low quality image instead of a small, high quality image. You would have been affected by this bug only if you directly used the output from calibre, without exporting it via send to device or save to disk."
tickets: [954254]
- title: "KF8 Input: Recognize OpenType embedded fonts as well."
tickets: [954728]
- title: "Fix regression in 0.8.41 that caused file:/// URLs to stop working in the news download system on windows."
tickets: [955581]
- title: "When setting metadata in MOBI files fix cover not being updated if the mobi file has its first image record as the cover"
- title: "Fix column coloring rules based on the size column not working"
tickets: [953737]
improved recipes:
- Microwaves and RF
- idg.se
new recipes:
- title: SatMagazine
author: kiavash
- version: 0.8.42
date: 2012-03-12
new features:
- title: "Support for reading Amazon's new KF8 format"
type: major
description: "calibre can now both view and convert MOBI files that contain Amazon's new KF8 (Kindle Fire) format"
- title: "Add a tweak to Preferences->Tweaks to control the font size used in the book details panel"
tickets: [948357]
- title: "Allow specifying a list of file types to exclude when automatically adding files from a folder"
tickets: [943025]
- title: "Show ratings in the book details panel as stars. Also allow the user to change the alignment of the ratings column in the main books list. No longer display the stars in blue, instead their color can be customized via the column coloring rules, like any other column"
- title: "When setting metadata in EPUB ensure that the <meta name=cover> tag has its name attribute first. Needed for the Nook."
- title: "Drivers for Novo 7, LG G2x and Zenithink T-280"
tickets: [941671, 940625, 940527]
- title: "Update linux binaries to Qt 4.8.0"
bug fixes:
- title: "Fix some rar files causing crashes on OS X (updated libunrar.dylib in the OS X build)"
tickets: [951185]
- title: "MOBI Output: Ignore the Table of Contents pointed to by the guide, if it contains no links"
- title: "ODT Input: Ignore margin declaration in ODT styles if more specific margin-* declarations are present"
tickets: [941134]
- title: "Conversion pipeline: Fix @import rules in CSS stylesheets that have comments on their first few lines being ignored."
- title: "EPUB Input: When extracting the contents of epub files on windows, do not error out if one or more of the components in the epub file have filepaths containing characters that are invalid for the windows filesystem, instead, just replace those characters, since those entries are likely to be errors in the zip container anyway."
tickets: [950081]
- title: "Textile output: Fix issue with blockquotes and sentences getting removed."
- title: "MOBI Output: When using the prefer author sort conversion option, handle multiple authors better."
tickets: [947146]
- title: "Fix regression in 0.8.41 that broke direct connection to iDevices in windows"
tickets: [944534]
- title: "Fix the download bulk metadata completed popup causing a crash if the Esc key is pressed."
tickets: [943056]
- title: "Fix rating values doubled in CSV/XML catalogs"
tickets: [942790]
- title: "EPUB Input: Remove non markup documents from the spine automatically, instead of erroring out"
- title: "When formatting ratings in templates, etc., do not have an unnecessary .0"
- title: "Calibre portable: Do not allow calibre portable to run if it is placed in a location whose path is too long. Also hide the library location setup in the welcome wizard when running the portable build."
- title: "Fix regression in 0.8.41 that broke calibre if the TMP or TEMP environment variable is set to the root of a drive."
tickets: [952284]
- title: "Fix display of ratings type custom fields in the content server"
tickets: [940600]
improved recipes:
- La Jornada
- Chicago Tribune
- Mediapart
- rue89
new recipes:
- title: Racjonalista
author: Racjonlista
- title: JAPAA
author: adoucette
- version: 0.8.41 - version: 0.8.41
date: 2012-02-24 date: 2012-02-24

View File

@ -4,7 +4,7 @@ from calibre.web.feeds.news import BasicNewsRecipe
class IDGse(BasicNewsRecipe): class IDGse(BasicNewsRecipe):
title = 'IDG' title = 'IDG'
__author__ = 'zapt0' __author__ = 'Stanislav Khromov'
language = 'sv' language = 'sv'
description = 'IDG.se' description = 'IDG.se'
oldest_article = 1 oldest_article = 1
@ -15,6 +15,9 @@ class IDGse(BasicNewsRecipe):
feeds = [(u'Dagens IDG-nyheter',u'http://feeds.idg.se/idg/ETkj?format=xml')] feeds = [(u'Dagens IDG-nyheter',u'http://feeds.idg.se/idg/ETkj?format=xml')]
def get_article_url(self, article):
return article.get('guid', None)
def print_version(self,url): def print_version(self,url):
return url + '?articleRenderMode=print&m=print' return url + '?articleRenderMode=print&m=print'

View File

@ -15,7 +15,7 @@ import re
from calibre.web.feeds.news import BasicNewsRecipe from calibre.web.feeds.news import BasicNewsRecipe
from calibre.utils.magick import Image from calibre.utils.magick import Image
class Microwave_and_RF(BasicNewsRecipe): class Microwaves_and_RF(BasicNewsRecipe):
Convert_Grayscale = False # Convert images to gray scale or not Convert_Grayscale = False # Convert images to gray scale or not
@ -25,9 +25,9 @@ class Microwave_and_RF(BasicNewsRecipe):
# Add sections that want to be included from the magazine # Add sections that want to be included from the magazine
include_sections = [] include_sections = []
title = u'Microwave and RF' title = u'Microwaves and RF'
__author__ = 'kiavash' __author__ = u'kiavash'
description = u'Microwave and RF Montly Magazine' description = u'Microwaves and RF Montly Magazine'
publisher = 'Penton Media, Inc.' publisher = 'Penton Media, Inc.'
publication_type = 'magazine' publication_type = 'magazine'
site = 'http://mwrf.com' site = 'http://mwrf.com'
@ -96,9 +96,16 @@ class Microwave_and_RF(BasicNewsRecipe):
def parse_index(self): def parse_index(self):
# Fetches the main page of Microwave and RF # Fetches the main page of Microwaves and RF
soup = self.index_to_soup(self.site) soup = self.index_to_soup(self.site)
# First page has the ad, Let's find the redirect address.
url = soup.find('span', attrs={'class':'commonCopy'}).find('a').get('href')
if url.startswith('/'):
url = self.site + url
soup = self.index_to_soup(url)
# Searches the site for Issue ID link then returns the href address # Searches the site for Issue ID link then returns the href address
# pointing to the latest issue # pointing to the latest issue
latest_issue = soup.find('a', attrs={'href':lambda x: x and 'IssueID' in x}).get('href') latest_issue = soup.find('a', attrs={'href':lambda x: x and 'IssueID' in x}).get('href')

155
recipes/satmagazine.recipe Normal file
View File

@ -0,0 +1,155 @@
#!/usr/bin/env python
##
## Title: SatMagazine
##
## License: GNU General Public License v3 - http://www.gnu.org/copyleft/gpl.html
##
## Written: Feb 2012
## Last Edited: Mar 2012
##
# Feb 2012: Initial release
__license__ = 'GNU General Public License v3 - http://www.gnu.org/copyleft/gpl.html'
'''
satmagazine.com
'''
import re
from calibre.web.feeds.news import BasicNewsRecipe
class SatMagazine(BasicNewsRecipe):
title = u'SatMagazine'
description = u'North American Satellite Markets...'
publisher = 'Satnews Publishers'
publication_type = 'magazine'
INDEX = 'http://www.satmagazine.com/cgi-bin/display_edition.cgi'
__author__ = 'kiavash'
language = 'en'
asciiize = True
timeout = 120
simultaneous_downloads = 2
# Flattens all the tables to make it compatible with Nook
conversion_options = {'linearize_tables' : True}
keep_only_tags = [dict(name='span', attrs={'class':'story'})]
no_stylesheets = True
remove_javascript = True
remove_attributes = [ 'border', 'cellspacing', 'align', 'cellpadding', 'colspan',
'valign', 'vspace', 'hspace', 'alt', 'width', 'height' ]
# Specify extra CSS - overrides ALL other CSS (IE. Added last).
extra_css = 'body { font-family: verdana, helvetica, sans-serif; } \
.introduction, .first { font-weight: bold; } \
.cross-head { font-weight: bold; font-size: 125%; } \
.cap, .caption { display: block; font-size: 80%; font-style: italic; } \
.cap, .caption, .caption img, .caption span { display: block; margin: 5px auto; } \
.byl, .byd, .byline img, .byline-name, .byline-title, .author-name, .author-position, \
.correspondent-portrait img, .byline-lead-in, .name, .bbc-role { display: block; \
font-size: 80%; font-style: italic; margin: 1px auto; } \
.story-date, .published { font-size: 80%; } \
table { width: 100%; } \
td img { display: block; margin: 5px auto; } \
ul { padding-top: 10px; } \
ol { padding-top: 10px; } \
li { padding-top: 5px; padding-bottom: 5px; } \
h1 { font-size: 175%; font-weight: bold; } \
h2 { font-size: 150%; font-weight: bold; } \
h3 { font-size: 125%; font-weight: bold; } \
h4, h5, h6 { font-size: 100%; font-weight: bold; }'
# Remove the line breaks, href links and float left/right and picture width/height.
preprocess_regexps = [(re.compile(r'<br[ ]*/>', re.IGNORECASE), lambda m: ''),
(re.compile(r'<br[ ]*clear.*/>', re.IGNORECASE), lambda m: ''),
(re.compile(r'<a.*?>'), lambda h1: ''),
(re.compile(r'</a>'), lambda h2: ''),
(re.compile(r'float:.*?'), lambda h3: ''),
(re.compile(r'width:.*?px'), lambda h4: ''),
(re.compile(r'height:.*?px'), lambda h5: '')
]
def parse_index(self):
article_info = []
feeds = []
soup = self.index_to_soup(self.INDEX)
# Find Cover image
cover = soup.find('img', src=True, alt='Cover Image')
if cover is not None:
self.cover_url = cover['src']
self.log('Found Cover image:', self.cover_url)
soup = soup.find('div', attrs={'id':'middlecontent'}) # main part of the site that has the articles
#Find the Magazine date
ts = soup.find('span', attrs={'class':'master_heading'}) # contains the string with the date
ds = ' '.join(self.tag_to_string(ts).strip().split()[:2])
self.log('Found Current Issue:', ds)
self.timefmt = ' [%s]'%ds
#sections = soup.findAll('span', attrs={'class':'upper_heading'})
articles = soup.findAll('span', attrs={'class':'heading'})
descriptions = soup.findAll('span', attrs={'class':'story'})
title_number = 0
# Goes thru all the articles one by one and sort them out
for article in articles:
title = self.tag_to_string(article)
url = article.find('a').get('href')
self.log('\tFound article:', title, 'at', url)
desc = self.tag_to_string(descriptions[title_number])
#self.log('\t\t', desc)
article_info.append({'title':title, 'url':url, 'description':desc,
'date':self.timefmt})
title_number = title_number + 1
if article_info:
feeds.append((self.title, article_info))
return feeds
def preprocess_html(self, soup):
# Finds all the images
for figure in soup.findAll('img', attrs = {'src' : True}):
# if the image is an ad then remove it.
if (figure['alt'].find('_ad_') >=0) or (figure['alt'].find('_snipe_') >=0):
del figure['src']
del figure['alt']
del figure['border']
del figure['hspace']
del figure['vspace']
del figure['align']
del figure['size']
figure.name = 'font'
continue
figure['style'] = 'display:block' # adds /n before and after the image
# Makes the title standing out
for title in soup.findAll('b'):
title.name = 'h3'
# Removes all unrelated links
for link in soup.findAll('a', attrs = {'href': True}):
link.name = 'font'
del link['href']
del link['target']
return soup

View File

@ -11,7 +11,7 @@ class Sueddeutsche(BasicNewsRecipe):
title = u'Süddeutsche.de' # 2012-01-26 AGe Correct Title title = u'Süddeutsche.de' # 2012-01-26 AGe Correct Title
description = 'News from Germany, Access to online content' # 2012-01-26 AGe description = 'News from Germany, Access to online content' # 2012-01-26 AGe
__author__ = 'Oliver Niesner and Armin Geller' #Update AGe 2012-01-26 __author__ = 'Oliver Niesner and Armin Geller' #Update AGe 2012-01-26
publisher = 'Süddeutsche Zeitung' # 2012-01-26 AGe add publisher = u'Süddeutsche Zeitung' # 2012-01-26 AGe add
category = 'news, politics, Germany' # 2012-01-26 AGe add category = 'news, politics, Germany' # 2012-01-26 AGe add
timefmt = ' [%a, %d %b %Y]' # 2012-01-26 AGe add %a timefmt = ' [%a, %d %b %Y]' # 2012-01-26 AGe add %a
oldest_article = 7 oldest_article = 7

View File

@ -9,10 +9,10 @@ from calibre.web.feeds.news import BasicNewsRecipe
from calibre import strftime from calibre import strftime
class SueddeutcheZeitung(BasicNewsRecipe): class SueddeutcheZeitung(BasicNewsRecipe):
title = 'Süddeutsche Zeitung' title = u'Süddeutsche Zeitung'
__author__ = 'Darko Miletic' __author__ = 'Darko Miletic'
description = 'News from Germany. Access to paid content.' description = 'News from Germany. Access to paid content.'
publisher = 'Süddeutsche Zeitung' publisher = u'Süddeutsche Zeitung'
category = 'news, politics, Germany' category = 'news, politics, Germany'
no_stylesheets = True no_stylesheets = True
oldest_article = 2 oldest_article = 2

View File

@ -502,3 +502,13 @@ tweak_book_prefer = 'epub'
# negative number to increase or decrease the font size. # negative number to increase or decrease the font size.
change_book_details_font_size_by = 0 change_book_details_font_size_by = 0
#: Compile General Program Mode templates to Python
# Compiled general program mode templates are significantly faster than
# interpreted templates. Setting this tweak to True causes calibre to compile
# (in most cases) general program mode templates. Setting it to False causes
# calibre to use the old behavior -- interpreting the templates. Set the tweak
# to False if some compiled templates produce incorrect values.
# Default: compile_gpm_templates = True
# No compile: compile_gpm_templates = False
compile_gpm_templates = True

Binary file not shown.

Before

Width:  |  Height:  |  Size: 30 KiB

After

Width:  |  Height:  |  Size: 85 KiB

View File

@ -14,7 +14,7 @@ from setup.build_environment import msvc, MT, RC
from setup.installer.windows.wix import WixMixIn from setup.installer.windows.wix import WixMixIn
OPENSSL_DIR = r'Q:\openssl' OPENSSL_DIR = r'Q:\openssl'
QT_DIR = 'Q:\\Qt\\4.7.3' QT_DIR = 'Q:\\Qt\\4.8.0'
QT_DLLS = ['Core', 'Gui', 'Network', 'Svg', 'WebKit', 'Xml', 'XmlPatterns'] QT_DLLS = ['Core', 'Gui', 'Network', 'Svg', 'WebKit', 'Xml', 'XmlPatterns']
LIBUNRAR = 'C:\\Program Files\\UnrarDLL\\unrar.dll' LIBUNRAR = 'C:\\Program Files\\UnrarDLL\\unrar.dll'
SW = r'C:\cygwin\home\kovid\sw' SW = r'C:\cygwin\home\kovid\sw'

View File

@ -97,7 +97,9 @@ Now, run configure and make::
-no-plugin-manifests is needed so that loading the plugins does not fail looking for the CRT assembly -no-plugin-manifests is needed so that loading the plugins does not fail looking for the CRT assembly
configure -opensource -release -qt-zlib -qt-gif -qt-libmng -qt-libpng -qt-libtiff -qt-libjpeg -release -platform win32-msvc2008 -no-qt3support -webkit -xmlpatterns -no-phonon -no-style-plastique -no-style-cleanlooks -no-style-motif -no-style-cde -no-declarative -no-scripttools -no-audio-backend -no-multimedia -no-dbus -no-openvg -no-opengl -no-qt3support -confirm-license -nomake examples -nomake demos -nomake docs -no-plugin-manifests -openssl -I Q:\openssl\include -L Q:\openssl\lib && nmake configure -opensource -release -qt-zlib -qt-libmng -qt-libpng -qt-libtiff -qt-libjpeg -release -platform win32-msvc2008 -no-qt3support -webkit -xmlpatterns -no-phonon -no-style-plastique -no-style-cleanlooks -no-style-motif -no-style-cde -no-declarative -no-scripttools -no-audio-backend -no-multimedia -no-dbus -no-openvg -no-opengl -no-qt3support -confirm-license -nomake examples -nomake demos -nomake docs -no-plugin-manifests -openssl -I Q:\openssl\include -L Q:\openssl\lib && nmake
Add the path to the bin folder inside the Qt dir to your system PATH.
SIP SIP
----- -----

View File

@ -18,14 +18,14 @@ msgstr ""
"Report-Msgid-Bugs-To: Debian iso-codes team <pkg-isocodes-" "Report-Msgid-Bugs-To: Debian iso-codes team <pkg-isocodes-"
"devel@lists.alioth.debian.org>\n" "devel@lists.alioth.debian.org>\n"
"POT-Creation-Date: 2011-11-25 14:01+0000\n" "POT-Creation-Date: 2011-11-25 14:01+0000\n"
"PO-Revision-Date: 2012-01-14 02:30+0000\n" "PO-Revision-Date: 2012-03-05 19:08+0000\n"
"Last-Translator: Wolfgang Rohdewald <wolfgang@rohdewald.de>\n" "Last-Translator: Dennis Baudys <Unknown>\n"
"Language-Team: German <debian-l10n-german@lists.debian.org>\n" "Language-Team: German <debian-l10n-german@lists.debian.org>\n"
"MIME-Version: 1.0\n" "MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n" "Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n" "Content-Transfer-Encoding: 8bit\n"
"X-Launchpad-Export-Date: 2012-01-15 05:18+0000\n" "X-Launchpad-Export-Date: 2012-03-06 04:47+0000\n"
"X-Generator: Launchpad (build 14664)\n" "X-Generator: Launchpad (build 14900)\n"
"Language: de\n" "Language: de\n"
#. name for aaa #. name for aaa
@ -5871,7 +5871,7 @@ msgstr ""
#. name for cym #. name for cym
msgid "Welsh" msgid "Welsh"
msgstr "Kymrisch" msgstr "Walisisch"
#. name for cyo #. name for cyo
msgid "Cuyonon" msgid "Cuyonon"

File diff suppressed because it is too large Load Diff

View File

@ -8,14 +8,14 @@ msgstr ""
"Project-Id-Version: calibre\n" "Project-Id-Version: calibre\n"
"Report-Msgid-Bugs-To: FULL NAME <EMAIL@ADDRESS>\n" "Report-Msgid-Bugs-To: FULL NAME <EMAIL@ADDRESS>\n"
"POT-Creation-Date: 2011-11-25 14:01+0000\n" "POT-Creation-Date: 2011-11-25 14:01+0000\n"
"PO-Revision-Date: 2011-12-17 09:29+0000\n" "PO-Revision-Date: 2012-03-11 10:13+0000\n"
"Last-Translator: Jellby <Unknown>\n" "Last-Translator: Jellby <Unknown>\n"
"Language-Team: Spanish <es@li.org>\n" "Language-Team: Spanish <es@li.org>\n"
"MIME-Version: 1.0\n" "MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n" "Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n" "Content-Transfer-Encoding: 8bit\n"
"X-Launchpad-Export-Date: 2011-12-18 04:37+0000\n" "X-Launchpad-Export-Date: 2012-03-12 04:38+0000\n"
"X-Generator: Launchpad (build 14525)\n" "X-Generator: Launchpad (build 14933)\n"
#. name for aaa #. name for aaa
msgid "Ghotuo" msgid "Ghotuo"
@ -1779,7 +1779,7 @@ msgstr "Awiyaana"
#. name for auz #. name for auz
msgid "Arabic; Uzbeki" msgid "Arabic; Uzbeki"
msgstr "Árabe uzbeco" msgstr "Árabe uzbeko"
#. name for ava #. name for ava
msgid "Avaric" msgid "Avaric"
@ -22207,7 +22207,7 @@ msgstr "Roglai septentrional"
#. name for roh #. name for roh
msgid "Romansh" msgid "Romansh"
msgstr "" msgstr "Romanche"
#. name for rol #. name for rol
msgid "Romblomanon" msgid "Romblomanon"
@ -22607,7 +22607,7 @@ msgstr ""
#. name for sci #. name for sci
msgid "Creole Malay; Sri Lankan" msgid "Creole Malay; Sri Lankan"
msgstr "Malo criollo de Sri Lanka" msgstr "Malayo criollo de Sri Lanka"
#. name for sck #. name for sck
msgid "Sadri" msgid "Sadri"
@ -26987,15 +26987,15 @@ msgstr ""
#. name for uzb #. name for uzb
msgid "Uzbek" msgid "Uzbek"
msgstr "Uzbeco" msgstr "Uzbeko"
#. name for uzn #. name for uzn
msgid "Uzbek; Northern" msgid "Uzbek; Northern"
msgstr "Uzbeco septentrional" msgstr "Uzbeko septentrional"
#. name for uzs #. name for uzs
msgid "Uzbek; Southern" msgid "Uzbek; Southern"
msgstr "Uzbeco meridional" msgstr "Uzbeko meridional"
#. name for vaa #. name for vaa
msgid "Vaagri Booli" msgid "Vaagri Booli"
@ -30319,7 +30319,7 @@ msgstr ""
#. name for zhn #. name for zhn
msgid "Zhuang; Nong" msgid "Zhuang; Nong"
msgstr "Zhuang nong" msgstr "Chuang nong"
#. name for zho #. name for zho
msgid "Chinese" msgid "Chinese"

View File

@ -9,67 +9,67 @@ msgstr ""
"Report-Msgid-Bugs-To: Debian iso-codes team <pkg-isocodes-" "Report-Msgid-Bugs-To: Debian iso-codes team <pkg-isocodes-"
"devel@lists.alioth.debian.org>\n" "devel@lists.alioth.debian.org>\n"
"POT-Creation-Date: 2011-11-25 14:01+0000\n" "POT-Creation-Date: 2011-11-25 14:01+0000\n"
"PO-Revision-Date: 2011-09-27 15:37+0000\n" "PO-Revision-Date: 2012-03-06 13:55+0000\n"
"Last-Translator: Piarres Beobide <pi@beobide.net>\n" "Last-Translator: Asier Iturralde Sarasola <Unknown>\n"
"Language-Team: Euskara <itzulpena@comtropos.com>\n" "Language-Team: Euskara <itzulpena@comtropos.com>\n"
"MIME-Version: 1.0\n" "MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n" "Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n" "Content-Transfer-Encoding: 8bit\n"
"X-Launchpad-Export-Date: 2011-11-26 05:07+0000\n" "X-Launchpad-Export-Date: 2012-03-07 05:12+0000\n"
"X-Generator: Launchpad (build 14381)\n" "X-Generator: Launchpad (build 14907)\n"
"Language: eu\n" "Language: eu\n"
#. name for aaa #. name for aaa
msgid "Ghotuo" msgid "Ghotuo"
msgstr "" msgstr "Ghotuo"
#. name for aab #. name for aab
msgid "Alumu-Tesu" msgid "Alumu-Tesu"
msgstr "" msgstr "Alumu-Tesu"
#. name for aac #. name for aac
msgid "Ari" msgid "Ari"
msgstr "" msgstr "Ari"
#. name for aad #. name for aad
msgid "Amal" msgid "Amal"
msgstr "" msgstr "Amal"
#. name for aae #. name for aae
msgid "Albanian; Arbëreshë" msgid "Albanian; Arbëreshë"
msgstr "" msgstr "Albaniera; Arbëreshë"
#. name for aaf #. name for aaf
msgid "Aranadan" msgid "Aranadan"
msgstr "" msgstr "Aranadan"
#. name for aag #. name for aag
msgid "Ambrak" msgid "Ambrak"
msgstr "" msgstr "Ambrak"
#. name for aah #. name for aah
msgid "Arapesh; Abu'" msgid "Arapesh; Abu'"
msgstr "" msgstr "Arapesh; Abu'"
#. name for aai #. name for aai
msgid "Arifama-Miniafia" msgid "Arifama-Miniafia"
msgstr "" msgstr "Arifama-Miniafia"
#. name for aak #. name for aak
msgid "Ankave" msgid "Ankave"
msgstr "" msgstr "Ankave"
#. name for aal #. name for aal
msgid "Afade" msgid "Afade"
msgstr "" msgstr "Afade"
#. name for aam #. name for aam
msgid "Aramanik" msgid "Aramanik"
msgstr "" msgstr "Aramanik"
#. name for aan #. name for aan
msgid "Anambé" msgid "Anambé"
msgstr "" msgstr "Anambé"
#. name for aao #. name for aao
msgid "Arabic; Algerian Saharan" msgid "Arabic; Algerian Saharan"
@ -77,107 +77,107 @@ msgstr ""
#. name for aap #. name for aap
msgid "Arára; Pará" msgid "Arára; Pará"
msgstr "" msgstr "Arára; Pará"
#. name for aaq #. name for aaq
msgid "Abnaki; Eastern" msgid "Abnaki; Eastern"
msgstr "" msgstr "Abnaki; Ekialdekoa"
#. name for aar #. name for aar
msgid "Afar" msgid "Afar"
msgstr "" msgstr "Afarera"
#. name for aas #. name for aas
msgid "Aasáx" msgid "Aasáx"
msgstr "" msgstr "Aasáx"
#. name for aat #. name for aat
msgid "Albanian; Arvanitika" msgid "Albanian; Arvanitika"
msgstr "" msgstr "Albaniera; Arvanitika"
#. name for aau #. name for aau
msgid "Abau" msgid "Abau"
msgstr "" msgstr "Abau"
#. name for aaw #. name for aaw
msgid "Solong" msgid "Solong"
msgstr "" msgstr "Solong"
#. name for aax #. name for aax
msgid "Mandobo Atas" msgid "Mandobo Atas"
msgstr "" msgstr "Mandobo Atas"
#. name for aaz #. name for aaz
msgid "Amarasi" msgid "Amarasi"
msgstr "" msgstr "Amarasi"
#. name for aba #. name for aba
msgid "Abé" msgid "Abé"
msgstr "" msgstr "Abé"
#. name for abb #. name for abb
msgid "Bankon" msgid "Bankon"
msgstr "" msgstr "Bankon"
#. name for abc #. name for abc
msgid "Ayta; Ambala" msgid "Ayta; Ambala"
msgstr "" msgstr "Ayta; Ambala"
#. name for abd #. name for abd
msgid "Manide" msgid "Manide"
msgstr "" msgstr "Manide"
#. name for abe #. name for abe
msgid "Abnaki; Western" msgid "Abnaki; Western"
msgstr "" msgstr "Abnaki; Mendebaldekoa"
#. name for abf #. name for abf
msgid "Abai Sungai" msgid "Abai Sungai"
msgstr "" msgstr "Abai Sungai"
#. name for abg #. name for abg
msgid "Abaga" msgid "Abaga"
msgstr "" msgstr "Abaga"
#. name for abh #. name for abh
msgid "Arabic; Tajiki" msgid "Arabic; Tajiki"
msgstr "" msgstr "Arabiera; Tajiki"
#. name for abi #. name for abi
msgid "Abidji" msgid "Abidji"
msgstr "" msgstr "Abidji"
#. name for abj #. name for abj
msgid "Aka-Bea" msgid "Aka-Bea"
msgstr "" msgstr "Aka-Bea"
#. name for abk #. name for abk
msgid "Abkhazian" msgid "Abkhazian"
msgstr "" msgstr "Abkhazera"
#. name for abl #. name for abl
msgid "Lampung Nyo" msgid "Lampung Nyo"
msgstr "" msgstr "Lampung Nyo"
#. name for abm #. name for abm
msgid "Abanyom" msgid "Abanyom"
msgstr "" msgstr "Abanyom"
#. name for abn #. name for abn
msgid "Abua" msgid "Abua"
msgstr "" msgstr "Abua"
#. name for abo #. name for abo
msgid "Abon" msgid "Abon"
msgstr "" msgstr "Abon"
#. name for abp #. name for abp
msgid "Ayta; Abellen" msgid "Ayta; Abellen"
msgstr "" msgstr "Ayta; Abellen"
#. name for abq #. name for abq
msgid "Abaza" msgid "Abaza"
msgstr "" msgstr "Abazera"
#. name for abr #. name for abr
msgid "Abron" msgid "Abron"

File diff suppressed because it is too large Load Diff

View File

@ -8,119 +8,119 @@ msgstr ""
"Report-Msgid-Bugs-To: Debian iso-codes team <pkg-isocodes-" "Report-Msgid-Bugs-To: Debian iso-codes team <pkg-isocodes-"
"devel@lists.alioth.debian.org>\n" "devel@lists.alioth.debian.org>\n"
"POT-Creation-Date: 2011-11-25 14:01+0000\n" "POT-Creation-Date: 2011-11-25 14:01+0000\n"
"PO-Revision-Date: 2011-09-27 15:42+0000\n" "PO-Revision-Date: 2012-03-14 21:30+0000\n"
"Last-Translator: Kovid Goyal <Unknown>\n" "Last-Translator: Иван Старчевић <ivanstar61@gmail.com>\n"
"Language-Team: Serbian <gnu@prevod.org>\n" "Language-Team: Serbian <gnu@prevod.org>\n"
"MIME-Version: 1.0\n" "MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n" "Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n" "Content-Transfer-Encoding: 8bit\n"
"X-Launchpad-Export-Date: 2011-11-26 05:36+0000\n" "X-Launchpad-Export-Date: 2012-03-15 04:45+0000\n"
"X-Generator: Launchpad (build 14381)\n" "X-Generator: Launchpad (build 14933)\n"
"Language: sr\n" "Language: sr\n"
#. name for aaa #. name for aaa
msgid "Ghotuo" msgid "Ghotuo"
msgstr "" msgstr "Готуо"
#. name for aab #. name for aab
msgid "Alumu-Tesu" msgid "Alumu-Tesu"
msgstr "" msgstr "Алуму-Тесу"
#. name for aac #. name for aac
msgid "Ari" msgid "Ari"
msgstr "" msgstr "Ари"
#. name for aad #. name for aad
msgid "Amal" msgid "Amal"
msgstr "" msgstr "Амал"
#. name for aae #. name for aae
msgid "Albanian; Arbëreshë" msgid "Albanian; Arbëreshë"
msgstr "" msgstr "Албански; Арбереше"
#. name for aaf #. name for aaf
msgid "Aranadan" msgid "Aranadan"
msgstr "" msgstr "Аранадан"
#. name for aag #. name for aag
msgid "Ambrak" msgid "Ambrak"
msgstr "" msgstr "Амбрак"
#. name for aah #. name for aah
msgid "Arapesh; Abu'" msgid "Arapesh; Abu'"
msgstr "" msgstr "Арабеш; Абу'"
#. name for aai #. name for aai
msgid "Arifama-Miniafia" msgid "Arifama-Miniafia"
msgstr "" msgstr "Арифама-Миниафиа"
#. name for aak #. name for aak
msgid "Ankave" msgid "Ankave"
msgstr "" msgstr "Анкаве"
#. name for aal #. name for aal
msgid "Afade" msgid "Afade"
msgstr "" msgstr "Афаде"
#. name for aam #. name for aam
msgid "Aramanik" msgid "Aramanik"
msgstr "" msgstr "Араманик"
#. name for aan #. name for aan
msgid "Anambé" msgid "Anambé"
msgstr "" msgstr "Анамбе"
#. name for aao #. name for aao
msgid "Arabic; Algerian Saharan" msgid "Arabic; Algerian Saharan"
msgstr "" msgstr "Арапски; Алжирска Сахара"
#. name for aap #. name for aap
msgid "Arára; Pará" msgid "Arára; Pará"
msgstr "" msgstr "Арара;Пара"
#. name for aaq #. name for aaq
msgid "Abnaki; Eastern" msgid "Abnaki; Eastern"
msgstr "" msgstr "Абнаки;Источни"
#. name for aar #. name for aar
msgid "Afar" msgid "Afar"
msgstr "афар" msgstr "Афар"
#. name for aas #. name for aas
msgid "Aasáx" msgid "Aasáx"
msgstr "" msgstr "Асакс"
#. name for aat #. name for aat
msgid "Albanian; Arvanitika" msgid "Albanian; Arvanitika"
msgstr "" msgstr "Албански (арванитска)"
#. name for aau #. name for aau
msgid "Abau" msgid "Abau"
msgstr "" msgstr "Абау"
#. name for aaw #. name for aaw
msgid "Solong" msgid "Solong"
msgstr "" msgstr "Солонг"
#. name for aax #. name for aax
msgid "Mandobo Atas" msgid "Mandobo Atas"
msgstr "" msgstr "Мандобо Атас"
#. name for aaz #. name for aaz
msgid "Amarasi" msgid "Amarasi"
msgstr "" msgstr "Амараси"
#. name for aba #. name for aba
msgid "Abé" msgid "Abé"
msgstr "" msgstr "Абе"
#. name for abb #. name for abb
msgid "Bankon" msgid "Bankon"
msgstr "" msgstr "Банкон"
#. name for abc #. name for abc
msgid "Ayta; Ambala" msgid "Ayta; Ambala"
msgstr "" msgstr "Аита;Амбала"
#. name for abd #. name for abd
msgid "Manide" msgid "Manide"
@ -128,235 +128,235 @@ msgstr ""
#. name for abe #. name for abe
msgid "Abnaki; Western" msgid "Abnaki; Western"
msgstr "" msgstr "Абнаки; Западни"
#. name for abf #. name for abf
msgid "Abai Sungai" msgid "Abai Sungai"
msgstr "" msgstr "Абаи Сунгаи"
#. name for abg #. name for abg
msgid "Abaga" msgid "Abaga"
msgstr "" msgstr "Абага"
#. name for abh #. name for abh
msgid "Arabic; Tajiki" msgid "Arabic; Tajiki"
msgstr "" msgstr "Арапски; Таџики"
#. name for abi #. name for abi
msgid "Abidji" msgid "Abidji"
msgstr "" msgstr "Абиџи"
#. name for abj #. name for abj
msgid "Aka-Bea" msgid "Aka-Bea"
msgstr "" msgstr "Ака-Беа"
#. name for abk #. name for abk
msgid "Abkhazian" msgid "Abkhazian"
msgstr "абкаски" msgstr "Абхазијски"
#. name for abl #. name for abl
msgid "Lampung Nyo" msgid "Lampung Nyo"
msgstr "" msgstr "Лампунг Нио"
#. name for abm #. name for abm
msgid "Abanyom" msgid "Abanyom"
msgstr "" msgstr "Абањјом"
#. name for abn #. name for abn
msgid "Abua" msgid "Abua"
msgstr "" msgstr "Абуа"
#. name for abo #. name for abo
msgid "Abon" msgid "Abon"
msgstr "" msgstr "Абон"
#. name for abp #. name for abp
msgid "Ayta; Abellen" msgid "Ayta; Abellen"
msgstr "" msgstr "Ајта (абелијска)"
#. name for abq #. name for abq
msgid "Abaza" msgid "Abaza"
msgstr "" msgstr "Абаза"
#. name for abr #. name for abr
msgid "Abron" msgid "Abron"
msgstr "" msgstr "Аброн"
#. name for abs #. name for abs
msgid "Malay; Ambonese" msgid "Malay; Ambonese"
msgstr "" msgstr "Малајски; Амбонијски"
#. name for abt #. name for abt
msgid "Ambulas" msgid "Ambulas"
msgstr "" msgstr "Амбулас"
#. name for abu #. name for abu
msgid "Abure" msgid "Abure"
msgstr "" msgstr "Абуре"
#. name for abv #. name for abv
msgid "Arabic; Baharna" msgid "Arabic; Baharna"
msgstr "" msgstr "Арапски (Бахреин)"
#. name for abw #. name for abw
msgid "Pal" msgid "Pal"
msgstr "" msgstr "Пал"
#. name for abx #. name for abx
msgid "Inabaknon" msgid "Inabaknon"
msgstr "" msgstr "Инабакнон"
#. name for aby #. name for aby
msgid "Aneme Wake" msgid "Aneme Wake"
msgstr "" msgstr "Анем Ваке"
#. name for abz #. name for abz
msgid "Abui" msgid "Abui"
msgstr "" msgstr "Абуи"
#. name for aca #. name for aca
msgid "Achagua" msgid "Achagua"
msgstr "" msgstr "Ачагуа"
#. name for acb #. name for acb
msgid "Áncá" msgid "Áncá"
msgstr "" msgstr "Анка"
#. name for acd #. name for acd
msgid "Gikyode" msgid "Gikyode"
msgstr "" msgstr "Гикиод"
#. name for ace #. name for ace
msgid "Achinese" msgid "Achinese"
msgstr "акинески" msgstr "Акинески"
#. name for acf #. name for acf
msgid "Creole French; Saint Lucian" msgid "Creole French; Saint Lucian"
msgstr "" msgstr "Креолски француски; Сент Лусија"
#. name for ach #. name for ach
msgid "Acoli" msgid "Acoli"
msgstr "аколи" msgstr "Аколи"
#. name for aci #. name for aci
msgid "Aka-Cari" msgid "Aka-Cari"
msgstr "" msgstr "Ака-Кари"
#. name for ack #. name for ack
msgid "Aka-Kora" msgid "Aka-Kora"
msgstr "" msgstr "Ака-Кора"
#. name for acl #. name for acl
msgid "Akar-Bale" msgid "Akar-Bale"
msgstr "" msgstr "Акар-Бале"
#. name for acm #. name for acm
msgid "Arabic; Mesopotamian" msgid "Arabic; Mesopotamian"
msgstr "" msgstr "Арапски (Месопотамија)"
#. name for acn #. name for acn
msgid "Achang" msgid "Achang"
msgstr "" msgstr "Ачанг"
#. name for acp #. name for acp
msgid "Acipa; Eastern" msgid "Acipa; Eastern"
msgstr "" msgstr "Акипа;Источни"
#. name for acq #. name for acq
msgid "Arabic; Ta'izzi-Adeni" msgid "Arabic; Ta'izzi-Adeni"
msgstr "" msgstr "Арапски; Северни Јемен"
#. name for acr #. name for acr
msgid "Achi" msgid "Achi"
msgstr "" msgstr "Ачи"
#. name for acs #. name for acs
msgid "Acroá" msgid "Acroá"
msgstr "" msgstr "Акроа"
#. name for act #. name for act
msgid "Achterhoeks" msgid "Achterhoeks"
msgstr "" msgstr "Ахтерхекс"
#. name for acu #. name for acu
msgid "Achuar-Shiwiar" msgid "Achuar-Shiwiar"
msgstr "" msgstr "Ачуар-Шивиар"
#. name for acv #. name for acv
msgid "Achumawi" msgid "Achumawi"
msgstr "" msgstr "Ачумави"
#. name for acw #. name for acw
msgid "Arabic; Hijazi" msgid "Arabic; Hijazi"
msgstr "" msgstr "Арапски;Хиџази"
#. name for acx #. name for acx
msgid "Arabic; Omani" msgid "Arabic; Omani"
msgstr "" msgstr "Арапски;Оман"
#. name for acy #. name for acy
msgid "Arabic; Cypriot" msgid "Arabic; Cypriot"
msgstr "" msgstr "Арапски;Кипар"
#. name for acz #. name for acz
msgid "Acheron" msgid "Acheron"
msgstr "" msgstr "Ачерон"
#. name for ada #. name for ada
msgid "Adangme" msgid "Adangme"
msgstr "адангме" msgstr "Адангме"
#. name for adb #. name for adb
msgid "Adabe" msgid "Adabe"
msgstr "" msgstr "Адабе"
#. name for add #. name for add
msgid "Dzodinka" msgid "Dzodinka"
msgstr "" msgstr "Ђодинка"
#. name for ade #. name for ade
msgid "Adele" msgid "Adele"
msgstr "" msgstr "Аделе"
#. name for adf #. name for adf
msgid "Arabic; Dhofari" msgid "Arabic; Dhofari"
msgstr "" msgstr "Арапски;Дофари"
#. name for adg #. name for adg
msgid "Andegerebinha" msgid "Andegerebinha"
msgstr "" msgstr "Андегеребина"
#. name for adh #. name for adh
msgid "Adhola" msgid "Adhola"
msgstr "" msgstr "Адола"
#. name for adi #. name for adi
msgid "Adi" msgid "Adi"
msgstr "" msgstr "Ади"
#. name for adj #. name for adj
msgid "Adioukrou" msgid "Adioukrou"
msgstr "" msgstr "Адиокру"
#. name for adl #. name for adl
msgid "Galo" msgid "Galo"
msgstr "" msgstr "Гало"
#. name for adn #. name for adn
msgid "Adang" msgid "Adang"
msgstr "" msgstr "Аданг"
#. name for ado #. name for ado
msgid "Abu" msgid "Abu"
msgstr "" msgstr "Абу"
#. name for adp #. name for adp
msgid "Adap" msgid "Adap"
msgstr "" msgstr "Адап"
#. name for adq #. name for adq
msgid "Adangbe" msgid "Adangbe"
msgstr "" msgstr "Адангбе"
#. name for adr #. name for adr
msgid "Adonara" msgid "Adonara"
@ -364,59 +364,59 @@ msgstr ""
#. name for ads #. name for ads
msgid "Adamorobe Sign Language" msgid "Adamorobe Sign Language"
msgstr "" msgstr "Адамороб знаковни језик"
#. name for adt #. name for adt
msgid "Adnyamathanha" msgid "Adnyamathanha"
msgstr "" msgstr "Адњаматана"
#. name for adu #. name for adu
msgid "Aduge" msgid "Aduge"
msgstr "" msgstr "Адуге"
#. name for adw #. name for adw
msgid "Amundava" msgid "Amundava"
msgstr "" msgstr "Амундава"
#. name for adx #. name for adx
msgid "Tibetan; Amdo" msgid "Tibetan; Amdo"
msgstr "" msgstr "Тибетански;Амдо"
#. name for ady #. name for ady
msgid "Adyghe" msgid "Adyghe"
msgstr "" msgstr "Адиге"
#. name for adz #. name for adz
msgid "Adzera" msgid "Adzera"
msgstr "" msgstr "Адзера"
#. name for aea #. name for aea
msgid "Areba" msgid "Areba"
msgstr "" msgstr "Ареба"
#. name for aeb #. name for aeb
msgid "Arabic; Tunisian" msgid "Arabic; Tunisian"
msgstr "" msgstr "Арапски;Туниски"
#. name for aec #. name for aec
msgid "Arabic; Saidi" msgid "Arabic; Saidi"
msgstr "" msgstr "Арапски (Горњи Египат)"
#. name for aed #. name for aed
msgid "Argentine Sign Language" msgid "Argentine Sign Language"
msgstr "" msgstr "Аргентински знаковни језик"
#. name for aee #. name for aee
msgid "Pashayi; Northeast" msgid "Pashayi; Northeast"
msgstr "" msgstr "Пашаи (североисточни)"
#. name for aek #. name for aek
msgid "Haeke" msgid "Haeke"
msgstr "" msgstr "Хаеке"
#. name for ael #. name for ael
msgid "Ambele" msgid "Ambele"
msgstr "" msgstr "Амбеле"
#. name for aem #. name for aem
msgid "Arem" msgid "Arem"
@ -460,15 +460,15 @@ msgstr ""
#. name for afd #. name for afd
msgid "Andai" msgid "Andai"
msgstr "" msgstr "Андаи"
#. name for afe #. name for afe
msgid "Putukwam" msgid "Putukwam"
msgstr "" msgstr "Путуквам"
#. name for afg #. name for afg
msgid "Afghan Sign Language" msgid "Afghan Sign Language"
msgstr "" msgstr "Афганистански знаковни језик"
#. name for afh #. name for afh
msgid "Afrihili" msgid "Afrihili"
@ -476,7 +476,7 @@ msgstr "африхили"
#. name for afi #. name for afi
msgid "Akrukay" msgid "Akrukay"
msgstr "" msgstr "Акрукај"
#. name for afk #. name for afk
msgid "Nanubae" msgid "Nanubae"
@ -484,15 +484,15 @@ msgstr ""
#. name for afn #. name for afn
msgid "Defaka" msgid "Defaka"
msgstr "" msgstr "Дефака"
#. name for afo #. name for afo
msgid "Eloyi" msgid "Eloyi"
msgstr "" msgstr "Елоји"
#. name for afp #. name for afp
msgid "Tapei" msgid "Tapei"
msgstr "" msgstr "Тапеи"
#. name for afr #. name for afr
msgid "Afrikaans" msgid "Afrikaans"
@ -500,51 +500,51 @@ msgstr "африканс"
#. name for afs #. name for afs
msgid "Creole; Afro-Seminole" msgid "Creole; Afro-Seminole"
msgstr "" msgstr "Креолски;Афричко-Семинолслки"
#. name for aft #. name for aft
msgid "Afitti" msgid "Afitti"
msgstr "" msgstr "Афити"
#. name for afu #. name for afu
msgid "Awutu" msgid "Awutu"
msgstr "" msgstr "Авуту"
#. name for afz #. name for afz
msgid "Obokuitai" msgid "Obokuitai"
msgstr "" msgstr "Обокуитаи"
#. name for aga #. name for aga
msgid "Aguano" msgid "Aguano"
msgstr "" msgstr "Агвано"
#. name for agb #. name for agb
msgid "Legbo" msgid "Legbo"
msgstr "" msgstr "Легбо"
#. name for agc #. name for agc
msgid "Agatu" msgid "Agatu"
msgstr "" msgstr "Агату"
#. name for agd #. name for agd
msgid "Agarabi" msgid "Agarabi"
msgstr "" msgstr "Агараби"
#. name for age #. name for age
msgid "Angal" msgid "Angal"
msgstr "" msgstr "Ангал"
#. name for agf #. name for agf
msgid "Arguni" msgid "Arguni"
msgstr "" msgstr "Аргуни"
#. name for agg #. name for agg
msgid "Angor" msgid "Angor"
msgstr "" msgstr "Ангор"
#. name for agh #. name for agh
msgid "Ngelima" msgid "Ngelima"
msgstr "" msgstr "Нгелима"
#. name for agi #. name for agi
msgid "Agariya" msgid "Agariya"
@ -588,15 +588,15 @@ msgstr ""
#. name for agt #. name for agt
msgid "Agta; Central Cagayan" msgid "Agta; Central Cagayan"
msgstr "" msgstr "Агта;Централно Кагајански"
#. name for agu #. name for agu
msgid "Aguacateco" msgid "Aguacateco"
msgstr "" msgstr "Агвакатеко"
#. name for agv #. name for agv
msgid "Dumagat; Remontado" msgid "Dumagat; Remontado"
msgstr "" msgstr "Думагат;Ремонтадо"
#. name for agw #. name for agw
msgid "Kahua" msgid "Kahua"
@ -604,27 +604,27 @@ msgstr ""
#. name for agx #. name for agx
msgid "Aghul" msgid "Aghul"
msgstr "" msgstr "Агхул"
#. name for agy #. name for agy
msgid "Alta; Southern" msgid "Alta; Southern"
msgstr "" msgstr "Алта;Јужни"
#. name for agz #. name for agz
msgid "Agta; Mt. Iriga" msgid "Agta; Mt. Iriga"
msgstr "" msgstr "Агта;Мт.Ирига"
#. name for aha #. name for aha
msgid "Ahanta" msgid "Ahanta"
msgstr "" msgstr "Аханта"
#. name for ahb #. name for ahb
msgid "Axamb" msgid "Axamb"
msgstr "" msgstr "Аксамб"
#. name for ahg #. name for ahg
msgid "Qimant" msgid "Qimant"
msgstr "" msgstr "Кимант"
#. name for ahh #. name for ahh
msgid "Aghu" msgid "Aghu"
@ -668,95 +668,95 @@ msgstr ""
#. name for aht #. name for aht
msgid "Ahtena" msgid "Ahtena"
msgstr "" msgstr "Ахтена"
#. name for aia #. name for aia
msgid "Arosi" msgid "Arosi"
msgstr "" msgstr "Ароси"
#. name for aib #. name for aib
msgid "Ainu (China)" msgid "Ainu (China)"
msgstr "" msgstr "Аину(Кина)"
#. name for aic #. name for aic
msgid "Ainbai" msgid "Ainbai"
msgstr "" msgstr "Аинбаи"
#. name for aid #. name for aid
msgid "Alngith" msgid "Alngith"
msgstr "" msgstr "Алнгит"
#. name for aie #. name for aie
msgid "Amara" msgid "Amara"
msgstr "" msgstr "Амара"
#. name for aif #. name for aif
msgid "Agi" msgid "Agi"
msgstr "" msgstr "Аги"
#. name for aig #. name for aig
msgid "Creole English; Antigua and Barbuda" msgid "Creole English; Antigua and Barbuda"
msgstr "" msgstr "Креолски Енглески;Антигва и Барбуда"
#. name for aih #. name for aih
msgid "Ai-Cham" msgid "Ai-Cham"
msgstr "" msgstr "Аи-Чам"
#. name for aii #. name for aii
msgid "Neo-Aramaic; Assyrian" msgid "Neo-Aramaic; Assyrian"
msgstr "" msgstr "Ново-Арамејски;Асирски"
#. name for aij #. name for aij
msgid "Lishanid Noshan" msgid "Lishanid Noshan"
msgstr "" msgstr "Лианид Ношан"
#. name for aik #. name for aik
msgid "Ake" msgid "Ake"
msgstr "" msgstr "Аке"
#. name for ail #. name for ail
msgid "Aimele" msgid "Aimele"
msgstr "" msgstr "Ајмеле"
#. name for aim #. name for aim
msgid "Aimol" msgid "Aimol"
msgstr "" msgstr "Ајмол"
#. name for ain #. name for ain
msgid "Ainu (Japan)" msgid "Ainu (Japan)"
msgstr "" msgstr "Аину(Јапан)"
#. name for aio #. name for aio
msgid "Aiton" msgid "Aiton"
msgstr "" msgstr "Аитон"
#. name for aip #. name for aip
msgid "Burumakok" msgid "Burumakok"
msgstr "" msgstr "Бурумакок"
#. name for aiq #. name for aiq
msgid "Aimaq" msgid "Aimaq"
msgstr "" msgstr "Ајмак"
#. name for air #. name for air
msgid "Airoran" msgid "Airoran"
msgstr "" msgstr "Ајроран"
#. name for ais #. name for ais
msgid "Amis; Nataoran" msgid "Amis; Nataoran"
msgstr "" msgstr "Амис;Натаоран"
#. name for ait #. name for ait
msgid "Arikem" msgid "Arikem"
msgstr "" msgstr "Арикем"
#. name for aiw #. name for aiw
msgid "Aari" msgid "Aari"
msgstr "" msgstr "Аари"
#. name for aix #. name for aix
msgid "Aighon" msgid "Aighon"
msgstr "" msgstr "Аигхон"
#. name for aiy #. name for aiy
msgid "Ali" msgid "Ali"
@ -764,35 +764,35 @@ msgstr ""
#. name for aja #. name for aja
msgid "Aja (Sudan)" msgid "Aja (Sudan)"
msgstr "" msgstr "Аја(Судан)"
#. name for ajg #. name for ajg
msgid "Aja (Benin)" msgid "Aja (Benin)"
msgstr "" msgstr "Аја(Бенин)"
#. name for aji #. name for aji
msgid "Ajië" msgid "Ajië"
msgstr "" msgstr "Ајие"
#. name for ajp #. name for ajp
msgid "Arabic; South Levantine" msgid "Arabic; South Levantine"
msgstr "" msgstr "Арапски;Јужно-Левантински"
#. name for ajt #. name for ajt
msgid "Arabic; Judeo-Tunisian" msgid "Arabic; Judeo-Tunisian"
msgstr "" msgstr "Арапски;Јудео-Туниски"
#. name for aju #. name for aju
msgid "Arabic; Judeo-Moroccan" msgid "Arabic; Judeo-Moroccan"
msgstr "" msgstr "Арапски;Јудео-Марокански"
#. name for ajw #. name for ajw
msgid "Ajawa" msgid "Ajawa"
msgstr "" msgstr "Ајава"
#. name for ajz #. name for ajz
msgid "Karbi; Amri" msgid "Karbi; Amri"
msgstr "" msgstr "Карби;Амри"
#. name for aka #. name for aka
msgid "Akan" msgid "Akan"
@ -800,35 +800,35 @@ msgstr "акан"
#. name for akb #. name for akb
msgid "Batak Angkola" msgid "Batak Angkola"
msgstr "" msgstr "Батак Ангкола"
#. name for akc #. name for akc
msgid "Mpur" msgid "Mpur"
msgstr "" msgstr "Мпур"
#. name for akd #. name for akd
msgid "Ukpet-Ehom" msgid "Ukpet-Ehom"
msgstr "" msgstr "Укпет-Ехом"
#. name for ake #. name for ake
msgid "Akawaio" msgid "Akawaio"
msgstr "" msgstr "Акавајо"
#. name for akf #. name for akf
msgid "Akpa" msgid "Akpa"
msgstr "" msgstr "Акипа"
#. name for akg #. name for akg
msgid "Anakalangu" msgid "Anakalangu"
msgstr "" msgstr "Анакалангу"
#. name for akh #. name for akh
msgid "Angal Heneng" msgid "Angal Heneng"
msgstr "" msgstr "Ангал Хененг"
#. name for aki #. name for aki
msgid "Aiome" msgid "Aiome"
msgstr "" msgstr "Ајоме"
#. name for akj #. name for akj
msgid "Aka-Jeru" msgid "Aka-Jeru"

View File

@ -151,7 +151,8 @@ class Translations(POT): # {{{
self.info('\tCopying ISO 639 translations') self.info('\tCopying ISO 639 translations')
subprocess.check_call(['msgfmt', '-o', dest, iso639]) subprocess.check_call(['msgfmt', '-o', dest, iso639])
elif locale not in ('en_GB', 'en_CA', 'en_AU', 'si', 'ur', 'sc', elif locale not in ('en_GB', 'en_CA', 'en_AU', 'si', 'ur', 'sc',
'ltg', 'nds', 'te', 'yi', 'fo', 'sq', 'ast', 'ml', 'ku'): 'ltg', 'nds', 'te', 'yi', 'fo', 'sq', 'ast', 'ml', 'ku',
'fr_CA'):
self.warn('No ISO 639 translations for locale:', locale) self.warn('No ISO 639 translations for locale:', locale)
self.write_stats() self.write_stats()

View File

@ -132,12 +132,15 @@ class UploadInstallers(Command): # {{{
with open(os.path.join(tdir, 'fmap'), 'wb') as fo: with open(os.path.join(tdir, 'fmap'), 'wb') as fo:
for f, desc in files.iteritems(): for f, desc in files.iteritems():
fo.write('%s: %s\n'%(f, desc)) fo.write('%s: %s\n'%(f, desc))
while True:
try: try:
send_data(tdir) send_data(tdir)
except: except:
print('\nUpload to staging failed, retrying in a minute') print('\nUpload to staging failed, retrying in a minute')
time.sleep(60) time.sleep(60)
send_data(tdir) else:
break
def upload_to_google(self, replace): def upload_to_google(self, replace):
gdata = get_google_data() gdata = get_google_data()

View File

@ -4,7 +4,7 @@ __license__ = 'GPL v3'
__copyright__ = '2008, Kovid Goyal kovid@kovidgoyal.net' __copyright__ = '2008, Kovid Goyal kovid@kovidgoyal.net'
__docformat__ = 'restructuredtext en' __docformat__ = 'restructuredtext en'
__appname__ = u'calibre' __appname__ = u'calibre'
numeric_version = (0, 8, 41) numeric_version = (0, 8, 43)
__version__ = u'.'.join(map(unicode, numeric_version)) __version__ = u'.'.join(map(unicode, numeric_version))
__author__ = u"Kovid Goyal <kovid@kovidgoyal.net>" __author__ = u"Kovid Goyal <kovid@kovidgoyal.net>"

View File

@ -263,7 +263,7 @@ class MOBIMetadataReader(MetadataReaderPlugin):
description = _('Read metadata from %s files')%'MOBI' description = _('Read metadata from %s files')%'MOBI'
def get_metadata(self, stream, ftype): def get_metadata(self, stream, ftype):
from calibre.ebooks.mobi.reader import get_metadata from calibre.ebooks.metadata.mobi import get_metadata
return get_metadata(stream) return get_metadata(stream)
class ODTMetadataReader(MetadataReaderPlugin): class ODTMetadataReader(MetadataReaderPlugin):

View File

@ -379,6 +379,7 @@ class iPadOutput(OutputProfile):
/* Feed summary formatting */ /* Feed summary formatting */
.article_summary { .article_summary {
display:inline-block; display:inline-block;
padding-bottom:0.5em;
} }
.feed { .feed {
font-family:sans-serif; font-family:sans-serif;
@ -431,6 +432,15 @@ class iPadOutput(OutputProfile):
''' '''
# }}} # }}}
class iPad3Output(iPadOutput):
screen_size = comic_screen_size = (2048, 1536)
dpi = 264.0
name = 'iPad 3'
short_name = 'ipad3'
description = _('Intended for the iPad 3 and similar devices with a '
'resolution of 1536x2048')
class TabletOutput(iPadOutput): class TabletOutput(iPadOutput):
name = 'Tablet' name = 'Tablet'
short_name = 'tablet' short_name = 'tablet'
@ -754,7 +764,7 @@ class PocketBook900Output(OutputProfile):
output_profiles = [OutputProfile, SonyReaderOutput, SonyReader300Output, output_profiles = [OutputProfile, SonyReaderOutput, SonyReader300Output,
SonyReader900Output, MSReaderOutput, MobipocketOutput, HanlinV3Output, SonyReader900Output, MSReaderOutput, MobipocketOutput, HanlinV3Output,
HanlinV5Output, CybookG3Output, CybookOpusOutput, KindleOutput, HanlinV5Output, CybookG3Output, CybookOpusOutput, KindleOutput,
iPadOutput, KoboReaderOutput, TabletOutput, SamsungGalaxy, iPadOutput, iPad3Output, KoboReaderOutput, TabletOutput, SamsungGalaxy,
SonyReaderLandscapeOutput, KindleDXOutput, IlliadOutput, SonyReaderLandscapeOutput, KindleDXOutput, IlliadOutput,
IRexDR1000Output, IRexDR800Output, JetBook5Output, NookOutput, IRexDR1000Output, IRexDR800Output, JetBook5Output, NookOutput,
BambookOutput, NookColorOutput, PocketBook900Output, GenericEink, BambookOutput, NookColorOutput, PocketBook900Output, GenericEink,

View File

@ -51,8 +51,9 @@ Run an embedded python interpreter.
'with sqlite3 works.') 'with sqlite3 works.')
parser.add_option('-p', '--py-console', help='Run python console', parser.add_option('-p', '--py-console', help='Run python console',
default=False, action='store_true') default=False, action='store_true')
parser.add_option('-m', '--inspect-mobi', parser.add_option('-m', '--inspect-mobi', action='store_true',
help='Inspect the MOBI file at the specified path', default=None) default=False,
help='Inspect the MOBI file(s) at the specified path(s)')
parser.add_option('--test-build', help='Test binary modules in build', parser.add_option('--test-build', help='Test binary modules in build',
action='store_true', default=False) action='store_true', default=False)
@ -232,9 +233,13 @@ def main(args=sys.argv):
if len(args) > 1 and os.access(args[-1], os.R_OK): if len(args) > 1 and os.access(args[-1], os.R_OK):
sql_dump = args[-1] sql_dump = args[-1]
reinit_db(opts.reinitialize_db, sql_dump=sql_dump) reinit_db(opts.reinitialize_db, sql_dump=sql_dump)
elif opts.inspect_mobi is not None: elif opts.inspect_mobi:
from calibre.ebooks.mobi.debug import inspect_mobi from calibre.ebooks.mobi.debug import inspect_mobi
inspect_mobi(opts.inspect_mobi) for path in args[1:]:
prints('Inspecting:', path)
inspect_mobi(path)
print
elif opts.test_build: elif opts.test_build:
from calibre.test_build import test from calibre.test_build import test
test() test()

View File

@ -81,7 +81,7 @@ class ANDROID(USBMS):
0x4e11 : [0x0100, 0x226, 0x227], 0x4e11 : [0x0100, 0x226, 0x227],
0x4e12 : [0x0100, 0x226, 0x227], 0x4e12 : [0x0100, 0x226, 0x227],
0x4e21 : [0x0100, 0x226, 0x227, 0x231], 0x4e21 : [0x0100, 0x226, 0x227, 0x231],
0x4e22 : [0x0100, 0x226, 0x227], 0x4e22 : [0x0100, 0x226, 0x227, 0x231],
0xb058 : [0x0222, 0x226, 0x227], 0xb058 : [0x0222, 0x226, 0x227],
0x0ff9 : [0x0226], 0x0ff9 : [0x0226],
0xdddd : [0x216], 0xdddd : [0x216],
@ -194,7 +194,8 @@ class ANDROID(USBMS):
'__UMS_COMPOSITE', 'SGH-I997_CARD', 'MB870', 'ALPANDIGITAL', '__UMS_COMPOSITE', 'SGH-I997_CARD', 'MB870', 'ALPANDIGITAL',
'ANDROID_MID', 'P990_SD_CARD', '.K080', 'LTE_CARD', 'MB853', 'ANDROID_MID', 'P990_SD_CARD', '.K080', 'LTE_CARD', 'MB853',
'A1-07___C0541A4F', 'XT912', 'MB855', 'XT910', 'BOOK_A10_CARD', 'A1-07___C0541A4F', 'XT912', 'MB855', 'XT910', 'BOOK_A10_CARD',
'USB_2.0_DRIVER', 'I9100T', 'P999DW_SD_CARD', 'KTABLET_PC'] 'USB_2.0_DRIVER', 'I9100T', 'P999DW_SD_CARD', 'KTABLET_PC',
'FILE-CD_GADGET']
OSX_MAIN_MEM = 'Android Device Main Memory' OSX_MAIN_MEM = 'Android Device Main Memory'

View File

@ -10,7 +10,7 @@ Generates and writes an APNX page mapping file.
import struct import struct
from calibre.ebooks.mobi.reader import MobiReader from calibre.ebooks.mobi.reader.mobi6 import MobiReader
from calibre.ebooks.pdb.header import PdbHeaderReader from calibre.ebooks.pdb.header import PdbHeaderReader
from calibre.utils.logging import default_log from calibre.utils.logging import default_log

View File

@ -31,7 +31,7 @@ BOOK_EXTENSIONS = ['lrf', 'rar', 'zip', 'rtf', 'lit', 'txt', 'txtz', 'text', 'ht
'epub', 'fb2', 'djv', 'djvu', 'lrx', 'cbr', 'cbz', 'cbc', 'oebzip', 'epub', 'fb2', 'djv', 'djvu', 'lrx', 'cbr', 'cbz', 'cbc', 'oebzip',
'rb', 'imp', 'odt', 'chm', 'tpz', 'azw1', 'pml', 'pmlz', 'mbp', 'tan', 'snb', 'rb', 'imp', 'odt', 'chm', 'tpz', 'azw1', 'pml', 'pmlz', 'mbp', 'tan', 'snb',
'xps', 'oxps', 'azw4', 'book', 'zbf', 'pobi', 'docx', 'md', 'xps', 'oxps', 'azw4', 'book', 'zbf', 'pobi', 'docx', 'md',
'textile', 'markdown'] 'textile', 'markdown', 'ibook', 'iba']
class HTMLRenderer(object): class HTMLRenderer(object):

View File

@ -190,12 +190,22 @@ class EPUBOutput(OutputFormatPlugin):
if x.get(OPF('scheme'), None).lower() == 'uuid' or unicode(x).startswith('urn:uuid:'): if x.get(OPF('scheme'), None).lower() == 'uuid' or unicode(x).startswith('urn:uuid:'):
uuid = unicode(x).split(':')[-1] uuid = unicode(x).split(':')[-1]
break break
encrypted_fonts = getattr(input_plugin, 'encrypted_fonts', [])
if uuid is None: if uuid is None:
self.log.warn('No UUID identifier found') self.log.warn('No UUID identifier found')
from uuid import uuid4 from uuid import uuid4
uuid = str(uuid4()) uuid = str(uuid4())
oeb.metadata.add('identifier', uuid, scheme='uuid', id=uuid) oeb.metadata.add('identifier', uuid, scheme='uuid', id=uuid)
if encrypted_fonts and not uuid.startswith('urn:uuid:'):
# Apparently ADE requires this value to start with urn:uuid:
# for some absurd reason, or it will throw a hissy fit and refuse
# to use the obfuscated fonts.
for x in identifiers:
if unicode(x) == uuid:
x.content = 'urn:uuid:'+uuid
with TemporaryDirectory(u'_epub_output') as tdir: with TemporaryDirectory(u'_epub_output') as tdir:
from calibre.customize.ui import plugin_for_output_format from calibre.customize.ui import plugin_for_output_format
metadata_xml = None metadata_xml = None
@ -210,7 +220,6 @@ class EPUBOutput(OutputFormatPlugin):
opf = [x for x in os.listdir(tdir) if x.endswith('.opf')][0] opf = [x for x in os.listdir(tdir) if x.endswith('.opf')][0]
self.condense_ncx([os.path.join(tdir, x) for x in os.listdir(tdir)\ self.condense_ncx([os.path.join(tdir, x) for x in os.listdir(tdir)\
if x.endswith('.ncx')][0]) if x.endswith('.ncx')][0])
encrypted_fonts = getattr(input_plugin, 'encrypted_fonts', [])
encryption = None encryption = None
if encrypted_fonts: if encrypted_fonts:
encryption = self.encrypt_fonts(encrypted_fonts, tdir, uuid) encryption = self.encrypt_fonts(encrypted_fonts, tdir, uuid)

View File

@ -3,8 +3,26 @@ __license__ = 'GPL 3'
__copyright__ = '2009, Kovid Goyal <kovid@kovidgoyal.net>' __copyright__ = '2009, Kovid Goyal <kovid@kovidgoyal.net>'
__docformat__ = 'restructuredtext en' __docformat__ = 'restructuredtext en'
import os
from calibre.customize.conversion import InputFormatPlugin from calibre.customize.conversion import InputFormatPlugin
def run_mobi_unpack(stream, options, log, accelerators):
from mobiunpack.mobi_unpack import Mobi8Reader
from calibre.customize.ui import plugin_for_input_format
from calibre.ptempfile import PersistentTemporaryDirectory
wdir = PersistentTemporaryDirectory('_unpack_space')
m8r = Mobi8Reader(stream, wdir)
if m8r.isK8():
epub_path = m8r.processMobi8()
epub_input = plugin_for_input_format('epub')
for opt in epub_input.options:
setattr(options, opt.option.name, opt.recommended_value)
options.input_encoding = m8r.getCodec()
return epub_input.convert(open(epub_path,'rb'), options,
'epub', log, accelerators)
class MOBIInput(InputFormatPlugin): class MOBIInput(InputFormatPlugin):
name = 'MOBI Input' name = 'MOBI Input'
@ -14,18 +32,38 @@ class MOBIInput(InputFormatPlugin):
def convert(self, stream, options, file_ext, log, def convert(self, stream, options, file_ext, log,
accelerators): accelerators):
from calibre.ebooks.mobi.reader import MobiReader
if os.environ.get('USE_MOBIUNPACK', None) is not None:
pos = stream.tell()
try:
return run_mobi_unpack(stream, options, log, accelerators)
except Exception:
log.exception('mobi_unpack code not working')
stream.seek(pos)
from calibre.ebooks.mobi.reader.mobi6 import MobiReader
from lxml import html from lxml import html
parse_cache = {} parse_cache = {}
try: try:
mr = MobiReader(stream, log, options.input_encoding, mr = MobiReader(stream, log, options.input_encoding,
options.debug_pipeline) options.debug_pipeline)
if mr.kf8_type is None:
mr.extract_content(u'.', parse_cache) mr.extract_content(u'.', parse_cache)
except: except:
mr = MobiReader(stream, log, options.input_encoding, mr = MobiReader(stream, log, options.input_encoding,
options.debug_pipeline, try_extra_data_fix=True) options.debug_pipeline, try_extra_data_fix=True)
if mr.kf8_type is None:
mr.extract_content(u'.', parse_cache) mr.extract_content(u'.', parse_cache)
if mr.kf8_type is not None:
log('Found KF8 MOBI of type %r'%mr.kf8_type)
from calibre.ebooks.mobi.reader.mobi8 import Mobi8Reader
mr = Mobi8Reader(mr, log)
opf = os.path.abspath(mr())
self.encrypted_fonts = mr.encrypted_fonts
return opf
raw = parse_cache.pop('calibre_raw_mobi_markup', False) raw = parse_cache.pop('calibre_raw_mobi_markup', False)
if raw: if raw:
if isinstance(raw, unicode): if isinstance(raw, unicode):

View File

@ -18,9 +18,6 @@ class MOBIOutput(OutputFormatPlugin):
file_type = 'mobi' file_type = 'mobi'
options = set([ options = set([
OptionRecommendation(name='rescale_images', recommended_value=False,
help=_('Modify images to meet Palm device size limitations.')
),
OptionRecommendation(name='prefer_author_sort', OptionRecommendation(name='prefer_author_sort',
recommended_value=False, level=OptionRecommendation.LOW, recommended_value=False, level=OptionRecommendation.LOW,
help=_('When present, use author sort field as author.') help=_('When present, use author sort field as author.')
@ -59,7 +56,16 @@ class MOBIOutput(OutputFormatPlugin):
help=_('Enable sharing of book content via Facebook etc. ' help=_('Enable sharing of book content via Facebook etc. '
' on the Kindle. WARNING: Using this feature means that ' ' on the Kindle. WARNING: Using this feature means that '
' the book will not auto sync its last read position ' ' the book will not auto sync its last read position '
' on multiple devices. Complain to Amazon.')) ' on multiple devices. Complain to Amazon.')
),
OptionRecommendation(name='mobi_keep_original_images',
recommended_value=False,
help=_('By default calibre converts all images to JPEG format '
'in the output MOBI file. This is for maximum compatibility '
'as some older MOBI viewers have problems with other image '
'formats. This option tells calibre not to do this. '
'Useful if your document contains lots of GIF/PNG images that '
'become very large when converted to JPEG.')),
]) ])
def check_for_periodical(self): def check_for_periodical(self):
@ -167,12 +173,7 @@ class MOBIOutput(OutputFormatPlugin):
mobimlizer(oeb, opts) mobimlizer(oeb, opts)
self.check_for_periodical() self.check_for_periodical()
write_page_breaks_after_item = input_plugin is not plugin_for_input_format('cbz') write_page_breaks_after_item = input_plugin is not plugin_for_input_format('cbz')
from calibre.utils.config import tweaks
if tweaks.get('new_mobi_writer', True):
from calibre.ebooks.mobi.writer2.main import MobiWriter from calibre.ebooks.mobi.writer2.main import MobiWriter
MobiWriter
else:
from calibre.ebooks.mobi.writer import MobiWriter
writer = MobiWriter(opts, writer = MobiWriter(opts,
write_page_breaks_after_item=write_page_breaks_after_item) write_page_breaks_after_item=write_page_breaks_after_item)
writer(oeb, output_path) writer(oeb, output_path)

View File

@ -289,10 +289,17 @@ class CSSPreProcessor(object):
data = self.MS_PAT.sub(self.ms_sub, data) data = self.MS_PAT.sub(self.ms_sub, data)
if not add_namespace: if not add_namespace:
return data return data
# Remove comments as the following namespace logic will break if there
# are commented lines before the first @import or @charset rule. Since
# the conversion will remove all stylesheets anyway, we don't lose
# anything
data = re.sub(ur'/\*.*?\*/', u'', data, flags=re.DOTALL)
ans, namespaced = [], False ans, namespaced = [], False
for line in data.splitlines(): for line in data.splitlines():
ll = line.lstrip() ll = line.lstrip()
if not (namespaced or ll.startswith('@import') or if not (namespaced or ll.startswith('@import') or not ll or
ll.startswith('@charset')): ll.startswith('@charset')):
ans.append(XHTML_CSS_NAMESPACE.strip()) ans.append(XHTML_CSS_NAMESPACE.strip())
namespaced = True namespaced = True

View File

@ -9,16 +9,19 @@ import copy, traceback
from calibre import prints from calibre import prints
from calibre.constants import DEBUG from calibre.constants import DEBUG
from calibre.ebooks.metadata.book import SC_COPYABLE_FIELDS from calibre.ebooks.metadata.book import (SC_COPYABLE_FIELDS,
from calibre.ebooks.metadata.book import SC_FIELDS_COPY_NOT_NULL SC_FIELDS_COPY_NOT_NULL, STANDARD_METADATA_FIELDS,
from calibre.ebooks.metadata.book import STANDARD_METADATA_FIELDS TOP_LEVEL_IDENTIFIERS, ALL_METADATA_FIELDS)
from calibre.ebooks.metadata.book import TOP_LEVEL_IDENTIFIERS
from calibre.ebooks.metadata.book import ALL_METADATA_FIELDS
from calibre.library.field_metadata import FieldMetadata from calibre.library.field_metadata import FieldMetadata
from calibre.utils.date import isoformat, format_date from calibre.utils.date import isoformat, format_date
from calibre.utils.icu import sort_key from calibre.utils.icu import sort_key
from calibre.utils.formatter import TemplateFormatter from calibre.utils.formatter import TemplateFormatter
# Special sets used to optimize the performance of getting and setting
# attributes on Metadata objects
SIMPLE_GET = frozenset(STANDARD_METADATA_FIELDS - TOP_LEVEL_IDENTIFIERS)
SIMPLE_SET = frozenset(SIMPLE_GET - {'identifiers'})
def human_readable(size, precision=2): def human_readable(size, precision=2):
""" Convert a size in bytes into megabytes """ """ Convert a size in bytes into megabytes """
return ('%.'+str(precision)+'f'+ 'MB') % ((size/(1024.*1024.)),) return ('%.'+str(precision)+'f'+ 'MB') % ((size/(1024.*1024.)),)
@ -136,6 +139,8 @@ class Metadata(object):
def __getattribute__(self, field): def __getattribute__(self, field):
_data = object.__getattribute__(self, '_data') _data = object.__getattribute__(self, '_data')
if field in SIMPLE_GET:
return _data.get(field, None)
if field in TOP_LEVEL_IDENTIFIERS: if field in TOP_LEVEL_IDENTIFIERS:
return _data.get('identifiers').get(field, None) return _data.get('identifiers').get(field, None)
if field == 'language': if field == 'language':
@ -143,8 +148,6 @@ class Metadata(object):
return _data.get('languages', [])[0] return _data.get('languages', [])[0]
except: except:
return NULL_VALUES['language'] return NULL_VALUES['language']
if field in STANDARD_METADATA_FIELDS:
return _data.get(field, None)
try: try:
return object.__getattribute__(self, field) return object.__getattribute__(self, field)
except AttributeError: except AttributeError:
@ -173,7 +176,11 @@ class Metadata(object):
def __setattr__(self, field, val, extra=None): def __setattr__(self, field, val, extra=None):
_data = object.__getattribute__(self, '_data') _data = object.__getattribute__(self, '_data')
if field in TOP_LEVEL_IDENTIFIERS: if field in SIMPLE_SET:
if val is None:
val = copy.copy(NULL_VALUES.get(field, None))
_data[field] = val
elif field in TOP_LEVEL_IDENTIFIERS:
field, val = self._clean_identifier(field, val) field, val = self._clean_identifier(field, val)
identifiers = _data['identifiers'] identifiers = _data['identifiers']
identifiers.pop(field, None) identifiers.pop(field, None)
@ -188,10 +195,6 @@ class Metadata(object):
if val and val.lower() != 'und': if val and val.lower() != 'und':
langs = [val] langs = [val]
_data['languages'] = langs _data['languages'] = langs
elif field in STANDARD_METADATA_FIELDS:
if val is None:
val = copy.copy(NULL_VALUES.get(field, None))
_data[field] = val
elif field in _data['user_metadata'].iterkeys(): elif field in _data['user_metadata'].iterkeys():
_data['user_metadata'][field]['#value#'] = val _data['user_metadata'][field]['#value#'] = val
_data['user_metadata'][field]['#extra#'] = extra _data['user_metadata'][field]['#extra#'] = extra
@ -404,9 +407,19 @@ class Metadata(object):
''' '''
if metadata is None: if metadata is None:
traceback.print_stack() traceback.print_stack()
return
um = {}
for key, meta in metadata.iteritems():
m = meta.copy()
if '#value#' not in m:
if m['datatype'] == 'text' and m['is_multiple']:
m['#value#'] = []
else: else:
for key in metadata: m['#value#'] = None
self.set_user_metadata(key, metadata[key]) um[key] = m
_data = object.__getattribute__(self, '_data')
_data['user_metadata'].update(um)
def set_user_metadata(self, field, metadata): def set_user_metadata(self, field, metadata):
''' '''
@ -420,9 +433,11 @@ class Metadata(object):
if metadata is None: if metadata is None:
traceback.print_stack() traceback.print_stack()
return return
m = {} m = dict(metadata)
for k in metadata: # Copying the elements should not be necessary. The objects referenced
m[k] = copy.copy(metadata[k]) # in the dict should not change. Of course, they can be replaced.
# for k,v in metadata.iteritems():
# m[k] = copy.copy(v)
if '#value#' not in m: if '#value#' not in m:
if m['datatype'] == 'text' and m['is_multiple']: if m['datatype'] == 'text' and m['is_multiple']:
m['#value#'] = [] m['#value#'] = []
@ -543,6 +558,7 @@ class Metadata(object):
# Happens if x is not a text, is_multiple field # Happens if x is not a text, is_multiple field
# on self # on self
lstags = [] lstags = []
self_tags = []
ot, st = map(frozenset, (lotags, lstags)) ot, st = map(frozenset, (lotags, lstags))
for t in st.intersection(ot): for t in st.intersection(ot):
sidx = lstags.index(t) sidx = lstags.index(t)

View File

@ -9,15 +9,21 @@ __copyright__ = '2009, Kovid Goyal kovid@kovidgoyal.net and ' \
'Marshall T. Vandegrift <llasram@gmail.com>' 'Marshall T. Vandegrift <llasram@gmail.com>'
__docformat__ = 'restructuredtext en' __docformat__ = 'restructuredtext en'
import os, cStringIO, imghdr
from struct import pack, unpack from struct import pack, unpack
from cStringIO import StringIO from cStringIO import StringIO
from calibre.ebooks import normalize from calibre.ebooks import normalize
from calibre.ebooks.mobi import MobiError from calibre.ebooks.mobi import MobiError, MAX_THUMB_DIMEN
from calibre.ebooks.mobi.writer import rescale_image, MAX_THUMB_DIMEN from calibre.ebooks.mobi.utils import rescale_image
from calibre.ebooks.mobi.langcodes import iana2mobi from calibre.ebooks.mobi.langcodes import iana2mobi
from calibre.utils.date import now as nowf from calibre.utils.date import now as nowf
def is_image(ss):
if ss is None:
return False
return imghdr.what(None, ss[:200]) is not None
class StreamSlicer(object): class StreamSlicer(object):
def __init__(self, stream, start=0, stop=None): def __init__(self, stream, start=0, stop=None):
@ -160,11 +166,10 @@ class MetadataUpdater(object):
if id == 106: if id == 106:
self.timestamp = content self.timestamp = content
elif id == 201: elif id == 201:
rindex, = self.cover_rindex, = unpack('>i', content) rindex, = self.cover_rindex, = unpack('>I', content)
if rindex > 0 :
self.cover_record = self.record(rindex + image_base) self.cover_record = self.record(rindex + image_base)
elif id == 202: elif id == 202:
rindex, = self.thumbnail_rindex, = unpack('>i', content) rindex, = self.thumbnail_rindex, = unpack('>I', content)
if rindex > 0 : if rindex > 0 :
self.thumbnail_record = self.record(rindex + image_base) self.thumbnail_record = self.record(rindex + image_base)
@ -415,17 +420,17 @@ class MetadataUpdater(object):
except: except:
pass pass
else: else:
if self.cover_record is not None: if is_image(self.cover_record):
size = len(self.cover_record) size = len(self.cover_record)
cover = rescale_image(data, size) cover = rescale_image(data, size)
if len(cover) <= size: if len(cover) <= size:
cover += '\0' * (size - len(cover)) cover += b'\0' * (size - len(cover))
self.cover_record[:] = cover self.cover_record[:] = cover
if self.thumbnail_record is not None: if is_image(self.thumbnail_record):
size = len(self.thumbnail_record) size = len(self.thumbnail_record)
thumbnail = rescale_image(data, size, dimen=MAX_THUMB_DIMEN) thumbnail = rescale_image(data, size, dimen=MAX_THUMB_DIMEN)
if len(thumbnail) <= size: if len(thumbnail) <= size:
thumbnail += '\0' * (size - len(thumbnail)) thumbnail += b'\0' * (size - len(thumbnail))
self.thumbnail_record[:] = thumbnail self.thumbnail_record[:] = thumbnail
return return
@ -433,3 +438,75 @@ def set_metadata(stream, mi):
mu = MetadataUpdater(stream) mu = MetadataUpdater(stream)
mu.update(mi) mu.update(mi)
return return
def get_metadata(stream):
from calibre.ebooks.metadata import MetaInformation
from calibre.ptempfile import TemporaryDirectory
from calibre.ebooks.mobi.reader.headers import MetadataHeader
from calibre.ebooks.mobi.reader.mobi6 import MobiReader
from calibre import CurrentDir
try:
from PIL import Image as PILImage
PILImage
except ImportError:
import Image as PILImage
stream.seek(0)
try:
raw = stream.read(3)
except:
raw = ''
stream.seek(0)
if raw == b'TPZ':
from calibre.ebooks.metadata.topaz import get_metadata
return get_metadata(stream)
from calibre.utils.logging import Log
log = Log()
try:
mi = MetaInformation(os.path.basename(stream.name), [_('Unknown')])
except:
mi = MetaInformation(_('Unknown'), [_('Unknown')])
mh = MetadataHeader(stream, log)
if mh.title and mh.title != _('Unknown'):
mi.title = mh.title
if mh.exth is not None:
if mh.exth.mi is not None:
mi = mh.exth.mi
else:
size = 1024**3
if hasattr(stream, 'seek') and hasattr(stream, 'tell'):
pos = stream.tell()
stream.seek(0, 2)
size = stream.tell()
stream.seek(pos)
if size < 4*1024*1024:
with TemporaryDirectory('_mobi_meta_reader') as tdir:
with CurrentDir(tdir):
mr = MobiReader(stream, log)
parse_cache = {}
mr.extract_content(tdir, parse_cache)
if mr.embedded_mi is not None:
mi = mr.embedded_mi
if hasattr(mh.exth, 'cover_offset'):
cover_index = mh.first_image_index + mh.exth.cover_offset
data = mh.section_data(int(cover_index))
else:
try:
data = mh.section_data(mh.first_image_index)
except:
data = ''
buf = cStringIO.StringIO(data)
try:
im = PILImage.open(buf)
except:
log.exception('Failed to read MOBI cover')
else:
obuf = cStringIO.StringIO()
im.convert('RGB').save(obuf, format='JPEG')
mi.cover_data = ('jpg', obuf.getvalue())
return mi

View File

@ -1148,7 +1148,8 @@ class OPFCreator(Metadata):
self.manifest = Manifest.from_paths(entries) self.manifest = Manifest.from_paths(entries)
self.manifest.set_basedir(self.base_path) self.manifest.set_basedir(self.base_path)
def create_manifest_from_files_in(self, files_and_dirs): def create_manifest_from_files_in(self, files_and_dirs,
exclude=lambda x:False):
entries = [] entries = []
def dodir(dir): def dodir(dir):
@ -1156,7 +1157,7 @@ class OPFCreator(Metadata):
root, files = spec[0], spec[-1] root, files = spec[0], spec[-1]
for name in files: for name in files:
path = os.path.join(root, name) path = os.path.join(root, name)
if os.path.isfile(path): if os.path.isfile(path) and not exclude(path):
entries.append((path, None)) entries.append((path, None))
for i in files_and_dirs: for i in files_and_dirs:

View File

@ -46,7 +46,7 @@ class TOC(list):
self.toc_thumbnail = toc_thumbnail self.toc_thumbnail = toc_thumbnail
def __str__(self): def __str__(self):
lines = ['TOC: %s#%s'%(self.href, self.fragment)] lines = ['TOC: %s#%s %s'%(self.href, self.fragment, self.text)]
for child in self: for child in self:
c = str(child).splitlines() c = str(child).splitlines()
for l in c: for l in c:

View File

@ -6,3 +6,8 @@ __copyright__ = '2008, Kovid Goyal <kovid at kovidgoyal.net>'
class MobiError(Exception): class MobiError(Exception):
pass pass
MAX_THUMB_SIZE = 16 * 1024
MAX_THUMB_DIMEN = (180, 240)

View File

@ -14,8 +14,11 @@ from lxml import html
from calibre.utils.date import utc_tz from calibre.utils.date import utc_tz
from calibre.ebooks.mobi.langcodes import main_language, sub_language from calibre.ebooks.mobi.langcodes import main_language, sub_language
from calibre.ebooks.mobi.reader.headers import NULL_INDEX
from calibre.ebooks.mobi.reader.index import (parse_index_record,
parse_tagx_section)
from calibre.ebooks.mobi.utils import (decode_hex_number, decint, from calibre.ebooks.mobi.utils import (decode_hex_number, decint,
get_trailing_data, decode_tbs) get_trailing_data, decode_tbs, read_font_record)
from calibre.utils.magick.draw import identify_data from calibre.utils.magick.draw import identify_data
def format_bytes(byts): def format_bytes(byts):
@ -151,6 +154,10 @@ class EXTHRecord(object):
117 : 'adult', 117 : 'adult',
118 : 'retailprice', 118 : 'retailprice',
119 : 'retailpricecurrency', 119 : 'retailpricecurrency',
121 : 'KF8 header section index',
125 : 'KF8 resources (images/fonts) count',
129 : 'KF8 cover URI',
131 : 'KF8 unknown count',
201 : 'coveroffset', 201 : 'coveroffset',
202 : 'thumboffset', 202 : 'thumboffset',
203 : 'hasfakecover', 203 : 'hasfakecover',
@ -169,9 +176,10 @@ class EXTHRecord(object):
503 : 'updatedtitle', 503 : 'updatedtitle',
}.get(self.type, repr(self.type)) }.get(self.type, repr(self.type))
if self.name in ('coveroffset', 'thumboffset', 'hasfakecover', if (self.name in {'coveroffset', 'thumboffset', 'hasfakecover',
'Creator Major Version', 'Creator Minor Version', 'Creator Major Version', 'Creator Minor Version',
'Creator Build Number', 'Creator Software', 'startreading'): 'Creator Build Number', 'Creator Software', 'startreading'} or
self.type in {121, 125, 131}):
self.data, = struct.unpack(b'>I', self.data) self.data, = struct.unpack(b'>I', self.data)
def __str__(self): def __str__(self):
@ -338,9 +346,9 @@ class MOBIHeader(object): # {{{
ans.append('File version: %d'%self.file_version) ans.append('File version: %d'%self.file_version)
ans.append('Reserved: %r'%self.reserved) ans.append('Reserved: %r'%self.reserved)
ans.append('Secondary index record: %d (null val: %d)'%( ans.append('Secondary index record: %d (null val: %d)'%(
self.secondary_index_record, 0xffffffff)) self.secondary_index_record, NULL_INDEX))
ans.append('Reserved2: %r'%self.reserved2) ans.append('Reserved2: %r'%self.reserved2)
ans.append('First non-book record (null value: %d): %d'%(0xffffffff, ans.append('First non-book record (null value: %d): %d'%(NULL_INDEX,
self.first_non_book_record)) self.first_non_book_record))
ans.append('Full name offset: %d'%self.fullname_offset) ans.append('Full name offset: %d'%self.fullname_offset)
ans.append('Full name length: %d bytes'%self.fullname_length) ans.append('Full name length: %d bytes'%self.fullname_length)
@ -379,7 +387,7 @@ class MOBIHeader(object): # {{{
'(has indexing: %s) (has uncrossable breaks: %s)')%( '(has indexing: %s) (has uncrossable breaks: %s)')%(
bin(self.extra_data_flags), self.has_multibytes, bin(self.extra_data_flags), self.has_multibytes,
self.has_indexing_bytes, self.has_uncrossable_breaks )) self.has_indexing_bytes, self.has_uncrossable_breaks ))
ans.append('Primary index record (null value: %d): %d'%(0xffffffff, ans.append('Primary index record (null value: %d): %d'%(NULL_INDEX,
self.primary_index_record)) self.primary_index_record))
ans = '\n'.join(ans) ans = '\n'.join(ans)
@ -399,14 +407,10 @@ class MOBIHeader(object): # {{{
class TagX(object): # {{{ class TagX(object): # {{{
def __init__(self, raw, control_byte_count): def __init__(self, tag, num_values, bitmask, eof):
self.tag = ord(raw[0]) self.tag, self.num_values, self.bitmask, self.eof = (tag, num_values,
self.num_values = ord(raw[1]) bitmask, eof)
self.bitmask = ord(raw[2]) self.num_of_values = num_values
# End of file = 1 iff last entry
# When it is 1 all others are 0
self.eof = ord(raw[3])
self.is_eof = (self.eof == 1 and self.tag == 0 and self.num_values == 0 self.is_eof = (self.eof == 1 and self.tag == 0 and self.num_values == 0
and self.bitmask == 0) and self.bitmask == 0)
@ -453,14 +457,7 @@ class SecondaryIndexHeader(object): # {{{
raise ValueError('Invalid TAGX section') raise ValueError('Invalid TAGX section')
self.tagx_header_length, = struct.unpack('>I', tagx[4:8]) self.tagx_header_length, = struct.unpack('>I', tagx[4:8])
self.tagx_control_byte_count, = struct.unpack('>I', tagx[8:12]) self.tagx_control_byte_count, = struct.unpack('>I', tagx[8:12])
tag_table = tagx[12:self.tagx_header_length] self.tagx_entries = [TagX(*x) for x in parse_tagx_section(tagx)[1]]
if len(tag_table) % 4 != 0:
raise ValueError('Invalid Tag table')
num_tagx_entries = len(tag_table) // 4
self.tagx_entries = []
for i in range(num_tagx_entries):
self.tagx_entries.append(TagX(tag_table[i*4:(i+1)*4],
self.tagx_control_byte_count))
if self.tagx_entries and not self.tagx_entries[-1].is_eof: if self.tagx_entries and not self.tagx_entries[-1].is_eof:
raise ValueError('TAGX last entry is not EOF') raise ValueError('TAGX last entry is not EOF')
@ -528,7 +525,8 @@ class IndexHeader(object): # {{{
raise ValueError('Invalid Primary Index Record') raise ValueError('Invalid Primary Index Record')
self.header_length, = struct.unpack('>I', raw[4:8]) self.header_length, = struct.unpack('>I', raw[4:8])
self.unknown1 = raw[8:16] self.unknown1 = raw[8:12]
self.header_type, = struct.unpack('>I', raw[12:16])
self.index_type, = struct.unpack('>I', raw[16:20]) self.index_type, = struct.unpack('>I', raw[16:20])
self.index_type_desc = {0: 'normal', 2: self.index_type_desc = {0: 'normal', 2:
'inflection', 6: 'calibre'}.get(self.index_type, 'unknown') 'inflection', 6: 'calibre'}.get(self.index_type, 'unknown')
@ -557,14 +555,7 @@ class IndexHeader(object): # {{{
raise ValueError('Invalid TAGX section') raise ValueError('Invalid TAGX section')
self.tagx_header_length, = struct.unpack('>I', tagx[4:8]) self.tagx_header_length, = struct.unpack('>I', tagx[4:8])
self.tagx_control_byte_count, = struct.unpack('>I', tagx[8:12]) self.tagx_control_byte_count, = struct.unpack('>I', tagx[8:12])
tag_table = tagx[12:self.tagx_header_length] self.tagx_entries = [TagX(*x) for x in parse_tagx_section(tagx)[1]]
if len(tag_table) % 4 != 0:
raise ValueError('Invalid Tag table')
num_tagx_entries = len(tag_table) // 4
self.tagx_entries = []
for i in range(num_tagx_entries):
self.tagx_entries.append(TagX(tag_table[i*4:(i+1)*4],
self.tagx_control_byte_count))
if self.tagx_entries and not self.tagx_entries[-1].is_eof: if self.tagx_entries and not self.tagx_entries[-1].is_eof:
raise ValueError('TAGX last entry is not EOF') raise ValueError('TAGX last entry is not EOF')
@ -598,6 +589,7 @@ class IndexHeader(object): # {{{
a('Header length: %d'%self.header_length) a('Header length: %d'%self.header_length)
u(self.unknown1) u(self.unknown1)
a('Header type: %d'%self.header_type)
a('Index Type: %s (%d)'%(self.index_type_desc, self.index_type)) a('Index Type: %s (%d)'%(self.index_type_desc, self.index_type))
a('Offset to IDXT start: %d'%self.idxt_start) a('Offset to IDXT start: %d'%self.idxt_start)
a('Number of index records: %d'%self.index_count) a('Number of index records: %d'%self.index_count)
@ -634,77 +626,40 @@ class Tag(object): # {{{
TAG_MAP = { TAG_MAP = {
1: ('offset', 'Offset in HTML'), 1: ('offset', 'Offset in HTML'),
2: ('size', 'Size in HTML'), 2: ('size', 'Size in HTML'),
3: ('label_offset', 'Offset to label in CNCX'), 3: ('label_offset', 'Label offset in CNCX'),
4: ('depth', 'Depth of this entry in TOC'), 4: ('depth', 'Depth of this entry in TOC'),
5: ('class_offset', 'Class offset in CNCX'),
6: ('pos_fid', 'File Index'),
11: ('secondary', '[unknown, unknown, ' 11: ('secondary', '[unknown, unknown, '
'tag type from TAGX in primary index header]'), 'tag type from TAGX in primary index header]'),
# The remaining tag types have to be interpreted subject to the type 21: ('parent_index', 'Parent'),
# of index entry they are present in 22: ('first_child_index', 'First child'),
23: ('last_child_index', 'Last child'),
69 : ('image_index', 'Offset from first image record to the'
' image record associated with this entry'
' (masthead for periodical or thumbnail for'
' article entry).'),
70 : ('desc_offset', 'Description offset in cncx'),
71 : ('author_offset', 'Author offset in cncx'),
72 : ('image_caption_offset', 'Image caption offset in cncx'),
73 : ('image_attr_offset', 'Image attribution offset in cncx'),
} }
INTERPRET_MAP = { def __init__(self, tag_type, vals, cncx):
'subchapter': {
21 : ('Parent chapter index', 'parent_index')
},
'article' : {
5 : ('Class offset in cncx', 'class_offset'),
21 : ('Parent section index', 'parent_index'),
69 : ('Offset from first image record num to the'
' image record associated with this article',
'image_index'),
70 : ('Description offset in cncx', 'desc_offset'),
71 : ('Author offset in cncx', 'author_offset'),
72 : ('Image caption offset in cncx',
'image_caption_offset'),
73 : ('Image attribution offset in cncx',
'image_attr_offset'),
},
'chapter_with_subchapters' : {
22 : ('First subchapter index', 'first_child_index'),
23 : ('Last subchapter index', 'last_child_index'),
},
'periodical' : {
5 : ('Class offset in cncx', 'class_offset'),
22 : ('First section index', 'first_child_index'),
23 : ('Last section index', 'last_child_index'),
69 : ('Offset from first image record num to masthead'
' record', 'image_index'),
},
'section' : {
5 : ('Class offset in cncx', 'class_offset'),
21 : ('Periodical index', 'parent_index'),
22 : ('First article index', 'first_child_index'),
23 : ('Last article index', 'last_child_index'),
},
}
def __init__(self, tagx, vals, entry_type, cncx):
self.value = vals if len(vals) > 1 else vals[0] if vals else None self.value = vals if len(vals) > 1 else vals[0] if vals else None
self.entry_type = entry_type
tag_type = tagx.tag
self.cncx_value = None self.cncx_value = None
if tag_type in self.TAG_MAP: if tag_type in self.TAG_MAP:
self.attr, self.desc = self.TAG_MAP[tag_type] self.attr, self.desc = self.TAG_MAP[tag_type]
else: else:
try: print ('Unknown tag value: %%s'%tag_type)
td = self.INTERPRET_MAP[entry_type] self.desc = '??Unknown (tag value: %d)'%tag_type
except:
raise ValueError('Unknown entry type: %s'%entry_type)
try:
self.desc, self.attr = td[tag_type]
except:
print ('Unknown tag value: %d'%tag_type)
self.desc = '??Unknown (tag value: %d type: %s)'%(
tag_type, entry_type)
self.attr = 'unknown' self.attr = 'unknown'
if '_offset' in self.attr: if '_offset' in self.attr:
self.cncx_value = cncx[self.value] self.cncx_value = cncx[self.value]
@ -719,74 +674,18 @@ class IndexEntry(object): # {{{
''' '''
The index is made up of entries, each of which is represented by an The index is made up of entries, each of which is represented by an
instance of this class. Index entries typically point to offsets int eh instance of this class. Index entries typically point to offsets in the
HTML, specify HTML sizes and point to text strings in the CNCX that are HTML, specify HTML sizes and point to text strings in the CNCX that are
used in the navigation UI. used in the navigation UI.
''' '''
TYPES = { def __init__(self, ident, entry, cncx):
# Present in secondary index record
0x01 : 'null',
0x02 : 'publication_meta',
# Present in book type files
0x0f : 'chapter',
0x6f : 'chapter_with_subchapters',
0x1f : 'subchapter',
# Present in periodicals
0xdf : 'periodical',
0xff : 'section',
0x3f : 'article',
}
def __init__(self, ident, entry_type, raw, cncx, tagx_entries,
control_byte_count):
self.index = ident
self.raw = raw
self.tags = []
self.entry_type_raw = entry_type
self.byte_size = len(raw)
orig_raw = raw
try: try:
self.entry_type = self.TYPES[entry_type] self.index = int(ident, 16)
except KeyError: except ValueError:
raise ValueError('Unknown Index Entry type: %s'%bin(entry_type)) self.index = ident
self.tags = [Tag(tag_type, vals, cncx) for tag_type, vals in
if control_byte_count not in (1, 2): entry.iteritems()]
raise ValueError('Unknown control byte count: %d'%
control_byte_count)
self.flags = 0
if control_byte_count == 2:
self.flags = ord(raw[0])
raw = raw[1:]
expected_tags = [tag for tag in tagx_entries if tag.bitmask &
entry_type]
flags = self.flags
for tag in expected_tags:
vals = []
if tag.tag > 64:
has_tag = flags & 0b1
flags = flags >> 1
if not has_tag: continue
for i in range(tag.num_values):
if not raw:
raise ValueError('Index entry does not match TAGX header')
val, consumed = decint(raw)
raw = raw[consumed:]
vals.append(val)
self.tags.append(Tag(tag, vals, self.entry_type, cncx))
self.consumed = len(orig_raw) - len(raw)
self.trailing_bytes = raw
if self.trailing_bytes.replace(b'\0', b''):
raise ValueError('%s has leftover bytes: %s'%(self, format_bytes(
self.trailing_bytes)))
@property @property
def label(self): def label(self):
@ -837,103 +736,22 @@ class IndexEntry(object): # {{{
return tag.value return tag.value
return -1 return -1
@property
def pos_fid(self):
for tag in self.tags:
if tag.attr == 'pos_fid':
return tag.value
return [0, 0]
def __str__(self): def __str__(self):
ans = ['Index Entry(index=%s, entry_type=%s, flags=%s, ' ans = ['Index Entry(index=%s, length=%d)'%(
'length=%d, byte_size=%d)'%( self.index, len(self.tags))]
self.index, self.entry_type, bin(self.flags)[2:],
len(self.tags), self.byte_size)]
for tag in self.tags: for tag in self.tags:
if tag.value is not None: if tag.value is not None:
ans.append('\t'+str(tag)) ans.append('\t'+str(tag))
if self.first_child_index != -1: if self.first_child_index != -1:
ans.append('\tNumber of children: %d'%(self.last_child_index - ans.append('\tNumber of children: %d'%(self.last_child_index -
self.first_child_index + 1)) self.first_child_index + 1))
if self.trailing_bytes:
ans.append('\tTrailing bytes: %r'%self.trailing_bytes)
return '\n'.join(ans)
# }}}
class SecondaryIndexRecord(object): # {{{
def __init__(self, record, index_header, cncx):
self.record = record
raw = self.record.raw
if raw[:4] != b'INDX':
raise ValueError('Invalid Primary Index Record')
u = struct.unpack
self.header_length, = u('>I', raw[4:8])
self.unknown1 = raw[8:12]
self.header_type, = u('>I', raw[12:16])
self.unknown2 = raw[16:20]
self.idxt_offset, self.idxt_count = u(b'>II', raw[20:28])
if self.idxt_offset < 192:
raise ValueError('Unknown Index record structure')
self.unknown3 = raw[28:36]
self.unknown4 = raw[36:192] # Should be 156 bytes
self.index_offsets = []
indices = raw[self.idxt_offset:]
if indices[:4] != b'IDXT':
raise ValueError("Invalid IDXT index table")
indices = indices[4:]
for i in range(self.idxt_count):
off, = u(b'>H', indices[i*2:(i+1)*2])
self.index_offsets.append(off-192)
rest = indices[(i+1)*2:]
if rest.replace(b'\0', ''): # There can be padding null bytes
raise ValueError('Extra bytes after IDXT table: %r'%rest)
indxt = raw[192:self.idxt_offset]
self.size_of_indxt_block = len(indxt)
self.indices = []
for i, off in enumerate(self.index_offsets):
try:
next_off = self.index_offsets[i+1]
except:
next_off = len(indxt)
num = ord(indxt[off])
index = indxt[off+1:off+1+num]
consumed = 1 + num
entry_type = ord(indxt[off+consumed])
pos = off+consumed+1
idxe = IndexEntry(index, entry_type,
indxt[pos:next_off], cncx,
index_header.tagx_entries,
index_header.tagx_control_byte_count)
self.indices.append(idxe)
rest = indxt[pos+self.indices[-1].consumed:]
if rest.replace(b'\0', b''): # There can be padding null bytes
raise ValueError('Extra bytes after IDXT table: %r'%rest)
def __str__(self):
ans = ['*'*20 + ' Secondary Index Record (%d bytes) '%len(self.record.raw)+ '*'*20]
a = ans.append
def u(w):
a('Unknown: %r (%d bytes) (All zeros: %r)'%(w,
len(w), not bool(w.replace(b'\0', b'')) ))
a('Header length: %d'%self.header_length)
u(self.unknown1)
a('Unknown (header type? index record number? always 1?): %d'%self.header_type)
u(self.unknown2)
a('IDXT Offset (%d block size): %d'%(self.size_of_indxt_block,
self.idxt_offset))
a('IDXT Count: %d'%self.idxt_count)
u(self.unknown3)
u(self.unknown4)
a('Index offsets: %r'%self.index_offsets)
a('\nIndex Entries (%d entries):'%len(self.indices))
for entry in self.indices:
a(str(entry))
a('')
return '\n'.join(ans) return '\n'.join(ans)
# }}} # }}}
@ -945,58 +763,25 @@ class IndexRecord(object): # {{{
in the trailing data of the text records. in the trailing data of the text records.
''' '''
def __init__(self, record, index_header, cncx): def __init__(self, records, index_header, cncx):
self.record = record
self.alltext = None self.alltext = None
raw = self.record.raw table = OrderedDict()
tags = [TagX(x.tag, x.num_values, x.bitmask, x.eof) for x in
index_header.tagx_entries]
for record in records:
raw = record.raw
if raw[:4] != b'INDX': if raw[:4] != b'INDX':
raise ValueError('Invalid Primary Index Record') raise ValueError('Invalid Primary Index Record')
u = struct.unpack parse_index_record(table, record.raw,
index_header.tagx_control_byte_count, tags,
index_header.index_encoding, strict=True)
self.header_length, = u('>I', raw[4:8])
self.unknown1 = raw[8:12]
self.header_type, = u('>I', raw[12:16])
self.unknown2 = raw[16:20]
self.idxt_offset, self.idxt_count = u(b'>II', raw[20:28])
if self.idxt_offset < 192:
raise ValueError('Unknown Index record structure')
self.unknown3 = raw[28:36]
self.unknown4 = raw[36:192] # Should be 156 bytes
self.index_offsets = []
indices = raw[self.idxt_offset:]
if indices[:4] != b'IDXT':
raise ValueError("Invalid IDXT index table")
indices = indices[4:]
for i in range(self.idxt_count):
off, = u(b'>H', indices[i*2:(i+1)*2])
self.index_offsets.append(off-192)
rest = indices[(i+1)*2:]
if rest.replace(b'\0', ''): # There can be padding null bytes
raise ValueError('Extra bytes after IDXT table: %r'%rest)
indxt = raw[192:self.idxt_offset]
self.size_of_indxt_block = len(indxt)
self.indices = [] self.indices = []
for i, off in enumerate(self.index_offsets):
try:
next_off = self.index_offsets[i+1]
except:
next_off = len(indxt)
index, consumed = decode_hex_number(indxt[off:])
entry_type = ord(indxt[off+consumed])
pos = off+consumed+1
idxe = IndexEntry(index, entry_type,
indxt[pos:next_off], cncx,
index_header.tagx_entries,
index_header.tagx_control_byte_count)
self.indices.append(idxe)
rest = indxt[pos+self.indices[-1].consumed:] for ident, entry in table.iteritems():
if rest.replace(b'\0', b''): # There can be padding null bytes self.indices.append(IndexEntry(ident, entry, cncx))
raise ValueError('Extra bytes after IDXT table: %r'%rest)
def get_parent(self, index): def get_parent(self, index):
if index.depth < 1: if index.depth < 1:
@ -1006,24 +791,12 @@ class IndexRecord(object): # {{{
if p.depth != parent_depth: if p.depth != parent_depth:
continue continue
def __str__(self): def __str__(self):
ans = ['*'*20 + ' Index Record (%d bytes) '%len(self.record.raw)+ '*'*20] ans = ['*'*20 + ' Index Entries (%d entries) '%len(self.indices)+ '*'*20]
a = ans.append a = ans.append
def u(w): def u(w):
a('Unknown: %r (%d bytes) (All zeros: %r)'%(w, a('Unknown: %r (%d bytes) (All zeros: %r)'%(w,
len(w), not bool(w.replace(b'\0', b'')) )) len(w), not bool(w.replace(b'\0', b'')) ))
a('Header length: %d'%self.header_length)
u(self.unknown1)
a('Unknown (header type? index record number? always 1?): %d'%self.header_type)
u(self.unknown2)
a('IDXT Offset (%d block size): %d'%(self.size_of_indxt_block,
self.idxt_offset))
a('IDXT Count: %d'%self.idxt_count)
u(self.unknown3)
u(self.unknown4)
a('Index offsets: %r'%self.index_offsets)
a('\nIndex Entries (%d entries):'%len(self.indices))
for entry in self.indices: for entry in self.indices:
offset = entry.offset offset = entry.offset
a(str(entry)) a(str(entry))
@ -1149,6 +922,25 @@ class BinaryRecord(object): # {{{
# }}} # }}}
class FontRecord(object): # {{{
def __init__(self, idx, record):
self.raw = record.raw
name = '%06d'%idx
self.font = read_font_record(self.raw)
if self.font['err']:
raise ValueError('Failed to read font record: %s Headers: %s'%(
self.font['err'], self.font['headers']))
self.payload = (self.font['font_data'] if self.font['font_data'] else
self.font['raw_data'])
self.name = '%s.%s'%(name, self.font['ext'])
def dump(self, folder):
with open(os.path.join(folder, self.name), 'wb') as f:
f.write(self.payload)
# }}}
class TBSIndexing(object): # {{{ class TBSIndexing(object): # {{{
def __init__(self, text_records, indices, doc_type): def __init__(self, text_records, indices, doc_type):
@ -1179,7 +971,7 @@ class TBSIndexing(object): # {{{
def get_index(self, idx): def get_index(self, idx):
for i in self.indices: for i in self.indices:
if i.index == idx: return i if i.index in {idx, unicode(idx)}: return i
raise IndexError('Index %d not found'%idx) raise IndexError('Index %d not found'%idx)
def __str__(self): def __str__(self):
@ -1212,7 +1004,7 @@ class TBSIndexing(object): # {{{
if entries: if entries:
ans.append('\t%s:'%typ) ans.append('\t%s:'%typ)
for x in entries: for x in entries:
ans.append(('\t\tIndex Entry: %d (Parent index: %d, ' ans.append(('\t\tIndex Entry: %s (Parent index: %s, '
'Depth: %d, Offset: %d, Size: %d) [%s]')%( 'Depth: %d, Offset: %d, Size: %d) [%s]')%(
x.index, x.parent_index, x.depth, x.offset, x.size, x.label)) x.index, x.parent_index, x.depth, x.offset, x.size, x.label))
def bin4(num): def bin4(num):
@ -1309,18 +1101,18 @@ class TBSIndexing(object): # {{{
' when reading starting section'%extra) ' when reading starting section'%extra)
si = self.get_index(si) si = self.get_index(si)
ans.append('The section at the start of this record is:' ans.append('The section at the start of this record is:'
' %d'%si.index) ' %s'%si.index)
if 0b0100 in extra: if 0b0100 in extra:
num = extra[0b0100] num = extra[0b0100]
ans.append('The number of articles from the section %d' ans.append('The number of articles from the section %d'
' in this record: %d'%(si.index, num)) ' in this record: %s'%(si.index, num))
elif 0b0001 in extra: elif 0b0001 in extra:
eof = extra[0b0001] eof = extra[0b0001]
if eof != 0: if eof != 0:
raise ValueError('Unknown eof value %s when reading' raise ValueError('Unknown eof value %s when reading'
' starting section. All bytes: %r'%(eof, orig)) ' starting section. All bytes: %r'%(eof, orig))
ans.append('??This record has more than one article from ' ans.append('??This record has more than one article from '
' the section: %d'%si.index) ' the section: %s'%si.index)
return si, byts return si, byts
# }}} # }}}
@ -1382,34 +1174,37 @@ class MOBIFile(object): # {{{
self.index_header = self.index_record = None self.index_header = self.index_record = None
self.indexing_record_nums = set() self.indexing_record_nums = set()
pir = self.mobi_header.primary_index_record pir = self.mobi_header.primary_index_record
if pir != 0xffffffff: if pir != NULL_INDEX:
self.index_header = IndexHeader(self.records[pir]) self.index_header = IndexHeader(self.records[pir])
numi = self.index_header.index_count
self.cncx = CNCX(self.records[ self.cncx = CNCX(self.records[
pir+2:pir+2+self.index_header.num_of_cncx_blocks], pir+1+numi:pir+1+numi+self.index_header.num_of_cncx_blocks],
self.index_header.index_encoding) self.index_header.index_encoding)
self.index_record = IndexRecord(self.records[pir+1], self.index_record = IndexRecord(self.records[pir+1:pir+1+numi],
self.index_header, self.cncx) self.index_header, self.cncx)
self.indexing_record_nums = set(xrange(pir, self.indexing_record_nums = set(xrange(pir,
pir+2+self.index_header.num_of_cncx_blocks)) pir+1+numi+self.index_header.num_of_cncx_blocks))
self.secondary_index_record = self.secondary_index_header = None self.secondary_index_record = self.secondary_index_header = None
sir = self.mobi_header.secondary_index_record sir = self.mobi_header.secondary_index_record
if sir != 0xffffffff: if sir != NULL_INDEX:
self.secondary_index_header = SecondaryIndexHeader(self.records[sir]) self.secondary_index_header = SecondaryIndexHeader(self.records[sir])
numi = self.secondary_index_header.index_count
self.indexing_record_nums.add(sir) self.indexing_record_nums.add(sir)
self.secondary_index_record = SecondaryIndexRecord( self.secondary_index_record = IndexRecord(
self.records[sir+1], self.secondary_index_header, self.cncx) self.records[sir+1:sir+1+numi], self.secondary_index_header, self.cncx)
self.indexing_record_nums.add(sir+1) self.indexing_record_nums |= set(xrange(sir+1, sir+1+numi))
ntr = self.mobi_header.number_of_text_records ntr = self.mobi_header.number_of_text_records
fntbr = self.mobi_header.first_non_book_record fntbr = self.mobi_header.first_non_book_record
fii = self.mobi_header.first_image_index fii = self.mobi_header.first_image_index
if fntbr == 0xffffffff: if fntbr == NULL_INDEX:
fntbr = len(self.records) fntbr = len(self.records)
self.text_records = [TextRecord(r, self.records[r], self.text_records = [TextRecord(r, self.records[r],
self.mobi_header.extra_data_flags, decompress) for r in xrange(1, self.mobi_header.extra_data_flags, decompress) for r in xrange(1,
min(len(self.records), ntr+1))] min(len(self.records), ntr+1))]
self.image_records, self.binary_records = [], [] self.image_records, self.binary_records = [], []
self.font_records = []
image_index = 0 image_index = 0
for i in xrange(fntbr, len(self.records)): for i in xrange(fntbr, len(self.records)):
if i in self.indexing_record_nums or i in self.huffman_record_nums: if i in self.indexing_record_nums or i in self.huffman_record_nums:
@ -1419,13 +1214,15 @@ class MOBIFile(object): # {{{
fmt = None fmt = None
if i >= fii and r.raw[:4] not in {b'FLIS', b'FCIS', b'SRCS', if i >= fii and r.raw[:4] not in {b'FLIS', b'FCIS', b'SRCS',
b'\xe9\x8e\r\n', b'RESC', b'BOUN', b'FDST', b'DATP', b'\xe9\x8e\r\n', b'RESC', b'BOUN', b'FDST', b'DATP',
b'AUDI', b'VIDE'}: b'AUDI', b'VIDE', b'FONT'}:
try: try:
width, height, fmt = identify_data(r.raw) width, height, fmt = identify_data(r.raw)
except: except:
pass pass
if fmt is not None: if fmt is not None:
self.image_records.append(ImageRecord(image_index, r, fmt)) self.image_records.append(ImageRecord(image_index, r, fmt))
elif r.raw[:4] == b'FONT':
self.font_records.append(FontRecord(i, r))
else: else:
self.binary_records.append(BinaryRecord(i, r)) self.binary_records.append(BinaryRecord(i, r))
@ -1465,6 +1262,7 @@ def inspect_mobi(path_or_stream, ddir=None): # {{{
of.write(rec.raw) of.write(rec.raw)
alltext += rec.raw alltext += rec.raw
of.seek(0) of.seek(0)
if f.mobi_header.file_version < 8:
root = html.fromstring(alltext.decode('utf-8')) root = html.fromstring(alltext.decode('utf-8'))
with open(os.path.join(ddir, 'pretty.html'), 'wb') as of: with open(os.path.join(ddir, 'pretty.html'), 'wb') as of:
of.write(html.tostring(root, pretty_print=True, encoding='utf-8', of.write(html.tostring(root, pretty_print=True, encoding='utf-8',
@ -1490,7 +1288,7 @@ def inspect_mobi(path_or_stream, ddir=None): # {{{
f.tbs_indexing.dump(ddir) f.tbs_indexing.dump(ddir)
for tdir, attr in [('text', 'text_records'), ('images', 'image_records'), for tdir, attr in [('text', 'text_records'), ('images', 'image_records'),
('binary', 'binary_records')]: ('binary', 'binary_records'), ('font', 'font_records')]:
tdir = os.path.join(ddir, tdir) tdir = os.path.join(ddir, tdir)
os.mkdir(tdir) os.mkdir(tdir)
for rec in getattr(f, attr): for rec in getattr(f, attr):

View File

@ -0,0 +1,11 @@
#!/usr/bin/env python
# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai
from __future__ import (unicode_literals, division, absolute_import,
print_function)
__license__ = 'GPL v3'
__copyright__ = '2012, Kovid Goyal <kovid@kovidgoyal.net>'
__docformat__ = 'restructuredtext en'

View File

@ -0,0 +1,261 @@
#!/usr/bin/env python
# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai
from __future__ import (absolute_import, print_function)
__license__ = 'GPL v3'
__copyright__ = '2012, Kovid Goyal <kovid@kovidgoyal.net>'
__docformat__ = 'restructuredtext en'
import struct, re, os
from calibre import replace_entities
from calibre.utils.date import parse_date
from calibre.ebooks.mobi import MobiError
from calibre.ebooks.metadata import MetaInformation
from calibre.ebooks.mobi.langcodes import main_language, sub_language, mobi2iana
NULL_INDEX = 0xffffffff
class EXTHHeader(object): # {{{
def __init__(self, raw, codec, title):
self.doctype = raw[:4]
self.length, self.num_items = struct.unpack('>LL', raw[4:12])
raw = raw[12:]
pos = 0
self.mi = MetaInformation(_('Unknown'), [_('Unknown')])
self.has_fake_cover = True
self.start_offset = None
left = self.num_items
self.kf8_header = None
while left > 0:
left -= 1
idx, size = struct.unpack('>LL', raw[pos:pos + 8])
content = raw[pos + 8:pos + size]
pos += size
if idx >= 100 and idx < 200:
self.process_metadata(idx, content, codec)
elif idx == 203:
self.has_fake_cover = bool(struct.unpack('>L', content)[0])
elif idx == 201:
co, = struct.unpack('>L', content)
if co < NULL_INDEX:
self.cover_offset = co
elif idx == 202:
self.thumbnail_offset, = struct.unpack('>L', content)
elif idx == 501:
# cdetype
pass
elif idx == 502:
# last update time
pass
elif idx == 503: # Long title
# Amazon seems to regard this as the definitive book title
# rather than the title from the PDB header. In fact when
# sending MOBI files through Amazon's email service if the
# title contains non ASCII chars or non filename safe chars
# they are messed up in the PDB header
try:
title = content.decode(codec)
except:
pass
#else:
# print 'unknown record', idx, repr(content)
if title:
self.mi.title = replace_entities(title)
def process_metadata(self, idx, content, codec):
if idx == 100:
if self.mi.is_null('authors'):
self.mi.authors = []
au = content.decode(codec, 'ignore').strip()
self.mi.authors.append(au)
if re.match(r'\S+?\s*,\s+\S+', au.strip()):
self.mi.author_sort = au.strip()
elif idx == 101:
self.mi.publisher = content.decode(codec, 'ignore').strip()
elif idx == 103:
self.mi.comments = content.decode(codec, 'ignore')
elif idx == 104:
self.mi.isbn = content.decode(codec, 'ignore').strip().replace('-', '')
elif idx == 105:
if not self.mi.tags:
self.mi.tags = []
self.mi.tags.extend([x.strip() for x in content.decode(codec,
'ignore').split(';')])
self.mi.tags = list(set(self.mi.tags))
elif idx == 106:
try:
self.mi.pubdate = parse_date(content, as_utc=False)
except:
pass
elif idx == 108:
self.mi.book_producer = content.decode(codec, 'ignore').strip()
elif idx == 113:
pass # ASIN or UUID
elif idx == 116:
self.start_offset, = struct.unpack(b'>L', content)
elif idx == 121:
self.kf8_header, = struct.unpack(b'>L', content)
#else:
# print 'unhandled metadata record', idx, repr(content)
# }}}
class BookHeader(object):
def __init__(self, raw, ident, user_encoding, log, try_extra_data_fix=False):
self.log = log
self.compression_type = raw[:2]
self.records, self.records_size = struct.unpack('>HH', raw[8:12])
self.encryption_type, = struct.unpack('>H', raw[12:14])
if ident == 'TEXTREAD':
self.codepage = 1252
if len(raw) <= 16:
self.codec = 'cp1252'
self.extra_flags = 0
self.title = _('Unknown')
self.language = 'ENGLISH'
self.sublanguage = 'NEUTRAL'
self.exth_flag, self.exth = 0, None
self.ancient = True
self.first_image_index = -1
self.mobi_version = 1
else:
self.ancient = False
self.doctype = raw[16:20]
self.length, self.type, self.codepage, self.unique_id, \
self.version = struct.unpack('>LLLLL', raw[20:40])
try:
self.codec = {
1252: 'cp1252',
65001: 'utf-8',
}[self.codepage]
except (IndexError, KeyError):
self.codec = 'cp1252' if not user_encoding else user_encoding
log.warn('Unknown codepage %d. Assuming %s' % (self.codepage,
self.codec))
# There exists some broken DRM removal tool that removes DRM but
# leaves the DRM fields in the header yielding a header size of
# 0xF8. The actual value of max_header_length should be 0xE8 but
# it's changed to accommodate this silly tool. Hopefully that will
# not break anything else.
max_header_length = 0xF8
if (ident == 'TEXTREAD' or self.length < 0xE4 or
self.length > max_header_length or
(try_extra_data_fix and self.length == 0xE4)):
self.extra_flags = 0
else:
self.extra_flags, = struct.unpack('>H', raw[0xF2:0xF4])
if self.compression_type == 'DH':
self.huff_offset, self.huff_number = struct.unpack('>LL',
raw[0x70:0x78])
toff, tlen = struct.unpack('>II', raw[0x54:0x5c])
tend = toff + tlen
self.title = raw[toff:tend] if tend < len(raw) else _('Unknown')
langcode = struct.unpack('!L', raw[0x5C:0x60])[0]
langid = langcode & 0xFF
sublangid = (langcode >> 10) & 0xFF
self.language = main_language.get(langid, 'ENGLISH')
self.sublanguage = sub_language.get(sublangid, 'NEUTRAL')
self.mobi_version = struct.unpack('>I', raw[0x68:0x6c])[0]
self.first_image_index = struct.unpack('>L', raw[0x6c:0x6c + 4])[0]
self.exth_flag, = struct.unpack('>L', raw[0x80:0x84])
self.exth = None
if not isinstance(self.title, unicode):
self.title = self.title.decode(self.codec, 'replace')
if self.exth_flag & 0x40:
try:
self.exth = EXTHHeader(raw[16 + self.length:], self.codec,
self.title)
self.exth.mi.uid = self.unique_id
try:
self.exth.mi.language = mobi2iana(langid, sublangid)
except:
self.log.exception('Unknown language code')
except:
self.log.exception('Invalid EXTH header')
self.exth_flag = 0
self.ncxidx = NULL_INDEX
if len(raw) >= 0xF8:
self.ncxidx, = struct.unpack_from(b'>L', raw, 0xF4)
if self.mobi_version >= 8:
self.skelidx, = struct.unpack_from('>L', raw, 0xFC)
# Index into <div> sections in raw_ml
self.dividx, = struct.unpack_from('>L', raw, 0xF8)
# Index into Other files
self.othidx, = struct.unpack_from('>L', raw, 0x104)
# need to use the FDST record to find out how to properly
# unpack the raw_ml into pieces it is simply a table of start
# and end locations for each flow piece
self.fdstidx, = struct.unpack_from('>L', raw, 0xC0)
self.fdstcnt, = struct.unpack_from('>L', raw, 0xC4)
# if cnt is 1 or less, fdst section number can be garbage
if self.fdstcnt <= 1:
self.fdstidx = NULL_INDEX
else: # Null values
self.skelidx = self.dividx = self.othidx = self.fdstidx = \
NULL_INDEX
class MetadataHeader(BookHeader):
def __init__(self, stream, log):
self.stream = stream
self.ident = self.identity()
self.num_sections = self.section_count()
if self.num_sections >= 2:
header = self.header()
BookHeader.__init__(self, header, self.ident, None, log)
else:
self.exth = None
def identity(self):
self.stream.seek(60)
ident = self.stream.read(8).upper()
if ident not in ['BOOKMOBI', 'TEXTREAD']:
raise MobiError('Unknown book type: %s' % ident)
return ident
def section_count(self):
self.stream.seek(76)
return struct.unpack('>H', self.stream.read(2))[0]
def section_offset(self, number):
self.stream.seek(78 + number * 8)
return struct.unpack('>LBBBB', self.stream.read(8))[0]
def header(self):
section_headers = []
# First section with the metadata
section_headers.append(self.section_offset(0))
# Second section used to get the length of the first
section_headers.append(self.section_offset(1))
end_off = section_headers[1]
off = section_headers[0]
self.stream.seek(off)
return self.stream.read(end_off - off)
def section_data(self, number):
start = self.section_offset(number)
if number == self.num_sections -1:
end = os.stat(self.stream.name).st_size
else:
end = self.section_offset(number + 1)
self.stream.seek(start)
try:
return self.stream.read(end - start)
except OverflowError:
self.stream.seek(start)
return self.stream.read()

View File

@ -0,0 +1,214 @@
#!/usr/bin/env python
# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai
from __future__ import (unicode_literals, division, absolute_import,
print_function)
__license__ = 'GPL v3'
__copyright__ = '2012, Kovid Goyal <kovid@kovidgoyal.net>'
__docformat__ = 'restructuredtext en'
import struct
from collections import OrderedDict, namedtuple
from calibre.ebooks.mobi.utils import (decint, count_set_bits,
decode_string)
TagX = namedtuple('TagX', 'tag num_of_values bitmask eof')
PTagX = namedtuple('PTagX', 'tag value_count value_bytes num_of_values')
class InvalidFile(ValueError):
pass
def check_signature(data, signature):
if data[:len(signature)] != signature:
raise InvalidFile('Not a valid %r section'%signature)
class NotAnINDXRecord(InvalidFile):
pass
class NotATAGXSection(InvalidFile):
pass
def format_bytes(byts):
byts = bytearray(byts)
byts = [hex(b)[2:] for b in byts]
return ' '.join(byts)
def parse_indx_header(data):
check_signature(data, b'INDX')
words = (
'len', 'nul1', 'type', 'gen', 'start', 'count', 'code',
'lng', 'total', 'ordt', 'ligt', 'nligt', 'ncncx'
)
num = len(words)
values = struct.unpack(bytes('>%dL' % num), data[4:4*(num+1)])
return dict(zip(words, values))
class CNCX(object): # {{{
'''
Parses the records that contain the compiled NCX (all strings from the
NCX). Presents a simple offset : string mapping interface to access the
data.
'''
def __init__(self, records, codec):
self.records = OrderedDict()
record_offset = 0
for raw in records:
pos = 0
while pos < len(raw):
length, consumed = decint(raw[pos:])
if length > 0:
try:
self.records[pos+record_offset] = raw[
pos+consumed:pos+consumed+length].decode(codec)
except:
byts = raw[pos:]
r = format_bytes(byts)
print ('CNCX entry at offset %d has unknown format %s'%(
pos+record_offset, r))
self.records[pos+record_offset] = r
pos = len(raw)
pos += consumed+length
record_offset += 0x10000
def __getitem__(self, offset):
return self.records.get(offset)
def get(self, offset, default=None):
return self.records.get(offset, default)
# }}}
def parse_tagx_section(data):
check_signature(data, b'TAGX')
tags = []
first_entry_offset, = struct.unpack_from(b'>L', data, 4)
control_byte_count, = struct.unpack_from(b'>L', data, 8)
for i in xrange(12, first_entry_offset, 4):
vals = list(bytearray(data[i:i+4]))
tags.append(TagX(*vals))
return control_byte_count, tags
def get_tag_map(control_byte_count, tagx, data, strict=False):
ptags = []
ans = {}
control_bytes = list(bytearray(data[:control_byte_count]))
data = data[control_byte_count:]
for x in tagx:
if x.eof == 0x01:
control_bytes = control_bytes[1:]
continue
value = control_bytes[0] & x.bitmask
if value != 0:
value_count = value_bytes = None
if value == x.bitmask:
if count_set_bits(x.bitmask) > 1:
# If all bits of masked value are set and the mask has more
# than one bit, a variable width value will follow after
# the control bytes which defines the length of bytes (NOT
# the value count!) which will contain the corresponding
# variable width values.
value_bytes, consumed = decint(data)
data = data[consumed:]
else:
value_count = 1
else:
# Shift bits to get the masked value.
mask = x.bitmask
while mask & 0b1 == 0:
mask >>= 1
value >>= 1
value_count = value
ptags.append(PTagX(x.tag, value_count, value_bytes,
x.num_of_values))
for x in ptags:
values = []
if x.value_count is not None:
# Read value_count * values_per_entry variable width values.
for _ in xrange(x.value_count * x.num_of_values):
byts, consumed = decint(data)
data = data[consumed:]
values.append(byts)
else: # value_bytes is not None
# Convert value_bytes to variable width values.
total_consumed = 0
while total_consumed < x.value_bytes:
# Does this work for values_per_entry != 1?
byts, consumed = decint(data)
data = data[consumed:]
total_consumed += consumed
values.append(byts)
if total_consumed != x.value_bytes:
err = ("Error: Should consume %s bytes, but consumed %s" %
(x.value_bytes, total_consumed))
if strict:
raise ValueError(err)
else:
print(err)
ans[x.tag] = values
# Test that all bytes have been processed
if data.replace(b'\0', b''):
err = ("Warning: There are unprocessed index bytes left: %s" %
format_bytes(data))
if strict:
raise ValueError(err)
else:
print(err)
return ans
def parse_index_record(table, data, control_byte_count, tags, codec,
strict=False):
header = parse_indx_header(data)
idxt_pos = header['start']
if data[idxt_pos:idxt_pos+4] != b'IDXT':
print ('WARNING: Invalid INDX record')
entry_count = header['count']
# loop through to build up the IDXT position starts
idx_positions= []
for j in xrange(entry_count):
pos, = struct.unpack_from(b'>H', data, idxt_pos + 4 + (2 * j))
idx_positions.append(pos)
# The last entry ends before the IDXT tag (but there might be zero fill
# bytes we need to ignore!)
idx_positions.append(idxt_pos)
# For each entry in the IDXT build up the tag map and any associated
# text
for j in xrange(entry_count):
start, end = idx_positions[j:j+2]
rec = data[start:end]
ident, consumed = decode_string(rec, codec=codec)
rec = rec[consumed:]
tag_map = get_tag_map(control_byte_count, tags, rec, strict=strict)
table[ident] = tag_map
def read_index(sections, idx, codec):
table, cncx = OrderedDict(), CNCX([], codec)
data = sections[idx][0]
indx_header = parse_indx_header(data)
indx_count = indx_header['count']
if indx_header['ncncx'] > 0:
off = idx + indx_count + 1
cncx_records = [x[0] for x in sections[off:off+indx_header['ncncx']]]
cncx = CNCX(cncx_records, codec)
tag_section_start = indx_header['len']
control_byte_count, tags = parse_tagx_section(data[tag_section_start:])
for i in xrange(idx + 1, idx + 1 + indx_count):
# Index record
data = sections[i][0]
parse_index_record(table, data, control_byte_count, tags, codec)
return table, cncx

View File

@ -0,0 +1,309 @@
#!/usr/bin/env python
# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai
from __future__ import (unicode_literals, division, absolute_import,
print_function)
__license__ = 'GPL v3'
__copyright__ = '2012, Kovid Goyal <kovid@kovidgoyal.net>'
__docformat__ = 'restructuredtext en'
import re, os
def update_internal_links(mobi8_reader):
# need to update all links that are internal which
# are based on positions within the xhtml files **BEFORE**
# cutting and pasting any pieces into the xhtml text files
# kindle:pos:fid:XXXX:off:YYYYYYYYYY (used for internal link within xhtml)
# XXXX is the offset in records into divtbl
# YYYYYYYYYYYY is a base32 number you add to the divtbl insertpos to get final position
mr = mobi8_reader
# pos:fid pattern
posfid_pattern = re.compile(br'''(<a.*?href=.*?>)''', re.IGNORECASE)
posfid_index_pattern = re.compile(br'''['"]kindle:pos:fid:([0-9|A-V]+):off:([0-9|A-V]+).*?["']''')
parts = []
for part in mr.parts:
srcpieces = posfid_pattern.split(part)
for j in xrange(1, len(srcpieces), 2):
tag = srcpieces[j]
if tag.startswith(b'<'):
for m in posfid_index_pattern.finditer(tag):
posfid = m.group(1)
offset = m.group(2)
filename, idtag = mr.get_id_tag_by_pos_fid(posfid, offset)
suffix = (b'#' + idtag) if idtag else b''
replacement = filename.encode(mr.header.codec) + suffix
tag = posfid_index_pattern.sub(replacement, tag, 1)
srcpieces[j] = tag
part = ''.join([x.decode(mr.header.codec) for x in srcpieces])
parts.append(part)
# All parts are now unicode and have no internal links
return parts
def remove_kindlegen_markup(parts):
# we can safely remove all of the Kindlegen generated aid tags
find_tag_with_aid_pattern = re.compile(r'''(<[^>]*\said\s*=[^>]*>)''',
re.IGNORECASE)
within_tag_aid_position_pattern = re.compile(r'''\said\s*=['"][^'"]*['"]''')
for i in xrange(len(parts)):
part = parts[i]
srcpieces = find_tag_with_aid_pattern.split(part)
for j in range(len(srcpieces)):
tag = srcpieces[j]
if tag.startswith('<'):
for m in within_tag_aid_position_pattern.finditer(tag):
replacement = ''
tag = within_tag_aid_position_pattern.sub(replacement, tag,
1)
srcpieces[j] = tag
part = "".join(srcpieces)
parts[i] = part
# we can safely remove all of the Kindlegen generated data-AmznPageBreak tags
find_tag_with_AmznPageBreak_pattern = re.compile(
r'''(<[^>]*\sdata-AmznPageBreak=[^>]*>)''', re.IGNORECASE)
within_tag_AmznPageBreak_position_pattern = re.compile(
r'''\sdata-AmznPageBreak=['"][^'"]*['"]''')
for i in xrange(len(parts)):
part = parts[i]
srcpieces = find_tag_with_AmznPageBreak_pattern.split(part)
for j in range(len(srcpieces)):
tag = srcpieces[j]
if tag.startswith('<'):
for m in within_tag_AmznPageBreak_position_pattern.finditer(tag):
replacement = ''
tag = within_tag_AmznPageBreak_position_pattern.sub(replacement, tag, 1)
srcpieces[j] = tag
part = "".join(srcpieces)
parts[i] = part
def update_flow_links(mobi8_reader, resource_map, log):
# kindle:embed:XXXX?mime=image/gif (png, jpeg, etc) (used for images)
# kindle:flow:XXXX?mime=YYYY/ZZZ (used for style sheets, svg images, etc)
# kindle:embed:XXXX (used for fonts)
mr = mobi8_reader
flows = []
img_pattern = re.compile(r'''(<[img\s|image\s][^>]*>)''', re.IGNORECASE)
img_index_pattern = re.compile(r'''['"]kindle:embed:([0-9|A-V]+)[^'"]*['"]''', re.IGNORECASE)
tag_pattern = re.compile(r'''(<[^>]*>)''')
flow_pattern = re.compile(r'''['"]kindle:flow:([0-9|A-V]+)\?mime=([^'"]+)['"]''', re.IGNORECASE)
url_pattern = re.compile(r'''(url\(.*?\))''', re.IGNORECASE)
url_img_index_pattern = re.compile(r'''kindle:embed:([0-9|A-V]+)\?mime=image/[^\)]*''', re.IGNORECASE)
font_index_pattern = re.compile(r'''kindle:embed:([0-9|A-V]+)''', re.IGNORECASE)
url_css_index_pattern = re.compile(r'''kindle:flow:([0-9|A-V]+)\?mime=text/css[^\)]*''', re.IGNORECASE)
for flow in mr.flows:
if flow is None: # 0th flow is None
flows.append(flow)
continue
if not isinstance(flow, unicode):
flow = flow.decode(mr.header.codec)
# links to raster image files from image tags
# image_pattern
srcpieces = img_pattern.split(flow)
for j in range(1, len(srcpieces), 2):
tag = srcpieces[j]
if tag.startswith('<im'):
for m in img_index_pattern.finditer(tag):
num = int(m.group(1), 32)
href = resource_map[num-1]
if href:
replacement = '"%s"'%('../'+ href)
tag = img_index_pattern.sub(replacement, tag, 1)
else:
log.warn('Referenced image %s was not recognized '
'as a valid image in %s' % (num, tag))
srcpieces[j] = tag
flow = "".join(srcpieces)
# replacements inside css url():
srcpieces = url_pattern.split(flow)
for j in range(1, len(srcpieces), 2):
tag = srcpieces[j]
# process links to raster image files
for m in url_img_index_pattern.finditer(tag):
num = int(m.group(1), 32)
href = resource_map[num-1]
if href:
replacement = '"%s"'%('../'+ href)
tag = url_img_index_pattern.sub(replacement, tag, 1)
else:
log.warn('Referenced image %s was not recognized as a '
'valid image in %s' % (num, tag))
# process links to fonts
for m in font_index_pattern.finditer(tag):
num = int(m.group(1), 32)
href = resource_map[num-1]
if href is None:
log.warn('Referenced font %s was not recognized as a '
'valid font in %s' % (num, tag))
else:
replacement = '"%s"'%('../'+ href)
if href.endswith('.failed'):
replacement = '"%s"'%('failed-'+href)
tag = font_index_pattern.sub(replacement, tag, 1)
# process links to other css pieces
for m in url_css_index_pattern.finditer(tag):
num = int(m.group(1), 32)
fi = mr.flowinfo[num]
replacement = '"../' + fi.dir + '/' + fi.fname + '"'
tag = url_css_index_pattern.sub(replacement, tag, 1)
srcpieces[j] = tag
flow = "".join(srcpieces)
# flow pattern not inside url()
srcpieces = re.split(tag_pattern, flow)
for j in range(1, len(srcpieces), 2):
tag = srcpieces[j]
if tag.startswith('<'):
for m in re.finditer(flow_pattern, tag):
num = int(m.group(1), 32)
fi = mr.flowinfo[num]
if fi.format == 'inline':
flowtext = mr.flows[num]
tag = flowtext
else:
replacement = '"../' + fi.dir + '/' + fi.fname + '"'
tag = flow_pattern.sub(replacement, tag, 1)
srcpieces[j] = tag
flow = "".join(srcpieces)
flows.append(flow)
# All flows are now unicode and have links resolved
return flows
def insert_flows_into_markup(parts, flows, mobi8_reader):
mr = mobi8_reader
# kindle:flow:XXXX?mime=YYYY/ZZZ (used for style sheets, svg images, etc)
tag_pattern = re.compile(r'''(<[^>]*>)''')
flow_pattern = re.compile(r'''['"]kindle:flow:([0-9|A-V]+)\?mime=([^'"]+)['"]''', re.IGNORECASE)
for i in xrange(len(parts)):
part = parts[i]
# flow pattern
srcpieces = tag_pattern.split(part)
for j in range(1, len(srcpieces),2):
tag = srcpieces[j]
if tag.startswith('<'):
for m in flow_pattern.finditer(tag):
num = int(m.group(1), 32)
fi = mr.flowinfo[num]
if fi.format == 'inline':
tag = flows[num]
else:
replacement = '"../' + fi.dir + '/' + fi.fname + '"'
tag = flow_pattern.sub(replacement, tag, 1)
srcpieces[j] = tag
part = "".join(srcpieces)
# store away modified version
parts[i] = part
def insert_images_into_markup(parts, resource_map, log):
# Handle any embedded raster images links in the xhtml text
# kindle:embed:XXXX?mime=image/gif (png, jpeg, etc) (used for images)
img_pattern = re.compile(r'''(<[img\s|image\s][^>]*>)''', re.IGNORECASE)
img_index_pattern = re.compile(r'''['"]kindle:embed:([0-9|A-V]+)[^'"]*['"]''')
for i in xrange(len(parts)):
part = parts[i]
#[partnum, dir, filename, beg, end, aidtext] = self.k8proc.partinfo[i]
# links to raster image files
# image_pattern
srcpieces = img_pattern.split(part)
for j in range(1, len(srcpieces), 2):
tag = srcpieces[j]
if tag.startswith('<im'):
for m in img_index_pattern.finditer(tag):
num = int(m.group(1), 32)
href = resource_map[num-1]
if href:
replacement = '"%s"'%('../' + href)
tag = img_index_pattern.sub(replacement, tag, 1)
else:
log.warn('Referenced image %s was not recognized as '
'a valid image in %s' % (num, tag))
srcpieces[j] = tag
part = "".join(srcpieces)
# store away modified version
parts[i] = part
def upshift_markup(parts):
tag_pattern = re.compile(r'''(<(?:svg)[^>]*>)''', re.IGNORECASE)
for i in xrange(len(parts)):
part = parts[i]
# tag pattern
srcpieces = re.split(tag_pattern, part)
for j in range(1, len(srcpieces), 2):
tag = srcpieces[j]
if tag[:4].lower() == '<svg':
tag = tag.replace('preserveaspectratio','preserveAspectRatio')
tag = tag.replace('viewbox','viewBox')
srcpieces[j] = tag
part = "".join(srcpieces)
# store away modified version
parts[i] = part
def expand_mobi8_markup(mobi8_reader, resource_map, log):
# First update all internal links that are based on offsets
parts = update_internal_links(mobi8_reader)
# Remove pointless markup inserted by kindlegen
remove_kindlegen_markup(parts)
# Handle substitutions for the flows pieces first as they may
# be inlined into the xhtml text
flows = update_flow_links(mobi8_reader, resource_map, log)
# Insert inline flows into the markup
insert_flows_into_markup(parts, flows, mobi8_reader)
# Insert raster images into markup
insert_images_into_markup(parts, resource_map, log)
# Perform general markup cleanups
upshift_markup(parts)
# Update the parts and flows stored in the reader
mobi8_reader.parts = parts
mobi8_reader.flows = flows
# write out the parts and file flows
os.mkdir('text') # directory containing all parts
spine = []
for i, part in enumerate(parts):
pi = mobi8_reader.partinfo[i]
with open(os.path.join(pi.type, pi.filename), 'wb') as f:
f.write(part.encode('utf-8'))
spine.append(f.name)
for i, flow in enumerate(flows):
fi = mobi8_reader.flowinfo[i]
if fi.format == 'file':
if not os.path.exists(fi.dir):
os.mkdir(fi.dir)
with open(os.path.join(fi.dir, fi.fname), 'wb') as f:
f.write(flow.encode('utf-8'))
return spine

View File

@ -1,10 +1,12 @@
__license__ = 'GPL v3' #!/usr/bin/env python
__copyright__ = '2008, Kovid Goyal <kovid at kovidgoyal.net>' # vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai
''' from __future__ import (absolute_import, print_function)
Read data from .mobi files
'''
import shutil, os, re, struct, textwrap, cStringIO, sys __license__ = 'GPL v3'
__copyright__ = '2012, Kovid Goyal <kovid@kovidgoyal.net>'
__docformat__ = 'restructuredtext en'
import shutil, os, re, struct, textwrap, cStringIO
try: try:
from PIL import Image as PILImage from PIL import Image as PILImage
@ -14,235 +16,22 @@ except ImportError:
from lxml import html, etree from lxml import html, etree
from calibre import xml_entity_to_unicode, CurrentDir, entity_to_unicode, \ from calibre import (xml_entity_to_unicode, entity_to_unicode)
replace_entities
from calibre.utils.filenames import ascii_filename from calibre.utils.filenames import ascii_filename
from calibre.utils.date import parse_date
from calibre.utils.cleantext import clean_ascii_chars from calibre.utils.cleantext import clean_ascii_chars
from calibre.ptempfile import TemporaryDirectory
from calibre.ebooks import DRMError, unit_convert from calibre.ebooks import DRMError, unit_convert
from calibre.ebooks.chardet import ENCODING_PATS from calibre.ebooks.chardet import ENCODING_PATS
from calibre.ebooks.mobi import MobiError from calibre.ebooks.mobi import MobiError
from calibre.ebooks.mobi.huffcdic import HuffReader from calibre.ebooks.mobi.huffcdic import HuffReader
from calibre.ebooks.mobi.langcodes import main_language, sub_language, mobi2iana
from calibre.ebooks.compression.palmdoc import decompress_doc from calibre.ebooks.compression.palmdoc import decompress_doc
from calibre.ebooks.metadata import MetaInformation from calibre.ebooks.metadata import MetaInformation
from calibre.ebooks.metadata.opf2 import OPFCreator, OPF from calibre.ebooks.metadata.opf2 import OPFCreator, OPF
from calibre.ebooks.metadata.toc import TOC from calibre.ebooks.metadata.toc import TOC
from calibre.ebooks.mobi.reader.headers import BookHeader
class TopazError(ValueError): class TopazError(ValueError):
pass pass
class EXTHHeader(object):
def __init__(self, raw, codec, title):
self.doctype = raw[:4]
self.length, self.num_items = struct.unpack('>LL', raw[4:12])
raw = raw[12:]
pos = 0
self.mi = MetaInformation(_('Unknown'), [_('Unknown')])
self.has_fake_cover = True
left = self.num_items
while left > 0:
left -= 1
id, size = struct.unpack('>LL', raw[pos:pos + 8])
content = raw[pos + 8:pos + size]
pos += size
if id >= 100 and id < 200:
self.process_metadata(id, content, codec)
elif id == 203:
self.has_fake_cover = bool(struct.unpack('>L', content)[0])
elif id == 201:
co, = struct.unpack('>L', content)
if co < 1e7:
self.cover_offset = co
elif id == 202:
self.thumbnail_offset, = struct.unpack('>L', content)
elif id == 501:
# cdetype
pass
elif id == 502:
# last update time
pass
elif id == 503: # Long title
# Amazon seems to regard this as the definitive book title
# rather than the title from the PDB header. In fact when
# sending MOBI files through Amazon's email service if the
# title contains non ASCII chars or non filename safe chars
# they are messed up in the PDB header
try:
title = content.decode(codec)
except:
pass
#else:
# print 'unknown record', id, repr(content)
if title:
self.mi.title = replace_entities(title)
def process_metadata(self, id, content, codec):
if id == 100:
if self.mi.authors == [_('Unknown')]:
self.mi.authors = []
au = content.decode(codec, 'ignore').strip()
self.mi.authors.append(au)
if re.match(r'\S+?\s*,\s+\S+', au.strip()):
self.mi.author_sort = au.strip()
elif id == 101:
self.mi.publisher = content.decode(codec, 'ignore').strip()
elif id == 103:
self.mi.comments = content.decode(codec, 'ignore')
elif id == 104:
self.mi.isbn = content.decode(codec, 'ignore').strip().replace('-', '')
elif id == 105:
if not self.mi.tags:
self.mi.tags = []
self.mi.tags.extend([x.strip() for x in content.decode(codec,
'ignore').split(';')])
self.mi.tags = list(set(self.mi.tags))
elif id == 106:
try:
self.mi.pubdate = parse_date(content, as_utc=False)
except:
pass
elif id == 108:
pass # Producer
elif id == 113:
pass # ASIN or UUID
#else:
# print 'unhandled metadata record', id, repr(content)
class BookHeader(object):
def __init__(self, raw, ident, user_encoding, log, try_extra_data_fix=False):
self.log = log
self.compression_type = raw[:2]
self.records, self.records_size = struct.unpack('>HH', raw[8:12])
self.encryption_type, = struct.unpack('>H', raw[12:14])
if ident == 'TEXTREAD':
self.codepage = 1252
if len(raw) <= 16:
self.codec = 'cp1252'
self.extra_flags = 0
self.title = _('Unknown')
self.language = 'ENGLISH'
self.sublanguage = 'NEUTRAL'
self.exth_flag, self.exth = 0, None
self.ancient = True
self.first_image_index = -1
self.mobi_version = 1
else:
self.ancient = False
self.doctype = raw[16:20]
self.length, self.type, self.codepage, self.unique_id, \
self.version = struct.unpack('>LLLLL', raw[20:40])
try:
self.codec = {
1252: 'cp1252',
65001: 'utf-8',
}[self.codepage]
except (IndexError, KeyError):
self.codec = 'cp1252' if not user_encoding else user_encoding
log.warn('Unknown codepage %d. Assuming %s' % (self.codepage,
self.codec))
# There exists some broken DRM removal tool that removes DRM but
# leaves the DRM fields in the header yielding a header size of
# 0xF8. The actual value of max_header_length should be 0xE8 but
# it's changed to accommodate this silly tool. Hopefully that will
# not break anything else.
max_header_length = 0xF8
if (ident == 'TEXTREAD' or self.length < 0xE4 or
self.length > max_header_length or
(try_extra_data_fix and self.length == 0xE4)):
self.extra_flags = 0
else:
self.extra_flags, = struct.unpack('>H', raw[0xF2:0xF4])
if self.compression_type == 'DH':
self.huff_offset, self.huff_number = struct.unpack('>LL', raw[0x70:0x78])
toff, tlen = struct.unpack('>II', raw[0x54:0x5c])
tend = toff + tlen
self.title = raw[toff:tend] if tend < len(raw) else _('Unknown')
langcode = struct.unpack('!L', raw[0x5C:0x60])[0]
langid = langcode & 0xFF
sublangid = (langcode >> 10) & 0xFF
self.language = main_language.get(langid, 'ENGLISH')
self.sublanguage = sub_language.get(sublangid, 'NEUTRAL')
self.mobi_version = struct.unpack('>I', raw[0x68:0x6c])[0]
self.first_image_index = struct.unpack('>L', raw[0x6c:0x6c + 4])[0]
self.exth_flag, = struct.unpack('>L', raw[0x80:0x84])
self.exth = None
if not isinstance(self.title, unicode):
self.title = self.title.decode(self.codec, 'replace')
if self.exth_flag & 0x40:
try:
self.exth = EXTHHeader(raw[16 + self.length:], self.codec, self.title)
self.exth.mi.uid = self.unique_id
try:
self.exth.mi.language = mobi2iana(langid, sublangid)
except:
self.log.exception('Unknown language code')
except:
self.log.exception('Invalid EXTH header')
self.exth_flag = 0
class MetadataHeader(BookHeader):
def __init__(self, stream, log):
self.stream = stream
self.ident = self.identity()
self.num_sections = self.section_count()
if self.num_sections >= 2:
header = self.header()
BookHeader.__init__(self, header, self.ident, None, log)
else:
self.exth = None
def identity(self):
self.stream.seek(60)
ident = self.stream.read(8).upper()
if ident not in ['BOOKMOBI', 'TEXTREAD']:
raise MobiError('Unknown book type: %s' % ident)
return ident
def section_count(self):
self.stream.seek(76)
return struct.unpack('>H', self.stream.read(2))[0]
def section_offset(self, number):
self.stream.seek(78 + number * 8)
return struct.unpack('>LBBBB', self.stream.read(8))[0]
def header(self):
section_headers = []
# First section with the metadata
section_headers.append(self.section_offset(0))
# Second section used to get the lengh of the first
section_headers.append(self.section_offset(1))
end_off = section_headers[1]
off = section_headers[0]
self.stream.seek(off)
return self.stream.read(end_off - off)
def section_data(self, number):
start = self.section_offset(number)
if number == self.num_sections -1:
end = os.stat(self.stream.name).st_size
else:
end = self.section_offset(number + 1)
self.stream.seek(start)
try:
return self.stream.read(end - start)
except OverflowError:
return self.stream.read(os.stat(self.stream.name).st_size - start)
class MobiReader(object): class MobiReader(object):
PAGE_BREAK_PAT = re.compile( PAGE_BREAK_PAT = re.compile(
r'<\s*/{0,1}\s*mbp:pagebreak((?:\s+[^/>]*){0,1})/{0,1}\s*>\s*(?:<\s*/{0,1}\s*mbp:pagebreak\s*/{0,1}\s*>)*', r'<\s*/{0,1}\s*mbp:pagebreak((?:\s+[^/>]*){0,1})/{0,1}\s*>\s*(?:<\s*/{0,1}\s*mbp:pagebreak\s*/{0,1}\s*>)*',
@ -312,15 +101,47 @@ class MobiReader(object):
self.sections.append((section(i), self.section_headers[i])) self.sections.append((section(i), self.section_headers[i]))
self.book_header = BookHeader(self.sections[0][0], self.ident, self.book_header = bh = BookHeader(self.sections[0][0], self.ident,
user_encoding, self.log, try_extra_data_fix=try_extra_data_fix) user_encoding, self.log, try_extra_data_fix=try_extra_data_fix)
self.name = self.name.decode(self.book_header.codec, 'replace') self.name = self.name.decode(self.book_header.codec, 'replace')
self.kf8_type = None
k8i = getattr(self.book_header.exth, 'kf8_header', None)
if self.book_header.mobi_version == 8:
self.kf8_type = 'standalone'
elif k8i is not None: # Check for joint mobi 6 and kf 8 file
try:
raw = self.sections[k8i-1][0]
except:
raw = None
if raw == b'BOUNDARY':
try:
self.book_header = BookHeader(self.sections[k8i][0],
self.ident, user_encoding, self.log)
# The following are only correct in the Mobi 6
# header not the Mobi 8 header
for x in ('first_image_index',):
setattr(self.book_header, x, getattr(bh, x))
if hasattr(self.book_header, 'huff_offset'):
self.book_header.huff_offset += k8i
self.kf8_type = 'joint'
self.kf8_boundary = k8i-1
except:
self.book_header = bh
def check_for_drm(self):
if self.book_header.encryption_type != 0:
try:
name = self.book_header.exth.mi.title
except:
name = self.name
if not name:
name = self.name
raise DRMError(name)
def extract_content(self, output_dir, parse_cache): def extract_content(self, output_dir, parse_cache):
output_dir = os.path.abspath(output_dir) output_dir = os.path.abspath(output_dir)
if self.book_header.encryption_type != 0: self.check_for_drm()
raise DRMError(self.name)
processed_records = self.extract_text() processed_records = self.extract_text()
if self.debug is not None: if self.debug is not None:
parse_cache['calibre_raw_mobi_markup'] = self.mobi_html parse_cache['calibre_raw_mobi_markup'] = self.mobi_html
@ -330,6 +151,7 @@ class MobiReader(object):
self.processed_html = self.processed_html.replace('</</', '</') self.processed_html = self.processed_html.replace('</</', '</')
self.processed_html = re.sub(r'</([a-zA-Z]+)<', r'</\1><', self.processed_html = re.sub(r'</([a-zA-Z]+)<', r'</\1><',
self.processed_html) self.processed_html)
self.processed_html = self.processed_html.replace(u'\ufeff', '')
# Remove tags of the form <xyz: ...> as they can cause issues further # Remove tags of the form <xyz: ...> as they can cause issues further
# along the pipeline # along the pipeline
self.processed_html = re.sub(r'</{0,1}[a-zA-Z]+:\s+[^>]*>', '', self.processed_html = re.sub(r'</{0,1}[a-zA-Z]+:\s+[^>]*>', '',
@ -916,11 +738,12 @@ class MobiReader(object):
trail_size = self.sizeof_trailing_entries(data) trail_size = self.sizeof_trailing_entries(data)
return data[:len(data)-trail_size] return data[:len(data)-trail_size]
def extract_text(self): def extract_text(self, offset=1):
self.log.debug('Extracting text...') self.log.debug('Extracting text...')
text_sections = [self.text_section(i) for i in range(1, text_sections = [self.text_section(i) for i in xrange(offset,
min(self.book_header.records + 1, len(self.sections)))] min(self.book_header.records + offset, len(self.sections)))]
processed_records = list(range(0, self.book_header.records + 1)) processed_records = list(range(offset-1, self.book_header.records +
offset))
self.mobi_html = '' self.mobi_html = ''
@ -1027,63 +850,6 @@ class MobiReader(object):
self.image_names.append(os.path.basename(path)) self.image_names.append(os.path.basename(path))
im.save(open(path, 'wb'), format='JPEG') im.save(open(path, 'wb'), format='JPEG')
def get_metadata(stream):
stream.seek(0)
try:
raw = stream.read(3)
except:
raw = ''
stream.seek(0)
if raw == 'TPZ':
from calibre.ebooks.metadata.topaz import get_metadata
return get_metadata(stream)
from calibre.utils.logging import Log
log = Log()
try:
mi = MetaInformation(os.path.basename(stream.name), [_('Unknown')])
except:
mi = MetaInformation(_('Unknown'), [_('Unknown')])
mh = MetadataHeader(stream, log)
if mh.title and mh.title != _('Unknown'):
mi.title = mh.title
if mh.exth is not None:
if mh.exth.mi is not None:
mi = mh.exth.mi
else:
size = sys.maxint
if hasattr(stream, 'seek') and hasattr(stream, 'tell'):
pos = stream.tell()
stream.seek(0, 2)
size = stream.tell()
stream.seek(pos)
if size < 4*1024*1024:
with TemporaryDirectory('_mobi_meta_reader') as tdir:
with CurrentDir(tdir):
mr = MobiReader(stream, log)
parse_cache = {}
mr.extract_content(tdir, parse_cache)
if mr.embedded_mi is not None:
mi = mr.embedded_mi
if hasattr(mh.exth, 'cover_offset'):
cover_index = mh.first_image_index + mh.exth.cover_offset
data = mh.section_data(int(cover_index))
else:
try:
data = mh.section_data(mh.first_image_index)
except:
data = ''
buf = cStringIO.StringIO(data)
try:
im = PILImage.open(buf)
except:
log.exception('Failed to read MOBI cover')
else:
obuf = cStringIO.StringIO()
im.convert('RGB').save(obuf, format='JPEG')
mi.cover_data = ('jpg', obuf.getvalue())
return mi
def test_mbp_regex(): def test_mbp_regex():
for raw, m in { for raw, m in {
'<mbp:pagebreak></mbp:pagebreak>':'', '<mbp:pagebreak></mbp:pagebreak>':'',

View File

@ -0,0 +1,392 @@
#!/usr/bin/env python
# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai
from __future__ import (unicode_literals, division, absolute_import,
print_function)
__license__ = 'GPL v3'
__copyright__ = '2012, Kovid Goyal <kovid@kovidgoyal.net>'
__docformat__ = 'restructuredtext en'
import struct, re, os, imghdr
from collections import namedtuple
from itertools import repeat
from calibre.ebooks.mobi.reader.headers import NULL_INDEX
from calibre.ebooks.mobi.reader.index import read_index
from calibre.ebooks.mobi.reader.ncx import read_ncx, build_toc
from calibre.ebooks.mobi.reader.markup import expand_mobi8_markup
from calibre.ebooks.metadata.opf2 import Guide, OPFCreator
from calibre.ebooks.mobi.utils import read_font_record
Part = namedtuple('Part',
'num type filename start end aid')
Elem = namedtuple('Elem',
'insert_pos toc_text file_number sequence_number start_pos '
'length')
FlowInfo = namedtuple('FlowInfo',
'type format dir fname')
class Mobi8Reader(object):
def __init__(self, mobi6_reader, log):
self.mobi6_reader, self.log = mobi6_reader, log
self.header = mobi6_reader.book_header
self.encrypted_fonts = []
def __call__(self):
self.mobi6_reader.check_for_drm()
offset = 1
res_end = len(self.mobi6_reader.sections)
if self.mobi6_reader.kf8_type == 'joint':
offset = self.mobi6_reader.kf8_boundary + 2
res_end = self.mobi6_reader.kf8_boundary
self.processed_records = self.mobi6_reader.extract_text(offset=offset)
self.raw_ml = self.mobi6_reader.mobi_html
with open('debug-raw.html', 'wb') as f:
f.write(self.raw_ml)
self.kf8_sections = self.mobi6_reader.sections[offset-1:]
first_resource_index = self.header.first_image_index
if first_resource_index in {-1, NULL_INDEX}:
first_resource_index = self.header.records + 1
self.resource_sections = \
self.mobi6_reader.sections[first_resource_index:res_end]
self.cover_offset = getattr(self.header.exth, 'cover_offset', None)
self.read_indices()
self.build_parts()
guide = self.create_guide()
ncx = self.create_ncx()
resource_map = self.extract_resources()
spine = self.expand_text(resource_map)
return self.write_opf(guide, ncx, spine, resource_map)
def read_indices(self):
self.flow_table = (0, NULL_INDEX)
if self.header.fdstidx != NULL_INDEX:
header = self.kf8_sections[self.header.fdstidx][0]
if header[:4] != b'FDST':
raise ValueError('KF8 does not have a valid FDST record')
num_sections, = struct.unpack_from(b'>L', header, 0x08)
sections = header[0x0c:]
self.flow_table = struct.unpack_from(b'>%dL' % (num_sections*2),
sections, 0)[::2] + (NULL_INDEX,)
self.files = []
if self.header.skelidx != NULL_INDEX:
table = read_index(self.kf8_sections, self.header.skelidx,
self.header.codec)[0]
File = namedtuple('File',
'file_number name divtbl_count start_position length')
for i, text in enumerate(table.iterkeys()):
tag_map = table[text]
self.files.append(File(i, text, tag_map[1][0],
tag_map[6][0], tag_map[6][1]))
self.elems = []
if self.header.dividx != NULL_INDEX:
table, cncx = read_index(self.kf8_sections, self.header.dividx,
self.header.codec)
for i, text in enumerate(table.iterkeys()):
tag_map = table[text]
toc_text = cncx[tag_map[2][0]]
self.elems.append(Elem(int(text), toc_text, tag_map[3][0],
tag_map[4][0], tag_map[6][0], tag_map[6][1]))
self.guide = []
if self.header.othidx != NULL_INDEX:
table, cncx = read_index(self.kf8_sections, self.header.othidx,
self.header.codec)
Item = namedtuple('Item',
'type title div_frag_num')
for i, ref_type in enumerate(table.iterkeys()):
tag_map = table[ref_type]
# ref_type, ref_title, div/frag number
title = cncx[tag_map[1][0]]
fileno = None
if 3 in tag_map.keys():
fileno = tag_map[3][0]
if 6 in tag_map.keys():
fileno = tag_map[6][0]
self.guide.append(Item(ref_type.decode(self.header.codec),
title, fileno))
def build_parts(self):
raw_ml = self.mobi6_reader.mobi_html
self.flows = []
self.flowinfo = []
# now split the raw_ml into its flow pieces
for j in xrange(0, len(self.flow_table)-1):
start = self.flow_table[j]
end = self.flow_table[j+1]
if end == NULL_INDEX:
end = len(raw_ml)
self.flows.append(raw_ml[start:end])
# the first piece represents the xhtml text
text = self.flows[0]
self.flows[0] = b''
# walk the <skeleton> and <div> tables to build original source xhtml
# files *without* destroying any file position information needed for
# later href processing and create final list of file separation start:
# stop points and etc in partinfo
self.parts = []
self.partinfo = []
divptr = 0
baseptr = 0
for skelnum, skelname, divcnt, skelpos, skellen in self.files:
baseptr = skelpos + skellen
skeleton = text[skelpos:baseptr]
for i in xrange(divcnt):
insertpos, idtext, filenum, seqnum, startpos, length = \
self.elems[divptr]
if i == 0:
aidtext = idtext[12:-2]
filename = 'part%04d.html' % filenum
part = text[baseptr:baseptr + length]
insertpos = insertpos - skelpos
skeleton = skeleton[0:insertpos] + part + skeleton[insertpos:]
baseptr = baseptr + length
divptr += 1
self.parts.append(skeleton)
self.partinfo.append(Part(skelnum, 'text', filename, skelpos,
baseptr, aidtext))
# The primary css style sheet is typically stored next followed by any
# snippets of code that were previously inlined in the
# original xhtml but have been stripped out and placed here.
# This can include local CDATA snippets and and svg sections.
# The problem is that for most browsers and ereaders, you can not
# use <img src="imageXXXX.svg" /> to import any svg image that itself
# properly uses an <image/> tag to import some raster image - it
# should work according to the spec but does not for almost all browsers
# and ereaders and causes epub validation issues because those raster
# images are in manifest but not in xhtml text - since they only
# referenced from an svg image
# So we need to check the remaining flow pieces to see if they are css
# or svg images. if svg images, we must check if they have an <image/>
# and if so inline them into the xhtml text pieces.
# there may be other sorts of pieces stored here but until we see one
# in the wild to reverse engineer we won't be able to tell
self.flowinfo.append(FlowInfo(None, None, None, None))
svg_tag_pattern = re.compile(br'''(<svg[^>]*>)''', re.IGNORECASE)
image_tag_pattern = re.compile(br'''(<image[^>]*>)''', re.IGNORECASE)
for j in xrange(1, len(self.flows)):
flowpart = self.flows[j]
nstr = '%04d' % j
m = svg_tag_pattern.search(flowpart)
if m != None:
# svg
typ = 'svg'
start = m.start()
m2 = image_tag_pattern.search(flowpart)
if m2 != None:
format = 'inline'
dir = None
fname = None
# strip off anything before <svg if inlining
flowpart = flowpart[start:]
else:
format = 'file'
dir = "images"
fname = 'svgimg' + nstr + '.svg'
else:
# search for CDATA and if exists inline it
if flowpart.find('[CDATA[') >= 0:
typ = 'css'
flowpart = '<style type="text/css">\n' + flowpart + '\n</style>\n'
format = 'inline'
dir = None
fname = None
else:
# css - assume as standalone css file
typ = 'css'
format = 'file'
dir = "styles"
fname = nstr + '.css'
self.flows[j] = flowpart
self.flowinfo.append(FlowInfo(typ, format, dir, fname))
def get_file_info(self, pos):
''' Get information about the part (file) that exists at pos in
the raw markup '''
for part in self.partinfo:
if pos >= part.start and pos < part.end:
return part
return Part(*repeat(None, len(Part._fields)))
def get_id_tag_by_pos_fid(self, posfid, offset):
# first convert kindle:pos:fid and offset info to position in file
row = int(posfid, 32)
off = int(offset, 32)
[insertpos, idtext, filenum, seqnm, startpos, length] = self.elems[row]
pos = insertpos + off
fname = self.get_file_info(pos).filename
# an existing "id=" must exist in original xhtml otherwise it would not
# have worked for linking. Amazon seems to have added its own
# additional "aid=" inside tags whose contents seem to represent some
# position information encoded into Base32 name.
# so find the closest "id=" before position the file by actually
# searching in that file
idtext = self.get_id_tag(pos)
return fname, idtext
def get_id_tag(self, pos):
# find the correct tag by actually searching in the destination
# textblock at position
fi = self.get_file_info(pos)
if fi.num is None and fi.start is None:
raise ValueError('No file contains pos: %d'%pos)
textblock = self.parts[fi.num]
id_map = []
npos = pos - fi.start
# if npos inside a tag then search all text before the its end of tag
# marker
pgt = textblock.find(b'>', npos)
plt = textblock.find(b'<', npos)
if pgt < plt:
npos = pgt + 1
# find id links only inside of tags
# inside any < > pair find all "id=' and return whatever is inside
# the quotes
id_pattern = re.compile(br'''<[^>]*\sid\s*=\s*['"]([^'"]*)['"][^>]*>''',
re.IGNORECASE)
for m in re.finditer(id_pattern, textblock):
id_map.append((m.start(), m.group(1)))
if not id_map:
# Found no id in the textblock, link must be to top of file
return b''
# if npos is before first id= inside a tag, return the first
if npos < id_map[0][0]:
return id_map[0][1]
# if npos is after the last id= inside a tag, return the last
if npos > id_map[-1][0]:
return id_map[-1][1]
# otherwise find last id before npos
for i, item in enumerate(id_map):
if npos < item[0]:
return id_map[i-1][1]
return id_map[0][1]
def create_guide(self):
guide = Guide()
for ref_type, ref_title, fileno in self.guide:
elem = self.elems[fileno]
fi = self.get_file_info(elem.insert_pos)
idtext = self.get_id_tag(elem.insert_pos).decode(self.header.codec)
linktgt = fi.filename
if idtext:
linktgt += b'#' + idtext
g = Guide.Reference('%s/%s'%(fi.type, linktgt), os.getcwdu())
g.title, g.type = ref_title, ref_type
guide.append(g)
so = self.header.exth.start_offset
if so not in {None, NULL_INDEX}:
fi = self.get_file_info(so)
if fi.filename is not None:
idtext = self.get_id_tag(so).decode(self.header.codec)
linktgt = fi.filename
if idtext:
linktgt += '#' + idtext
g = Guide.Reference('%s/%s'%(fi.type, linktgt), os.getcwdu())
g.title, g.type = 'start', 'text'
guide.append(g)
return guide
def create_ncx(self):
index_entries = read_ncx(self.kf8_sections, self.header.ncxidx,
self.header.codec)
# Add href and anchor info to the index entries
for entry in index_entries:
pos = entry['pos']
fi = self.get_file_info(pos)
#print (11111111, fi, entry['pos_fid'])
if fi.filename is None:
raise ValueError('Index entry has invalid pos: %d'%pos)
idtag = self.get_id_tag(pos).decode(self.header.codec)
entry['href'] = '%s/%s'%(fi.type, fi.filename)
entry['idtag'] = idtag
# Build the TOC object
return build_toc(index_entries)
def extract_resources(self):
resource_map = []
for x in ('fonts', 'images'):
os.mkdir(x)
for i, sec in enumerate(self.resource_sections):
fname_idx = i+1
data = sec[0]
typ = data[:4]
href = None
if typ in {b'FLIS', b'FCIS', b'SRCS', b'\xe9\x8e\r\n',
b'RESC', b'BOUN', b'FDST', b'DATP', b'AUDI', b'VIDE'}:
pass # Ignore these records
elif typ == b'FONT':
font = read_font_record(data)
href = "fonts/%05d.%s" % (fname_idx, font['ext'])
if font['err']:
self.log.warn('Reading font record %d failed: %s'%(
fname_idx, font['err']))
if font['headers']:
self.log.debug('Font record headers: %s'%font['headers'])
with open(href.replace('/', os.sep), 'wb') as f:
f.write(font['font_data'] if font['font_data'] else
font['raw_data'])
if font['encrypted']:
self.encrypted_fonts.append(href)
else:
imgtype = imghdr.what(None, data)
if imgtype is None:
imgtype = 'unknown'
href = 'images/%05d.%s'%(fname_idx, imgtype)
with open(href.replace('/', os.sep), 'wb') as f:
f.write(data)
resource_map.append(href)
return resource_map
def expand_text(self, resource_map):
return expand_mobi8_markup(self, resource_map, self.log)
def write_opf(self, guide, toc, spine, resource_map):
mi = self.header.exth.mi
if (self.cover_offset is not None and self.cover_offset <
len(resource_map)):
mi.cover = resource_map[self.cover_offset]
opf = OPFCreator(os.getcwdu(), mi)
opf.guide = guide
def exclude(path):
return os.path.basename(path) == 'debug-raw.html'
opf.create_manifest_from_files_in([os.getcwdu()], exclude=exclude)
opf.create_spine(spine)
opf.set_toc(toc)
with open('metadata.opf', 'wb') as of, open('toc.ncx', 'wb') as ncx:
opf.render(of, ncx, 'toc.ncx')
return 'metadata.opf'

View File

@ -0,0 +1,99 @@
#!/usr/bin/env python
# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai
from __future__ import (unicode_literals, division, absolute_import,
print_function)
__license__ = 'GPL v3'
__copyright__ = '2012, Kovid Goyal <kovid@kovidgoyal.net>'
__docformat__ = 'restructuredtext en'
import os
from calibre.ebooks.metadata.toc import TOC
from calibre.ebooks.mobi.reader.headers import NULL_INDEX
from calibre.ebooks.mobi.reader.index import read_index
tag_fieldname_map = {
1: ['pos',0],
2: ['len',0],
3: ['noffs',0],
4: ['hlvl',0],
5: ['koffs',0],
6: ['pos_fid',0],
21: ['parent',0],
22: ['child1',0],
23: ['childn',0],
69: ['image_index',0],
70 : ['desc_offset', 0], # 'Description offset in cncx'
71 : ['author_offset', 0], # 'Author offset in cncx'
72 : ['image_caption_offset', 0], # 'Image caption offset in cncx',
73 : ['image_attr_offset', 0], # 'Image attribution offset in cncx',
}
default_entry = {
'pos': -1,
'len': 0,
'noffs': -1,
'text' : "Unknown Text",
'hlvl' : -1,
'kind' : "Unknown Class",
'pos_fid' : None,
'parent' : -1,
'child1' : -1,
'childn' : -1,
'description': None,
'author': None,
'image_caption': None,
'image_attribution': None,
}
def read_ncx(sections, index, codec):
index_entries = []
if index != NULL_INDEX:
table, cncx = read_index(sections, index, codec)
for num, x in enumerate(table.iteritems()):
text, tag_map = x
entry = default_entry.copy()
entry['name'] = text
entry['num'] = num
for tag in tag_fieldname_map.iterkeys():
fieldname, i = tag_fieldname_map[tag]
if tag in tag_map:
fieldvalue = tag_map[tag][i]
if tag == 6:
# Appears to be an idx into the KF8 elems table with an
# offset
fieldvalue = tuple(tag_map[tag])
entry[fieldname] = fieldvalue
for which, name in {3:'text', 5:'kind', 70:'description',
71:'author', 72:'image_caption',
73:'image_attribution'}.iteritems():
if tag == which:
entry[name] = cncx.get(fieldvalue,
default_entry[name])
index_entries.append(entry)
return index_entries
def build_toc(index_entries):
ans = TOC(base_path=os.getcwdu())
levels = {x['hlvl'] for x in index_entries}
num_map = {-1: ans}
level_map = {l:[x for x in index_entries if x['hlvl'] == l] for l in
levels}
for lvl in sorted(levels):
for item in level_map[lvl]:
parent = num_map[item['parent']]
child = parent.add_item(item['href'], item['idtag'], item['text'])
num_map[item['num']] = child
# Set play orders in depth first order
for i, item in enumerate(ans.flat()):
item.play_order = i
return ans

View File

@ -7,7 +7,7 @@ __license__ = 'GPL v3'
__copyright__ = '2011, Kovid Goyal <kovid@kovidgoyal.net>' __copyright__ = '2011, Kovid Goyal <kovid@kovidgoyal.net>'
__docformat__ = 'restructuredtext en' __docformat__ = 'restructuredtext en'
import struct import struct, string, imghdr, zlib
from collections import OrderedDict from collections import OrderedDict
from calibre.utils.magick.draw import Image, save_cover_data_to, thumbnail from calibre.utils.magick.draw import Image, save_cover_data_to, thumbnail
@ -15,7 +15,13 @@ from calibre.ebooks import normalize
IMAGE_MAX_SIZE = 10 * 1024 * 1024 IMAGE_MAX_SIZE = 10 * 1024 * 1024
def decode_hex_number(raw): def decode_string(raw, codec='utf-8'):
length, = struct.unpack(b'>B', raw[0])
raw = raw[1:1+length]
consumed = length+1
return raw.decode(codec), consumed
def decode_hex_number(raw, codec='utf-8'):
''' '''
Return a variable length number encoded using hexadecimal encoding. These Return a variable length number encoded using hexadecimal encoding. These
numbers have the first byte which tells the number of bytes that follow. numbers have the first byte which tells the number of bytes that follow.
@ -25,13 +31,16 @@ def decode_hex_number(raw):
:param raw: Raw binary data as a bytestring :param raw: Raw binary data as a bytestring
:return: The number and the number of bytes from raw that the number :return: The number and the number of bytes from raw that the number
occupies occupies.
''' '''
length, = struct.unpack(b'>B', raw[0]) raw, consumed = decode_string(raw, codec=codec)
raw = raw[1:1+length]
consumed = length+1
return int(raw, 16), consumed return int(raw, 16), consumed
def encode_string(raw):
ans = bytearray(bytes(raw))
ans.insert(0, len(ans))
return bytes(ans)
def encode_number_as_hex(num): def encode_number_as_hex(num):
''' '''
Encode num as a variable length encoded hexadecimal number. Returns the Encode num as a variable length encoded hexadecimal number. Returns the
@ -44,9 +53,7 @@ def encode_number_as_hex(num):
nlen = len(num) nlen = len(num)
if nlen % 2 != 0: if nlen % 2 != 0:
num = b'0'+num num = b'0'+num
ans = bytearray(num) return encode_string(num)
ans.insert(0, len(num))
return bytes(ans)
def encint(value, forward=True): def encint(value, forward=True):
''' '''
@ -124,12 +131,18 @@ def rescale_image(data, maxsizeb=IMAGE_MAX_SIZE, dimen=None):
to JPEG. Ensure the resultant image has a byte size less than to JPEG. Ensure the resultant image has a byte size less than
maxsizeb. maxsizeb.
If dimen is not None, generate a thumbnail of width=dimen, height=dimen If dimen is not None, generate a thumbnail of
width=dimen, height=dimen or width, height = dimen (depending on the type
of dimen)
Returns the image as a bytestring Returns the image as a bytestring
''' '''
if dimen is not None: if dimen is not None:
data = thumbnail(data, width=dimen, height=dimen, if hasattr(dimen, '__len__'):
width, height = dimen
else:
width = height = dimen
data = thumbnail(data, width=width, height=height,
compression_quality=90)[-1] compression_quality=90)[-1]
else: else:
# Replace transparent pixels with white pixels and convert to JPEG # Replace transparent pixels with white pixels and convert to JPEG
@ -340,4 +353,151 @@ def detect_periodical(toc, log=None):
return False return False
return True return True
def count_set_bits(num):
if num < 0:
num = -num
ans = 0
while num > 0:
ans += (num & 0b1)
num >>= 1
return ans
def to_base(num, base=32):
digits = string.digits + string.ascii_uppercase
sign = 1 if num >= 0 else -1
if num == 0: return '0'
num *= sign
ans = []
while num:
ans.append(digits[(num % base)])
num //= base
if sign < 0:
ans.append('-')
ans.reverse()
return ''.join(ans)
def mobify_image(data):
'Convert PNG images to GIF as the idiotic Kindle cannot display some PNG'
what = imghdr.what(None, data)
if what == 'png':
im = Image()
im.load(data)
data = im.export('gif')
return data
def read_zlib_header(header):
header = bytearray(header)
# See sec 2.2 of RFC 1950 for the zlib stream format
# http://www.ietf.org/rfc/rfc1950.txt
if (header[0]*256 + header[1])%31 != 0:
return None, 'Bad zlib header, FCHECK failed'
cmf = header[0] & 0b1111
cinfo = header[0] >> 4
if cmf != 8:
return None, 'Unknown zlib compression method: %d'%cmf
if cinfo > 7:
return None, 'Invalid CINFO field in zlib header: %d'%cinfo
fdict = (header[1]&0b10000)>>5
if fdict != 0:
return None, 'FDICT based zlib compression not supported'
wbits = cinfo + 8
return wbits, None
def read_font_record(data, extent=1040): # {{{
'''
Return the font encoded in the MOBI FONT record represented by data.
The return value in a dict with fields raw_data, font_data, err, ext,
headers.
:param extent: The number of obfuscated bytes. So far I have only
encountered files with 1040 obfuscated bytes. If you encounter an
obfuscated record for which this function fails, try different extent
values (easily automated).
raw_data is the raw data in the font record
font_data is the decoded font_data or None if an error occurred
err is not None if some error occurred
ext is the font type (ttf for TrueType, dat for unknown and failed if an
error occurred)
headers is the list of decoded headers from the font record or None if
decoding failed
'''
# Format:
# bytes 0 - 3: 'FONT'
# bytes 4 - 7: Uncompressed size
# bytes 8 - 11: flags
# bit 1 - zlib compression
# bit 2 - XOR obfuscated
# bytes 12 - 15: offset to start of compressed data
# bytes 16 - 19: length of XOR string
# bytes 19 - 23: offset to start of XOR data
# The zlib compressed data begins with 2 bytes of header and
# has 4 bytes of checksum at the end
ans = {'raw_data':data, 'font_data':None, 'err':None, 'ext':'failed',
'headers':None, 'encrypted':False}
try:
usize, flags, dstart, xor_len, xor_start = struct.unpack_from(
b'>LLLLL', data, 4)
except:
ans['err'] = 'Failed to read font record header fields'
return ans
font_data = data[dstart:]
ans['headers'] = {'usize':usize, 'flags':bin(flags), 'xor_len':xor_len,
'xor_start':xor_start, 'dstart':dstart}
if flags & 0b10:
# De-obfuscate the data
key = bytearray(data[xor_start:xor_start+xor_len])
buf = bytearray(font_data)
extent = len(font_data) if extent is None else extent
extent = min(extent, len(font_data))
for n in xrange(extent):
buf[n] ^= key[n%xor_len] # XOR of buf and key
font_data = bytes(buf)
ans['encrypted'] = True
if flags & 0b1:
# ZLIB compressed data
wbits, err = read_zlib_header(font_data[:2])
if err is not None:
ans['err'] = err
return ans
adler32, = struct.unpack_from(b'>I', font_data, len(font_data) - 4)
try:
# remove two bytes of zlib header and 4 bytes of trailing checksum
# negative wbits indicates no standard gzip header
font_data = zlib.decompress(font_data[2:-4], -wbits, usize)
except Exception as e:
ans['err'] = 'Failed to zlib decompress font data (%s)'%e
return ans
if len(font_data) != usize:
ans['err'] = 'Uncompressed font size mismatch'
return ans
if False:
# For some reason these almost never match, probably Amazon has a
# buggy Adler32 implementation
sig = (zlib.adler32(font_data) & 0xffffffff)
if sig != adler32:
ans['err'] = ('Adler checksum did not match. Stored: %d '
'Calculated: %d')%(adler32, sig)
return ans
ans['font_data'] = font_data
sig = font_data[:4]
ans['ext'] = ('ttf' if sig in {b'\0\1\0\0', b'true', b'ttcf'}
else 'otf' if sig == b'OTTO' else 'dat')
return ans
# }}}

File diff suppressed because it is too large Load Diff

View File

@ -18,9 +18,10 @@ from calibre.ebooks.compression.palmdoc import compress_doc
from calibre.ebooks.mobi.langcodes import iana2mobi from calibre.ebooks.mobi.langcodes import iana2mobi
from calibre.utils.filenames import ascii_filename from calibre.utils.filenames import ascii_filename
from calibre.ebooks.mobi.writer2 import (PALMDOC, UNCOMPRESSED, RECORD_SIZE) from calibre.ebooks.mobi.writer2 import (PALMDOC, UNCOMPRESSED, RECORD_SIZE)
from calibre.ebooks.mobi.utils import (rescale_image, encint, from calibre.ebooks.mobi.utils import (rescale_image, encint, mobify_image,
encode_trailing_data, align_block, detect_periodical) encode_trailing_data, align_block, detect_periodical)
from calibre.ebooks.mobi.writer2.indexer import Indexer from calibre.ebooks.mobi.writer2.indexer import Indexer
from calibre.ebooks.mobi import MAX_THUMB_DIMEN, MAX_THUMB_SIZE
EXTH_CODES = { EXTH_CODES = {
'creator': 100, 'creator': 100,
@ -46,9 +47,6 @@ EXTH_CODES = {
# Disabled as I dont care about uncrossable breaks # Disabled as I dont care about uncrossable breaks
WRITE_UNCROSSABLE_BREAKS = False WRITE_UNCROSSABLE_BREAKS = False
MAX_THUMB_SIZE = 16 * 1024
MAX_THUMB_DIMEN = (180, 240)
class MobiWriter(object): class MobiWriter(object):
COLLAPSE_RE = re.compile(r'[ \t\r\n\v]+') COLLAPSE_RE = re.compile(r'[ \t\r\n\v]+')
@ -181,7 +179,11 @@ class MobiWriter(object):
for item in self.oeb.manifest.values(): for item in self.oeb.manifest.values():
if item.media_type not in OEB_RASTER_IMAGES: continue if item.media_type not in OEB_RASTER_IMAGES: continue
try: try:
data = rescale_image(item.data) data = item.data
if self.opts.mobi_keep_original_images:
data = mobify_image(data)
else:
data = rescale_image(data)
except: except:
oeb.logger.warn('Bad image file %r' % item.href) oeb.logger.warn('Bad image file %r' % item.href)
continue continue

View File

@ -832,22 +832,8 @@ class Manifest(object):
def _parse_css(self, data): def _parse_css(self, data):
from cssutils.css import CSSRule from cssutils import CSSParser, log, resolveImports
from cssutils import CSSParser, log
log.setLevel(logging.WARN) log.setLevel(logging.WARN)
def get_style_rules_from_import(import_rule):
ans = []
if not import_rule.styleSheet:
return ans
rules = import_rule.styleSheet.cssRules
for rule in rules:
if rule.type == CSSRule.IMPORT_RULE:
ans.extend(get_style_rules_from_import(rule))
elif rule.type in (CSSRule.FONT_FACE_RULE,
CSSRule.STYLE_RULE):
ans.append(rule)
return ans
self.oeb.log.debug('Parsing', self.href, '...') self.oeb.log.debug('Parsing', self.href, '...')
data = self.oeb.decode(data) data = self.oeb.decode(data)
data = self.oeb.css_preprocessor(data, add_namespace=True) data = self.oeb.css_preprocessor(data, add_namespace=True)
@ -855,19 +841,8 @@ class Manifest(object):
fetcher=self.override_css_fetch or self._fetch_css, fetcher=self.override_css_fetch or self._fetch_css,
log=_css_logger) log=_css_logger)
data = parser.parseString(data, href=self.href) data = parser.parseString(data, href=self.href)
data = resolveImports(data)
data.namespaces['h'] = XHTML_NS data.namespaces['h'] = XHTML_NS
import_rules = list(data.cssRules.rulesOfType(CSSRule.IMPORT_RULE))
rules_to_append = []
insert_index = None
for r in data.cssRules.rulesOfType(CSSRule.STYLE_RULE):
insert_index = data.cssRules.index(r)
break
for rule in import_rules:
rules_to_append.extend(get_style_rules_from_import(rule))
for r in reversed(rules_to_append):
data.insertRule(r, index=insert_index)
for rule in import_rules:
data.deleteRule(rule)
return data return data
def _fetch_css(self, path): def _fetch_css(self, path):
@ -880,7 +855,8 @@ class Manifest(object):
self.oeb.logger.warn('CSS import of non-CSS file %r' % path) self.oeb.logger.warn('CSS import of non-CSS file %r' % path)
return (None, None) return (None, None)
data = item.data.cssText data = item.data.cssText
return ('utf-8', data) enc = None if isinstance(data, unicode) else 'utf-8'
return (enc, data)
# }}} # }}}
@ -1487,9 +1463,17 @@ class TOC(object):
except ValueError: except ValueError:
return 1 return 1
def __str__(self): def get_lines(self, lvl=0):
return 'TOC: %s --> %s'%(self.title, self.href) ans = [(u'\t'*lvl) + u'TOC: %s --> %s'%(self.title, self.href)]
for child in self:
ans.extend(child.get_lines(lvl+1))
return ans
def __str__(self):
return b'\n'.join([x.encode('utf-8') for x in self.get_lines()])
def __unicode__(self):
return u'\n'.join(self.get_lines())
def to_opf1(self, tour): def to_opf1(self, tour):
for node in self.nodes: for node in self.nodes:

View File

@ -352,9 +352,12 @@ def parse_html(data, log=None, decoder=None, preprocessor=None,
title = etree.SubElement(head, XHTML('title')) title = etree.SubElement(head, XHTML('title'))
title.text = _('Unknown') title.text = _('Unknown')
elif not xpath(data, '/h:html/h:head/h:title'): elif not xpath(data, '/h:html/h:head/h:title'):
log.warn('File %s missing <title/> element' % filename)
title = etree.SubElement(head, XHTML('title')) title = etree.SubElement(head, XHTML('title'))
title.text = _('Unknown') title.text = _('Unknown')
# Ensure <title> is not empty
title = xpath(data, '/h:html/h:head/h:title')[0]
if not title.text or not title.text.strip():
title.text = _('Unknown')
# Remove any encoding-specifying <meta/> elements # Remove any encoding-specifying <meta/> elements
for meta in META_XP(data): for meta in META_XP(data):
meta.getparent().remove(meta) meta.getparent().remove(meta)

View File

@ -8,7 +8,7 @@ __copyright__ = '2008, Marshall T. Vandegrift <llasram@gmail.com>'
from calibre.ebooks.oeb.base import XML, XHTML, XHTML_NS from calibre.ebooks.oeb.base import XML, XHTML, XHTML_NS
from calibre.ebooks.oeb.base import XHTML_MIME, CSS_MIME from calibre.ebooks.oeb.base import XHTML_MIME, CSS_MIME
from calibre.ebooks.oeb.base import element from calibre.ebooks.oeb.base import element, XPath
__all__ = ['HTMLTOCAdder'] __all__ = ['HTMLTOCAdder']
@ -62,18 +62,24 @@ class HTMLTOCAdder(object):
return cls(title=opts.toc_title) return cls(title=opts.toc_title)
def __call__(self, oeb, context): def __call__(self, oeb, context):
has_toc = getattr(getattr(oeb, 'toc', False), 'nodes', False)
if 'toc' in oeb.guide: if 'toc' in oeb.guide:
# Ensure toc pointed to in <guide> is in spine # Ensure toc pointed to in <guide> is in spine
from calibre.ebooks.oeb.base import urlnormalize from calibre.ebooks.oeb.base import urlnormalize
href = urlnormalize(oeb.guide['toc'].href) href = urlnormalize(oeb.guide['toc'].href)
if href in oeb.manifest.hrefs: if href in oeb.manifest.hrefs:
item = oeb.manifest.hrefs[href] item = oeb.manifest.hrefs[href]
if (hasattr(item.data, 'xpath') and
XPath('//h:a[@href]')(item.data)):
if oeb.spine.index(item) < 0: if oeb.spine.index(item) < 0:
oeb.spine.add(item, linear=False) oeb.spine.add(item, linear=False)
return return
elif has_toc:
oeb.guide.remove('toc')
else: else:
oeb.guide.remove('toc') oeb.guide.remove('toc')
if not getattr(getattr(oeb, 'toc', False), 'nodes', False): if not has_toc:
return return
oeb.logger.info('Generating in-line TOC...') oeb.logger.info('Generating in-line TOC...')
title = self.title or oeb.translate(DEFAULT_TITLE) title = self.title or oeb.translate(DEFAULT_TITLE)

View File

@ -36,7 +36,9 @@ class RescaleImages(object):
ext = 'JPEG' ext = 'JPEG'
raw = item.data raw = item.data
if not raw: continue if hasattr(raw, 'xpath') or not raw:
# Probably an svg image
continue
try: try:
img = Image() img = Image()
img.load(raw) img.load(raw)

View File

@ -234,6 +234,8 @@ class RTFMLizer(object):
# Process tags that need special processing and that do not have inner # Process tags that need special processing and that do not have inner
# text. Usually these require an argument # text. Usually these require an argument
if tag == 'img': if tag == 'img':
src = elem.get('src')
if src:
src = os.path.basename(elem.get('src')) src = os.path.basename(elem.get('src'))
block_start = '' block_start = ''
block_end = '' block_end = ''

View File

@ -21,7 +21,8 @@ class PluginWidget(Widget, Ui_Form):
def __init__(self, parent, get_option, get_help, db=None, book_id=None): def __init__(self, parent, get_option, get_help, db=None, book_id=None):
Widget.__init__(self, parent, Widget.__init__(self, parent,
['prefer_author_sort', 'rescale_images', 'toc_title', ['prefer_author_sort', 'toc_title',
'mobi_keep_original_images',
'mobi_ignore_margins', 'mobi_toc_at_start', 'mobi_ignore_margins', 'mobi_toc_at_start',
'dont_compress', 'no_inline_toc', 'share_not_sync', 'dont_compress', 'no_inline_toc', 'share_not_sync',
'personal_doc']#, 'mobi_navpoints_only_deepest'] 'personal_doc']#, 'mobi_navpoints_only_deepest']

View File

@ -6,7 +6,7 @@
<rect> <rect>
<x>0</x> <x>0</x>
<y>0</y> <y>0</y>
<width>521</width> <width>588</width>
<height>342</height> <height>342</height>
</rect> </rect>
</property> </property>
@ -14,47 +14,6 @@
<string>Form</string> <string>Form</string>
</property> </property>
<layout class="QGridLayout" name="gridLayout"> <layout class="QGridLayout" name="gridLayout">
<item row="1" column="0">
<widget class="QLabel" name="label">
<property name="text">
<string>&amp;Title for Table of Contents:</string>
</property>
<property name="buddy">
<cstring>opt_toc_title</cstring>
</property>
</widget>
</item>
<item row="1" column="1">
<widget class="QLineEdit" name="opt_toc_title"/>
</item>
<item row="4" column="0" colspan="2">
<widget class="QCheckBox" name="opt_rescale_images">
<property name="text">
<string>Rescale images for &amp;Palm devices</string>
</property>
</widget>
</item>
<item row="5" column="0" colspan="2">
<widget class="QCheckBox" name="opt_prefer_author_sort">
<property name="text">
<string>Use author &amp;sort for author</string>
</property>
</widget>
</item>
<item row="6" column="0">
<widget class="QCheckBox" name="opt_dont_compress">
<property name="text">
<string>Disable compression of the file contents</string>
</property>
</widget>
</item>
<item row="0" column="0">
<widget class="QCheckBox" name="opt_no_inline_toc">
<property name="text">
<string>Do not add Table of Contents to book</string>
</property>
</widget>
</item>
<item row="8" column="0" colspan="2"> <item row="8" column="0" colspan="2">
<widget class="QGroupBox" name="groupBox"> <widget class="QGroupBox" name="groupBox">
<property name="title"> <property name="title">
@ -125,6 +84,47 @@
</property> </property>
</widget> </widget>
</item> </item>
<item row="4" column="0" colspan="2">
<widget class="QCheckBox" name="opt_prefer_author_sort">
<property name="text">
<string>Use author &amp;sort for author</string>
</property>
</widget>
</item>
<item row="1" column="0">
<widget class="QLabel" name="label">
<property name="text">
<string>&amp;Title for Table of Contents:</string>
</property>
<property name="buddy">
<cstring>opt_toc_title</cstring>
</property>
</widget>
</item>
<item row="1" column="1">
<widget class="QLineEdit" name="opt_toc_title"/>
</item>
<item row="6" column="0">
<widget class="QCheckBox" name="opt_dont_compress">
<property name="text">
<string>Disable compression of the file contents</string>
</property>
</widget>
</item>
<item row="0" column="0">
<widget class="QCheckBox" name="opt_no_inline_toc">
<property name="text">
<string>Do not add Table of Contents to book</string>
</property>
</widget>
</item>
<item row="5" column="0" colspan="2">
<widget class="QCheckBox" name="opt_mobi_keep_original_images">
<property name="text">
<string>Do not convert all images to &amp;JPEG (may result in images not working in older viewers)</string>
</property>
</widget>
</item>
</layout> </layout>
</widget> </widget>
<resources/> <resources/>

View File

@ -5,11 +5,14 @@ __license__ = 'GPL v3'
__copyright__ = '2010, Kovid Goyal <kovid@kovidgoyal.net>' __copyright__ = '2010, Kovid Goyal <kovid@kovidgoyal.net>'
__docformat__ = 'restructuredtext en' __docformat__ = 'restructuredtext en'
import sys
from PyQt4.Qt import (Qt, QApplication, QStyle, QIcon, QDoubleSpinBox, from PyQt4.Qt import (Qt, QApplication, QStyle, QIcon, QDoubleSpinBox,
QVariant, QSpinBox, QStyledItemDelegate, QComboBox, QTextDocument, QVariant, QSpinBox, QStyledItemDelegate, QComboBox, QTextDocument,
QAbstractTextDocumentLayout, QFont, QFontInfo) QAbstractTextDocumentLayout, QFont, QFontInfo)
from calibre.gui2 import UNDEFINED_QDATETIME, error_dialog, rating_font from calibre.gui2 import UNDEFINED_QDATETIME, error_dialog, rating_font
from calibre.constants import iswindows
from calibre.gui2.widgets import EnLineEdit from calibre.gui2.widgets import EnLineEdit
from calibre.gui2.complete import MultiCompleteLineEdit, MultiCompleteComboBox from calibre.gui2.complete import MultiCompleteLineEdit, MultiCompleteComboBox
from calibre.utils.date import now, format_date, qt_to_dt from calibre.utils.date import now, format_date, qt_to_dt
@ -27,7 +30,10 @@ class RatingDelegate(QStyledItemDelegate): # {{{
QStyledItemDelegate.__init__(self, *args, **kwargs) QStyledItemDelegate.__init__(self, *args, **kwargs)
self.rf = QFont(rating_font()) self.rf = QFont(rating_font())
self.em = Qt.ElideMiddle self.em = Qt.ElideMiddle
self.rf.setPointSize(QFontInfo(QApplication.font()).pointSize()) delta = 0
if iswindows and sys.getwindowsversion().major >= 6:
delta = 2
self.rf.setPointSize(QFontInfo(QApplication.font()).pointSize()+delta)
def createEditor(self, parent, option, index): def createEditor(self, parent, option, index):
sb = QStyledItemDelegate.createEditor(self, parent, option, index) sb = QStyledItemDelegate.createEditor(self, parent, option, index)

View File

@ -170,7 +170,7 @@
<item row="8" column="0" colspan="2"> <item row="8" column="0" colspan="2">
<widget class="QCheckBox" name="opt_remember_window_size"> <widget class="QCheckBox" name="opt_remember_window_size">
<property name="text"> <property name="text">
<string>Remember last used &amp;window size</string> <string>Remember last used &amp;window size and layout</string>
</property> </property>
</widget> </widget>
</item> </item>

View File

@ -689,7 +689,6 @@ class DocumentView(QWebView): # {{{
self.manager.load_started() self.manager.load_started()
self.loading_url = QUrl.fromLocalFile(path) self.loading_url = QUrl.fromLocalFile(path)
if has_svg: if has_svg:
prints('Rendering as XHTML...')
self.setContent(QByteArray(html.encode(path.encoding)), mt, QUrl.fromLocalFile(path)) self.setContent(QByteArray(html.encode(path.encoding)), mt, QUrl.fromLocalFile(path))
else: else:
self.setHtml(html, self.loading_url) self.setHtml(html, self.loading_url)

View File

@ -36,6 +36,15 @@ class JavaScriptLoader(object):
def __init__(self, dynamic_coffeescript=False): def __init__(self, dynamic_coffeescript=False):
self._dynamic_coffeescript = dynamic_coffeescript self._dynamic_coffeescript = dynamic_coffeescript
if self._dynamic_coffeescript:
try:
from calibre.utils.serve_coffee import compile_coffeescript
compile_coffeescript
except:
self._dynamic_coffeescript = False
print ('WARNING: Failed to load serve_coffee, not compiling '
'coffeescript dynamically.')
self._cache = {} self._cache = {}
self._hp_cache = {} self._hp_cache = {}

View File

@ -15,6 +15,7 @@ from PyQt4.Qt import (QIcon, QFont, QLabel, QListWidget, QAction,
QMenu, QStringListModel, QCompleter, QStringList, QMenu, QStringListModel, QCompleter, QStringList,
QTimer, QRect, QFontDatabase, QGraphicsView) QTimer, QRect, QFontDatabase, QGraphicsView)
from calibre.constants import iswindows
from calibre.gui2 import (NONE, error_dialog, pixmap_to_data, gprefs, from calibre.gui2 import (NONE, error_dialog, pixmap_to_data, gprefs,
warning_dialog) warning_dialog)
from calibre.gui2.filename_pattern_ui import Ui_Form from calibre.gui2.filename_pattern_ui import Ui_Form
@ -365,7 +366,7 @@ class FontFamilyModel(QAbstractListModel): # {{{
self.families = list(qt_families.intersection(set(self.families))) self.families = list(qt_families.intersection(set(self.families)))
self.families.sort() self.families.sort()
self.families[:0] = [_('None')] self.families[:0] = [_('None')]
self.font = QFont('sansserif') self.font = QFont('Verdana' if iswindows else 'sansserif')
def rowCount(self, *args): def rowCount(self, *args):
return len(self.families) return len(self.families)

View File

@ -242,11 +242,18 @@ class PocketBook900(PocketBook):
class iPhone(Device): class iPhone(Device):
name = 'iPad or iPhone/iTouch + Stanza' name = 'iPhone/iTouch'
output_format = 'EPUB' output_format = 'EPUB'
manufacturer = 'Apple' manufacturer = 'Apple'
id = 'iphone' id = 'iphone'
supports_color = True supports_color = True
output_profile = 'ipad'
class iPad(iPhone):
name = 'iPad'
id = 'ipad'
output_profile = 'ipad3'
class Android(Device): class Android(Device):

View File

@ -117,6 +117,9 @@ class Rule(object): # {{{
'lt': ('1', '', ''), 'lt': ('1', '', ''),
'gt': ('', '', '1') 'gt': ('', '', '1')
}[action] }[action]
if col == 'size':
return "cmp(booksize(), %s, '%s', '%s', '%s')" % (val, lt, eq, gt)
else:
return "cmp(raw_field('%s'), %s, '%s', '%s', '%s')" % (col, val, lt, eq, gt) return "cmp(raw_field('%s'), %s, '%s', '%s', '%s')" % (col, val, lt, eq, gt)
def rating_condition(self, col, action, val): def rating_condition(self, col, action, val):

View File

@ -227,6 +227,25 @@ class CustomColumns(object):
return self.conn.get('''SELECT extra FROM %s return self.conn.get('''SELECT extra FROM %s
WHERE book=?'''%lt, (idx,), all=False) WHERE book=?'''%lt, (idx,), all=False)
def get_custom_and_extra(self, idx, label=None, num=None, index_is_id=False):
if label is not None:
data = self.custom_column_label_map[label]
if num is not None:
data = self.custom_column_num_map[num]
idx = idx if index_is_id else self.id(idx)
row = self.data._data[idx]
ans = row[self.FIELD_MAP[data['num']]]
if data['is_multiple'] and data['datatype'] == 'text':
ans = ans.split(data['multiple_seps']['cache_to_list']) if ans else []
if data['display'].get('sort_alpha', False):
ans.sort(cmp=lambda x,y:cmp(x.lower(), y.lower()))
if data['datatype'] != 'series':
return (ans, None)
ign,lt = self.custom_table_names(data['num'])
extra = self.conn.get('''SELECT extra FROM %s
WHERE book=?'''%lt, (idx,), all=False)
return (ans, extra)
# convenience methods for tag editing # convenience methods for tag editing
def get_custom_items_with_ids(self, label=None, num=None): def get_custom_items_with_ids(self, label=None, num=None):
if label is not None: if label is not None:

View File

@ -910,7 +910,15 @@ class LibraryDatabase2(LibraryDatabase, SchemaUpgrade, CustomColumns):
Convenience method to return metadata as a :class:`Metadata` object. Convenience method to return metadata as a :class:`Metadata` object.
Note that the list of formats is not verified. Note that the list of formats is not verified.
''' '''
row = self.data._data[idx] if index_is_id else self.data[idx] idx = idx if index_is_id else self.id(idx)
try:
row = self.data._data[idx]
except:
row = None
if row is None:
raise ValueError('No book with id: %d'%idx)
fm = self.FIELD_MAP fm = self.FIELD_MAP
mi = Metadata(None, template_cache=self.formatter_template_cache) mi = Metadata(None, template_cache=self.formatter_template_cache)
@ -948,14 +956,13 @@ class LibraryDatabase2(LibraryDatabase, SchemaUpgrade, CustomColumns):
mi.book_size = row[fm['size']] mi.book_size = row[fm['size']]
mi.ondevice_col= row[fm['ondevice']] mi.ondevice_col= row[fm['ondevice']]
mi.last_modified = row[fm['last_modified']] mi.last_modified = row[fm['last_modified']]
id = idx if index_is_id else self.id(idx)
formats = row[fm['formats']] formats = row[fm['formats']]
mi.format_metadata = {} mi.format_metadata = {}
if not formats: if not formats:
good_formats = None good_formats = None
else: else:
formats = sorted(formats.split(',')) formats = sorted(formats.split(','))
mi.format_metadata = FormatMetadata(self, id, formats) mi.format_metadata = FormatMetadata(self, idx, formats)
good_formats = FormatsList(formats, mi.format_metadata) good_formats = FormatsList(formats, mi.format_metadata)
mi.formats = good_formats mi.formats = good_formats
tags = row[fm['tags']] tags = row[fm['tags']]
@ -968,19 +975,18 @@ class LibraryDatabase2(LibraryDatabase, SchemaUpgrade, CustomColumns):
if mi.series: if mi.series:
mi.series_index = row[fm['series_index']] mi.series_index = row[fm['series_index']]
mi.rating = row[fm['rating']] mi.rating = row[fm['rating']]
mi.set_identifiers(self.get_identifiers(id, index_is_id=True)) mi.set_identifiers(self.get_identifiers(idx, index_is_id=True))
mi.application_id = id mi.application_id = idx
mi.id = id mi.id = idx
mi.set_all_user_metadata(self.field_metadata.custom_field_metadata())
for key, meta in self.field_metadata.custom_iteritems(): for key, meta in self.field_metadata.custom_iteritems():
mi.set_user_metadata(key, meta)
if meta['datatype'] == 'composite': if meta['datatype'] == 'composite':
mi.set(key, val=row[meta['rec_index']]) mi.set(key, val=row[meta['rec_index']])
else: else:
mi.set(key, val=self.get_custom(idx, label=meta['label'], val, extra = self.get_custom_and_extra(idx, label=meta['label'],
index_is_id=index_is_id), index_is_id=True)
extra=self.get_custom_extra(idx, label=meta['label'], mi.set(key, val=val, extra=extra)
index_is_id=index_is_id))
user_cats = self.prefs['user_categories'] user_cats = self.prefs['user_categories']
user_cat_vals = {} user_cat_vals = {}
@ -999,12 +1005,12 @@ class LibraryDatabase2(LibraryDatabase, SchemaUpgrade, CustomColumns):
if get_cover: if get_cover:
if cover_as_data: if cover_as_data:
cdata = self.cover(id, index_is_id=True) cdata = self.cover(idx, index_is_id=True)
if cdata: if cdata:
mi.cover_data = ('jpeg', cdata) mi.cover_data = ('jpeg', cdata)
else: else:
mi.cover = self.cover(id, index_is_id=True, as_path=True) mi.cover = self.cover(idx, index_is_id=True, as_path=True)
mi.has_cover = _('Yes') if self.has_cover(id) else '' mi.has_cover = _('Yes') if self.has_cover(idx) else ''
return mi return mi
def has_book(self, mi): def has_book(self, mi):

View File

@ -388,6 +388,7 @@ class FieldMetadata(dict):
def __init__(self): def __init__(self):
self._field_metadata = copy.deepcopy(self._field_metadata_prototype) self._field_metadata = copy.deepcopy(self._field_metadata_prototype)
self._tb_cats = OrderedDict() self._tb_cats = OrderedDict()
self._tb_custom_fields = {}
self._search_term_map = {} self._search_term_map = {}
self.custom_label_to_key_map = {} self.custom_label_to_key_map = {}
for k,v in self._field_metadata: for k,v in self._field_metadata:
@ -477,10 +478,8 @@ class FieldMetadata(dict):
yield (key, self._tb_cats[key]) yield (key, self._tb_cats[key])
def custom_iteritems(self): def custom_iteritems(self):
for key in self._tb_cats: for key, meta in self._tb_custom_fields.iteritems():
fm = self._tb_cats[key] yield (key, meta)
if fm['is_custom']:
yield (key, self._tb_cats[key])
def items(self): def items(self):
return list(self.iteritems()) return list(self.iteritems())
@ -516,6 +515,8 @@ class FieldMetadata(dict):
return l return l
def custom_field_metadata(self, include_composites=True): def custom_field_metadata(self, include_composites=True):
if include_composites:
return self._tb_custom_fields
l = {} l = {}
for k in self.custom_field_keys(include_composites): for k in self.custom_field_keys(include_composites):
l[k] = self._tb_cats[k] l[k] = self._tb_cats[k]
@ -537,6 +538,7 @@ class FieldMetadata(dict):
'is_custom':True, 'is_category':is_category, 'is_custom':True, 'is_category':is_category,
'link_column':'value','category_sort':'value', 'link_column':'value','category_sort':'value',
'is_csp' : is_csp, 'is_editable': is_editable,} 'is_csp' : is_csp, 'is_editable': is_editable,}
self._tb_custom_fields[key] = self._tb_cats[key]
self._add_search_terms_to_map(key, [key]) self._add_search_terms_to_map(key, [key])
self.custom_label_to_key_map[label] = key self.custom_label_to_key_map[label] = key
if datatype == 'series': if datatype == 'series':

View File

@ -29,6 +29,7 @@ It can convert every input format in the following list, to every output format.
PRC is a generic format, |app| supports PRC files with TextRead and MOBIBook headers. PRC is a generic format, |app| supports PRC files with TextRead and MOBIBook headers.
PDB is also a generic format. |app| supports eReder, Plucker, PML and zTxt PDB files. PDB is also a generic format. |app| supports eReder, Plucker, PML and zTxt PDB files.
DJVU support is only for converting DJVU files that contain embedded text. These are typically generated by OCR software. DJVU support is only for converting DJVU files that contain embedded text. These are typically generated by OCR software.
MOBI books can be of two types Mobi6 and KF8. |app| currently fully supports Mobi6 and supports conversion from, but not to, KF8
.. _best-source-formats: .. _best-source-formats:

View File

@ -57,7 +57,7 @@ For example, assume you want to use the template::
{series} - {series_index} - {title} {series} - {series_index} - {title}
If the book has no series, the answer will be ``- - title``. Many people would rather the result be simply ``title``, without the hyphens. To do this, use the extended syntax ``{field:|prefix_text|suffix_text}``. When you use this syntax, if field has the value SERIES then the result will be ``prefix_textSERIESsuffix_text``. If field has no value, then the result will be the empty string (nothing); the prefix and suffix are ignored. The prefix and suffix can contain blanks. If the book has no series, the answer will be ``- - title``. Many people would rather the result be simply ``title``, without the hyphens. To do this, use the extended syntax ``{field:|prefix_text|suffix_text}``. When you use this syntax, if field has the value SERIES then the result will be ``prefix_textSERIESsuffix_text``. If field has no value, then the result will be the empty string (nothing); the prefix and suffix are ignored. The prefix and suffix can contain blanks. **Do not use subtemplates (`{ ... }`) or functions (see below) as the prefix or the suffix.**
Using this syntax, we can solve the above series problem with the template:: Using this syntax, we can solve the above series problem with the template::
@ -112,7 +112,7 @@ Functions are always applied before format specifications. See further down for
The syntax for using functions is ``{field:function(arguments)}``, or ``{field:function(arguments)|prefix|suffix}``. Arguments are separated by commas. Commas inside arguments must be preceeded by a backslash ( '\\' ). The last (or only) argument cannot contain a closing parenthesis ( ')' ). Functions return the value of the field used in the template, suitably modified. The syntax for using functions is ``{field:function(arguments)}``, or ``{field:function(arguments)|prefix|suffix}``. Arguments are separated by commas. Commas inside arguments must be preceeded by a backslash ( '\\' ). The last (or only) argument cannot contain a closing parenthesis ( ')' ). Functions return the value of the field used in the template, suitably modified.
If you have programming experience, please note that the syntax in this mode (single function) is not what you might expect. Strings are not quoted. Spaces are significant. All arguments must be constants; there is no sub-evaluation. **Do not use subtemplates (`{ ... }`) as function arguments.** Instead, use :ref:`template program mode <template_mode>` and :ref:`general program mode <general_mode>`. Important: If you have programming experience, please note that the syntax in this mode (single function) is not what you might expect. Strings are not quoted. Spaces are significant. All arguments must be constants; there is no sub-evaluation. **Do not use subtemplates (`{ ... }`) as function arguments.** Instead, use :ref:`template program mode <template_mode>` and :ref:`general program mode <general_mode>`.
Many functions use regular expressions. In all cases, regular expression matching is case-insensitive. Many functions use regular expressions. In all cases, regular expression matching is case-insensitive.
@ -245,6 +245,7 @@ The following functions are available in addition to those described in single-f
* ``current_library_name() -- `` return the last name on the path to the current calibre library. This function can be called in template program mode using the template ``{:'current_library_name()'}``. * ``current_library_name() -- `` return the last name on the path to the current calibre library. This function can be called in template program mode using the template ``{:'current_library_name()'}``.
* ``days_between(date1, date2)`` -- return the number of days between ``date1`` and ``date2``. The number is positive if ``date1`` is greater than ``date2``, otherwise negative. If either ``date1`` or ``date2`` are not dates, the function returns the empty string. * ``days_between(date1, date2)`` -- return the number of days between ``date1`` and ``date2``. The number is positive if ``date1`` is greater than ``date2``, otherwise negative. If either ``date1`` or ``date2`` are not dates, the function returns the empty string.
* ``divide(x, y)`` -- returns x / y. Throws an exception if either x or y are not numbers. * ``divide(x, y)`` -- returns x / y. Throws an exception if either x or y are not numbers.
* ``eval(string)`` -- evaluates the string as a program, passing the local variables (those ``assign`` ed to). This permits using the template processor to construct complex results from local variables.
* ``field(name)`` -- returns the metadata field named by ``name``. * ``field(name)`` -- returns the metadata field named by ``name``.
* ``first_non_empty(value, value, ...)`` -- returns the first value that is not empty. If all values are empty, then the empty value is returned. You can have as many values as you want. * ``first_non_empty(value, value, ...)`` -- returns the first value that is not empty. If all values are empty, then the empty value is returned. You can have as many values as you want.
* ``format_date(x, date_format)`` -- format_date(val, format_string) -- format the value, which must be a date field, using the format_string, returning a string. The formatting codes are:: * ``format_date(x, date_format)`` -- format_date(val, format_string) -- format the value, which must be a date field, using the format_string, returning a string. The formatting codes are::
@ -269,7 +270,19 @@ The following functions are available in addition to those described in single-f
AP : use a 12-hour clock instead of a 24-hour clock, with 'AP' replaced by the localized string for AM or PM. AP : use a 12-hour clock instead of a 24-hour clock, with 'AP' replaced by the localized string for AM or PM.
iso : the date with time and timezone. Must be the only format present. iso : the date with time and timezone. Must be the only format present.
* ``eval(string)`` -- evaluates the string as a program, passing the local variables (those ``assign`` ed to). This permits using the template processor to construct complex results from local variables. * finish_formatting(val, fmt, prefix, suffix) -- apply the format, prefix, and suffix to a value in the same way as done in a template like ``{series_index:05.2f| - |- }``. This function is provided to ease conversion of complex single-function- or template-program-mode templates to :ref:`general program mode <general_mode>` (see below) to take advantage of GPM template compilation. For example, the following program produces the same output as the above template::
program: finish_formatting(field("series_index"), "05.2f", " - ", " - ")
Another example: for the template ``{series:re(([^\s])[^\s]+(\s|$),\1)}{series_index:0>2s| - | - }{title}`` use::
program:
strcat(
re(field('series'), '([^\s])[^\s]+(\s|$)', '\1'),
finish_formatting(field('series_index'), '0>2s', ' - ', ' - '),
field('title')
)
* ``formats_modtimes(date_format)`` -- return a comma-separated list of colon_separated items representing modification times for the formats of a book. The date_format parameter specifies how the date is to be formatted. See the date_format function for details. You can use the select function to get the mod time for a specific format. Note that format names are always uppercase, as in EPUB. * ``formats_modtimes(date_format)`` -- return a comma-separated list of colon_separated items representing modification times for the formats of a book. The date_format parameter specifies how the date is to be formatted. See the date_format function for details. You can use the select function to get the mod time for a specific format. Note that format names are always uppercase, as in EPUB.
* ``formats_sizes()`` -- return a comma-separated list of colon_separated items representing sizes in bytes of the formats of a book. You can use the select function to get the size for a specific format. Note that format names are always uppercase, as in EPUB. * ``formats_sizes()`` -- return a comma-separated list of colon_separated items representing sizes in bytes of the formats of a book. You can use the select function to get the size for a specific format. Note that format names are always uppercase, as in EPUB.
* ``has_cover()`` -- return ``Yes`` if the book has a cover, otherwise return the empty string * ``has_cover()`` -- return ``Yes`` if the book has a cover, otherwise return the empty string
@ -312,7 +325,7 @@ Using general program mode
For more complicated template programs, it is sometimes easier to avoid template syntax (all the `{` and `}` characters), instead writing a more classical-looking program. You can do this in |app| by beginning the template with `program:`. In this case, no template processing is done. The special variable `$` is not set. It is up to your program to produce the correct results. For more complicated template programs, it is sometimes easier to avoid template syntax (all the `{` and `}` characters), instead writing a more classical-looking program. You can do this in |app| by beginning the template with `program:`. In this case, no template processing is done. The special variable `$` is not set. It is up to your program to produce the correct results.
One advantage of `program:` mode is that the brackets are no longer special. For example, it is not necessary to use `[[` and `]]` when using the `template()` function. One advantage of `program:` mode is that the brackets are no longer special. For example, it is not necessary to use `[[` and `]]` when using the `template()` function. Another advantage is that program mode templates are compiled to Python and can run much faster than templates in the other two modes. Speed improvement depends on the complexity of the templates; the more complicated the template the more the improvement. Compilation is turned off or on using the tweak ``compile_gpm_templates`` (Compile General Program Mode templates to Python). The main reason to turn off compilation is if a compiled template does not work, in which case please file a bug report.
The following example is a `program:` mode implementation of a recipe on the MobileRead forum: "Put series into the title, using either initials or a shortened form. Strip leading articles from the series name (any)." For example, for the book The Two Towers in the Lord of the Rings series, the recipe gives `LotR [02] The Two Towers`. Using standard templates, the recipe requires three custom columns and a plugboard, as explained in the following: The following example is a `program:` mode implementation of a recipe on the MobileRead forum: "Put series into the title, using either initials or a shortened form. Strip leading articles from the series name (any)." For example, for the book The Two Towers in the Lord of the Rings series, the recipe gives `LotR [02] The Two Towers`. Using standard templates, the recipe requires three custom columns and a plugboard, as explained in the following:

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

19613
src/calibre/translations/cy.po Normal file

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

Some files were not shown because too many files have changed in this diff Show More