Sync to trunk.

This commit is contained in:
John Schember 2011-10-19 19:39:05 -04:00
commit c58302f6a7
171 changed files with 33995 additions and 27526 deletions

View File

@ -19,6 +19,94 @@
# new recipes:
# - title:
- version: 0.8.22
date: 2011-10-14
new features:
- title: "Input plugin for OCR-ed DJVU files (i.e. .djvu files that contain text. Only the text is converted)"
type: major
- title: "Driver for the SONY PRS T1"
- title: "Add a 'Back' button to the metadata download dialog while downloading covers, so that you can go back and select a different match if you dont lke the covers, instead of having to re-do the entire download."
tickets: [855055]
- title: "Add an option in Preferences->Saving to disk to not show files in file browser after saving to disk"
- title: "Get Books: Add the amazon.fr store. Remove leading 'by' from author names. Fix encoding issues with non English titles/names"
- title: "Driver for Onyx BOOX A61S/X61S"
tickets: [872741]
- title: "Kobo: Add support for uploading new covers to the device without converting the ePub. You can just resend the book to have the cover updated"
- title: "Make it a little harder to ignore the fact that there are multiple toolbars when customizing toolbars"
tickets: [864589]
bug fixes:
- title: "MOBI Input: Remove invalid tags of the form <xyz: >"
tickets: [872883]
- title: "calibredb add_format does not refresh running calibre instance"
tickets: [872961]
- title: "Conversion pipeline: Translate <font face> to CSS font-family"
tickets: [871388]
- title: "When sending email add a Date: header so that amavis does not consider the emails to be spam"
- title: "Fix for the problem where setting the restriction to an empty current search clears the restriction box but does not clear the restriction."
tickets: [871921]
- title: "Fix generation of column coloring rules for date/time columns"
- title: "Fix plugboard problem where customizations to formats accepted by a device were ignored."
- title: "Enable adding of various actions to the toolbar when device is connected (they had been erroneously marked as being non-addable)"
- title: "Fixable content in library check is not hidden after repair"
tickets: [864096]
- title: "Catalog generation: Handle a corrupted thumbnail cache."
- title: "Do not error out when user clicks stop selected job with no job selected."
tickets: [863766]
improved recipes:
- automatiseringgids
- CNET
- Geek and Poke
- Gosc Niedzielny
- Dilbert
- Economist
- Ming Pao
- Metro UK
- Heise Online
- FAZ.net
- Houston Chronicle
- Slate
- Descopera
new recipes:
- title: WoW Insider
author: Krittika Goyal
- title: Merco Press and Penguin news
author: Russell Phillips
- title: Defense News
author: Darko Miletic
- title: Revista Piaui
author: Eduardo Simoes
- title: Dark Horizons
author: Jaded
- title: Various polish news sources
author: fenuks
- version: 0.8.21
date: 2011-09-30

71
recipes/20minutes.recipe Normal file
View File

@ -0,0 +1,71 @@
# -*- coding: utf-8 -*-
__license__ = 'GPL v3'
__copyright__ = '2011 Aurélien Chabot <contact@aurelienchabot.fr>'
'''
20minutes.fr
'''
import re
from calibre.web.feeds.recipes import BasicNewsRecipe
class Minutes(BasicNewsRecipe):
title = '20 minutes'
__author__ = 'calibre'
description = 'Actualités'
encoding = 'cp1252'
publisher = '20minutes.fr'
category = 'Actualités, France, Monde'
language = 'fr'
use_embedded_content = False
timefmt = ' [%d %b %Y]'
max_articles_per_feed = 15
no_stylesheets = True
remove_empty_feeds = True
filterDuplicates = True
extra_css = '''
h1 {font-size:xx-large; font-family:Arial,Helvetica,sans-serif;}
.mna-details {font-size:xx-small; color:#4D4D4D; font-family:Arial,Helvetica,sans-serif;}
.mna-image {font-size:xx-small; color:#4D4D4D; font-family:Arial,Helvetica,sans-serif;}
.mna-body {font-size:medium; font-family:Arial,Helvetica,sans-serif;}
'''
remove_tags = [
dict(name='iframe'),
dict(name='div', attrs={'class':['mn-section-heading']}),
dict(name='a', attrs={'href':['#commentaires']}),
dict(name='div', attrs={'class':['mn-right']}),
dict(name='div', attrs={'class':['mna-box']}),
dict(name='div', attrs={'class':['mna-comment-call']}),
dict(name='div', attrs={'class':['mna-tools']}),
dict(name='div', attrs={'class':['mn-trilist']})
]
keep_only_tags = [dict(id='mn-article')]
remove_tags_after = dict(name='div', attrs={'class':['mna-body','mna-signature']})
feeds = [
('France', 'http://www.20minutes.fr/rss/actu-france.xml'),
('International', 'http://www.20minutes.fr/rss/monde.xml'),
('Tech/Web', 'http://www.20minutes.fr/rss/hightech.xml'),
('Sciences', 'http://www.20minutes.fr/rss/sciences.xml'),
('Economie', 'http://www.20minutes.fr/rss/economie.xml'),
('Politique', 'http://www.20minutes.fr/rss/politique.xml'),
(u'Médias', 'http://www.20minutes.fr/rss/media.xml'),
('Cinema', 'http://www.20minutes.fr/rss/cinema.xml'),
('People', 'http://www.20minutes.fr/rss/people.xml'),
('Culture', 'http://www.20minutes.fr/rss/culture.xml'),
('Sport', 'http://www.20minutes.fr/rss/sport.xml'),
('Paris', 'http://www.20minutes.fr/rss/paris.xml'),
('Lyon', 'http://www.20minutes.fr/rss/lyon.xml'),
('Toulouse', 'http://www.20minutes.fr/rss/toulouse.xml')
]
def preprocess_html(self, soup):
for item in soup.findAll(style=True):
del item['style']
return soup

View File

@ -10,27 +10,15 @@ class autogids(BasicNewsRecipe):
publisher = 'AutomatiseringGids'
category = 'Nieuws, IT, Nederlandstalig'
simultaneous_downloads = 5
#delay = 1
timefmt = ' [%A, %d %B, %Y]'
#timefmt = ''
timefmt = ' [%a, %d %B, %Y]'
no_stylesheets = True
remove_javascript = True
remove_empty_feeds = True
publication_type = 'newspaper'
encoding = 'utf-8'
cover_url = 'http://www.automatiseringgids.nl/siteimg/header_logo.gif'
keep_only_tags = [dict(id=['content'])]
extra_css = '.artikelheader {font-size:0.8em; color: #666;} .artikelintro {font-weight:bold} div.imgArticle {float: right; margin: 0 0em 1em 1em; display: block; position: relative; } \
h2 { margin: 0 0 0.5em; min-height: 30px; font-size: 1.5em; letter-spacing: -0.2px; margin: 0 0 0.5em; color: black; font-weight: bold; line-height: 1.2em; padding: 4px 3px 0; }'
cover_url = 'http://www.automatiseringgids.nl/binaries/content/gallery/ag/marketing/ag-avatar-100x50.jpg'
keep_only_tags = [dict(name='div', attrs={'class':['content']})]
remove_tags = [dict(name='div', attrs={'id':['loginbox','reactiecollapsible','reactiebox']}),
dict(name='div', attrs={'class':['column_a','column_c','bannerfullsize','reactieheader','reactiecollapsible','formulier','artikel_headeroptions']}),
dict(name='ul', attrs={'class':['highlightlist']}),
dict(name='input', attrs={'type':['button']}),
dict(name='div', attrs={'style':['display:block; width:428px; height:30px; float:left;']}),
]
preprocess_regexps = [
(re.compile(r'(<h3>Reacties</h3>|<h2>Zie ook:</h2>|<div style=".*</div>|<a[^>]*>|</a>)', re.DOTALL|re.IGNORECASE),
lambda match: ''),

View File

@ -110,8 +110,10 @@ class BrandEins(BasicNewsRecipe):
selected_issue = issue_map[selected_issue_key]
url = selected_issue.get('href', False)
# Get the title for the magazin - build it out of the title of the cover - take the issue and year;
self.title = "brand eins " + selected_issue_key[4:] + "/" + selected_issue_key[0:4]
# self.title = "brand eins " + selected_issue_key[4:] + "/" + selected_issue_key[0:4]
# Get the alternative title for the magazin - build it out of the title of the cover - without the issue and year;
url = 'http://brandeins.de/'+url
self.timefmt = ' ' + selected_issue_key[4:] + '/' + selected_issue_key[:4]
# url = "http://www.brandeins.de/archiv/magazin/tierisch.html"
titles_and_articles = self.brand_eins_parse_issue(url)
@ -163,4 +165,3 @@ class BrandEins(BasicNewsRecipe):
current_articles.append({'title': title, 'url': url, 'description': description, 'date':''})
titles_and_articles.append([chapter_title, current_articles])
return titles_and_articles

View File

@ -5,8 +5,8 @@ __copyright__ = '2009, Darko Miletic <darko.miletic at gmail.com>'
Changelog:
2011-09-24
Changed cover (drMerry)
'''
'''
2011-10-13
Updated Cover (drMerry)
news.cnet.com
'''
@ -24,7 +24,7 @@ class CnetNews(BasicNewsRecipe):
encoding = 'cp1252'
use_embedded_content = False
language = 'en'
cover_url = 'http://reviews.cnet.com/i/ff/wp/logo_cnet.gif'
conversion_options = {
'comment' : description
, 'tags' : category

View File

@ -22,6 +22,14 @@ class CNN(BasicNewsRecipe):
#match_regexps = [r'http://sportsillustrated.cnn.com/.*/[1-9].html']
max_articles_per_feed = 25
extra_css = '''
h1 {font-size:xx-large; font-family:Arial,Helvetica,sans-serif;}
.cnn_story_author, .cnn_stryathrtmp {font-size:xx-small; color:#4D4D4D; font-family:Arial,Helvetica,sans-serif;}
.cnn_strycaptiontxt, .cnnArticleGalleryPhotoContainer {font-size:xx-small; color:#4D4D4D; font-family:Arial,Helvetica,sans-serif;}
.cnn_strycbftrtxt, .cnnEditorialNote {font-size:xx-small; color:#4D4D4D; font-family:Arial,Helvetica,sans-serif;}
.cnn_strycntntlft {font-size:medium; font-family:Arial,Helvetica,sans-serif;}
'''
preprocess_regexps = [
(re.compile(r'<!--\[if.*if\]-->', re.DOTALL), lambda m: ''),
(re.compile(r'<script.*?</script>', re.DOTALL), lambda m: ''),
@ -32,7 +40,12 @@ class CNN(BasicNewsRecipe):
remove_tags = [
{'class':['cnn_strybtntools', 'cnn_strylftcntnt',
'cnn_strybtntools', 'cnn_strybtntoolsbttm', 'cnn_strybtmcntnt',
'cnn_strycntntrgt', 'hed_side', 'foot']},
'cnn_strycntntrgt', 'hed_side', 'foot', 'cnn_strylftcntnt cnn_strylftcexpbx']},
{'class':['cnn_html_media_title_new', 'cnn_html_media_title_new cnn_html_media_title_none',
'cnnArticleGalleryCaptionControlText', 'articleGalleryNavContainer']},
{'id':['articleGalleryNav00JumpPrev', 'articleGalleryNav00Prev',
'articleGalleryNav00Next', 'articleGalleryNav00JumpNext']},
{'style':['display:none']},
dict(id=['ie_column']),
]
@ -58,3 +71,12 @@ class CNN(BasicNewsRecipe):
ans = BasicNewsRecipe.get_article_url(self, article)
return ans.partition('?')[0]
def get_masthead_url(self):
masthead = 'http://i.cdn.turner.com/cnn/.element/img/3.0/global/header/intl/hdr-globe-central.gif'
br = BasicNewsRecipe.get_browser()
try:
br.open(masthead)
except:
self.log("\nCover unavailable")
masthead = None
return masthead

View File

@ -8,11 +8,7 @@ class DallasNews(BasicNewsRecipe):
no_stylesheets = True
use_embedded_content = False
remove_tags_before = dict(name='h1')
keep_only_tags = {'class':lambda x: x and 'article' in x}
remove_tags = [
{'class':['DMNSocialTools', 'article ', 'article first ', 'article premium']},
]
auto_cleanup = True
feeds = [
('Local News',

View File

@ -0,0 +1,62 @@
__license__ = 'GPL v3'
__copyright__ = '2011, Darko Miletic <darko.miletic at gmail.com>'
'''
www.defensenews.com
'''
from calibre.web.feeds.news import BasicNewsRecipe
class DefenseNews(BasicNewsRecipe):
title = 'Defense News'
__author__ = 'Darko Miletic'
description = 'Find late-breaking defense news from the leading defense news weekly'
publisher = 'Gannett Government Media Corporation'
category = 'defense news, defence news, defense, defence, defence budget, defence policy'
oldest_article = 31
max_articles_per_feed = 200
no_stylesheets = True
encoding = 'utf8'
use_embedded_content = False
language = 'en'
remove_empty_feeds = True
publication_type = 'newspaper'
masthead_url = 'http://www.defensenews.com/images/logo_defensenews2.jpg'
extra_css = """
body{font-family: Arial,Helvetica,sans-serif }
img{margin-bottom: 0.4em; display:block}
.info{font-size: small; color: gray}
"""
conversion_options = {
'comment' : description
, 'tags' : category
, 'publisher' : publisher
, 'language' : language
}
remove_tags = [
dict(name=['meta','link'])
,dict(attrs={'class':['toolbar','related','left','right']})
]
remove_tags_before = attrs={'class':'storyWrp'}
remove_tags_after = attrs={'class':'middle'}
remove_attributes=['lang']
feeds = [
(u'Europe' , u'http://www.defensenews.com/rss/eur/' )
,(u'Americas', u'http://www.defensenews.com/rss/ame/' )
,(u'Asia & Pacific rim', u'http://www.defensenews.com/rss/asi/' )
,(u'Middle east & Africa', u'http://www.defensenews.com/rss/mid/')
,(u'Air', u'http://www.defensenews.com/rss/air/' )
,(u'Land', u'http://www.defensenews.com/rss/lan/' )
,(u'Naval', u'http://www.defensenews.com/rss/sea/' )
]
def preprocess_html(self, soup):
for item in soup.findAll(style=True):
del item['style']
for item in soup.findAll('img'):
if not item.has_key('alt'):
item['alt'] = 'image'
return soup

View File

@ -2,6 +2,7 @@ __license__ = 'GPL v3'
__copyright__ = '2009, Darko Miletic <darko.miletic at gmail.com>'
'''
http://www.dilbert.com
DrMerry added cover Image 2011-11-12
'''
from calibre.web.feeds.recipes import BasicNewsRecipe
@ -9,7 +10,7 @@ import re
class DilbertBig(BasicNewsRecipe):
title = 'Dilbert'
__author__ = 'Darko Miletic and Starson17'
__author__ = 'Darko Miletic and Starson17 contribution of DrMerry'
description = 'Dilbert'
reverse_article_order = True
oldest_article = 15
@ -20,6 +21,7 @@ class DilbertBig(BasicNewsRecipe):
publisher = 'UNITED FEATURE SYNDICATE, INC.'
category = 'comic'
language = 'en'
cover_url = 'http://dilbert.com/mobile/mobile/dilbert.app.icon.png'
conversion_options = {
'comments' : description

View File

@ -22,8 +22,6 @@ class Economist(BasicNewsRecipe):
' perspective. Best downloaded on Friday mornings (GMT)')
extra_css = '.headline {font-size: x-large;} \n h2 { font-size: small; } \n h1 { font-size: medium; }'
oldest_article = 7.0
cover_url = 'http://media.economist.com/sites/default/files/imagecache/print-cover-thumbnail/print-covers/currentcoverus_large.jpg'
#cover_url = 'http://www.economist.com/images/covers/currentcoverus_large.jpg'
remove_tags = [
dict(name=['script', 'noscript', 'title', 'iframe', 'cf_floatingcontent']),
dict(attrs={'class':['dblClkTrk', 'ec-article-info',
@ -56,6 +54,14 @@ class Economist(BasicNewsRecipe):
return br
'''
def get_cover_url(self):
br = self.browser
br.open(self.INDEX)
issue = br.geturl().split('/')[4]
self.log('Fetching cover for issue: %s'%issue)
cover_url = "http://media.economist.com/sites/default/files/imagecache/print-cover-full/print-covers/%s_CNA400.jpg" %(issue.translate(None,'-'))
return cover_url
def parse_index(self):
return self.economist_parse_index()

View File

@ -22,8 +22,6 @@ class Economist(BasicNewsRecipe):
' perspective. Best downloaded on Friday mornings (GMT)')
extra_css = '.headline {font-size: x-large;} \n h2 { font-size: small; } \n h1 { font-size: medium; }'
oldest_article = 7.0
cover_url = 'http://media.economist.com/sites/default/files/imagecache/print-cover-thumbnail/print-covers/currentcoverus_large.jpg'
#cover_url = 'http://www.economist.com/images/covers/currentcoverus_large.jpg'
remove_tags = [
dict(name=['script', 'noscript', 'title', 'iframe', 'cf_floatingcontent']),
dict(attrs={'class':['dblClkTrk', 'ec-article-info',
@ -40,6 +38,14 @@ class Economist(BasicNewsRecipe):
# downloaded with connection reset by peer (104) errors.
delay = 1
def get_cover_url(self):
br = self.browser
br.open(self.INDEX)
issue = br.geturl().split('/')[4]
self.log('Fetching cover for issue: %s'%issue)
cover_url = "http://media.economist.com/sites/default/files/imagecache/print-cover-full/print-covers/%s_CNA400.jpg" %(issue.translate(None,'-'))
return cover_url
def parse_index(self):
try:

View File

@ -0,0 +1,58 @@
from calibre.web.feeds.recipes import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup, BeautifulStoneSoup
class Ekathimerini(BasicNewsRecipe):
title = 'ekathimerini'
__author__ = 'Thomas Scholl'
description = 'News from Greece, English edition'
masthead_url = 'http://wwk.kathimerini.gr/webadmin/EnglishNew/gifs/logo.gif'
max_articles_per_feed = 100
oldest_article = 100
publisher = 'Kathimerini'
category = 'news, GR'
language = 'en_GR'
encoding = 'windows-1253'
conversion_options = { 'linearize_tables': True}
no_stylesheets = True
delay = 1
keep_only_tags = [dict(name='td', attrs={'class':'news'})]
rss_url = 'http://ws.kathimerini.gr/xml_files/latestnews.xml'
def find_articles(self, idx, category):
for article in idx.findAll('item'):
cat = u''
cat_elem = article.find('subcat')
if cat_elem:
cat = self.tag_to_string(cat_elem)
if cat == category:
desc_html = self.tag_to_string(article.find('description'))
description = self.tag_to_string(BeautifulSoup(desc_html))
a = {
'title': self.tag_to_string(article.find('title')),
'url': self.tag_to_string(article.find('link')),
'description': description,
'date' : self.tag_to_string(article.find('pubdate')),
}
yield a
def parse_index(self):
idx_contents = self.browser.open(self.rss_url).read()
idx = BeautifulStoneSoup(idx_contents, convertEntities=BeautifulStoneSoup.XML_ENTITIES)
cats = list(set([self.tag_to_string(subcat) for subcat in idx.findAll('subcat')]))
cats.sort()
feeds = [(u'News',list(self.find_articles(idx, u'')))]
for cat in cats:
feeds.append((cat.capitalize(), list(self.find_articles(idx, cat))))
return feeds
def print_version(self, url):
return url.replace('http://www.ekathimerini.com/4dcgi/', 'http://www.ekathimerini.com/4Dcgi/4dcgi/')

View File

@ -33,7 +33,7 @@ class ElPais(BasicNewsRecipe):
remove_javascript = True
no_stylesheets = True
keep_only_tags = [ dict(name='div', attrs={'class':['cabecera_noticia_reportaje estirar','cabecera_noticia_opinion estirar','cabecera_noticia estirar','contenido_noticia','caja_despiece']})]
keep_only_tags = [ dict(name='div', attrs={'class':['cabecera_noticia_reportaje estirar','cabecera_noticia_opinion estirar','cabecera_noticia estirar','contenido_noticia','cuerpo_noticia','caja_despiece']})]
extra_css = ' p{text-align: justify; font-size: 100%} body{ text-align: left; font-family: serif; font-size: 100% } h1{ font-family: sans-serif; font-size:200%; font-weight: bolder; text-align: justify; } h2{ font-family: sans-serif; font-size:150%; font-weight: 500; text-align: justify } h3{ font-family: sans-serif; font-size:125%; font-weight: 500; text-align: justify } img{margin-bottom: 0.4em} '

8
recipes/frandroid.recipe Normal file
View File

@ -0,0 +1,8 @@
# -*- coding: utf-8 -*-
class BasicUserRecipe1318572550(AutomaticNewsRecipe):
title = u'FrAndroid'
oldest_article = 2
max_articles_per_feed = 100
auto_cleanup = True
feeds = [(u'FrAndroid', u'http://feeds.feedburner.com/Frandroid')]

View File

@ -16,7 +16,7 @@ class FTDe(BasicNewsRecipe):
use_embedded_content = False
timefmt = ' [%d %b %Y]'
language = 'de'
max_articles_per_feed = 40
max_articles_per_feed = 30
no_stylesheets = True
remove_tags = [dict(id='navi_top'),
@ -84,18 +84,18 @@ class FTDe(BasicNewsRecipe):
dict(name='div', attrs={'class':'artikelsplitfaq'})]
#remove_tags_after = [dict(name='a', attrs={'class':'more'})]
feeds = [ ('Finanzen', 'http://www.ftd.de/rss2/finanzen/maerkte'),
('Meinungshungrige', 'http://www.ftd.de/rss2/meinungshungrige'),
('Unternehmen', 'http://www.ftd.de/rss2/unternehmen'),
('Politik', 'http://www.ftd.de/rss2/politik'),
('Karriere_Management', 'http://www.ftd.de/rss2/karriere-management'),
('IT_Medien', 'http://www.ftd.de/rss2/it-medien'),
('Wissen', 'http://www.ftd.de/rss2/wissen'),
('Sport', 'http://www.ftd.de/rss2/sport'),
('Auto', 'http://www.ftd.de/rss2/auto'),
('Lifestyle', 'http://www.ftd.de/rss2/lifestyle')
]
feeds = [
('Unternehmen', 'http://www.ftd.de/rss2/unternehmen'),
('Finanzen', 'http://www.ftd.de/rss2/finanzen/maerkte'),
('Meinungen', 'http://www.ftd.de/rss2/meinungshungrige'),
('Politik', 'http://www.ftd.de/rss2/politik'),
('Management & Karriere', 'http://www.ftd.de/rss2/karriere-management'),
('IT & Medien', 'http://www.ftd.de/rss2/it-medien'),
('Wissen', 'http://www.ftd.de/rss2/wissen'),
('Sport', 'http://www.ftd.de/rss2/sport'),
('Auto', 'http://www.ftd.de/rss2/auto'),
('Lifestyle', 'http://www.ftd.de/rss2/lifestyle')
]
def print_version(self, url):

View File

@ -1,35 +1,82 @@
#!/usr/bin/python
from calibre.web.feeds.news import BasicNewsRecipe
import re
from calibre.utils.magick import Image, create_canvas
class AdvancedUserRecipe1307556816(BasicNewsRecipe):
title = u'Geek and Poke'
__author__ = u'DrMerry'
description = u'Geek and Poke Cartoons'
publisher = u'Oliver Widder'
author = u'Oliver Widder, DrMerry (calibre-code), calibre'
oldest_article = 31
max_articles_per_feed = 100
language = u'en'
simultaneous_downloads = 5
simultaneous_downloads = 1
#delay = 1
timefmt = ' [%A, %d %B, %Y]'
timefmt = ' [%a, %d %B, %Y]'
summary_length = -1
no_stylesheets = True
category = 'News.IT, Cartoon, Humor, Geek'
use_embedded_content = False
cover_url = 'http://geekandpoke.typepad.com/aboutcoders.jpeg'
remove_javascript = True
remove_empty_feeds = True
publication_type = 'blog'
masthead_url = None
conversion_options = {
'comments' : ''
,'tags' : category
,'language' : language
,'publisher' : publisher
,'author' : author
}
preprocess_regexps = [ (re.compile(r'(<p>&nbsp;</p>|<iframe.*</iframe>|<a[^>]*>Tweet</a>|<a[^>]*>|</a>)', re.DOTALL|re.IGNORECASE),lambda match: ''),
(re.compile(r'(&nbsp;| )', re.DOTALL|re.IGNORECASE),lambda match: ' '),
(re.compile(r'<br( /)?>(<br( /)?>)+', re.DOTALL|re.IGNORECASE),lambda match: '<br>')
]
remove_tags_before = dict(name='p', attrs={'class':'content-nav'})
remove_tags_after = dict(name='div', attrs={'class':'entry-content'})
remove_tags = [dict(name='div', attrs={'class':'entry-footer'}),
dict(name='div', attrs={'id':'alpha'}),
dict(name='div', attrs={'id':'gamma'}),
dict(name='iframe'),
dict(name='p', attrs={'class':'content-nav'})]
extra_css = 'body, h3, p, h2, h1, div, span{margin:0px} h2.date-header {font-size: 0.7em; color:#eee;} h3.entry-header{font-size: 1.0em} div.entry-body{font-size: 0.9em}'
filter_regexps = [(r'feedburner\.com'),
(r'pixel.quantserve\.com'),
(r'googlesyndication\.com'),
(r'yimg\.com'),
(r'scorecardresearch\.com')]
preprocess_regexps = [(re.compile(r'(<p>(&nbsp;|\s)*</p>|<a[^>]*>Tweet</a>|<a[^>]*>|</a>|<!--.*?-->|<h2[^>]*>[^<]*</h2>[^<]*)', re.DOTALL|re.IGNORECASE),lambda match: ''),
(re.compile(r'(&nbsp;|\s\s)+\s*', re.DOTALL|re.IGNORECASE),lambda match: ' '),
(re.compile(r'(<h3[^>]*>)<a[^>]>((?!</a)*)</a></h3>', re.DOTALL|re.IGNORECASE),lambda match: match.group(1) + match.group(2) + '</h3>'),
(re.compile(r'(<img[^>]*alt="([^"]*)"[^>]*>)', re.DOTALL|re.IGNORECASE),lambda match: '<div id="merryImage"><cite>' + match.group(2) + '</cite><br>' + match.group(1) + '</div>'),
(re.compile(r'<br( /)?>(<br( /)?>)+', re.DOTALL|re.IGNORECASE),lambda match: '<br>'),
]
remove_tags_before = dict(name='h2', attrs={'class':'date-header'})
remove_tags_after = dict(name='div', attrs={'class':'entry-body'})
extra_css = 'body, h3, p, div, span{margin:0px; padding:0px} h3.entry-header{font-size: 0.8em} div.entry-body{font-size: 0.7em}'
def postprocess_html(self, soup, first):
for tag in soup.findAll(lambda tag: tag.name.lower()=='img' and tag.has_key('src')):
iurl = tag['src']
img = Image()
img.open(iurl)
#width, height = img.size
#print '***img is: ', iurl, '\n****width is: ', width, 'height is: ', height
img.trim(0)
#width, height = img.size
#print '***TRIMMED img width is: ', width, 'height is: ', height
left=0
top=0
border_color='#ffffff'
width, height = img.size
#print '***retrieved img width is: ', width, 'height is: ', height
height_correction = 1.17
canvas = create_canvas(width, height*height_correction,border_color)
canvas.compose(img, left, top)
#img = canvas
#img.save(iurl)
canvas.save(iurl)
#width, height = canvas.size
#print '***NEW img width is: ', width, 'height is: ', height
return soup
feeds = [(u'Geek and Poke', u'http://feeds.feedburner.com/GeekAndPoke?format=xml')]
feeds = ['http://feeds.feedburner.com/GeekAndPoke?format=xml']

View File

@ -0,0 +1,8 @@
# -*- coding: utf-8 -*-
class BasicUserRecipe1318572445(AutomaticNewsRecipe):
title = u'Google Mobile Blog'
oldest_article = 7
max_articles_per_feed = 100
auto_cleanup = True
feeds = [(u'Google Mobile Blog', u'http://googlemobile.blogspot.com/atom.xml')]

View File

@ -19,6 +19,7 @@ class GN(BasicNewsRecipe):
language = 'pl'
remove_javascript = True
temp_files = []
simultaneous_downloads = 1
articles_are_obfuscated = True
@ -94,16 +95,16 @@ class GN(BasicNewsRecipe):
def find_articles(self, main_block):
for a in main_block.findAll('div', attrs={'class':'prev_doc2'}):
art = a.find('a')
yield {
art = a.find('a')
yield {
'title' : self.tag_to_string(art),
'url' : 'http://www.gosc.pl' + art['href'].replace('/doc/','/doc_pr/'),
'date' : '',
'description' : ''
}
for a in main_block.findAll('div', attrs={'class':'sr-document'}):
art = a.find('a')
yield {
art = a.find('a')
yield {
'title' : self.tag_to_string(art),
'url' : 'http://www.gosc.pl' + art['href'].replace('/doc/','/doc_pr/'),
'date' : '',

View File

@ -119,10 +119,8 @@ class Guardian(BasicNewsRecipe):
}
def parse_index(self):
try:
feeds = []
for title, href in self.find_sections():
feeds.append((title, list(self.find_articles(href))))
return feeds
except:
raise NotImplementedError
feeds = []
for title, href in self.find_sections():
feeds.append((title, list(self.find_articles(href))))
return feeds

50
recipes/hankyoreh.recipe Normal file
View File

@ -0,0 +1,50 @@
__license__ = 'GPL v3'
__copyright__ = '2011, Seongkyoun Yoo <seongkyoun.yoo at gmail.com>'
'''
Profile to download The Hankyoreh
'''
import re
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup
class Hankyoreh(BasicNewsRecipe):
title = u'Hankyoreh'
language = 'ko'
description = u'The Hankyoreh News articles'
__author__ = 'Seongkyoun Yoo'
oldest_article = 5
recursions = 1
max_articles_per_feed = 5
no_stylesheets = True
keep_only_tags = [
dict(name='tr', attrs={'height':['60px']}),
dict(id=['fontSzArea'])
]
remove_tags = [
dict(target='_blank'),
dict(name='td', attrs={'style':['padding: 10px 8px 5px 8px;']}),
dict(name='iframe', attrs={'width':['590']}),
]
remove_tags_after = [
dict(target='_top')
]
feeds = [
('All News','http://www.hani.co.kr/rss/'),
('Politics','http://www.hani.co.kr/rss/politics/'),
('Economy','http://www.hani.co.kr/rss/economy/'),
('Society','http://www.hani.co.kr/rss/society/'),
('International','http://www.hani.co.kr/rss/international/'),
('Culture','http://www.hani.co.kr/rss/culture/'),
('Sports','http://www.hani.co.kr/rss/sports/'),
('Science','http://www.hani.co.kr/rss/science/'),
('Opinion','http://www.hani.co.kr/rss/opinion/'),
('Cartoon','http://www.hani.co.kr/rss/cartoon/'),
('English Edition','http://www.hani.co.kr/rss/english_edition/'),
('Specialsection','http://www.hani.co.kr/rss/specialsection/'),
('Hanionly','http://www.hani.co.kr/rss/hanionly/'),
('Hkronly','http://www.hani.co.kr/rss/hkronly/'),
('Multihani','http://www.hani.co.kr/rss/multihani/'),
('Lead','http://www.hani.co.kr/rss/lead/'),
('Newsrank','http://www.hani.co.kr/rss/newsrank/'),
]

View File

@ -0,0 +1,26 @@
__license__ = 'GPL v3'
__copyright__ = '2011, Seongkyoun Yoo <seongkyoun.yoo at gmail.com>'
'''
Profile to download The Hankyoreh
'''
import re
from calibre.web.feeds.news import BasicNewsRecipe
class Hankyoreh21(BasicNewsRecipe):
title = u'Hankyoreh21'
language = 'ko'
description = u'The Hankyoreh21 Magazine articles'
__author__ = 'Seongkyoun Yoo'
oldest_article = 20
recursions = 1
max_articles_per_feed = 120
no_stylesheets = True
remove_javascript = True
keep_only_tags = [
dict(name='font', attrs={'class':'t18bk'}),
dict(id=['fontSzArea'])
]
feeds = [
('Hani21','http://h21.hani.co.kr/rss/ '),
]

View File

@ -3,7 +3,7 @@ from calibre.web.feeds.news import BasicNewsRecipe
class AdvancedUserRecipe1298137661(BasicNewsRecipe):
title = u'Helsingin Sanomat'
__author__ = 'oneillpt'
language = 'fi'
language = 'fi'
oldest_article = 7
max_articles_per_feed = 100
no_stylesheets = True
@ -11,21 +11,12 @@ class AdvancedUserRecipe1298137661(BasicNewsRecipe):
conversion_options = {
'linearize_tables' : True
}
remove_tags = [
dict(name='a', attrs={'id':'articleCommentUrl'}),
dict(name='p', attrs={'class':'newsSummary'}),
dict(name='div', attrs={'class':'headerTools'})
]
keep_only_tags = [dict(name='div', attrs={'id':'main-content'}),
dict(name='div', attrs={'class':'contentNewsArticle'})]
feeds = [(u'Uutiset - HS.fi', u'http://www.hs.fi/uutiset/rss/'), (u'Politiikka - HS.fi', u'http://www.hs.fi/politiikka/rss/'),
feeds = [(u'Uutiset - HS.fi', u'http://www.hs.fi/uutiset/rss/')
, (u'Politiikka - HS.fi', u'http://www.hs.fi/politiikka/rss/'),
(u'Ulkomaat - HS.fi', u'http://www.hs.fi/ulkomaat/rss/'), (u'Kulttuuri - HS.fi', u'http://www.hs.fi/kulttuuri/rss/'),
(u'Kirjat - HS.fi', u'http://www.hs.fi/kulttuuri/kirjat/rss/'), (u'Elokuvat - HS.fi', u'http://www.hs.fi/kulttuuri/elokuvat/rss/')
]
def print_version(self, url):
j = url.rfind("/")
s = url[j:]
i = s.rfind("?ref=rss")
if i > 0:
s = s[:i]
return "http://www.hs.fi/tulosta" + s

View File

@ -1,50 +0,0 @@
#!/usr/bin/env python
# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai
from __future__ import with_statement
__license__ = 'GPL v3'
__copyright__ = '2009, Kovid Goyal <kovid@kovidgoyal.net>'
__docformat__ = 'restructuredtext en'
from calibre.web.feeds.news import BasicNewsRecipe
class HunMilNews(BasicNewsRecipe):
title = u'Honvedelem.hu'
oldest_article = 3
description = u'Katonah\xedrek'
language = 'hu'
lang = 'hu'
encoding = 'windows-1250'
category = 'news, military'
no_stylesheets = True
__author__ = 'Devilinside'
max_articles_per_feed = 16
no_stylesheets = True
keep_only_tags = [dict(name='div', attrs={'class':'cikkoldal_cikk_cim'}),
dict(name='div', attrs={'class':'cikkoldal_cikk_alcim'}),
dict(name='div', attrs={'class':'cikkoldal_datum'}),
dict(name='div', attrs={'class':'cikkoldal_lead'}),
dict(name='div', attrs={'class':'cikkoldal_szoveg'}),
dict(name='img', attrs={'class':'ajanlo_kep_keretes'}),
]
feeds = [(u'Misszi\xf3k', u'http://www.honvedelem.hu/rss_b?c=22'),
(u'Aktu\xe1lis hazai h\xedrek', u'http://www.honvedelem.hu/rss_b?c=3'),
(u'K\xfclf\xf6ldi h\xedrek', u'http://www.honvedelem.hu/rss_b?c=4'),
(u'A h\xf3nap t\xe9m\xe1ja', u'http://www.honvedelem.hu/rss_b?c=6'),
(u'Riport', u'http://www.honvedelem.hu/rss_b?c=5'),
(u'Portr\xe9k', u'http://www.honvedelem.hu/rss_b?c=7'),
(u'Haditechnika', u'http://www.honvedelem.hu/rss_b?c=8'),
(u'Programok, esem\xe9nyek', u'http://www.honvedelem.hu/rss_b?c=12')
]

View File

@ -1,41 +0,0 @@
#!/usr/bin/env python
# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai
from __future__ import with_statement
__license__ = 'GPL v3'
__copyright__ = '2009, Kovid Goyal <kovid@kovidgoyal.net>'
__docformat__ = 'restructuredtext en'
from calibre.web.feeds.news import BasicNewsRecipe
class HunTechNet(BasicNewsRecipe):
title = u'TechNet'
oldest_article = 3
description = u'Az ut\xf3bbi 3 nap TechNet h\xedrei'
language = 'hu'
lang = 'hu'
encoding = 'utf-8'
__author__ = 'Devilinside'
max_articles_per_feed = 30
timefmt = ' [%Y, %b %d, %a]'
remove_tags_before = dict(name='div', attrs={'id':'c-main'})
remove_tags = [dict(name='div', attrs={'class':'wrp clr'}),
{'class' : ['screenrdr','forum','print','startlap','text_small','text_normal','text_big','email']},
]
keep_only_tags = [dict(name='div', attrs={'class':'cikk_head box'}),dict(name='div', attrs={'class':'cikk_txt box'})]
feeds = [(u'C\xedmlap',
u'http://www.technet.hu/rss/cimoldal/'), (u'TechTud',
u'http://www.technet.hu/rss/techtud/'), (u'PDA M\xe1nia',
u'http://www.technet.hu/rss/pdamania/'), (u'Telefon',
u'http://www.technet.hu/rss/telefon/'), (u'Sz\xe1m\xedt\xf3g\xe9p',
u'http://www.technet.hu/rss/notebook/'), (u'GPS',
u'http://www.technet.hu/rss/gps/')]

Binary file not shown.

After

Width:  |  Height:  |  Size: 868 B

18
recipes/korben.recipe Normal file
View File

@ -0,0 +1,18 @@
# -*- coding: utf-8 -*-
class BasicUserRecipe1318619728(AutomaticNewsRecipe):
title = u'Korben'
oldest_article = 7
max_articles_per_feed = 100
auto_cleanup = True
feeds = [(u'Korben', u'http://feeds2.feedburner.com/KorbensBlog-UpgradeYourMind')]
def get_masthead_url(self):
masthead = 'http://korben.info/wp-content/themes/korben-steaw/hab/logo.png'
br = BasicNewsRecipe.get_browser()
try:
br.open(masthead)
except:
self.log("\nCover unavailable")
masthead = None
return masthead

View File

@ -10,9 +10,9 @@ class KoreaHerald(BasicNewsRecipe):
language = 'en'
description = u'Korea Herald News articles'
__author__ = 'Seongkyoun Yoo'
oldest_article = 10
oldest_article = 15
recursions = 3
max_articles_per_feed = 10
max_articles_per_feed = 15
no_stylesheets = True
keep_only_tags = [
dict(id=['contentLeft', '_article'])
@ -25,7 +25,6 @@ class KoreaHerald(BasicNewsRecipe):
]
feeds = [
('All News','http://www.koreaherald.com/rss/020000000000.xml'),
('National','http://www.koreaherald.com/rss/020100000000.xml'),
('Business','http://www.koreaherald.com/rss/020200000000.xml'),
('Life&Style','http://www.koreaherald.com/rss/020300000000.xml'),

View File

@ -1,7 +1,7 @@
from calibre.web.feeds.news import BasicNewsRecipe
class AdvancedUserRecipe1282101454(BasicNewsRecipe):
title = 'Kansascity Star'
title = 'Kansas City Star'
language = 'en'
__author__ = 'TonytheBookworm'
description = 'www.kansascity.com feed'

37
recipes/kyungyhang Normal file
View File

@ -0,0 +1,37 @@
__license__ = 'GPL v3'
__copyright__ = '2011, Seongkyoun Yoo <seongkyoun.yoo at gmail.com>'
'''
Profile to download The Kyungyhang
'''
from calibre.web.feeds.news import BasicNewsRecipe
class Kyungyhang(BasicNewsRecipe):
title = u'Kyungyhang'
language = 'ko'
description = u'The Kyungyhang Shinmun articles'
__author__ = 'Seongkyoun Yoo'
oldest_article = 20
recursions = 2
max_articles_per_feed = 20
no_stylesheets = True
remove_javascript = True
keep_only_tags = [
dict(name='div', attrs ={'class':['article_title_wrap']}),
dict(name='div', attrs ={'class':['article_txt']})
]
remove_tags_after = dict(id={'sub_bottom'})
remove_tags = [
dict(name='iframe'),
dict(id={'TdHot'}),
dict(name='div', attrs={'class':['btn_list','bline','linebottom','bestArticle']}),
dict(name='dl', attrs={'class':['CL']}),
dict(name='ul', attrs={'class':['tab']}),
]
feeds = [
('All News','http://www.khan.co.kr/rss/rssdata/total_news.xml'),
]

View File

@ -1,51 +1,77 @@
#!/usr/bin/env python
__license__ = 'GPL v3'
__author__ = 'Lorenzo Vigentini, based on Darko Miletic, Gabriele Marini'
__copyright__ = '2009, Darko Miletic <darko.miletic at gmail.com>, Lorenzo Vigentini <l.vigentini at gmail.com>'
description = 'Italian daily newspaper - v1.01 (04, January 2010); 16.05.2010 new version'
__copyright__ = '2009-2011, Darko Miletic <darko.miletic at gmail.com>, Lorenzo Vigentini <l.vigentini at gmail.com>'
description = 'Italian daily newspaper - v1.01 (04, January 2010); 16.05.2010 new version; 17.10.2011 new version'
'''
http://www.repubblica.it/
'''
import re
from calibre.ptempfile import PersistentTemporaryFile
from calibre.web.feeds.news import BasicNewsRecipe
class LaRepubblica(BasicNewsRecipe):
__author__ = 'Lorenzo Vigentini, Gabriele Marini'
description = 'Italian daily newspaper'
title = 'La Repubblica'
__author__ = 'Lorenzo Vigentini, Gabriele Marini, Darko Miletic'
description = 'il quotidiano online con tutte le notizie in tempo reale. News e ultime notizie. Tutti i settori: politica, cronaca, economia, sport, esteri, scienza, tecnologia, internet, spettacoli, musica, cultura, arte, mostre, libri, dvd, vhs, concerti, cinema, attori, attrici, recensioni, chat, cucina, mappe. Le citta di Repubblica: Roma, Milano, Bologna, Firenze, Palermo, Napoli, Bari, Torino.'
masthead_url = 'http://www.repubblica.it/static/images/homepage/2010/la-repubblica-logo-home-payoff.png'
publisher = 'Gruppo editoriale L\'Espresso'
category = 'News, politics, culture, economy, general interest'
language = 'it'
timefmt = '[%a, %d %b, %Y]'
oldest_article = 5
encoding = 'utf8'
use_embedded_content = False
no_stylesheets = True
publication_type = 'newspaper'
articles_are_obfuscated = True
temp_files = []
extra_css = """
img{display: block}
"""
cover_url = 'http://www.repubblica.it/images/homepage/la_repubblica_logo.gif'
title = u'La Repubblica'
publisher = 'Gruppo editoriale L\'Espresso'
category = 'News, politics, culture, economy, general interest'
remove_attributes = ['width','height','lang','xmlns:og','xmlns:fb']
language = 'it'
timefmt = '[%a, %d %b, %Y]'
oldest_article = 5
max_articles_per_feed = 100
use_embedded_content = False
recursion = 10
remove_javascript = True
no_stylesheets = True
preprocess_regexps = [
(re.compile(r'.*?<head>', re.DOTALL|re.IGNORECASE), lambda match: '<head>'),
(re.compile(r'<head>.*?<title>', re.DOTALL|re.IGNORECASE), lambda match: '<head><title>'),
(re.compile(r'</title>.*?</head>', re.DOTALL|re.IGNORECASE), lambda match: '</title></head>')
]
def get_article_url(self, article):
link = article.get('id', article.get('guid', None))
if link is None:
return article
return link
link = BasicNewsRecipe.get_article_url(self, article)
if link and not '.repubblica.it/' in link:
link2 = article.get('id', article.get('guid', None))
if link2:
link = link2
return link.rpartition('?')[0]
keep_only_tags = [dict(name='div', attrs={'class':'articolo'}),
dict(name='div', attrs={'class':'body-text'}),
# dict(name='div', attrs={'class':'page-content'}),
def get_obfuscated_article(self, url):
count = 0
while (count < 10):
try:
response = self.browser.open(url)
html = response.read()
count = 10
except:
print "Retrying download..."
count += 1
self.temp_files.append(PersistentTemporaryFile('_fa.html'))
self.temp_files[-1].write(html)
self.temp_files[-1].close()
return self.temp_files[-1].name
keep_only_tags = [
dict(attrs={'class':'articolo'}),
dict(attrs={'class':'body-text'}),
dict(name='p', attrs={'class':'disclaimer clearfix'}),
dict(name='div', attrs={'id':'contA'})
dict(attrs={'id':'contA'})
]
remove_tags = [
dict(name=['object','link']),
dict(name=['object','link','meta','iframe','embed']),
dict(name='span',attrs={'class':'linkindice'}),
dict(name='div', attrs={'class':'bottom-mobile'}),
dict(name='div', attrs={'id':['rssdiv','blocco']}),
@ -76,3 +102,11 @@ class LaRepubblica(BasicNewsRecipe):
(u'Edizione Palermo', u'feed://palermo.repubblica.it/rss/rss2.0.xml')
]
def preprocess_html(self, soup):
for item in soup.findAll(['hgroup','deresponsabilizzazione','per']):
item.name = 'div'
item.attrs = []
for item in soup.findAll(style=True):
del item['style']
return soup

76
recipes/lepoint.recipe Normal file
View File

@ -0,0 +1,76 @@
# -*- coding: utf-8 -*-
__license__ = 'GPL v3'
__copyright__ = '2011 Aurélien Chabot <contact@aurelienchabot.fr>'
'''
LePoint.fr
'''
import re
from calibre.web.feeds.recipes import BasicNewsRecipe
class lepoint(BasicNewsRecipe):
title = 'Le Point'
__author__ = 'calibre'
description = 'Actualités'
encoding = 'utf-8'
publisher = 'LePoint.fr'
category = 'news, France, world'
language = 'fr'
use_embedded_content = False
timefmt = ' [%d %b %Y]'
max_articles_per_feed = 15
no_stylesheets = True
remove_empty_feeds = True
filterDuplicates = True
extra_css = '''
h1 {font-size:xx-large; font-family:Arial,Helvetica,sans-serif;}
.chapo {font-size:xx-small; font-family:Arial,Helvetica,sans-serif;}
.info_article {font-size:xx-small; color:#4D4D4D; font-family:Arial,Helvetica,sans-serif;}
.media_article {font-size:xx-small; color:#4D4D4D; font-family:Arial,Helvetica,sans-serif;}
.article {font-size:medium; font-family:Arial,Helvetica,sans-serif;}
'''
remove_tags = [
dict(name='iframe'),
dict(name='div', attrs={'class':['entete_chroniqueur']}),
dict(name='div', attrs={'class':['col_article']}),
dict(name='div', attrs={'class':['signature_article']}),
dict(name='div', attrs={'class':['util_font util_article']}),
dict(name='div', attrs={'class':['util_article bottom']})
]
keep_only_tags = [dict(name='div', attrs={'class':['page_article']})]
remove_tags_after = dict(name='div', attrs={'class':['util_article bottom']})
feeds = [
(u'À la une', 'http://www.lepoint.fr/rss.xml'),
('International', 'http://www.lepoint.fr/monde/rss.xml'),
('Tech/Web', 'http://www.lepoint.fr/high-tech-internet/rss.xml'),
('Sciences', 'http://www.lepoint.fr/science/rss.xml'),
('Economie', 'http://www.lepoint.fr/economie/rss.xml'),
(u'Socièté', 'http://www.lepoint.fr/societe/rss.xml'),
('Politique', 'http://www.lepoint.fr/politique/rss.xml'),
(u'Médias', 'http://www.lepoint.fr/medias/rss.xml'),
('Culture', 'http://www.lepoint.fr/culture/rss.xml'),
(u'Santé', 'http://www.lepoint.fr/sante/rss.xml'),
('Sport', 'http://www.lepoint.fr/sport/rss.xml')
]
def preprocess_html(self, soup):
for item in soup.findAll(style=True):
del item['style']
return soup
def get_masthead_url(self):
masthead = 'http://www.lepoint.fr/images/commun/logo.png'
br = BasicNewsRecipe.get_browser()
try:
br.open(masthead)
except:
self.log("\nCover unavailable")
masthead = None
return masthead

74
recipes/lexpress.recipe Normal file
View File

@ -0,0 +1,74 @@
# -*- coding: utf-8 -*-
__license__ = 'GPL v3'
__copyright__ = '2011 Aurélien Chabot <contact@aurelienchabot.fr>'
'''
Lexpress.fr
'''
import re
from calibre.web.feeds.recipes import BasicNewsRecipe
class lepoint(BasicNewsRecipe):
title = 'L\'express'
__author__ = 'calibre'
description = 'Actualités'
encoding = 'cp1252'
publisher = 'LExpress.fr'
category = 'Actualité, France, Monde'
language = 'fr'
use_embedded_content = False
timefmt = ' [%d %b %Y]'
max_articles_per_feed = 15
no_stylesheets = True
remove_empty_feeds = True
filterDuplicates = True
extra_css = '''
h1 {font-size:xx-large; font-family:Arial,Helvetica,sans-serif;}
.current_parent, p.heure, .ouverture {font-size:xx-small; color:#4D4D4D; font-family:Arial,Helvetica,sans-serif;}
#contenu-article {font-size:medium; font-family:Arial,Helvetica,sans-serif;}
.entete { font-weiht:bold;}
'''
remove_tags = [
dict(name='iframe'),
dict(name='div', attrs={'class':['barre-outil-fb']}),
dict(name='div', attrs={'class':['barre-outils']}),
dict(id='bloc-sommaire'),
dict(id='footer-article')
]
keep_only_tags = [dict(name='div', attrs={'class':['bloc-article']})]
remove_tags_after = dict(id='content-article')
feeds = [
(u'À la une', 'http://www.lexpress.fr/rss/alaune.xml'),
('International', 'http://www.lexpress.fr/rss/monde.xml'),
('Tech/Web', 'http://www.lexpress.fr/rss/high-tech.xml'),
(u'Sciences/Santé', 'http://www.lexpress.fr/rss/science-et-sante.xml'),
(u'Envronnement', 'http://www.lexpress.fr/rss/environnement.xml'),
('Economie', 'http://www.lepoint.fr/economie/rss.xml'),
(u'Socièté', 'http://www.lexpress.fr/rss/societe.xml'),
('Politique', 'http://www.lexpress.fr/rss/politique.xml'),
(u'Médias', 'http://www.lexpress.fr/rss/medias.xml'),
('Culture', 'http://www.lexpress.fr/rss/culture.xml'),
('Sport', 'http://www.lexpress.fr/rss/sport.xml')
]
def preprocess_html(self, soup):
for item in soup.findAll(style=True):
del item['style']
return soup
def get_masthead_url(self):
masthead = 'http://static.lexpress.fr/imgstat/logo_lexpress.gif'
br = BasicNewsRecipe.get_browser()
try:
br.open(masthead)
except:
self.log("\nCover unavailable")
masthead = None
return masthead

View File

@ -9,39 +9,72 @@ liberation.fr
from calibre.web.feeds.news import BasicNewsRecipe
class Liberation(BasicNewsRecipe):
title = u'Liberation'
__author__ = 'Darko Miletic'
description = 'News from France'
language = 'fr'
__author__ = 'calibre'
description = 'Actualités'
category = 'Actualités, France, Monde'
language = 'fr'
oldest_article = 7
max_articles_per_feed = 100
no_stylesheets = True
use_embedded_content = False
use_embedded_content = False
timefmt = ' [%d %b %Y]'
max_articles_per_feed = 15
no_stylesheets = True
remove_empty_feeds = True
filterDuplicates = True
html2lrf_options = ['--base-font-size', '10']
extra_css = '''
h1, h2, h3 {font-size:xx-large; font-family:Arial,Helvetica,sans-serif;}
p.subtitle {font-size:xx-small; font-family:Arial,Helvetica,sans-serif;}
h4, h5, h2.rubrique, {font-size:xx-small; color:#4D4D4D; font-family:Arial,Helvetica,sans-serif;}
.ref, .date, .author, .legende {font-size:xx-small; color:#4D4D4D; font-family:Arial,Helvetica,sans-serif;}
.mna-body, entry-body {font-size:medium; font-family:Arial,Helvetica,sans-serif;}
'''
keep_only_tags = [
dict(name='h1')
#,dict(name='div', attrs={'class':'object-content text text-item'})
,dict(name='div', attrs={'class':'article'})
#,dict(name='div', attrs={'class':'articleContent'})
,dict(name='div', attrs={'class':'entry'})
]
remove_tags_after = [ dict(name='div',attrs={'class':'toolbox extra_toolbox'}) ]
dict(name='div', attrs={'class':'article'})
,dict(name='div', attrs={'class':'text-article m-bot-s1'})
,dict(name='div', attrs={'class':'entry'})
,dict(name='div', attrs={'class':'col_contenu'})
]
remove_tags_after = [
dict(name='div',attrs={'class':['object-content text text-item', 'object-content', 'entry-content', 'col01', 'bloc_article_01']})
,dict(name='p',attrs={'class':['chapo']})
,dict(id='_twitter_facebook')
]
remove_tags = [
dict(name='p', attrs={'class':'clear'})
,dict(name='ul', attrs={'class':'floatLeft clear'})
,dict(name='div', attrs={'class':'clear floatRight'})
,dict(name='object')
,dict(name='div', attrs={'class':'toolbox'})
,dict(name='div', attrs={'class':'cartridge cartridge-basic-bubble cat-zoneabo'})
#,dict(name='div', attrs={'class':'clear block block-call-items'})
,dict(name='div', attrs={'class':'block-content'})
dict(name='iframe')
,dict(name='a', attrs={'class':'lnk-comments'})
,dict(name='div', attrs={'class':'toolbox'})
,dict(name='ul', attrs={'class':'share-box'})
,dict(name='ul', attrs={'class':'tool-box'})
,dict(name='ul', attrs={'class':'rub'})
,dict(name='p',attrs={'class':['chapo']})
,dict(name='p',attrs={'class':['tag']})
,dict(name='div',attrs={'class':['blokLies']})
,dict(name='div',attrs={'class':['alire']})
,dict(id='_twitter_facebook')
]
feeds = [
(u'La une', u'http://www.liberation.fr/rss/laune')
,(u'Monde' , u'http://www.liberation.fr/rss/monde')
,(u'Sports', u'http://www.liberation.fr/rss/sports')
(u'La une', u'http://rss.liberation.fr/rss/9/')
,(u'Monde' , u'http://www.liberation.fr/rss/10/')
,(u'Économie', u'http://www.liberation.fr/rss/13/')
,(u'Politiques', u'http://www.liberation.fr/rss/11/')
,(u'Société', u'http://www.liberation.fr/rss/12/')
,(u'Cinéma', u'http://www.liberation.fr/rss/58/')
,(u'Écran', u'http://www.liberation.fr/rss/53/')
,(u'Sports', u'http://www.liberation.fr/rss/12/')
]
def get_masthead_url(self):
masthead = 'http://s0.libe.com/libe/img/common/logo-liberation-150.png'
br = BasicNewsRecipe.get_browser()
try:
br.open(masthead)
except:
self.log("\nCover unavailable")
masthead = None
return masthead

View File

@ -22,7 +22,7 @@ class LosTiempos_Bol(BasicNewsRecipe):
publication_type = 'newspaper'
delay = 1
remove_empty_feeds = True
cover_url = strftime('http://www.lostiempos.com/media_recortes/%Y/%m/%d/portada_md_1.jpg')
cover_url = strftime('http://www.lostiempos.com/media_recortes/%Y/%m/%d/portada_gd_1.jpg')
masthead_url = 'http://www.lostiempos.com/img_stat/logo_tiempos_sin_beta.jpg'
extra_css = """ body{font-family: Arial,Helvetica,sans-serif }
img{margin-bottom: 0.4em}

View File

@ -0,0 +1,27 @@
from calibre.web.feeds.news import BasicNewsRecipe
class MercoPress(BasicNewsRecipe):
title = u'Merco Press'
description = u"Read News, Stories and Insight Analysis from Latin America and Mercosur. Politics, Economy, Business and Investments in South America."
cover_url = 'http://en.mercopress.com/web/img/en/mercopress-logo.gif'
__author__ = 'Russell Phillips'
language = 'en'
oldest_article = 7
max_articles_per_feed = 100
auto_cleanup = True
extra_css = 'img{padding-bottom:1ex; display:block; text-align: center;}'
remove_tags = [dict(name='a')]
feeds = [('Antarctica', 'http://en.mercopress.com/rss/antarctica'),
('Argentina', 'http://en.mercopress.com/rss/argentina'),
('Brazil', 'http://en.mercopress.com/rss/brazil'),
('Falkland Islands', 'http://en.mercopress.com/rss/falkland-islands'),
('International News', 'http://en.mercopress.com/rss/international'),
('Latin America', 'http://en.mercopress.com/rss/latin-america'),
('Mercosur', 'http://en.mercopress.com/rss/mercosur'),
('Paraguay', 'http://en.mercopress.com/rss/paraguay'),
('United States', 'http://en.mercopress.com/rss/united-states'),
('Uruguay://en.mercopress.com/rss/uruguay')]

View File

@ -4,24 +4,25 @@ __copyright__ = '2010-2011, Eddie Lau'
# Region - Hong Kong, Vancouver, Toronto
__Region__ = 'Hong Kong'
# Users of Kindle 3 with limited system-level CJK support
# please replace the following "True" with "False".
# please replace the following "True" with "False". (Default: True)
__MakePeriodical__ = True
# Turn below to True if your device supports display of CJK titles
# Turn below to True if your device supports display of CJK titles (Default: False)
__UseChineseTitle__ = False
# Set it to False if you want to skip images
# Set it to False if you want to skip images (Default: True)
__KeepImages__ = True
# (HK only) Turn below to True if you wish to use life.mingpao.com as the main article source
# (HK only) Turn below to True if you wish to use life.mingpao.com as the main article source (Default: True)
__UseLife__ = True
# (HK only) It is to disable the column section which is now a premium content
__InclCols__ = False
# (HK only) Turn below to True if you wish to parse articles in news.mingpao.com with their printer-friendly formats
__ParsePFF__ = False
# (HK only) Turn below to True if you wish hi-res images
# (HK only) It is to disable premium content (Default: False)
__InclPremium__ = False
# (HK only) Turn below to True if you wish to parse articles in news.mingpao.com with their printer-friendly formats (Default: True)
__ParsePFF__ = True
# (HK only) Turn below to True if you wish hi-res images (Default: False)
__HiResImg__ = False
'''
Change Log:
2011/10/17: disable fetching of premium content, also improved txt source parsing
2011/10/04: option to get hi-res photos for the articles
2011/09/21: fetching "column" section is made optional.
2011/09/18: parse "column" section stuff from source text file directly.
@ -72,7 +73,7 @@ class MPRecipe(BasicNewsRecipe):
dict(attrs={'class':['content']}), # for content from txt
dict(attrs={'class':['photo']}),
dict(name='table', attrs={'width':['100%'], 'border':['0'], 'cellspacing':['5'], 'cellpadding':['0']}), # content in printed version of life.mingpao.com
dict(name='img', attrs={'width':['180'], 'alt':['按圖放大']}), # images for source from life.mingpao.com
dict(name='img', attrs={'width':['180'], 'alt':['????']}), # images for source from life.mingpao.com
dict(attrs={'class':['images']}) # for images from txt
]
if __KeepImages__:
@ -208,11 +209,14 @@ class MPRecipe(BasicNewsRecipe):
(u'\u9ad4\u80b2 Sport', 'http://life.mingpao.com/cfm/dailynews2.cfm?Issue=' + dateStr + '&Category=nalsp', 'nal'),
(u'\u5f71\u8996 Film/TV', 'http://life.mingpao.com/cfm/dailynews2.cfm?Issue=' + dateStr + '&Category=nalma', 'nal')
]:
articles = self.parse_section2(url, keystr)
if __InclPremium__ == True:
articles = self.parse_section2_txt(url, keystr)
else:
articles = self.parse_section2(url, keystr)
if articles:
feeds.append((title, articles))
if __InclCols__ == True:
if __InclPremium__ == True:
# parse column section articles directly from .txt files
for title, url, keystr in [(u'\u5c08\u6b04 Columns', 'http://life.mingpao.com/cfm/dailynews2.cfm?Issue=' + dateStr +'&Category=ncolumn', 'ncl')
]:
@ -253,7 +257,7 @@ class MPRecipe(BasicNewsRecipe):
# feeds.append((u'\u7d93\u6fdf Finance', fin_articles))
for title, url, keystr in [(u'\u7d93\u6fdf Finance', 'http://life.mingpao.com/cfm/dailynews2.cfm?Issue=' + dateStr + '&Category=nalea', 'nal')]:
articles = self.parse_section2(url, keystr)
articles = self.parse_section2_txt(url, keystr)
if articles:
feeds.append((title, articles))
@ -270,11 +274,11 @@ class MPRecipe(BasicNewsRecipe):
for title, url, keystr in [(u'\u5f71\u8996 Film/TV', 'http://life.mingpao.com/cfm/dailynews2.cfm?Issue=' + dateStr + '&Category=nalma', 'nal')
]:
articles = self.parse_section2(url, keystr)
articles = self.parse_section2_txt(url, keystr)
if articles:
feeds.append((title, articles))
if __InclCols__ == True:
if __InclPremium__ == True:
# parse column section articles directly from .txt files
for title, url, keystr in [(u'\u5c08\u6b04 Columns', 'http://life.mingpao.com/cfm/dailynews2.cfm?Issue=' + dateStr +'&Category=ncolumn', 'ncl')
]:
@ -333,7 +337,7 @@ class MPRecipe(BasicNewsRecipe):
url = 'http://news.mingpao.com/' + dateStr + '/' +url
# replace the url to the print-friendly version
if __ParsePFF__ == True:
if url.rfind('Redirect') <> -1:
if url.rfind('Redirect') <> -1 and __InclPremium__ == True:
url = re.sub(dateStr + '.*' + dateStr, dateStr, url)
url = re.sub('%2F.*%2F', '/', url)
title = title.replace(u'\u6536\u8cbb\u5167\u5bb9', '')
@ -349,6 +353,8 @@ class MPRecipe(BasicNewsRecipe):
# parse from life.mingpao.com
def parse_section2(self, url, keystr):
br = mechanize.Browser()
br.set_handle_redirect(False)
self.get_fetchdate()
soup = self.index_to_soup(url)
a = soup.findAll('a', href=True)
@ -359,9 +365,13 @@ class MPRecipe(BasicNewsRecipe):
title = self.tag_to_string(i)
url = 'http://life.mingpao.com/cfm/' + i.get('href', False)
if (url not in included_urls) and (not url.rfind('.txt') == -1) and (not url.rfind(keystr) == -1):
url = url.replace('dailynews3.cfm', 'dailynews3a.cfm') # use printed version of the article
current_articles.append({'title': title, 'url': url, 'description': ''})
included_urls.append(url)
try:
br.open_novisit(url)
url = url.replace('dailynews3.cfm', 'dailynews3a.cfm') # use printed version of the article
current_articles.append({'title': title, 'url': url, 'description': ''})
included_urls.append(url)
except:
print 'skipping a premium article'
current_articles.reverse()
return current_articles
@ -553,6 +563,7 @@ class MPRecipe(BasicNewsRecipe):
# .txt based file
splitter = re.compile(r'\n') # Match non-digits
new_raw_html = '<html><head><title>Untitled</title></head><body><div class="images">'
next_is_mov_link = False
next_is_img_txt = False
title_started = False
met_article_start_char = False
@ -561,22 +572,33 @@ class MPRecipe(BasicNewsRecipe):
met_article_start_char = True
new_raw_html = new_raw_html + '</div><div class="content"><p>' + item + '<p>\n'
else:
if next_is_img_txt == False:
if item.startswith('='):
if next_is_img_txt == False and next_is_mov_link == False:
item = item.strip()
if item.startswith("=@"):
next_is_mov_link = True
elif item.startswith("=?"):
next_is_img_txt = True
new_raw_html += '<img src="' + str(item)[2:].strip() + '.gif" /><p>\n'
elif item.startswith('='):
next_is_img_txt = True
new_raw_html += '<img src="' + str(item)[1:].strip() + '.jpg" /><p>\n'
else:
if met_article_start_char == False:
if title_started == False:
new_raw_html = new_raw_html + '</div><div class="heading">' + item + '\n'
title_started = True
if item <> '':
if next_is_img_txt == False and met_article_start_char == False:
if title_started == False:
#print 'Title started at ', item
new_raw_html = new_raw_html + '</div><div class="heading">' + item + '\n'
title_started = True
else:
new_raw_html = new_raw_html + item + '\n'
else:
new_raw_html = new_raw_html + item + '\n'
else:
new_raw_html = new_raw_html + item + '<p>\n'
new_raw_html = new_raw_html + item + '<p>\n'
else:
next_is_img_txt = False
new_raw_html = new_raw_html + item + '\n'
if next_is_mov_link == True:
next_is_mov_link = False
else:
next_is_img_txt = False
new_raw_html = new_raw_html + item + '\n'
return new_raw_html + '</div></body></html>'
def preprocess_html(self, soup):
@ -678,7 +700,7 @@ class MPRecipe(BasicNewsRecipe):
if po is None:
self.play_order_counter += 1
po = self.play_order_counter
parent.add_item('%sindex.html'%adir, None, a.title if a.title else ('Untitled Article'),
parent.add_item('%sindex.html'%adir, None, a.title if a.title else _('Untitled Article'),
play_order=po, author=auth, description=desc)
last = os.path.join(self.output_dir, ('%sindex.html'%adir).replace('/', os.sep))
for sp in a.sub_pages:

18
recipes/omgubuntu.recipe Normal file
View File

@ -0,0 +1,18 @@
# -*- coding: utf-8 -*-
class BasicUserRecipe1318619832(AutomaticNewsRecipe):
title = u'OmgUbuntu'
oldest_article = 7
max_articles_per_feed = 100
auto_cleanup = True
feeds = [(u'Omg Ubuntu', u'http://feeds.feedburner.com/d0od')]
def get_masthead_url(self):
masthead = 'http://cdn.omgubuntu.co.uk/wp-content/themes/omgubuntu/images/logo.png'
br = BasicNewsRecipe.get_browser()
try:
br.open(masthead)
except:
self.log("\nCover unavailable")
masthead = None
return masthead

View File

@ -0,0 +1,17 @@
from calibre.web.feeds.news import BasicNewsRecipe
class MercoPress(BasicNewsRecipe):
title = u'Penguin News'
description = u"Penguin News: the Falkland Islands' only newspaper."
cover_url = 'http://www.penguin-news.com/templates/rt_syndicate_j15/images/logo/light/logo1.png'
language = 'en'
__author__ = 'Russell Phillips'
oldest_article = 7
max_articles_per_feed = 100
auto_cleanup = True
extra_css = 'img{padding-bottom:1ex; display:block; text-align: center;}'
feeds = [(u'Penguin News - Falkland Islands', u'http://www.penguin-news.com/index.php?format=feed&amp;type=rss')]

47
recipes/phoronix.recipe Normal file
View File

@ -0,0 +1,47 @@
# -*- coding: utf-8 -*-
__license__ = 'GPL v3'
__copyright__ = '2011 Aurélien Chabot <contact@aurelienchabot.fr>'
'''
Fetch phoronix.com
'''
from calibre.web.feeds.news import BasicNewsRecipe
class cdnet(BasicNewsRecipe):
title = 'Phoronix'
__author__ = 'calibre'
description = 'Actualités Phoronix'
encoding = 'utf-8'
publisher = 'Phoronix.com'
category = 'news, IT, linux'
language = 'en'
use_embedded_content = False
timefmt = ' [%d %b %Y]'
max_articles_per_feed = 25
no_stylesheets = True
remove_empty_feeds = True
filterDuplicates = True
extra_css = '''
h1 {font-size:xx-large; font-family:Arial,Helvetica,sans-serif;}
h2 {font-size:xx-small; color:#4D4D4D; font-family:Arial,Helvetica,sans-serif;}
.KonaBody {font-size:medium; font-family:Arial,Helvetica,sans-serif;}
'''
remove_tags = []
remove_tags_before = dict(id='phxcms_content_phx')
remove_tags_after = dict(name='div', attrs={'class':'KonaBody'})
feeds = [('Phoronix', 'http://feeds.feedburner.com/Phoronix')]
def preprocess_html(self, soup):
for item in soup.findAll(style=True):
del item['style']
return soup

View File

@ -10,7 +10,7 @@ from calibre.web.feeds.news import BasicNewsRecipe
class Sueddeutsche(BasicNewsRecipe):
title = u'Süddeutsche'
title = u'sueddeutsche.de'
description = 'News from Germany'
__author__ = 'Oliver Niesner and Armin Geller'
use_embedded_content = False
@ -62,7 +62,7 @@ class Sueddeutsche(BasicNewsRecipe):
(u'Sport', u'http://suche.sueddeutsche.de/query/%23/sort/-docdatetime/drilldown/%C2%A7ressort%3A%5ESport%24?output=rss'),
(u'Leben', u'http://suche.sueddeutsche.de/query/%23/sort/-docdatetime/drilldown/%C2%A7ressort%3A%5ELeben%24?output=rss'),
(u'Karriere', u'http://suche.sueddeutsche.de/query/%23/sort/-docdatetime/drilldown/%C2%A7ressort%3A%5EKarriere%24?output=rss'),
(u'München&Region', u'http://www.sueddeutsche.de/app/service/rss/ressort/muenchen/rss.xml'),
(u'M&uuml;nchen & Region', u'http://www.sueddeutsche.de/app/service/rss/ressort/muenchen/rss.xml'),
(u'Bayern', u'http://suche.sueddeutsche.de/query/%23/sort/-docdatetime/drilldown/%C2%A7ressort%3A%5EBayern%24?output=rss'),
(u'Medien', u'http://suche.sueddeutsche.de/query/%23/sort/-docdatetime/drilldown/%C2%A7ressort%3A%5EMedien%24?output=rss'),
(u'Digital', u'http://suche.sueddeutsche.de/query/%23/sort/-docdatetime/drilldown/%C2%A7ressort%3A%5EDigital%24?output=rss'),
@ -75,7 +75,7 @@ class Sueddeutsche(BasicNewsRecipe):
(u'Job', u'http://suche.sueddeutsche.de/query/%23/sort/-docdatetime/drilldown/%C2%A7ressort%3A%5EJob%24?output=rss'), # sometimes only
(u'Service', u'http://suche.sueddeutsche.de/query/%23/sort/-docdatetime/drilldown/%C2%A7ressort%3A%5EService%24?output=rss'), # sometimes only
(u'Verlag', u'http://suche.sueddeutsche.de/query/%23/sort/-docdatetime/drilldown/%C2%A7ressort%3A%5EVerlag%24?output=rss'), # sometimes only
]
]
def print_version(self, url):
main, sep, id = url.rpartition('/')

View File

@ -3,7 +3,7 @@
from calibre.web.feeds.news import BasicNewsRecipe
class TelepolisNews(BasicNewsRecipe):
title = u'Telepolis (News+Artikel)'
title = u'Telepolis'
__author__ = 'syntaxis'
publisher = 'Heise Zeitschriften Verlag GmbH & Co KG'
description = 'News from Telepolis'
@ -15,11 +15,8 @@ class TelepolisNews(BasicNewsRecipe):
encoding = "utf-8"
language = 'de'
remove_empty_feeds = True
keep_only_tags = [dict(name = 'div',attrs={'class':'head'}),dict(name = 'div',attrs={'class':'leftbox'}),dict(name='td',attrs={'class':'strict'})]
remove_tags = [ dict(name='td',attrs={'class':'blogbottom'}),
dict(name='div',attrs={'class':'forum'}), dict(name='div',attrs={'class':'social'}),dict(name='div',attrs={'class':'blog-letter p-news'}),
@ -28,7 +25,6 @@ class TelepolisNews(BasicNewsRecipe):
remove_tags_after = [dict(name='span', attrs={'class':['breadcrumb']})]
feeds = [(u'News', u'http://www.heise.de/tp/news-atom.xml')]
html2lrf_options = [
@ -39,7 +35,6 @@ class TelepolisNews(BasicNewsRecipe):
html2epub_options = 'publisher="' + publisher + '"\ncomments="' + description + '"\ntags="' + category + '"'
def preprocess_html(self, soup):
mtag = '<meta http-equiv="Content-Type" content="text/html; charset=' + self.encoding + '">'
soup.head.insert(0,mtag)

View File

@ -10,27 +10,28 @@ from calibre.web.feeds.news import BasicNewsRecipe
class USAToday(BasicNewsRecipe):
title = 'USA Today'
__author__ = 'Kovid Goyal'
oldest_article = 1
publication_type = 'newspaper'
timefmt = ''
max_articles_per_feed = 20
language = 'en'
no_stylesheets = True
extra_css = '.headline {text-align: left;}\n \
.byline {font-family: monospace; \
text-align: left; \
margin-bottom: 1em;}\n \
.image {text-align: center;}\n \
.caption {text-align: center; \
font-size: smaller; \
font-style: italic}\n \
.credit {text-align: right; \
margin-bottom: 0em; \
font-size: smaller;}\n \
.articleBody {text-align: left;}\n '
#simultaneous_downloads = 1
title = 'USA Today'
__author__ = 'calibre'
description = 'newspaper'
encoding = 'utf-8'
publisher = 'usatoday.com'
category = 'news, usa'
language = 'en'
use_embedded_content = False
timefmt = ' [%d %b %Y]'
max_articles_per_feed = 15
no_stylesheets = True
remove_empty_feeds = True
filterDuplicates = True
extra_css = '''
h1, h2 {font-size:xx-large; font-family:Arial,Helvetica,sans-serif;}
#post-attributes, .info, .clear {font-size:xx-small; color:#4D4D4D; font-family:Arial,Helvetica,sans-serif;}
#post-body, #content {font-size:medium; font-family:Arial,Helvetica,sans-serif;}
'''
feeds = [
('Top Headlines', 'http://rssfeeds.usatoday.com/usatoday-NewsTopStories'),
('Tech Headlines', 'http://rssfeeds.usatoday.com/usatoday-TechTopStories'),
@ -43,15 +44,18 @@ class USAToday(BasicNewsRecipe):
('Sport Headlines', 'http://rssfeeds.usatoday.com/UsatodaycomSports-TopStories'),
('Weather Headlines', 'http://rssfeeds.usatoday.com/usatoday-WeatherTopStories'),
('Most Popular', 'http://rssfeeds.usatoday.com/Usatoday-MostViewedArticles'),
('Offbeat News', 'http://rssfeeds.usatoday.com/UsatodaycomOffbeat-TopStories'),
('Offbeat News', 'http://rssfeeds.usatoday.com/UsatodaycomOffbeat-TopStories')
]
keep_only_tags = [dict(attrs={'class':'story'})]
remove_tags = [
dict(attrs={'class':[
'share',
'reprints',
'inline-h3',
'info-extras',
'info-extras rounded',
'inset',
'ppy-outer',
'ppy-caption',
'comments',
@ -61,9 +65,13 @@ class USAToday(BasicNewsRecipe):
'tags',
'bottom-tools',
'sponsoredlinks',
'corrections'
]}),
dict(name='ul', attrs={'class':'inside-copy'}),
dict(id=['pluck']),
]
dict(id=['updated']),
dict(id=['post-date-updated'])
]
def get_masthead_url(self):

17
recipes/wow.recipe Normal file
View File

@ -0,0 +1,17 @@
from calibre.web.feeds.news import BasicNewsRecipe
class WoW(BasicNewsRecipe):
title = u'WoW Insider'
language = 'en'
__author__ = 'Krittika Goyal'
oldest_article = 1 #days
max_articles_per_feed = 25
use_embedded_content = False
no_stylesheets = True
auto_cleanup = True
feeds = [
('WoW',
'http://wow.joystiq.com/rss.xml')
]

68
recipes/zdnet.fr.recipe Normal file
View File

@ -0,0 +1,68 @@
# -*- coding: utf-8 -*-
__license__ = 'GPL v3'
__copyright__ = '2011 Aurélien Chabot <contact@aurelienchabot.fr>'
'''
Fetch zdnet.fr
'''
from calibre.web.feeds.news import BasicNewsRecipe
class zdnet(BasicNewsRecipe):
title = 'ZDNet.fr'
__author__ = 'calibre'
description = 'Actualités'
encoding = 'utf-8'
publisher = 'ZDNet.fr'
category = 'Actualité, Informatique, IT'
language = 'fr'
use_embedded_content = False
timefmt = ' [%d %b %Y]'
max_articles_per_feed = 15
no_stylesheets = True
remove_empty_feeds = True
filterDuplicates = True
extra_css = '''
h1 {font-size:xx-large; font-family:Arial,Helvetica,sans-serif;}
.contentmetadata p {font-size:xx-small; color:#4D4D4D; font-family:Arial,Helvetica,sans-serif;}
#content {font-size:medium; font-family:Arial,Helvetica,sans-serif;}
'''
remove_tags = [
dict(name='iframe'),
dict(name='div', attrs={'class':['toolbox']}),
dict(name='div', attrs={'class':['clear clearfix']}),
dict(id='emailtoafriend'),
dict(id='storyaudio'),
dict(id='fbtwContainer'),
dict(name='h5')
]
remove_tags_before = dict(id='leftcol')
remove_tags_after = dict(id='content')
feeds = [
('Informatique', 'http://www.zdnet.fr/feeds/rss/actualites/informatique/'),
('Internet', 'http://www.zdnet.fr/feeds/rss/actualites/internet/'),
('Telecom', 'http://www.zdnet.fr/feeds/rss/actualites/telecoms/')
]
def preprocess_html(self, soup):
for item in soup.findAll(style=True):
del item['style']
return soup
def get_masthead_url(self):
masthead = 'http://www.zdnet.fr/images/base/logo.png'
br = BasicNewsRecipe.get_browser()
try:
br.open(masthead)
except:
self.log("\nCover unavailable")
masthead = None
return masthead

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.0 KiB

View File

@ -1,7 +1,7 @@
<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:html="http://www.w3.org/1999/xhtml"
xmlns="http://www.w3.org/1999/xhtml"
xmlns:rtf="http://rtf2xml.sourceforge.net/"
xmlns:c="calibre"
extension-element-prefixes="c"
@ -63,11 +63,16 @@
</xsl:template>
<xsl:template name = "para">
<xsl:if test = "normalize-space(.) or child::*">
<xsl:element name = "p">
<xsl:call-template name = "para-content"/>
</xsl:element>
</xsl:if>
<xsl:element name = "p">
<xsl:choose>
<xsl:when test = "normalize-space(.) or child::*">
<xsl:call-template name = "para-content"/>
</xsl:when>
<xsl:otherwise>
<xsl:text>&#160;</xsl:text>
</xsl:otherwise>
</xsl:choose>
</xsl:element>
</xsl:template>
<xsl:template name = "para_off">
@ -149,7 +154,7 @@
<xsl:template match="rtf:doc-information" mode="header">
<link rel="stylesheet" type="text/css" href="styles.css"/>
<xsl:if test="not(rtf:title)">
<title>unamed</title>
<title>unnamed</title>
</xsl:if>
<xsl:apply-templates/>
</xsl:template>
@ -445,7 +450,10 @@
<xsl:template match = "rtf:field[@type='hyperlink']">
<xsl:element name ="a">
<xsl:attribute name = "href"><xsl:if test="not(contains(@link, '/'))">#</xsl:if><xsl:value-of select = "@link"/></xsl:attribute>
<xsl:attribute name = "href">
<xsl:if test = "not(contains(@link, '/'))">#</xsl:if>
<xsl:value-of select = "@link"/>
</xsl:attribute>
<xsl:apply-templates/>
</xsl:element>
</xsl:template>

View File

@ -225,7 +225,10 @@ except:
try:
HOST=get_ip_address('wlan0')
except:
HOST='192.168.1.2'
try:
HOST=get_ip_address('ppp0')
except:
HOST='192.168.1.2'
PROJECT=os.path.basename(os.path.abspath('.'))

View File

@ -336,7 +336,7 @@ class Build(Command):
oinc = ['/Fo'+obj] if iswindows else ['-o', obj]
cmd = [compiler] + cflags + ext.cflags + einc + sinc + oinc
self.info(' '.join(cmd))
subprocess.check_call(cmd)
self.check_call(cmd)
dest = self.dest(ext)
elib = self.lib_dirs_to_ldflags(ext.lib_dirs)
@ -350,18 +350,32 @@ class Build(Command):
else:
cmd += objects + ext.extra_objs + ['-o', dest] + ldflags + ext.ldflags + elib + xlib
self.info('\n\n', ' '.join(cmd), '\n\n')
subprocess.check_call(cmd)
self.check_call(cmd)
if iswindows:
#manifest = dest+'.manifest'
#cmd = [MT, '-manifest', manifest, '-outputresource:%s;2'%dest]
#self.info(*cmd)
#subprocess.check_call(cmd)
#self.check_call(cmd)
#os.remove(manifest)
for x in ('.exp', '.lib'):
x = os.path.splitext(dest)[0]+x
if os.path.exists(x):
os.remove(x)
def check_call(self, *args, **kwargs):
"""print cmdline if an error occured
If something is missing (qmake e.g.) you get a non-informative error
self.check_call(qmc + [ext.name+'.pro'])
so you would have to look a the source to see the actual command.
"""
try:
subprocess.check_call(*args, **kwargs)
except:
cmdline = ' '.join(['"%s"' % (arg) if ' ' in arg else arg for arg in args[0]])
print "Error while executing: %s\n" % (cmdline)
raise
def build_qt_objects(self, ext):
obj_pat = 'release\\*.obj' if iswindows else '*.o'
objects = glob.glob(obj_pat)
@ -380,8 +394,8 @@ class Build(Command):
qmc = [QMAKE, '-o', 'Makefile']
if iswindows:
qmc += ['-spec', 'win32-msvc2008']
subprocess.check_call(qmc + [ext.name+'.pro'])
subprocess.check_call([make, '-f', 'Makefile'])
self.check_call(qmc + [ext.name+'.pro'])
self.check_call([make, '-f', 'Makefile'])
objects = glob.glob(obj_pat)
return list(map(self.a, objects))
@ -407,7 +421,7 @@ class Build(Command):
cmd = [pyqt.sip_bin+exe, '-w', '-c', src_dir, '-b', sbf, '-I'+\
pyqt.pyqt_sip_dir] + shlex.split(pyqt.pyqt_sip_flags) + [sipf]
self.info(' '.join(cmd))
subprocess.check_call(cmd)
self.check_call(cmd)
module = self.j(src_dir, self.b(dest))
if self.newer(dest, [sbf]+qt_objects):
mf = self.j(src_dir, 'Makefile')
@ -417,7 +431,7 @@ class Build(Command):
makefile.extra_include_dirs = ext.inc_dirs
makefile.generate()
subprocess.check_call([make, '-f', mf], cwd=src_dir)
self.check_call([make, '-f', mf], cwd=src_dir)
shutil.copy2(module, dest)
def clean(self):
@ -457,7 +471,7 @@ class BuildPDF2XML(Command):
cmd += ['-I'+x for x in poppler_inc_dirs+magick_inc_dirs]
cmd += ['/Fo'+obj, src]
self.info(*cmd)
subprocess.check_call(cmd)
self.check_call(cmd)
objects.append(obj)
if self.newer(dest, objects):
@ -470,7 +484,7 @@ class BuildPDF2XML(Command):
png_libs+magick_libs+poppler_libs+ft_libs+jpg_libs+pdfreflow_libs]
cmd += ['/OUT:'+dest] + objects
self.info(*cmd)
subprocess.check_call(cmd)
self.check_call(cmd)
self.info('Binary installed as', dest)

View File

@ -20,17 +20,23 @@ for x in [
EXCLUDES.extend(['--exclude', x])
SAFE_EXCLUDES = ['"%s"'%x if '*' in x else x for x in EXCLUDES]
def get_rsync_pw():
return open('/home/kovid/work/kde/conf/buildbot').read().partition(
':')[-1].strip()
class Rsync(Command):
description = 'Sync source tree from development machine'
SYNC_CMD = ' '.join(BASE_RSYNC+SAFE_EXCLUDES+
['rsync://{host}/work/{project}', '..'])
['rsync://buildbot@{host}/work/{project}', '..'])
def run(self, opts):
cmd = self.SYNC_CMD.format(host=HOST, project=PROJECT)
env = dict(os.environ)
env['RSYNC_PASSWORD'] = get_rsync_pw()
self.info(cmd)
subprocess.check_call(cmd, shell=True)
subprocess.check_call(cmd, shell=True, env=env)
class Push(Command):
@ -81,7 +87,8 @@ class VMInstaller(Command):
def get_build_script(self):
ans = '\n'.join(self.BUILD_PREFIX)+'\n\n'
rs = ['export RSYNC_PASSWORD=%s'%get_rsync_pw()]
ans = '\n'.join(self.BUILD_PREFIX + rs)+'\n\n'
ans += ' && \\\n'.join(self.BUILD_RSYNC) + ' && \\\n'
ans += ' && \\\n'.join(self.BUILD_CLEAN) + ' && \\\n'
ans += ' && \\\n'.join(self.BUILD_BUILD) + ' && \\\n'

File diff suppressed because it is too large Load Diff

View File

@ -206,7 +206,7 @@ class Resources(Command):
function_dict = {}
import inspect
from calibre.utils.formatter_functions import formatter_functions
for obj in formatter_functions.get_builtins().values():
for obj in formatter_functions().get_builtins().values():
eval_func = inspect.getmembers(obj,
lambda x: inspect.ismethod(x) and x.__name__ == 'evaluate')
try:

View File

@ -4,7 +4,7 @@ __license__ = 'GPL v3'
__copyright__ = '2008, Kovid Goyal kovid@kovidgoyal.net'
__docformat__ = 'restructuredtext en'
__appname__ = u'calibre'
numeric_version = (0, 8, 21)
numeric_version = (0, 8, 22)
__version__ = u'.'.join(map(unicode, numeric_version))
__author__ = u"Kovid Goyal <kovid@kovidgoyal.net>"

View File

@ -502,6 +502,7 @@ class TXTZMetadataWriter(MetadataWriterPlugin):
# }}}
from calibre.ebooks.comic.input import ComicInput
from calibre.ebooks.djvu.input import DJVUInput
from calibre.ebooks.epub.input import EPUBInput
from calibre.ebooks.fb2.input import FB2Input
from calibre.ebooks.html.input import HTMLInput
@ -555,7 +556,8 @@ from calibre.devices.irexdr.driver import IREXDR1000, IREXDR800
from calibre.devices.jetbook.driver import JETBOOK, MIBUK, JETBOOK_MINI
from calibre.devices.kindle.driver import KINDLE, KINDLE2, KINDLE_DX
from calibre.devices.nook.driver import NOOK, NOOK_COLOR
from calibre.devices.prs505.driver import PRS505, PRST1
from calibre.devices.prs505.driver import PRS505
from calibre.devices.prst1.driver import PRST1
from calibre.devices.user_defined.driver import USER_DEFINED
from calibre.devices.android.driver import ANDROID, S60, WEBOS
from calibre.devices.nokia.driver import N770, N810, E71X, E52
@ -599,6 +601,7 @@ plugins += [GoogleBooks, Amazon, OpenLibrary, ISBNDB, OverDrive, Douban, Ozon]
plugins += [
ComicInput,
DJVUInput,
EPUBInput,
FB2Input,
HTMLInput,

View File

@ -4,7 +4,6 @@ __license__ = 'GPL 3'
__copyright__ = '2009, Kovid Goyal <kovid@kovidgoyal.net>'
__docformat__ = 'restructuredtext en'
import sys
from itertools import izip
from xml.sax.saxutils import escape

View File

@ -49,6 +49,15 @@ class ANDROID(USBMS):
0x7086 : [0x0226], 0x70a8: [0x9999], 0x42c4 : [0x216],
0x70c6 : [0x226]
},
# Freescale
0x15a2 : {
0x0c01 : [0x226]
},
# Alcatel
0x05c6 : {
0x9018 : [0x0226],
},
# Sony Ericsson
0xfce : {
@ -62,7 +71,8 @@ class ANDROID(USBMS):
0x4e11 : [0x0100, 0x226, 0x227],
0x4e12 : [0x0100, 0x226, 0x227],
0x4e21 : [0x0100, 0x226, 0x227],
0xb058 : [0x0222, 0x226, 0x227]
0xb058 : [0x0222, 0x226, 0x227],
0x0ff9 : [0x0226],
},
# Samsung
@ -138,7 +148,8 @@ class ANDROID(USBMS):
VENDOR_NAME = ['HTC', 'MOTOROLA', 'GOOGLE_', 'ANDROID', 'ACER',
'GT-I5700', 'SAMSUNG', 'DELL', 'LINUX', 'GOOGLE', 'ARCHOS',
'TELECHIP', 'HUAWEI', 'T-MOBILE', 'SEMC', 'LGE', 'NVIDIA',
'GENERIC-', 'ZTE', 'MID', 'QUALCOMM', 'PANDIGIT', 'HYSTON', 'VIZIO']
'GENERIC-', 'ZTE', 'MID', 'QUALCOMM', 'PANDIGIT', 'HYSTON',
'VIZIO', 'GOOGLE', 'FREESCAL']
WINDOWS_MAIN_MEM = ['ANDROID_PHONE', 'A855', 'A853', 'INC.NEXUS_ONE',
'__UMS_COMPOSITE', '_MB200', 'MASS_STORAGE', '_-_CARD', 'SGH-I897',
'GT-I9000', 'FILE-STOR_GADGET', 'SGH-T959', 'SAMSUNG_ANDROID',
@ -149,7 +160,7 @@ class ANDROID(USBMS):
'MB860', 'MULTI-CARD', 'MID7015A', 'INCREDIBLE', 'A7EB', 'STREAK',
'MB525', 'ANDROID2.3', 'SGH-I997', 'GT-I5800_CARD', 'MB612',
'GT-S5830_CARD', 'GT-S5570_CARD', 'MB870', 'MID7015A',
'ALPANDIGITAL', 'ANDROID_MID', 'VTAB1008']
'ALPANDIGITAL', 'ANDROID_MID', 'VTAB1008', 'EMX51_BBG_ANDROI']
WINDOWS_CARD_A_MEM = ['ANDROID_PHONE', 'GT-I9000_CARD', 'SGH-I897',
'FILE-STOR_GADGET', 'SGH-T959', 'SAMSUNG_ANDROID', 'GT-P1000_CARD',
'A70S', 'A101IT', '7', 'INCREDIBLE', 'A7EB', 'SGH-T849_CARD',

View File

@ -116,6 +116,7 @@ class BOOX(HANLINV3):
supported_platforms = ['windows', 'osx', 'linux']
METADATA_CACHE = '.metadata.calibre'
DRIVEINFO = '.driveinfo.calibre'
icon = I('devices/boox.jpg')
# Ordered list of supported formats
FORMATS = ['epub', 'fb2', 'djvu', 'pdf', 'html', 'txt', 'rtf', 'mobi',
@ -123,7 +124,7 @@ class BOOX(HANLINV3):
VENDOR_ID = [0x0525]
PRODUCT_ID = [0xa4a5]
BCD = [0x322]
BCD = [0x322, 0x323]
MAIN_MEMORY_VOLUME_LABEL = 'BOOX Internal Memory'
STORAGE_CARD_VOLUME_LABEL = 'BOOX Storage Card'

View File

@ -62,7 +62,7 @@ class DevicePlugin(Plugin):
#: Icon for this device
icon = I('reader.png')
# Used by gui2.ui:annotations_fetched() and devices.kindle.driver:get_annotations()
# Encapsulates an annotation fetched from the device
UserAnnotation = namedtuple('Annotation','type, value')
#: GUI displays this as a message if not None. Useful if opening can take a
@ -217,7 +217,7 @@ class DevicePlugin(Plugin):
'''
Unix version of :meth:`can_handle_windows`
:param device_info: Is a tupe of (vid, pid, bcd, manufacturer, product,
:param device_info: Is a tuple of (vid, pid, bcd, manufacturer, product,
serial number)
'''
@ -464,6 +464,13 @@ class DevicePlugin(Plugin):
'''
pass
def prepare_addable_books(self, paths):
'''
Given a list of paths, returns another list of paths. These paths
point to addable versions of the books.
'''
return paths
class BookList(list):
'''
A list of books. Each Book object must have the fields

View File

@ -13,6 +13,8 @@ import datetime, os, re, sys, json, hashlib
from calibre.devices.kindle.apnx import APNXBuilder
from calibre.devices.kindle.bookmark import Bookmark
from calibre.devices.usbms.driver import USBMS
from calibre.ebooks.metadata import MetaInformation
from calibre import strftime
'''
Notes on collections:
@ -164,6 +166,121 @@ class KINDLE(USBMS):
# This returns as job.result in gui2.ui.annotations_fetched(self,job)
return bookmarked_books
def generate_annotation_html(self, bookmark):
from calibre.ebooks.BeautifulSoup import BeautifulSoup, Tag, NavigableString
# Returns <div class="user_annotations"> ... </div>
last_read_location = bookmark.last_read_location
timestamp = datetime.datetime.utcfromtimestamp(bookmark.timestamp)
percent_read = bookmark.percent_read
ka_soup = BeautifulSoup()
dtc = 0
divTag = Tag(ka_soup,'div')
divTag['class'] = 'user_annotations'
# Add the last-read location
spanTag = Tag(ka_soup, 'span')
spanTag['style'] = 'font-weight:bold'
if bookmark.book_format == 'pdf':
spanTag.insert(0,NavigableString(
_("%(time)s<br />Last Page Read: %(loc)d (%(pr)d%%)") % \
dict(time=strftime(u'%x', timestamp.timetuple()),
loc=last_read_location,
pr=percent_read)))
else:
spanTag.insert(0,NavigableString(
_("%(time)s<br />Last Page Read: Location %(loc)d (%(pr)d%%)") % \
dict(time=strftime(u'%x', timestamp.timetuple()),
loc=last_read_location,
pr=percent_read)))
divTag.insert(dtc, spanTag)
dtc += 1
divTag.insert(dtc, Tag(ka_soup,'br'))
dtc += 1
if bookmark.user_notes:
user_notes = bookmark.user_notes
annotations = []
# Add the annotations sorted by location
# Italicize highlighted text
for location in sorted(user_notes):
if user_notes[location]['text']:
annotations.append(
_('<b>Location %(dl)d &bull; %(typ)s</b><br />%(text)s<br />') % \
dict(dl=user_notes[location]['displayed_location'],
typ=user_notes[location]['type'],
text=(user_notes[location]['text'] if \
user_notes[location]['type'] == 'Note' else \
'<i>%s</i>' % user_notes[location]['text'])))
else:
if bookmark.book_format == 'pdf':
annotations.append(
_('<b>Page %(dl)d &bull; %(typ)s</b><br />') % \
dict(dl=user_notes[location]['displayed_location'],
typ=user_notes[location]['type']))
else:
annotations.append(
_('<b>Location %(dl)d &bull; %(typ)s</b><br />') % \
dict(dl=user_notes[location]['displayed_location'],
typ=user_notes[location]['type']))
for annotation in annotations:
divTag.insert(dtc, annotation)
dtc += 1
ka_soup.insert(0,divTag)
return ka_soup
def add_annotation_to_library(self, db, db_id, annotation):
from calibre.ebooks.BeautifulSoup import Tag
bm = annotation
ignore_tags = set(['Catalog', 'Clippings'])
if bm.type == 'kindle_bookmark':
mi = db.get_metadata(db_id, index_is_id=True)
user_notes_soup = self.generate_annotation_html(bm.value)
if mi.comments:
a_offset = mi.comments.find('<div class="user_annotations">')
ad_offset = mi.comments.find('<hr class="annotations_divider" />')
if a_offset >= 0:
mi.comments = mi.comments[:a_offset]
if ad_offset >= 0:
mi.comments = mi.comments[:ad_offset]
if set(mi.tags).intersection(ignore_tags):
return
if mi.comments:
hrTag = Tag(user_notes_soup,'hr')
hrTag['class'] = 'annotations_divider'
user_notes_soup.insert(0, hrTag)
mi.comments += unicode(user_notes_soup.prettify())
else:
mi.comments = unicode(user_notes_soup.prettify())
# Update library comments
db.set_comment(db_id, mi.comments)
# Add bookmark file to db_id
db.add_format_with_hooks(db_id, bm.value.bookmark_extension,
bm.value.path, index_is_id=True)
elif bm.type == 'kindle_clippings':
# Find 'My Clippings' author=Kindle in database, or add
last_update = 'Last modified %s' % strftime(u'%x %X',bm.value['timestamp'].timetuple())
mc_id = list(db.data.search_getting_ids('title:"My Clippings"', ''))
if mc_id:
db.add_format_with_hooks(mc_id[0], 'TXT', bm.value['path'],
index_is_id=True)
mi = db.get_metadata(mc_id[0], index_is_id=True)
mi.comments = last_update
db.set_metadata(mc_id[0], mi)
else:
mi = MetaInformation('My Clippings', authors = ['Kindle'])
mi.tags = ['Clippings']
mi.comments = last_update
db.add_books([bm.value['path']], ['txt'], [mi])
class KINDLE2(KINDLE):

View File

@ -16,6 +16,7 @@ from calibre.devices.usbms.driver import USBMS, debug_print
from calibre import prints
from calibre.devices.usbms.books import CollectionsBookList
from calibre.utils.magick.draw import save_cover_data_to
from calibre.ptempfile import PersistentTemporaryFile
class KOBO(USBMS):
@ -76,6 +77,11 @@ class KOBO(USBMS):
self.book_class = Book
self.dbversion = 7
def create_annotations_path(self, mdata, device_path=None):
if device_path:
return device_path
return USBMS.create_annotations_path(self, mdata)
def books(self, oncard=None, end_session=True):
from calibre.ebooks.metadata.meta import path_to_ext
@ -750,9 +756,12 @@ class KOBO(USBMS):
blists = {}
for i in paths:
if booklists[i] is not None:
#debug_print('Booklist: ', i)
blists[i] = booklists[i]
try:
if booklists[i] is not None:
#debug_print('Booklist: ', i)
blists[i] = booklists[i]
except IndexError:
pass
opts = self.settings()
if opts.extra_customization:
collections = [x.lower().strip() for x in
@ -865,3 +874,21 @@ class KOBO(USBMS):
else:
debug_print("ImageID could not be retreived from the database")
def prepare_addable_books(self, paths):
'''
The Kobo supports an encrypted epub refered to as a kepub
Unfortunately Kobo decided to put the files on the device
with no file extension. I just hope that decision causes
them as much grief as it does me :-)
This has to make a temporary copy of the book files with a
epub extension to allow Calibre's normal processing to
deal with the file appropriately
'''
for idx, path in enumerate(paths):
if path.find('kepub') >= 0:
with closing(open(path)) as r:
tf = PersistentTemporaryFile(suffix='.epub')
tf.write(r.read())
paths[idx] = tf.name
return paths

View File

@ -84,7 +84,7 @@ class PDNOVEL(USBMS):
FORMATS = ['epub', 'pdf']
VENDOR_ID = [0x18d1]
PRODUCT_ID = [0xb004]
PRODUCT_ID = [0xb004, 0xa004]
BCD = [0x224]
VENDOR_NAME = 'ANDROID'

View File

@ -207,8 +207,11 @@ class PRS505(USBMS):
c = self.initialize_XML_cache()
blists = {}
for i in c.paths:
if booklists[i] is not None:
blists[i] = booklists[i]
try:
if booklists[i] is not None:
blists[i] = booklists[i]
except IndexError:
pass
opts = self.settings()
if opts.extra_customization:
collections = [x.strip() for x in
@ -299,40 +302,3 @@ class PRS505(USBMS):
f.write(metadata.thumbnail[-1])
debug_print('Cover uploaded to: %r'%cpath)
class PRST1(USBMS):
name = 'SONY PRST1 and newer Device Interface'
gui_name = 'SONY Reader'
description = _('Communicate with Sony PRST1 and newer eBook readers')
author = 'Kovid Goyal'
supported_platforms = ['windows', 'osx', 'linux']
FORMATS = ['epub', 'lrf', 'lrx', 'rtf', 'pdf', 'txt']
VENDOR_ID = [0x054c] #: SONY Vendor Id
PRODUCT_ID = [0x05c2]
BCD = [0x226]
VENDOR_NAME = 'SONY'
WINDOWS_MAIN_MEM = re.compile(
r'(PRS-T1&)'
)
THUMBNAIL_HEIGHT = 217
SCAN_FROM_ROOT = True
EBOOK_DIR_MAIN = __appname__
SUPPORTS_SUB_DIRS = True
def windows_filter_pnp_id(self, pnp_id):
return '_LAUNCHER' in pnp_id or '_SETTING' in pnp_id
def get_carda_ebook_dir(self, for_upload=False):
if for_upload:
return __appname__
return self.EBOOK_DIR_CARD_A
def get_main_ebook_dir(self, for_upload=False):
if for_upload:
return __appname__
return ''

View File

@ -0,0 +1,7 @@
#!/usr/bin/env python
# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai
__license__ = 'GPL v3'
__copyright__ = '2011, Kovid Goyal <kovid@kovidgoyal.net>'
__docformat__ = 'restructuredtext en'

View File

@ -0,0 +1,573 @@
#!/usr/bin/env python
# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai
from __future__ import (unicode_literals, division, absolute_import,
print_function)
__license__ = 'GPL v3'
__copyright__ = '2011, Kovid Goyal <kovid@kovidgoyal.net>'
__docformat__ = 'restructuredtext en'
'''
Device driver for the SONY T1 devices
'''
import os, time, re
import sqlite3 as sqlite
from contextlib import closing
from datetime import date
from calibre.devices.usbms.driver import USBMS, debug_print
from calibre.devices.usbms.device import USBDevice
from calibre.devices.usbms.books import CollectionsBookList
from calibre.devices.usbms.books import BookList
from calibre.ebooks.metadata import authors_to_sort_string
from calibre.constants import islinux
from calibre.ebooks.metadata import authors_to_string, authors_to_sort_string
DBPATH = 'Sony_Reader/database/books.db'
THUMBPATH = 'Sony_Reader/database/cache/books/%s/thumbnail/main_thumbnail.jpg'
class ImageWrapper(object):
def __init__(self, image_path):
self.image_path = image_path
class PRST1(USBMS):
name = 'SONY PRST1 and newer Device Interface'
gui_name = 'SONY Reader'
description = _('Communicate with the PRST1 and newer SONY eBook readers')
author = 'Kovid Goyal'
supported_platforms = ['windows', 'osx', 'linux']
path_sep = '/'
booklist_class = CollectionsBookList
FORMATS = ['epub', 'pdf', 'txt']
CAN_SET_METADATA = ['collections']
CAN_DO_DEVICE_DB_PLUGBOARD = True
VENDOR_ID = [0x054c] #: SONY Vendor Id
PRODUCT_ID = [0x05c2]
BCD = [0x226]
VENDOR_NAME = 'SONY'
WINDOWS_MAIN_MEM = re.compile(
r'(PRS-T1&)'
)
WINDOWS_CARD_A_MEM = re.compile(
r'(PRS-T1__SD&)'
)
MAIN_MEMORY_VOLUME_LABEL = 'SONY Reader Main Memory'
STORAGE_CARD_VOLUME_LABEL = 'SONY Reader Storage Card'
THUMBNAIL_HEIGHT = 144
SUPPORTS_SUB_DIRS = True
SUPPORTS_USE_AUTHOR_SORT = True
MUST_READ_METADATA = True
EBOOK_DIR_MAIN = 'Sony_Reader/media/books'
EXTRA_CUSTOMIZATION_MESSAGE = [
_('Comma separated list of metadata fields '
'to turn into collections on the device. Possibilities include: ')+\
'series, tags, authors',
_('Upload separate cover thumbnails for books') +
':::'+_('Normally, the SONY readers get the cover image from the'
' ebook file itself. With this option, calibre will send a '
'separate cover image to the reader, useful if you are '
'sending DRMed books in which you cannot change the cover.'),
_('Refresh separate covers when using automatic management') +
':::' +
_('Set this option to have separate book covers uploaded '
'every time you connect your device. Unset this option if '
'you have so many books on the reader that performance is '
'unacceptable.'),
_('Preserve cover aspect ratio when building thumbnails') +
':::' +
_('Set this option if you want the cover thumbnails to have '
'the same aspect ratio (width to height) as the cover. '
'Unset it if you want the thumbnail to be the maximum size, '
'ignoring aspect ratio.'),
_('Use SONY Author Format (First Author Only)') +
':::' +
_('Set this option if you want the author on the Sony to '
'appear the same way the T1 sets it. This means it will '
'only show the first author for books with multiple authors. '
'Leave this disabled if you use Metadata Plugboards.')
]
EXTRA_CUSTOMIZATION_DEFAULT = [
', '.join(['series', 'tags']),
True,
False,
True,
False,
]
OPT_COLLECTIONS = 0
OPT_UPLOAD_COVERS = 1
OPT_REFRESH_COVERS = 2
OPT_PRESERVE_ASPECT_RATIO = 3
OPT_USE_SONY_AUTHORS = 4
plugboards = None
plugboard_func = None
def post_open_callback(self):
# Set the thumbnail width to the theoretical max if the user has asked
# that we do not preserve aspect ratio
if not self.settings().extra_customization[self.OPT_PRESERVE_ASPECT_RATIO]:
self.THUMBNAIL_WIDTH = 108
# Make sure the date offset is set to none, we'll calculate it in books.
self.device_offset = None
def windows_filter_pnp_id(self, pnp_id):
return '_LAUNCHER' in pnp_id or '_SETTING' in pnp_id
def get_carda_ebook_dir(self, for_upload=False):
if for_upload:
return self.EBOOK_DIR_MAIN
return self.EBOOK_DIR_CARD_A
def get_main_ebook_dir(self, for_upload=False):
if for_upload:
return self.EBOOK_DIR_MAIN
return ''
def can_handle(self, devinfo, debug=False):
if islinux:
dev = USBDevice(devinfo)
main, carda, cardb = self.find_device_nodes(detected_device=dev)
if main is None and carda is None and cardb is None:
if debug:
print ('\tPRS-T1: Appears to be in non data mode'
' or was ejected, ignoring')
return False
return True
def books(self, oncard=None, end_session=True):
dummy_bl = BookList(None, None, None)
if (
(oncard == 'carda' and not self._card_a_prefix) or
(oncard and oncard != 'carda')
):
self.report_progress(1.0, _('Getting list of books on device...'))
return dummy_bl
prefix = self._card_a_prefix if oncard == 'carda' else self._main_prefix
# Let parent driver get the books
self.booklist_class.rebuild_collections = self.rebuild_collections
bl = USBMS.books(self, oncard=oncard, end_session=end_session)
dbpath = self.normalize_path(prefix + DBPATH)
debug_print("SQLite DB Path: " + dbpath)
with closing(sqlite.connect(dbpath)) as connection:
# Replace undecodable characters in the db instead of erroring out
connection.text_factory = lambda x: unicode(x, "utf-8", "replace")
cursor = connection.cursor()
# Query collections
query = '''
SELECT books._id, collection.title
FROM collections
LEFT OUTER JOIN books
LEFT OUTER JOIN collection
WHERE collections.content_id = books._id AND
collections.collection_id = collection._id
'''
cursor.execute(query)
bl_collections = {}
for i, row in enumerate(cursor):
bl_collections.setdefault(row[0], [])
bl_collections[row[0]].append(row[1])
# collect information on offsets, but assume any
# offset we already calculated is correct
if self.device_offset is None:
query = 'SELECT file_path, modified_date FROM books'
cursor.execute(query)
time_offsets = {}
for i, row in enumerate(cursor):
comp_date = int(os.path.getmtime(self.normalize_path(prefix + row[0])) * 1000);
device_date = int(row[1]);
offset = device_date - comp_date
time_offsets.setdefault(offset, 0)
time_offsets[offset] = time_offsets[offset] + 1
try:
device_offset = max(time_offsets,key = lambda a: time_offsets.get(a))
debug_print("Device Offset: %d ms"%device_offset)
self.device_offset = device_offset
except ValueError:
debug_print("No Books To Detect Device Offset.")
for idx, book in enumerate(bl):
query = 'SELECT _id, thumbnail FROM books WHERE file_path = ?'
t = (book.lpath,)
cursor.execute (query, t)
for i, row in enumerate(cursor):
book.device_collections = bl_collections.get(row[0], None)
thumbnail = row[1]
if thumbnail is not None:
thumbnail = self.normalize_path(prefix + thumbnail)
book.thumbnail = ImageWrapper(thumbnail)
cursor.close()
return bl
def set_plugboards(self, plugboards, pb_func):
self.plugboards = plugboards
self.plugboard_func = pb_func
def sync_booklists(self, booklists, end_session=True):
debug_print('PRST1: starting sync_booklists')
opts = self.settings()
if opts.extra_customization:
collections = [x.strip() for x in
opts.extra_customization[self.OPT_COLLECTIONS].split(',')]
else:
collections = []
debug_print('PRST1: collection fields:', collections)
if booklists[0] is not None:
self.update_device_database(booklists[0], collections, None)
if booklists[1] is not None:
self.update_device_database(booklists[1], collections, 'carda')
USBMS.sync_booklists(self, booklists, end_session=end_session)
debug_print('PRST1: finished sync_booklists')
def update_device_database(self, booklist, collections_attributes, oncard):
debug_print('PRST1: starting update_device_database')
plugboard = None
if self.plugboard_func:
plugboard = self.plugboard_func(self.__class__.__name__,
'device_db', self.plugboards)
debug_print("PRST1: Using Plugboard", plugboard)
prefix = self._card_a_prefix if oncard == 'carda' else self._main_prefix
if prefix is None:
# Reader has no sd card inserted
return
source_id = 1 if oncard == 'carda' else 0
dbpath = self.normalize_path(prefix + DBPATH)
debug_print("SQLite DB Path: " + dbpath)
collections = booklist.get_collections(collections_attributes)
with closing(sqlite.connect(dbpath)) as connection:
self.update_device_books(connection, booklist, source_id, plugboard)
self.update_device_collections(connection, booklist, collections, source_id)
debug_print('PRST1: finished update_device_database')
def update_device_books(self, connection, booklist, source_id, plugboard):
opts = self.settings()
upload_covers = opts.extra_customization[self.OPT_UPLOAD_COVERS]
refresh_covers = opts.extra_customization[self.OPT_REFRESH_COVERS]
use_sony_authors = opts.extra_customization[self.OPT_USE_SONY_AUTHORS]
cursor = connection.cursor()
# Get existing books
query = 'SELECT file_path, _id FROM books'
cursor.execute(query)
db_books = {}
for i, row in enumerate(cursor):
lpath = row[0].replace('\\', '/')
db_books[lpath] = row[1]
for book in booklist:
# Run through plugboard if needed
if plugboard is not None:
newmi = book.deepcopy_metadata()
newmi.template_to_attribute(book, plugboard)
else:
newmi = book
# Get Metadata We Want
lpath = book.lpath
try:
if opts.use_author_sort:
if newmi.author_sort:
author = newmi.author_sort
else:
author = authors_to_sort_string(newmi.authors)
else:
if use_sony_authors:
author = newmi.authors[0]
else:
author = authors_to_string(newmi.authors)
except:
author = _('Unknown')
title = newmi.title or _('Unknown')
# Get modified date
modified_date = os.path.getmtime(book.path) * 1000
if self.device_offset is not None:
modified_date = modified_date + self.device_offset
else:
time_offset = -time.altzone if time.daylight else -time.timezone
modified_date = modified_date + (time_offset * 1000)
if lpath not in db_books:
query = '''
INSERT INTO books
(title, author, source_id, added_date, modified_date,
file_path, file_name, file_size, mime_type, corrupted,
prevent_delete)
values (?,?,?,?,?,?,?,?,?,0,0)
'''
t = (title, author, source_id, int(time.time() * 1000),
modified_date, lpath,
os.path.basename(lpath), book.size, book.mime)
cursor.execute(query, t)
book.bookId = cursor.lastrowid
if upload_covers:
self.upload_book_cover(connection, book, source_id)
debug_print('Inserted New Book: ' + book.title)
else:
query = '''
UPDATE books
SET title = ?, author = ?, modified_date = ?, file_size = ?
WHERE file_path = ?
'''
t = (title, author, modified_date, book.size, lpath)
cursor.execute(query, t)
book.bookId = db_books[lpath]
if refresh_covers:
self.upload_book_cover(connection, book, source_id)
db_books[lpath] = None
if self.is_sony_periodical(book):
self.periodicalize_book(connection, book)
for book, bookId in db_books.items():
if bookId is not None:
# Remove From Collections
query = 'DELETE FROM collections WHERE content_id = ?'
t = (bookId,)
cursor.execute(query, t)
# Remove from Books
query = 'DELETE FROM books where _id = ?'
t = (bookId,)
cursor.execute(query, t)
debug_print('Deleted Book:' + book)
connection.commit()
cursor.close()
def update_device_collections(self, connection, booklist, collections,
source_id):
cursor = connection.cursor()
if collections:
# Get existing collections
query = 'SELECT _id, title FROM collection'
cursor.execute(query)
db_collections = {}
for i, row in enumerate(cursor):
db_collections[row[1]] = row[0]
for collection, books in collections.items():
if collection not in db_collections:
query = 'INSERT INTO collection (title, source_id) VALUES (?,?)'
t = (collection, source_id)
cursor.execute(query, t)
db_collections[collection] = cursor.lastrowid
debug_print('Inserted New Collection: ' + collection)
# Get existing books in collection
query = '''
SELECT books.file_path, content_id
FROM collections
LEFT OUTER JOIN books
WHERE collection_id = ? AND books._id = collections.content_id
'''
t = (db_collections[collection],)
cursor.execute(query, t)
db_books = {}
for i, row in enumerate(cursor):
db_books[row[0]] = row[1]
for idx, book in enumerate(books):
if collection not in book.device_collections:
book.device_collections.append(collection)
if db_books.get(book.lpath, None) is None:
query = '''
INSERT INTO collections (collection_id, content_id,
added_order) values (?,?,?)
'''
t = (db_collections[collection], book.bookId, idx)
cursor.execute(query, t)
debug_print('Inserted Book Into Collection: ' +
book.title + ' -> ' + collection)
else:
query = '''
UPDATE collections
SET added_order = ?
WHERE content_id = ? AND collection_id = ?
'''
t = (idx, book.bookId, db_collections[collection])
cursor.execute(query, t)
db_books[book.lpath] = None
for bookPath, bookId in db_books.items():
if bookId is not None:
query = ('DELETE FROM collections '
'WHERE content_id = ? AND collection_id = ? ')
t = (bookId, db_collections[collection],)
cursor.execute(query, t)
debug_print('Deleted Book From Collection: ' + bookPath
+ ' -> ' + collection)
db_collections[collection] = None
for collection, collectionId in db_collections.items():
if collectionId is not None:
# Remove Books from Collection
query = ('DELETE FROM collections '
'WHERE collection_id = ?')
t = (collectionId,)
cursor.execute(query, t)
# Remove Collection
query = ('DELETE FROM collection '
'WHERE _id = ?')
t = (collectionId,)
cursor.execute(query, t)
debug_print('Deleted Collection: ' + collection)
connection.commit()
cursor.close()
def rebuild_collections(self, booklist, oncard):
debug_print('PRST1: starting rebuild_collections')
opts = self.settings()
if opts.extra_customization:
collections = [x.strip() for x in
opts.extra_customization[self.OPT_COLLECTIONS].split(',')]
else:
collections = []
debug_print('PRST1: collection fields:', collections)
self.update_device_database(booklist, collections, oncard)
debug_print('PRS-T1: finished rebuild_collections')
def upload_cover(self, path, filename, metadata, filepath):
debug_print('PRS-T1: uploading cover')
if filepath.startswith(self._main_prefix):
prefix = self._main_prefix
source_id = 0
else:
prefix = self._card_a_prefix
source_id = 1
metadata.lpath = filepath.partition(prefix)[2]
metadata.lpath = metadata.lpath.replace('\\', '/')
dbpath = self.normalize_path(prefix + DBPATH)
debug_print("SQLite DB Path: " + dbpath)
with closing(sqlite.connect(dbpath)) as connection:
cursor = connection.cursor()
query = 'SELECT _id FROM books WHERE file_path = ?'
t = (metadata.lpath,)
cursor.execute(query, t)
for i, row in enumerate(cursor):
metadata.bookId = row[0]
cursor.close()
if getattr(metadata, 'bookId', None) is not None:
debug_print('PRS-T1: refreshing cover for book being sent')
self.upload_book_cover(connection, metadata, source_id)
debug_print('PRS-T1: done uploading cover')
def upload_book_cover(self, connection, book, source_id):
debug_print('PRST1: Uploading/Refreshing Cover for ' + book.title)
if not book.thumbnail or not book.thumbnail[-1]:
return
cursor = connection.cursor()
thumbnail_path = THUMBPATH%book.bookId
prefix = self._main_prefix if source_id is 0 else self._card_a_prefix
thumbnail_file_path = os.path.join(prefix, *thumbnail_path.split('/'))
thumbnail_dir_path = os.path.dirname(thumbnail_file_path)
if not os.path.exists(thumbnail_dir_path):
os.makedirs(thumbnail_dir_path)
with open(thumbnail_file_path, 'wb') as f:
f.write(book.thumbnail[-1])
query = 'UPDATE books SET thumbnail = ? WHERE _id = ?'
t = (thumbnail_path, book.bookId,)
cursor.execute(query, t)
connection.commit()
cursor.close()
def is_sony_periodical(self, book):
if _('News') not in book.tags:
return False
if not book.lpath.lower().endswith('.epub'):
return False
if book.pubdate.date() < date(2010, 10, 17):
return False
return True
def periodicalize_book(self, connection, book):
if not self.is_sony_periodical(book):
return
name = None
if '[' in book.title:
name = book.title.split('[')[0].strip()
if len(name) < 4:
name = None
if not name:
try:
name = [t for t in book.tags if t != _('News')][0]
except:
name = None
if not name:
name = book.title
pubdate = None
try:
pubdate = int(time.mktime(book.pubdate.timetuple()) * 1000)
except:
pass
cursor = connection.cursor()
query = '''
UPDATE books
SET conforms_to = 'http://xmlns.sony.net/e-book/prs/periodicals/1.0/newspaper/1.0',
periodical_name = ?,
description = ?,
publication_date = ?
WHERE _id = ?
'''
t = (name, None, pubdate, book.bookId,)
cursor.execute(query, t)
connection.commit()
cursor.close()

View File

@ -483,7 +483,7 @@ class Device(DeviceConfig, DevicePlugin):
self._card_a_prefix = get_card_prefix('carda')
self._card_b_prefix = get_card_prefix('cardb')
def find_device_nodes(self):
def find_device_nodes(self, detected_device=None):
def walk(base):
base = os.path.abspath(os.path.realpath(base))
@ -507,8 +507,11 @@ class Device(DeviceConfig, DevicePlugin):
d, j = os.path.dirname, os.path.join
usb_dir = None
if detected_device is None:
detected_device = self.detected_device
def test(val, attr):
q = getattr(self.detected_device, attr)
q = getattr(detected_device, attr)
return q == val
for x, isfile in walk('/sys/devices'):
@ -596,6 +599,8 @@ class Device(DeviceConfig, DevicePlugin):
label = self.STORAGE_CARD2_VOLUME_LABEL
if not label:
label = self.STORAGE_CARD_VOLUME_LABEL + ' 2'
if not label:
label = 'E-book Reader (%s)'%type
extra = 0
while True:
q = ' (%d)'%extra if extra else ''
@ -1063,6 +1068,12 @@ class Device(DeviceConfig, DevicePlugin):
'''
return {}
def add_annotation_to_library(self, db, db_id, annotation):
'''
Add an annotation to the calibre library
'''
pass
def create_upload_path(self, path, mdata, fname, create_dirs=True):
path = os.path.abspath(path)
maxlen = self.MAX_PATH_LEN
@ -1142,3 +1153,6 @@ class Device(DeviceConfig, DevicePlugin):
os.makedirs(filedir)
return filepath
def create_annotations_path(self, mdata, device_path=None):
return self.create_upload_path(os.path.abspath('/<storage>'), mdata, 'x.bookmark', create_dirs=False)

View File

@ -22,7 +22,7 @@ class CHMInput(InputFormatPlugin):
def _chmtohtml(self, output_dir, chm_path, no_images, log, debug_dump=False):
from calibre.ebooks.chm.reader import CHMReader
log.debug('Opening CHM file')
rdr = CHMReader(chm_path, log, self.opts)
rdr = CHMReader(chm_path, log, input_encoding=self.opts.input_encoding)
log.debug('Extracting CHM to %s' % output_dir)
rdr.extract_content(output_dir, debug_dump=debug_dump)
self._chm_reader = rdr

View File

@ -40,14 +40,14 @@ class CHMError(Exception):
pass
class CHMReader(CHMFile):
def __init__(self, input, log, opts):
def __init__(self, input, log, input_encoding=None):
CHMFile.__init__(self)
if isinstance(input, unicode):
input = input.encode(filesystem_encoding)
if not self.LoadCHM(input):
raise CHMError("Unable to open CHM file '%s'"%(input,))
self.log = log
self.opts = opts
self.input_encoding = input_encoding
self._sourcechm = input
self._contents = None
self._playorder = 0
@ -156,8 +156,8 @@ class CHMReader(CHMFile):
break
def _reformat(self, data, htmlpath):
if self.opts.input_encoding:
data = data.decode(self.opts.input_encoding)
if self.input_encoding:
data = data.decode(self.input_encoding)
try:
data = xml_to_unicode(data, strip_encoding_pats=True)[0]
soup = BeautifulSoup(data)

View File

@ -693,6 +693,8 @@ OptionRecommendation(name='sr3_replace',
def unarchive(self, path, tdir):
extract(path, tdir)
files = list(walk(tdir))
files = [f if isinstance(f, unicode) else f.decode(filesystem_encoding)
for f in files]
from calibre.customize.ui import available_input_formats
fmts = available_input_formats()
for x in ('htm', 'html', 'xhtm', 'xhtml'): fmts.remove(x)

View File

@ -0,0 +1,12 @@
#!/usr/bin/env python
from __future__ import (unicode_literals, division, absolute_import,
print_function)
__license__ = 'GPL v3'
__copyright__ = '2011, Anthon van der Neut <anthon@mnt.org>'
__docformat__ = 'restructuredtext en'
'''
Used for DJVU input
'''

View File

@ -0,0 +1,146 @@
#! /usr/bin/env python
# coding: utf-8
from __future__ import (unicode_literals, division, absolute_import,
print_function)
__license__ = 'GPL v3'
__copyright__ = '2011, Anthon van der Neut <A.van.der.Neut@ruamel.eu>'
# this code is based on:
# Lizardtech DjVu Reference
# DjVu v3
# November 2005
import sys
import struct
from cStringIO import StringIO
from .djvubzzdec import BZZDecoder
class DjvuChunk(object):
def __init__(self, buf, start, end, align=True, bigendian=True,
inclheader=False, verbose=0):
self.subtype = None
self._subchunks = []
self.buf = buf
pos = start + 4
self.type = buf[start:pos]
self.align = align # whether to align to word (2-byte) boundaries
self.headersize = 0 if inclheader else 8
if bigendian:
self.strflag = b'>'
else:
self.strflag = b'<'
oldpos, pos = pos, pos+4
self.size = struct.unpack(self.strflag+b'L', buf[oldpos:pos])[0]
self.dataend = pos + self.size - (8 if inclheader else 0)
if self.type == b'FORM':
oldpos, pos = pos, pos+4
#print oldpos, pos
self.subtype = buf[oldpos:pos]
#self.headersize += 4
self.datastart = pos
if verbose > 0:
print ('found', self.type, self.subtype, pos, self.size)
if self.type in b'FORM'.split():
if verbose > 0:
print ('processing substuff %d %d (%x)' % (pos, self.dataend,
self.dataend))
numchunks = 0
while pos < self.dataend:
x = DjvuChunk(buf, pos, start+self.size, verbose=verbose)
numchunks += 1
self._subchunks.append(x)
newpos = pos + x.size + x.headersize + (1 if (x.size % 2) else 0)
if verbose > 0:
print ('newpos %d %d (%x, %x) %d' % (newpos, self.dataend,
newpos, self.dataend, x.headersize))
pos = newpos
if verbose > 0:
print (' end of chunk %d (%x)' % (pos, pos))
def dump(self, verbose=0, indent=1, out=None, txtout=None, maxlevel=100):
if out:
out.write(b' ' * indent)
out.write(b'%s%s [%d]\n' % (self.type,
b':' + self.subtype if self.subtype else b'', self.size))
if txtout and self.type == b'TXTz':
inbuf = StringIO(self.buf[self.datastart: self.dataend])
outbuf = StringIO()
decoder = BZZDecoder(inbuf, outbuf)
while True:
xxres = decoder.convert(1024 * 1024)
if not xxres:
break
res = outbuf.getvalue()
l = 0
for x in res[:3]:
l <<= 8
l += ord(x)
if verbose > 0 and out:
print >> out, l
txtout.write(res[3:3+l])
txtout.write(b'\n\f')
if txtout and self.type == b'TXTa':
res = self.buf[self.datastart: self.dataend]
l = 0
for x in res[:3]:
l <<= 8
l += ord(x)
if verbose > 0 and out:
print >> out, l
txtout.write(res[3:3+l])
txtout.write(b'\n\f')
if indent >= maxlevel:
return
for schunk in self._subchunks:
schunk.dump(verbose=verbose, indent=indent+1, out=out, txtout=txtout)
class DJVUFile(object):
def __init__(self, instream, verbose=0):
self.instream = instream
buf = self.instream.read(4)
assert(buf == b'AT&T')
buf = self.instream.read()
self.dc = DjvuChunk(buf, 0, len(buf), verbose=verbose)
def get_text(self, outfile=None):
self.dc.dump(txtout=outfile)
def dump(self, outfile=None, maxlevel=0):
self.dc.dump(out=outfile, maxlevel=maxlevel)
def main():
from ruamel.util.program import Program
class DJVUDecoder(Program):
def __init__(self):
Program.__init__(self)
def parser_setup(self):
Program.parser_setup(self)
#self._argparser.add_argument('--combine', '-c', action=CountAction, const=1, nargs=0)
#self._argparser.add_argument('--combine', '-c', type=int, default=1)
#self._argparser.add_argument('--segments', '-s', action='append', nargs='+')
#self._argparser.add_argument('--force', '-f', action='store_true')
#self._argparser.add_argument('classname')
self._argparser.add_argument('--text', '-t', action='store_true')
self._argparser.add_argument('--dump', type=int, default=0)
self._argparser.add_argument('file', nargs='+')
def run(self):
if self._args.verbose > 1: # can be negative with --quiet
print (self._args.file)
x = DJVUFile(file(self._args.file[0], 'rb'), verbose=self._args.verbose)
if self._args.text:
print (x.get_text(sys.stdout))
if self._args.dump:
x.dump(sys.stdout, maxlevel=self._args.dump)
return 0
tt = DJVUDecoder()
res = tt.result
if res != 0:
print (res)
if __name__ == '__main__':
main()

View File

@ -0,0 +1,746 @@
#! /usr/bin/env python
# coding: utf-8
from __future__ import (unicode_literals, division, absolute_import,
print_function)
__license__ = 'GPL v3'
__copyright__ = '2011, Anthon van der Neut <A.van.der.Neut@ruamel.eu>'
#__docformat__ = 'restructuredtext en'
# Copyright (C) 2011 Anthon van der Neut, Ruamel bvba
# Adapted from Leon Bottou's djvulibre C++ code,
# ( ZPCodec.{cpp,h} and BSByteStream.{cpp,h} )
# that code was first converted to C removing any dependencies on the DJVU libre
# framework for ByteStream, making it into a ctypes callable shared object
# then to python, and remade into a class
original_copyright_notice = '''
//C- -------------------------------------------------------------------
//C- DjVuLibre-3.5
//C- Copyright (c) 2002 Leon Bottou and Yann Le Cun.
//C- Copyright (c) 2001 AT&T
//C-
//C- This software is subject to, and may be distributed under, the
//C- GNU General Public License, either Version 2 of the license,
//C- or (at your option) any later version. The license should have
//C- accompanied the software or you may obtain a copy of the license
//C- from the Free Software Foundation at http://www.fsf.org .
//C-
//C- This program is distributed in the hope that it will be useful,
//C- but WITHOUT ANY WARRANTY; without even the implied warranty of
//C- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
//C- GNU General Public License for more details.
//C-
//C- DjVuLibre-3.5 is derived from the DjVu(r) Reference Library from
//C- Lizardtech Software. Lizardtech Software has authorized us to
//C- replace the original DjVu(r) Reference Library notice by the following
//C- text (see doc/lizard2002.djvu and doc/lizardtech2007.djvu):
//C-
//C- ------------------------------------------------------------------
//C- | DjVu (r) Reference Library (v. 3.5)
//C- | Copyright (c) 1999-2001 LizardTech, Inc. All Rights Reserved.
//C- | The DjVu Reference Library is protected by U.S. Pat. No.
//C- | 6,058,214 and patents pending.
//C- |
//C- | This software is subject to, and may be distributed under, the
//C- | GNU General Public License, either Version 2 of the license,
//C- | or (at your option) any later version. The license should have
//C- | accompanied the software or you may obtain a copy of the license
//C- | from the Free Software Foundation at http://www.fsf.org .
//C- |
//C- | The computer code originally released by LizardTech under this
//C- | license and unmodified by other parties is deemed "the LIZARDTECH
//C- | ORIGINAL CODE." Subject to any third party intellectual property
//C- | claims, LizardTech grants recipient a worldwide, royalty-free,
//C- | non-exclusive license to make, use, sell, or otherwise dispose of
//C- | the LIZARDTECH ORIGINAL CODE or of programs derived from the
//C- | LIZARDTECH ORIGINAL CODE in compliance with the terms of the GNU
//C- | General Public License. This grant only confers the right to
//C- | infringe patent claims underlying the LIZARDTECH ORIGINAL CODE to
//C- | the extent such infringement is reasonably necessary to enable
//C- | recipient to make, have made, practice, sell, or otherwise dispose
//C- | of the LIZARDTECH ORIGINAL CODE (or portions thereof) and not to
//C- | any greater extent that may be necessary to utilize further
//C- | modifications or combinations.
//C- |
//C- | The LIZARDTECH ORIGINAL CODE is provided "AS IS" WITHOUT WARRANTY
//C- | OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED
//C- | TO ANY WARRANTY OF NON-INFRINGEMENT, OR ANY IMPLIED WARRANTY OF
//C- | MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
//C- +------------------------------------------------------------------
//
// $Id: BSByteStream.cpp,v 1.9 2007/03/25 20:48:29 leonb Exp $
// $Name: release_3_5_23 $
'''
MAXBLOCK = 4096
FREQMAX = 4
CTXIDS = 3
MAXLEN = 1024 ** 2
# Exception classes used by this module.
class BZZDecoderError(Exception):
"""This exception is raised when BZZDecode runs into trouble
"""
def __init__(self, msg):
self.msg = msg
def __str__(self):
return "BZZDecoderError: %s" % (self.msg)
# This table has been designed for the ZPCoder
# * by running the following command in file 'zptable.sn':
# * (fast-crude (steady-mat 0.0035 0.0002) 260)))
default_ztable = [ # {{{
(0x8000, 0x0000, 84, 145), # 000: p=0.500000 ( 0, 0)
(0x8000, 0x0000, 3, 4), # 001: p=0.500000 ( 0, 0)
(0x8000, 0x0000, 4, 3), # 002: p=0.500000 ( 0, 0)
(0x6bbd, 0x10a5, 5, 1), # 003: p=0.465226 ( 0, 0)
(0x6bbd, 0x10a5, 6, 2), # 004: p=0.465226 ( 0, 0)
(0x5d45, 0x1f28, 7, 3), # 005: p=0.430708 ( 0, 0)
(0x5d45, 0x1f28, 8, 4), # 006: p=0.430708 ( 0, 0)
(0x51b9, 0x2bd3, 9, 5), # 007: p=0.396718 ( 0, 0)
(0x51b9, 0x2bd3, 10, 6), # 008: p=0.396718 ( 0, 0)
(0x4813, 0x36e3, 11, 7), # 009: p=0.363535 ( 0, 0)
(0x4813, 0x36e3, 12, 8), # 010: p=0.363535 ( 0, 0)
(0x3fd5, 0x408c, 13, 9), # 011: p=0.331418 ( 0, 0)
(0x3fd5, 0x408c, 14, 10), # 012: p=0.331418 ( 0, 0)
(0x38b1, 0x48fd, 15, 11), # 013: p=0.300585 ( 0, 0)
(0x38b1, 0x48fd, 16, 12), # 014: p=0.300585 ( 0, 0)
(0x3275, 0x505d, 17, 13), # 015: p=0.271213 ( 0, 0)
(0x3275, 0x505d, 18, 14), # 016: p=0.271213 ( 0, 0)
(0x2cfd, 0x56d0, 19, 15), # 017: p=0.243438 ( 0, 0)
(0x2cfd, 0x56d0, 20, 16), # 018: p=0.243438 ( 0, 0)
(0x2825, 0x5c71, 21, 17), # 019: p=0.217391 ( 0, 0)
(0x2825, 0x5c71, 22, 18), # 020: p=0.217391 ( 0, 0)
(0x23ab, 0x615b, 23, 19), # 021: p=0.193150 ( 0, 0)
(0x23ab, 0x615b, 24, 20), # 022: p=0.193150 ( 0, 0)
(0x1f87, 0x65a5, 25, 21), # 023: p=0.170728 ( 0, 0)
(0x1f87, 0x65a5, 26, 22), # 024: p=0.170728 ( 0, 0)
(0x1bbb, 0x6962, 27, 23), # 025: p=0.150158 ( 0, 0)
(0x1bbb, 0x6962, 28, 24), # 026: p=0.150158 ( 0, 0)
(0x1845, 0x6ca2, 29, 25), # 027: p=0.131418 ( 0, 0)
(0x1845, 0x6ca2, 30, 26), # 028: p=0.131418 ( 0, 0)
(0x1523, 0x6f74, 31, 27), # 029: p=0.114460 ( 0, 0)
(0x1523, 0x6f74, 32, 28), # 030: p=0.114460 ( 0, 0)
(0x1253, 0x71e6, 33, 29), # 031: p=0.099230 ( 0, 0)
(0x1253, 0x71e6, 34, 30), # 032: p=0.099230 ( 0, 0)
(0x0fcf, 0x7404, 35, 31), # 033: p=0.085611 ( 0, 0)
(0x0fcf, 0x7404, 36, 32), # 034: p=0.085611 ( 0, 0)
(0x0d95, 0x75d6, 37, 33), # 035: p=0.073550 ( 0, 0)
(0x0d95, 0x75d6, 38, 34), # 036: p=0.073550 ( 0, 0)
(0x0b9d, 0x7768, 39, 35), # 037: p=0.062888 ( 0, 0)
(0x0b9d, 0x7768, 40, 36), # 038: p=0.062888 ( 0, 0)
(0x09e3, 0x78c2, 41, 37), # 039: p=0.053539 ( 0, 0)
(0x09e3, 0x78c2, 42, 38), # 040: p=0.053539 ( 0, 0)
(0x0861, 0x79ea, 43, 39), # 041: p=0.045365 ( 0, 0)
(0x0861, 0x79ea, 44, 40), # 042: p=0.045365 ( 0, 0)
(0x0711, 0x7ae7, 45, 41), # 043: p=0.038272 ( 0, 0)
(0x0711, 0x7ae7, 46, 42), # 044: p=0.038272 ( 0, 0)
(0x05f1, 0x7bbe, 47, 43), # 045: p=0.032174 ( 0, 0)
(0x05f1, 0x7bbe, 48, 44), # 046: p=0.032174 ( 0, 0)
(0x04f9, 0x7c75, 49, 45), # 047: p=0.026928 ( 0, 0)
(0x04f9, 0x7c75, 50, 46), # 048: p=0.026928 ( 0, 0)
(0x0425, 0x7d0f, 51, 47), # 049: p=0.022444 ( 0, 0)
(0x0425, 0x7d0f, 52, 48), # 050: p=0.022444 ( 0, 0)
(0x0371, 0x7d91, 53, 49), # 051: p=0.018636 ( 0, 0)
(0x0371, 0x7d91, 54, 50), # 052: p=0.018636 ( 0, 0)
(0x02d9, 0x7dfe, 55, 51), # 053: p=0.015421 ( 0, 0)
(0x02d9, 0x7dfe, 56, 52), # 054: p=0.015421 ( 0, 0)
(0x0259, 0x7e5a, 57, 53), # 055: p=0.012713 ( 0, 0)
(0x0259, 0x7e5a, 58, 54), # 056: p=0.012713 ( 0, 0)
(0x01ed, 0x7ea6, 59, 55), # 057: p=0.010419 ( 0, 0)
(0x01ed, 0x7ea6, 60, 56), # 058: p=0.010419 ( 0, 0)
(0x0193, 0x7ee6, 61, 57), # 059: p=0.008525 ( 0, 0)
(0x0193, 0x7ee6, 62, 58), # 060: p=0.008525 ( 0, 0)
(0x0149, 0x7f1a, 63, 59), # 061: p=0.006959 ( 0, 0)
(0x0149, 0x7f1a, 64, 60), # 062: p=0.006959 ( 0, 0)
(0x010b, 0x7f45, 65, 61), # 063: p=0.005648 ( 0, 0)
(0x010b, 0x7f45, 66, 62), # 064: p=0.005648 ( 0, 0)
(0x00d5, 0x7f6b, 67, 63), # 065: p=0.004506 ( 0, 0)
(0x00d5, 0x7f6b, 68, 64), # 066: p=0.004506 ( 0, 0)
(0x00a5, 0x7f8d, 69, 65), # 067: p=0.003480 ( 0, 0)
(0x00a5, 0x7f8d, 70, 66), # 068: p=0.003480 ( 0, 0)
(0x007b, 0x7faa, 71, 67), # 069: p=0.002602 ( 0, 0)
(0x007b, 0x7faa, 72, 68), # 070: p=0.002602 ( 0, 0)
(0x0057, 0x7fc3, 73, 69), # 071: p=0.001843 ( 0, 0)
(0x0057, 0x7fc3, 74, 70), # 072: p=0.001843 ( 0, 0)
(0x003b, 0x7fd7, 75, 71), # 073: p=0.001248 ( 0, 0)
(0x003b, 0x7fd7, 76, 72), # 074: p=0.001248 ( 0, 0)
(0x0023, 0x7fe7, 77, 73), # 075: p=0.000749 ( 0, 0)
(0x0023, 0x7fe7, 78, 74), # 076: p=0.000749 ( 0, 0)
(0x0013, 0x7ff2, 79, 75), # 077: p=0.000402 ( 0, 0)
(0x0013, 0x7ff2, 80, 76), # 078: p=0.000402 ( 0, 0)
(0x0007, 0x7ffa, 81, 77), # 079: p=0.000153 ( 0, 0)
(0x0007, 0x7ffa, 82, 78), # 080: p=0.000153 ( 0, 0)
(0x0001, 0x7fff, 81, 79), # 081: p=0.000027 ( 0, 0)
(0x0001, 0x7fff, 82, 80), # 082: p=0.000027 ( 0, 0)
(0x5695, 0x0000, 9, 85), # 083: p=0.411764 ( 2, 3)
(0x24ee, 0x0000, 86, 226), # 084: p=0.199988 ( 1, 0)
(0x8000, 0x0000, 5, 6), # 085: p=0.500000 ( 3, 3)
(0x0d30, 0x0000, 88, 176), # 086: p=0.071422 ( 4, 0)
(0x481a, 0x0000, 89, 143), # 087: p=0.363634 ( 1, 2)
(0x0481, 0x0000, 90, 138), # 088: p=0.024388 ( 13, 0)
(0x3579, 0x0000, 91, 141), # 089: p=0.285711 ( 1, 3)
(0x017a, 0x0000, 92, 112), # 090: p=0.007999 ( 41, 0)
(0x24ef, 0x0000, 93, 135), # 091: p=0.199997 ( 1, 5)
(0x007b, 0x0000, 94, 104), # 092: p=0.002611 ( 127, 0)
(0x1978, 0x0000, 95, 133), # 093: p=0.137929 ( 1, 8)
(0x0028, 0x0000, 96, 100), # 094: p=0.000849 ( 392, 0)
(0x10ca, 0x0000, 97, 129), # 095: p=0.090907 ( 1, 13)
(0x000d, 0x0000, 82, 98), # 096: p=0.000276 ( 1208, 0)
(0x0b5d, 0x0000, 99, 127), # 097: p=0.061537 ( 1, 20)
(0x0034, 0x0000, 76, 72), # 098: p=0.001102 ( 1208, 1)
(0x078a, 0x0000, 101, 125), # 099: p=0.040815 ( 1, 31)
(0x00a0, 0x0000, 70, 102), # 100: p=0.003387 ( 392, 1)
(0x050f, 0x0000, 103, 123), # 101: p=0.027397 ( 1, 47)
(0x0117, 0x0000, 66, 60), # 102: p=0.005912 ( 392, 2)
(0x0358, 0x0000, 105, 121), # 103: p=0.018099 ( 1, 72)
(0x01ea, 0x0000, 106, 110), # 104: p=0.010362 ( 127, 1)
(0x0234, 0x0000, 107, 119), # 105: p=0.011940 ( 1, 110)
(0x0144, 0x0000, 66, 108), # 106: p=0.006849 ( 193, 1)
(0x0173, 0x0000, 109, 117), # 107: p=0.007858 ( 1, 168)
(0x0234, 0x0000, 60, 54), # 108: p=0.011925 ( 193, 2)
(0x00f5, 0x0000, 111, 115), # 109: p=0.005175 ( 1, 256)
(0x0353, 0x0000, 56, 48), # 110: p=0.017995 ( 127, 2)
(0x00a1, 0x0000, 69, 113), # 111: p=0.003413 ( 1, 389)
(0x05c5, 0x0000, 114, 134), # 112: p=0.031249 ( 41, 1)
(0x011a, 0x0000, 65, 59), # 113: p=0.005957 ( 2, 389)
(0x03cf, 0x0000, 116, 132), # 114: p=0.020618 ( 63, 1)
(0x01aa, 0x0000, 61, 55), # 115: p=0.009020 ( 2, 256)
(0x0285, 0x0000, 118, 130), # 116: p=0.013652 ( 96, 1)
(0x0286, 0x0000, 57, 51), # 117: p=0.013672 ( 2, 168)
(0x01ab, 0x0000, 120, 128), # 118: p=0.009029 ( 146, 1)
(0x03d3, 0x0000, 53, 47), # 119: p=0.020710 ( 2, 110)
(0x011a, 0x0000, 122, 126), # 120: p=0.005961 ( 222, 1)
(0x05c5, 0x0000, 49, 41), # 121: p=0.031250 ( 2, 72)
(0x00ba, 0x0000, 124, 62), # 122: p=0.003925 ( 338, 1)
(0x08ad, 0x0000, 43, 37), # 123: p=0.046979 ( 2, 47)
(0x007a, 0x0000, 72, 66), # 124: p=0.002586 ( 514, 1)
(0x0ccc, 0x0000, 39, 31), # 125: p=0.069306 ( 2, 31)
(0x01eb, 0x0000, 60, 54), # 126: p=0.010386 ( 222, 2)
(0x1302, 0x0000, 33, 25), # 127: p=0.102940 ( 2, 20)
(0x02e6, 0x0000, 56, 50), # 128: p=0.015695 ( 146, 2)
(0x1b81, 0x0000, 29, 131), # 129: p=0.148935 ( 2, 13)
(0x045e, 0x0000, 52, 46), # 130: p=0.023648 ( 96, 2)
(0x24ef, 0x0000, 23, 17), # 131: p=0.199999 ( 3, 13)
(0x0690, 0x0000, 48, 40), # 132: p=0.035533 ( 63, 2)
(0x2865, 0x0000, 23, 15), # 133: p=0.218748 ( 2, 8)
(0x09de, 0x0000, 42, 136), # 134: p=0.053434 ( 41, 2)
(0x3987, 0x0000, 137, 7), # 135: p=0.304346 ( 2, 5)
(0x0dc8, 0x0000, 38, 32), # 136: p=0.074626 ( 41, 3)
(0x2c99, 0x0000, 21, 139), # 137: p=0.241378 ( 2, 7)
(0x10ca, 0x0000, 140, 172), # 138: p=0.090907 ( 13, 1)
(0x3b5f, 0x0000, 15, 9), # 139: p=0.312499 ( 3, 7)
(0x0b5d, 0x0000, 142, 170), # 140: p=0.061537 ( 20, 1)
(0x5695, 0x0000, 9, 85), # 141: p=0.411764 ( 2, 3)
(0x078a, 0x0000, 144, 168), # 142: p=0.040815 ( 31, 1)
(0x8000, 0x0000, 141, 248), # 143: p=0.500000 ( 2, 2)
(0x050f, 0x0000, 146, 166), # 144: p=0.027397 ( 47, 1)
(0x24ee, 0x0000, 147, 247), # 145: p=0.199988 ( 0, 1)
(0x0358, 0x0000, 148, 164), # 146: p=0.018099 ( 72, 1)
(0x0d30, 0x0000, 149, 197), # 147: p=0.071422 ( 0, 4)
(0x0234, 0x0000, 150, 162), # 148: p=0.011940 ( 110, 1)
(0x0481, 0x0000, 151, 95), # 149: p=0.024388 ( 0, 13)
(0x0173, 0x0000, 152, 160), # 150: p=0.007858 ( 168, 1)
(0x017a, 0x0000, 153, 173), # 151: p=0.007999 ( 0, 41)
(0x00f5, 0x0000, 154, 158), # 152: p=0.005175 ( 256, 1)
(0x007b, 0x0000, 155, 165), # 153: p=0.002611 ( 0, 127)
(0x00a1, 0x0000, 70, 156), # 154: p=0.003413 ( 389, 1)
(0x0028, 0x0000, 157, 161), # 155: p=0.000849 ( 0, 392)
(0x011a, 0x0000, 66, 60), # 156: p=0.005957 ( 389, 2)
(0x000d, 0x0000, 81, 159), # 157: p=0.000276 ( 0, 1208)
(0x01aa, 0x0000, 62, 56), # 158: p=0.009020 ( 256, 2)
(0x0034, 0x0000, 75, 71), # 159: p=0.001102 ( 1, 1208)
(0x0286, 0x0000, 58, 52), # 160: p=0.013672 ( 168, 2)
(0x00a0, 0x0000, 69, 163), # 161: p=0.003387 ( 1, 392)
(0x03d3, 0x0000, 54, 48), # 162: p=0.020710 ( 110, 2)
(0x0117, 0x0000, 65, 59), # 163: p=0.005912 ( 2, 392)
(0x05c5, 0x0000, 50, 42), # 164: p=0.031250 ( 72, 2)
(0x01ea, 0x0000, 167, 171), # 165: p=0.010362 ( 1, 127)
(0x08ad, 0x0000, 44, 38), # 166: p=0.046979 ( 47, 2)
(0x0144, 0x0000, 65, 169), # 167: p=0.006849 ( 1, 193)
(0x0ccc, 0x0000, 40, 32), # 168: p=0.069306 ( 31, 2)
(0x0234, 0x0000, 59, 53), # 169: p=0.011925 ( 2, 193)
(0x1302, 0x0000, 34, 26), # 170: p=0.102940 ( 20, 2)
(0x0353, 0x0000, 55, 47), # 171: p=0.017995 ( 2, 127)
(0x1b81, 0x0000, 30, 174), # 172: p=0.148935 ( 13, 2)
(0x05c5, 0x0000, 175, 193), # 173: p=0.031249 ( 1, 41)
(0x24ef, 0x0000, 24, 18), # 174: p=0.199999 ( 13, 3)
(0x03cf, 0x0000, 177, 191), # 175: p=0.020618 ( 1, 63)
(0x2b74, 0x0000, 178, 222), # 176: p=0.235291 ( 4, 1)
(0x0285, 0x0000, 179, 189), # 177: p=0.013652 ( 1, 96)
(0x201d, 0x0000, 180, 218), # 178: p=0.173910 ( 6, 1)
(0x01ab, 0x0000, 181, 187), # 179: p=0.009029 ( 1, 146)
(0x1715, 0x0000, 182, 216), # 180: p=0.124998 ( 9, 1)
(0x011a, 0x0000, 183, 185), # 181: p=0.005961 ( 1, 222)
(0x0fb7, 0x0000, 184, 214), # 182: p=0.085105 ( 14, 1)
(0x00ba, 0x0000, 69, 61), # 183: p=0.003925 ( 1, 338)
(0x0a67, 0x0000, 186, 212), # 184: p=0.056337 ( 22, 1)
(0x01eb, 0x0000, 59, 53), # 185: p=0.010386 ( 2, 222)
(0x06e7, 0x0000, 188, 210), # 186: p=0.037382 ( 34, 1)
(0x02e6, 0x0000, 55, 49), # 187: p=0.015695 ( 2, 146)
(0x0496, 0x0000, 190, 208), # 188: p=0.024844 ( 52, 1)
(0x045e, 0x0000, 51, 45), # 189: p=0.023648 ( 2, 96)
(0x030d, 0x0000, 192, 206), # 190: p=0.016529 ( 79, 1)
(0x0690, 0x0000, 47, 39), # 191: p=0.035533 ( 2, 63)
(0x0206, 0x0000, 194, 204), # 192: p=0.010959 ( 120, 1)
(0x09de, 0x0000, 41, 195), # 193: p=0.053434 ( 2, 41)
(0x0155, 0x0000, 196, 202), # 194: p=0.007220 ( 183, 1)
(0x0dc8, 0x0000, 37, 31), # 195: p=0.074626 ( 3, 41)
(0x00e1, 0x0000, 198, 200), # 196: p=0.004750 ( 279, 1)
(0x2b74, 0x0000, 199, 243), # 197: p=0.235291 ( 1, 4)
(0x0094, 0x0000, 72, 64), # 198: p=0.003132 ( 424, 1)
(0x201d, 0x0000, 201, 239), # 199: p=0.173910 ( 1, 6)
(0x0188, 0x0000, 62, 56), # 200: p=0.008284 ( 279, 2)
(0x1715, 0x0000, 203, 237), # 201: p=0.124998 ( 1, 9)
(0x0252, 0x0000, 58, 52), # 202: p=0.012567 ( 183, 2)
(0x0fb7, 0x0000, 205, 235), # 203: p=0.085105 ( 1, 14)
(0x0383, 0x0000, 54, 48), # 204: p=0.019021 ( 120, 2)
(0x0a67, 0x0000, 207, 233), # 205: p=0.056337 ( 1, 22)
(0x0547, 0x0000, 50, 44), # 206: p=0.028571 ( 79, 2)
(0x06e7, 0x0000, 209, 231), # 207: p=0.037382 ( 1, 34)
(0x07e2, 0x0000, 46, 38), # 208: p=0.042682 ( 52, 2)
(0x0496, 0x0000, 211, 229), # 209: p=0.024844 ( 1, 52)
(0x0bc0, 0x0000, 40, 34), # 210: p=0.063636 ( 34, 2)
(0x030d, 0x0000, 213, 227), # 211: p=0.016529 ( 1, 79)
(0x1178, 0x0000, 36, 28), # 212: p=0.094593 ( 22, 2)
(0x0206, 0x0000, 215, 225), # 213: p=0.010959 ( 1, 120)
(0x19da, 0x0000, 30, 22), # 214: p=0.139999 ( 14, 2)
(0x0155, 0x0000, 217, 223), # 215: p=0.007220 ( 1, 183)
(0x24ef, 0x0000, 26, 16), # 216: p=0.199998 ( 9, 2)
(0x00e1, 0x0000, 219, 221), # 217: p=0.004750 ( 1, 279)
(0x320e, 0x0000, 20, 220), # 218: p=0.269229 ( 6, 2)
(0x0094, 0x0000, 71, 63), # 219: p=0.003132 ( 1, 424)
(0x432a, 0x0000, 14, 8), # 220: p=0.344827 ( 6, 3)
(0x0188, 0x0000, 61, 55), # 221: p=0.008284 ( 2, 279)
(0x447d, 0x0000, 14, 224), # 222: p=0.349998 ( 4, 2)
(0x0252, 0x0000, 57, 51), # 223: p=0.012567 ( 2, 183)
(0x5ece, 0x0000, 8, 2), # 224: p=0.434782 ( 4, 3)
(0x0383, 0x0000, 53, 47), # 225: p=0.019021 ( 2, 120)
(0x8000, 0x0000, 228, 87), # 226: p=0.500000 ( 1, 1)
(0x0547, 0x0000, 49, 43), # 227: p=0.028571 ( 2, 79)
(0x481a, 0x0000, 230, 246), # 228: p=0.363634 ( 2, 1)
(0x07e2, 0x0000, 45, 37), # 229: p=0.042682 ( 2, 52)
(0x3579, 0x0000, 232, 244), # 230: p=0.285711 ( 3, 1)
(0x0bc0, 0x0000, 39, 33), # 231: p=0.063636 ( 2, 34)
(0x24ef, 0x0000, 234, 238), # 232: p=0.199997 ( 5, 1)
(0x1178, 0x0000, 35, 27), # 233: p=0.094593 ( 2, 22)
(0x1978, 0x0000, 138, 236), # 234: p=0.137929 ( 8, 1)
(0x19da, 0x0000, 29, 21), # 235: p=0.139999 ( 2, 14)
(0x2865, 0x0000, 24, 16), # 236: p=0.218748 ( 8, 2)
(0x24ef, 0x0000, 25, 15), # 237: p=0.199998 ( 2, 9)
(0x3987, 0x0000, 240, 8), # 238: p=0.304346 ( 5, 2)
(0x320e, 0x0000, 19, 241), # 239: p=0.269229 ( 2, 6)
(0x2c99, 0x0000, 22, 242), # 240: p=0.241378 ( 7, 2)
(0x432a, 0x0000, 13, 7), # 241: p=0.344827 ( 3, 6)
(0x3b5f, 0x0000, 16, 10), # 242: p=0.312499 ( 7, 3)
(0x447d, 0x0000, 13, 245), # 243: p=0.349998 ( 2, 4)
(0x5695, 0x0000, 10, 2), # 244: p=0.411764 ( 3, 2)
(0x5ece, 0x0000, 7, 1), # 245: p=0.434782 ( 3, 4)
(0x8000, 0x0000, 244, 83), # 246: p=0.500000 ( 2, 2)
(0x8000, 0x0000, 249, 250), # 247: p=0.500000 ( 1, 1)
(0x5695, 0x0000, 10, 2), # 248: p=0.411764 ( 3, 2)
(0x481a, 0x0000, 89, 143), # 249: p=0.363634 ( 1, 2)
(0x481a, 0x0000, 230, 246), # 250: p=0.363634 ( 2, 1)
(0, 0, 0, 0),
(0, 0, 0, 0),
(0, 0, 0, 0),
(0, 0, 0, 0),
(0, 0, 0, 0),
]
xmtf = (
0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
0x08, 0x09, 0x0A, 0x0B, 0x0C, 0x0D, 0x0E, 0x0F,
0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17,
0x18, 0x19, 0x1A, 0x1B, 0x1C, 0x1D, 0x1E, 0x1F,
0x20, 0x21, 0x22, 0x23, 0x24, 0x25, 0x26, 0x27,
0x28, 0x29, 0x2A, 0x2B, 0x2C, 0x2D, 0x2E, 0x2F,
0x30, 0x31, 0x32, 0x33, 0x34, 0x35, 0x36, 0x37,
0x38, 0x39, 0x3A, 0x3B, 0x3C, 0x3D, 0x3E, 0x3F,
0x40, 0x41, 0x42, 0x43, 0x44, 0x45, 0x46, 0x47,
0x48, 0x49, 0x4A, 0x4B, 0x4C, 0x4D, 0x4E, 0x4F,
0x50, 0x51, 0x52, 0x53, 0x54, 0x55, 0x56, 0x57,
0x58, 0x59, 0x5A, 0x5B, 0x5C, 0x5D, 0x5E, 0x5F,
0x60, 0x61, 0x62, 0x63, 0x64, 0x65, 0x66, 0x67,
0x68, 0x69, 0x6A, 0x6B, 0x6C, 0x6D, 0x6E, 0x6F,
0x70, 0x71, 0x72, 0x73, 0x74, 0x75, 0x76, 0x77,
0x78, 0x79, 0x7A, 0x7B, 0x7C, 0x7D, 0x7E, 0x7F,
0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
0x88, 0x89, 0x8A, 0x8B, 0x8C, 0x8D, 0x8E, 0x8F,
0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
0x98, 0x99, 0x9A, 0x9B, 0x9C, 0x9D, 0x9E, 0x9F,
0xA0, 0xA1, 0xA2, 0xA3, 0xA4, 0xA5, 0xA6, 0xA7,
0xA8, 0xA9, 0xAA, 0xAB, 0xAC, 0xAD, 0xAE, 0xAF,
0xB0, 0xB1, 0xB2, 0xB3, 0xB4, 0xB5, 0xB6, 0xB7,
0xB8, 0xB9, 0xBA, 0xBB, 0xBC, 0xBD, 0xBE, 0xBF,
0xC0, 0xC1, 0xC2, 0xC3, 0xC4, 0xC5, 0xC6, 0xC7,
0xC8, 0xC9, 0xCA, 0xCB, 0xCC, 0xCD, 0xCE, 0xCF,
0xD0, 0xD1, 0xD2, 0xD3, 0xD4, 0xD5, 0xD6, 0xD7,
0xD8, 0xD9, 0xDA, 0xDB, 0xDC, 0xDD, 0xDE, 0xDF,
0xE0, 0xE1, 0xE2, 0xE3, 0xE4, 0xE5, 0xE6, 0xE7,
0xE8, 0xE9, 0xEA, 0xEB, 0xEC, 0xED, 0xEE, 0xEF,
0xF0, 0xF1, 0xF2, 0xF3, 0xF4, 0xF5, 0xF6, 0xF7,
0xF8, 0xF9, 0xFA, 0xFB, 0xFC, 0xFD, 0xFE, 0xFF
)
# }}}
def chr3(l):
return bytes(bytearray(l))
class BZZDecoder():
def __init__(self, infile, outfile):
self.instream = infile
self.outf = outfile
self.ieof = False
self.bptr = None
self.xsize = None
self.outbuf = [0] * (MAXBLOCK * 1024)
self.byte = None
self.scount = 0
self.delay = 25
self.a = 0
self.code = 0
self.bufint = 0
self.ctx = [0] * 300
# table
self.p = [0] * 256
self.m = [0] * 256
self.up = [0] * 256
self.dn = [0] * 256
# machine independent ffz
self.ffzt = [0] * 256
# Create machine independent ffz table
for i in range(256):
j = i
while(j & 0x80):
self.ffzt[i] += 1
j <<= 1
# Initialize table
self.newtable(default_ztable)
# Codebit counter
# Read first 16 bits of code
if not self.read_byte():
self.byte = 0xff
self.code = (self.byte << 8)
if not self.read_byte():
self.byte = 0xff
self.code = self.code | self.byte
# Preload buffer
self.preload()
# Compute initial fence
self.fence = self.code
if self.code >= 0x8000:
self.fence = 0x7fff
def convert(self, sz):
if self.ieof:
return 0
copied = 0
while sz > 0 and not (self.ieof):
# Decode if needed
if not self.xsize:
self.bptr = 0
if not self.decode(): # input block size set in decode
self.xsize = 1
self.ieof = True
self.xsize -= 1
# Compute remaining
bytes = self.xsize
if bytes > sz:
bytes = sz
# Transfer
if bytes:
for i in range(bytes):
self.outf.write(chr3(self.outbuf[self.bptr + i]))
self.xsize -= bytes
self.bptr += bytes
sz -= bytes
copied += bytes
# offset += bytes; // for tell()
return copied
def preload(self):
while self.scount <= 24:
if self.read_byte() < 1:
self.byte = 0xff
if --self.delay < 1:
raise BZZDecoderError("BiteStream EOF")
self.bufint = (self.bufint << 8) | self.byte
self.scount += 8
def newtable(self, table):
for i in range(256):
self.p[i] = table[i][0]
self.m[i] = table[i][1]
self.up[i] = table[i][2]
self.dn[i] = table[i][3]
def decode(self):
outbuf = self.outbuf
# Decode block size
self.xsize = self.decode_raw(24)
if not self.xsize:
return 0
if self.xsize > MAXBLOCK * 1024: # 4MB (4096 * 1024) is max block
raise BZZDecoderError("BiteStream.corrupt")
# Dec11ode Estimation Speed
fshift = 0
if self.zpcodec_decoder():
fshift += 1
if self.zpcodec_decoder():
fshift += 1
# Prepare Quasi MTF
mtf = list(xmtf) # unsigned chars
freq = [0] * FREQMAX
fadd = 4
# Decode
mtfno = 3
markerpos = -1
for i in range(self.xsize):
ctxid = CTXIDS - 1
if ctxid > mtfno:
ctxid = mtfno
cx = self.ctx
if self.zpcodec_decode(cx, ctxid):
mtfno = 0
outbuf[i] = mtf[mtfno]
elif self.zpcodec_decode(cx, ctxid + CTXIDS):
mtfno = 1
outbuf[i] = mtf[mtfno]
elif self.zpcodec_decode(cx, 2*CTXIDS):
mtfno = 2 + self.decode_binary(cx, 2*CTXIDS + 1, 1)
outbuf[i] = mtf[mtfno]
elif self.zpcodec_decode(cx, 2*CTXIDS+2):
mtfno = 4 + self.decode_binary(cx, 2*CTXIDS+2 + 1, 2)
outbuf[i] = mtf[mtfno]
elif self.zpcodec_decode(cx, 2*CTXIDS + 6):
mtfno = 8 + self.decode_binary(cx, 2*CTXIDS + 6 + 1, 3)
outbuf[i] = mtf[mtfno]
elif self.zpcodec_decode(cx, 2*CTXIDS + 14):
mtfno = 16 + self.decode_binary(cx, 2*CTXIDS + 14 + 1, 4)
outbuf[i] = mtf[mtfno]
elif self.zpcodec_decode(cx, 2*CTXIDS + 30 ):
mtfno = 32 + self.decode_binary(cx, 2*CTXIDS + 30 + 1, 5)
outbuf[i] = mtf[mtfno]
elif self.zpcodec_decode(cx, 2*CTXIDS + 62 ):
mtfno = 64 + self.decode_binary(cx, 2*CTXIDS + 62 + 1, 6)
outbuf[i] = mtf[mtfno]
elif self.zpcodec_decode(cx, 2*CTXIDS + 126):
mtfno = 128 + self.decode_binary(cx, 2*CTXIDS + 126 + 1, 7)
outbuf[i] = mtf[mtfno]
else:
mtfno = 256 # EOB
outbuf[i] = 0
markerpos = i
continue
# Rotate mtf according to empirical frequencies (new!)
# :rotate label
# Adjust frequencies for overflow
fadd = fadd + (fadd >> fshift)
if fadd > 0x10000000:
fadd >>= 24
freq[0] >>= 24
freq[1] >>= 24
freq[2] >>= 24
freq[3] >>= 24
for k in range(4, FREQMAX):
freq[k] = freq[k] >> 24
# Relocate new char according to new freq
fc = fadd
if mtfno < FREQMAX:
fc += freq[mtfno]
k = mtfno
while (k >= FREQMAX):
mtf[k] = mtf[k - 1]
k -= 1
while (k > 0 and fc >= freq[k - 1]):
mtf[k] = mtf[k - 1]
freq[k] = freq[k - 1]
k -= 1
mtf[k] = outbuf[i]
freq[k] = fc
#///////////////////////////////
#//////// Reconstruct the string
if markerpos < 1 or markerpos >= self.xsize:
raise BZZDecoderError("BiteStream.corrupt")
# Allocate pointers
posn = [0] * self.xsize
# Prepare count buffer
count = [0] * 256
# Fill count buffer
for i in range(markerpos):
c = outbuf[i]
posn[i] = (c << 24) | (count[c] & 0xffffff)
count[c] += 1
for i in range(markerpos + 1, self.xsize):
c = outbuf[i]
posn[i] = (c << 24) | (count[c] & 0xffffff)
count[c] += 1
# Compute sorted char positions
last = 1
for i in range(256):
tmp = count[i]
count[i] = last
last += tmp
# Undo the sort transform
i = 0
last = self.xsize - 1
while last > 0:
n = posn[i]
c = (posn[i] >> 24)
last -= 1
outbuf[last] = c
i = count[c] + (n & 0xffffff)
# Free and check
if i != markerpos:
raise BZZDecoderError("BiteStream.corrupt")
return self.xsize
def decode_raw(self, bits):
n = 1
m = (1 << bits)
while n < m:
b = self.zpcodec_decoder()
n = (n << 1) | b
return n - m
def decode_binary(self, ctx, index, bits):
n = 1
m = (1 << bits)
while n < m:
b = self.zpcodec_decode(ctx, index + n - 1)
n = (n << 1) | b
return n - m
def zpcodec_decoder(self):
return self.decode_sub_simple(0, 0x8000 + (self.a >> 1))
def decode_sub_simple(self, mps, z):
# Test MPS/LPS
if z > self.code:
# LPS branch
z = 0x10000 - z
self.a += +z
self.code = self.code + z
# LPS renormalization
shift = self.ffz()
self.scount -= shift
self.a = self.a << shift
self.a &= 0xffff
self.code = (self.code << shift) | ((self.bufint >> self.scount) & ((1 << shift) - 1))
self.code &= 0xffff
if self.scount < 16:
self.preload()
# Adjust fence
self.fence = self.code
if self.code >= 0x8000:
self.fence = 0x7fff
result = mps ^ 1
else:
# MPS renormalization
self.scount -= 1
self.a = (z << 1) & 0xffff
self.code = ((self.code << 1) | ((self.bufint >> self.scount) & 1))
self.code &= 0xffff
if self.scount < 16:
self.preload()
# Adjust fence
self.fence = self.code
if self.code >= 0x8000:
self.fence = 0x7fff
result = mps
return result
def decode_sub(self, ctx, index, z):
# Save bit
bit = (ctx[index] & 1)
# Avoid interval reversion
d = 0x6000 + ((z + self.a) >> 2)
if z > d:
z = d
# Test MPS/LPS
if z > self.code:
# LPS branch
z = 0x10000 - z
self.a += +z
self.code = self.code + z
# LPS adaptation
ctx[index] = self.dn[ctx[index]]
# LPS renormalization
shift = self.ffz()
self.scount -= shift
self.a = (self.a << shift) & 0xffff
self.code = ((self.code << shift) | ((self.bufint >> self.scount) & ((1 << shift) - 1))) & 0xffff
if self.scount < 16:
self.preload()
# Adjust fence
self.fence = self.code
if self.code >= 0x8000:
self.fence = 0x7fff
return bit ^ 1
else:
# MPS adaptation
if self.a >= self.m[ctx[index]]:
ctx[index] = self.up[ctx[index]]
# MPS renormalization
self.scount -= 1
self.a = z << 1 & 0xffff
self.code = ((self.code << 1) | ((self.bufint >> self.scount) & 1)) & 0xffff
if self.scount < 16:
self.preload()
# Adjust fence
self.fence = self.code
if self.code >= 0x8000:
self.fence = 0x7fff
return bit
def zpcodec_decode(self, ctx, index):
z = self.a + self.p[ctx[index]]
if z <= self.fence:
self.a = z
res = (ctx[index] & 1)
else:
res = self.decode_sub(ctx, index, z)
return res
def read_byte(self):
res = 0
if self.instream:
ires = self.instream.read(1)
res = len(ires)
if res:
self.byte = ord(ires[0])
else:
raise NotImplementedError
return res
def ffz(self):
x = self.a
if (x >= 0xff00):
return (self.ffzt[x & 0xff] + 8)
else:
return (self.ffzt[(x >> 8) & 0xff])
### for testing
def main():
import sys
infile = file(sys.argv[1], "rb")
outfile = file(sys.argv[2], "wb")
dec = BZZDecoder(infile, outfile)
while True:
res = dec.convert(1024 * 1024)
if not res:
break
if __name__ == "__main__":
main()

View File

@ -0,0 +1,87 @@
# -*- coding: utf-8 -*-
from __future__ import (unicode_literals, division, absolute_import,
print_function)
__license__ = 'GPL 3'
__copyright__ = '2011, Anthon van der Neut <anthon@mnt.org>'
__docformat__ = 'restructuredtext en'
import os
from subprocess import Popen, PIPE
from cStringIO import StringIO
from calibre.customize.conversion import InputFormatPlugin, OptionRecommendation
from calibre.ebooks.txt.processor import convert_basic
class DJVUInput(InputFormatPlugin):
name = 'DJVU Input'
author = 'Anthon van der Neut'
description = 'Convert OCR-ed DJVU files (.djvu) to HTML'
file_types = set(['djvu', 'djv'])
options = set([
OptionRecommendation(name='use_djvutxt', recommended_value=True,
help=_('Try to use the djvutxt program and fall back to pure '
'python implementation if it fails or is not available')),
])
def convert(self, stream, options, file_ext, log, accelerators):
stdout = StringIO()
ppdjvu = True
# using djvutxt is MUCH faster, should make it an option
if options.use_djvutxt and os.path.exists('/usr/bin/djvutxt'):
from calibre.ptempfile import PersistentTemporaryFile
try:
fp = PersistentTemporaryFile(suffix='.djvu', prefix='djv_input')
filename = fp._name
fp.write(stream.read())
fp.close()
cmd = ['djvutxt', filename]
stdout.write(Popen(cmd, stdout=PIPE, close_fds=True).communicate()[0])
os.remove(filename)
ppdjvu = False
except:
stream.seek(0) # retry with the pure python converter
if ppdjvu:
from .djvu import DJVUFile
x = DJVUFile(stream)
x.get_text(stdout)
html = convert_basic(stdout.getvalue().replace(b"\n", b' ').replace(
b'\037', b'\n\n'))
# Run the HTMLized text through the html processing plugin.
from calibre.customize.ui import plugin_for_input_format
html_input = plugin_for_input_format('html')
for opt in html_input.options:
setattr(options, opt.option.name, opt.recommended_value)
options.input_encoding = 'utf-8'
base = os.getcwdu()
if file_ext != 'txtz' and hasattr(stream, 'name'):
base = os.path.dirname(stream.name)
fname = os.path.join(base, 'index.html')
c = 0
while os.path.exists(fname):
c += 1
fname = 'index%d.html'%c
htmlfile = open(fname, 'wb')
with htmlfile:
htmlfile.write(html.encode('utf-8'))
odi = options.debug_pipeline
options.debug_pipeline = None
# Generate oeb from html conversion.
with open(htmlfile.name, 'rb') as f:
oeb = html_input.convert(f, options, 'html', log,
{})
options.debug_pipeline = odi
os.remove(htmlfile.name)
# Set metadata from file.
from calibre.customize.ui import get_file_type_metadata
from calibre.ebooks.oeb.transforms.metadata import meta_info_to_oeb_metadata
mi = get_file_type_metadata(stream, file_ext)
meta_info_to_oeb_metadata(mi, oeb.metadata, log)
return oeb

View File

@ -325,6 +325,10 @@ class MobiReader(object):
self.processed_html = self.processed_html.replace('</</', '</')
self.processed_html = re.sub(r'</([a-zA-Z]+)<', r'</\1><',
self.processed_html)
# Remove tags of the form <xyz: ...> as they can cause issues further
# along the pipeline
self.processed_html = re.sub(r'</{0,1}[a-zA-Z]+:\s+[^>]*>', '',
self.processed_html)
for pat in ENCODING_PATS:
self.processed_html = pat.sub('', self.processed_html)

View File

@ -212,7 +212,11 @@ class Serializer(object):
if tocref.klass == "periodical":
buf.write('<div> <div height="1em"></div>')
else:
buf.write('<div></div> <div> <h2 height="1em"><font size="+2"><b>'+tocref.title+'</b></font></h2> <div height="1em"></div>')
t = tocref.title
if isinstance(t, unicode):
t = t.encode('utf-8')
buf.write('<div></div> <div> <h2 height="1em"><font size="+2"><b>'
+t+'</b></font></h2> <div height="1em"></div>')
buf.write('<ul>')
@ -221,14 +225,17 @@ class Serializer(object):
itemhref = tocitem.href
if tocref.klass == 'periodical':
# This is a section node.
# For periodical toca, the section urls are like r'feed_\d+/index.html'
# For periodical tocs, the section urls are like r'feed_\d+/index.html'
# We dont want to point to the start of the first article
# so we change the href.
itemhref = re.sub(r'article_\d+/', '', itemhref)
self.href_offsets[itemhref].append(buf.tell())
buf.write('0000000000')
buf.write(' ><font size="+1" color="blue"><b><u>')
buf.write(tocitem.title)
t = tocitem.title
if isinstance(t, unicode):
t = t.encode('utf-8')
buf.write(t)
buf.write('</u></b></font></a></li>')
buf.write('</ul><div height="1em"></div></div><mbp:pagebreak />')

View File

@ -246,6 +246,7 @@ class CSSFlattener(object):
cssdict['font-size'] = '%.1fpt'%font_size
del node.attrib['size']
if 'face' in node.attrib:
cssdict['font-family'] = node.attrib['face']
del node.attrib['face']
if 'color' in node.attrib:
cssdict['color'] = node.attrib['color']

View File

@ -305,11 +305,13 @@ class RTFInput(InputFormatPlugin):
html = 'index.xhtml'
with open(html, 'wb') as f:
res = transform.tostring(result)
res = res[:100].replace('xmlns:html', 'xmlns') + res[100:]
# res = res[:100].replace('xmlns:html', 'xmlns') + res[100:]
#clean multiple \n
res = re.sub('\n+', '\n', res)
# Replace newlines inserted by the 'empty_paragraphs' option in rtf2xml with html blank lines
res = re.sub('\s*<body>', '<body>', res)
res = re.sub('(?<=\n)\n{2}',
u'<p>\u00a0</p>\n'.encode('utf-8'), res)
# res = re.sub('\s*<body>', '<body>', res)
# res = re.sub('(?<=\n)\n{2}',
# u'<p>\u00a0</p>\n'.encode('utf-8'), res)
f.write(res)
self.write_inline_css(inline_class, border_styles)
stream.seek(0)

View File

@ -376,13 +376,13 @@ class ParseRtf:
msg += 'self.__run_level is "%s"\n' % self.__run_level
raise RtfInvalidCodeException, msg
if self.__run_level > 1:
sys.stderr.write(_('File could be older RTF...\n'))
sys.stderr.write('File could be older RTF...\n')
if found_destination:
if self.__run_level > 1:
sys.stderr.write(_(
sys.stderr.write(
'File also has newer RTF.\n'
'Will do the best to convert.\n'
))
)
add_brackets_obj = add_brackets.AddBrackets(
in_file = self.__temp_file,
bug_handler = RtfInvalidCodeException,

View File

@ -11,11 +11,11 @@
# #
# #
#########################################################################
import sys, os, tempfile
import sys, os, tempfile
from calibre.ebooks.rtf2xml import copy, check_brackets
# note to self. This is the first module in which I use tempfile. A good idea?
"""
"""
class AddBrackets:
"""
Add brackets for old RTF.
@ -41,6 +41,7 @@ class AddBrackets:
self.__copy = copy
self.__write_to = tempfile.mktemp()
self.__run_level = run_level
def __initiate_values(self):
"""
"""
@ -82,14 +83,16 @@ class AddBrackets:
'cw<ci<subscript_' ,
'cw<ci<superscrip',
'cw<ci<underlined' ,
'cw<ul<underlined' ,
# 'cw<ul<underlined' ,
]
def __before_body_func(self, line):
"""
"""
if self.__token_info == 'mi<mk<body-open_':
self.__state = 'in_body'
self.__write_obj.write(line)
def __in_body_func(self, line):
"""
"""
@ -108,6 +111,7 @@ class AddBrackets:
self.__state = 'after_control_word'
else:
self.__write_obj.write(line)
def __after_control_word_func(self, line):
"""
"""
@ -122,6 +126,7 @@ class AddBrackets:
self.__ignore_count = self.__ob_count
else:
self.__state = 'in_body'
def __write_group(self):
"""
"""
@ -141,6 +146,7 @@ class AddBrackets:
self.__write_obj.write(inline_string)
self.__open_bracket = 1
self.__temp_group = []
def __change_permanent_group(self):
"""
use temp group to change permanent group
@ -150,6 +156,7 @@ class AddBrackets:
if token_info in self.__accept:
att = line[20:-1]
self.__inline[token_info] = att
def __ignore_func(self, line):
"""
Don't add any brackets while inside of brackets RTF has already
@ -159,12 +166,14 @@ class AddBrackets:
if self.__token_info == 'cb<nu<clos-brack'and\
self.__cb_count == self.__ignore_count:
self.__state = 'in_body'
def __check_brackets(self, in_file):
self.__check_brack_obj = check_brackets.CheckBrackets\
(file = in_file)
good_br = self.__check_brack_obj.check_brackets()[0]
if not good_br:
return 1
def add_brackets(self):
"""
"""

View File

@ -397,6 +397,7 @@ class AddAction(InterfaceAction):
d = error_dialog(self.gui, _('Add to library'), _('No book files found'))
d.exec_()
return
paths = self.gui.device_manager.device.prepare_addable_books(paths)
from calibre.gui2.add import Adder
self.__adder_func = partial(self._add_from_device_adder, on_card=None,
model=view.model())

View File

@ -5,14 +5,57 @@ __license__ = 'GPL v3'
__copyright__ = '2010, Kovid Goyal <kovid@kovidgoyal.net>'
__docformat__ = 'restructuredtext en'
import os, datetime
from PyQt4.Qt import pyqtSignal, QModelIndex, QThread, Qt
from calibre.gui2 import error_dialog
from calibre.ebooks.BeautifulSoup import BeautifulSoup, Tag, NavigableString
from calibre import strftime
from calibre.gui2.actions import InterfaceAction
from calibre.devices.usbms.device import Device
from calibre.gui2.dialogs.progress import ProgressDialog
class Updater(QThread): # {{{
update_progress = pyqtSignal(int)
update_done = pyqtSignal()
def __init__(self, parent, db, device, annotation_map, done_callback):
QThread.__init__(self, parent)
self.errors = {}
self.db = db
self.keep_going = True
self.pd = ProgressDialog(_('Merging user annotations into database'), '',
0, len(annotation_map), parent=parent)
self.device = device
self.annotation_map = annotation_map
self.done_callback = done_callback
self.pd.canceled_signal.connect(self.canceled)
self.pd.setModal(True)
self.pd.show()
self.update_progress.connect(self.pd.set_value,
type=Qt.QueuedConnection)
self.update_done.connect(self.pd.hide, type=Qt.QueuedConnection)
def canceled(self):
self.keep_going = False
self.pd.hide()
def run(self):
for i, id_ in enumerate(self.annotation_map):
if not self.keep_going:
break
bm = Device.UserAnnotation(self.annotation_map[id_][0],
self.annotation_map[id_][1])
try:
self.device.add_annotation_to_library(self.db, id_, bm)
except:
import traceback
self.errors[id_] = traceback.format_exc()
self.update_progress.emit(i)
self.update_done.emit()
self.done_callback(self.annotation_map.keys(), self.errors)
# }}}
class FetchAnnotationsAction(InterfaceAction):
@ -41,13 +84,21 @@ class FetchAnnotationsAction(InterfaceAction):
fmts.append(format.lower())
return fmts
def get_device_path_from_id(id_):
paths = []
for x in ('memory', 'card_a', 'card_b'):
x = getattr(self.gui, x+'_view').model()
paths += x.paths_for_db_ids(set([id_]), as_map=True)[id_]
return paths[0].path if paths else None
def generate_annotation_paths(ids, db, device):
# Generate path templates
# Individual storage mount points scanned/resolved in driver.get_annotations()
path_map = {}
for id in ids:
path = get_device_path_from_id(id)
mi = db.get_metadata(id, index_is_id=True)
a_path = device.create_upload_path(os.path.abspath('/<storage>'), mi, 'x.bookmark', create_dirs=False)
a_path = device.create_annotations_path(mi, device_path=path)
path_map[id] = dict(path=a_path, fmts=get_formats(id))
return path_map
@ -78,166 +129,6 @@ class FetchAnnotationsAction(InterfaceAction):
path_map)
def annotations_fetched(self, job):
from calibre.devices.usbms.device import Device
from calibre.ebooks.metadata import MetaInformation
from calibre.gui2.dialogs.progress import ProgressDialog
from calibre.library.cli import do_add_format
class Updater(QThread): # {{{
update_progress = pyqtSignal(int)
update_done = pyqtSignal()
FINISHED_READING_PCT_THRESHOLD = 96
def __init__(self, parent, db, annotation_map, done_callback):
QThread.__init__(self, parent)
self.db = db
self.pd = ProgressDialog(_('Merging user annotations into database'), '',
0, len(job.result), parent=parent)
self.am = annotation_map
self.done_callback = done_callback
self.pd.canceled_signal.connect(self.canceled)
self.pd.setModal(True)
self.pd.show()
self.update_progress.connect(self.pd.set_value,
type=Qt.QueuedConnection)
self.update_done.connect(self.pd.hide, type=Qt.QueuedConnection)
def generate_annotation_html(self, bookmark):
# Returns <div class="user_annotations"> ... </div>
last_read_location = bookmark.last_read_location
timestamp = datetime.datetime.utcfromtimestamp(bookmark.timestamp)
percent_read = bookmark.percent_read
ka_soup = BeautifulSoup()
dtc = 0
divTag = Tag(ka_soup,'div')
divTag['class'] = 'user_annotations'
# Add the last-read location
spanTag = Tag(ka_soup, 'span')
spanTag['style'] = 'font-weight:bold'
if bookmark.book_format == 'pdf':
spanTag.insert(0,NavigableString(
_("%(time)s<br />Last Page Read: %(loc)d (%(pr)d%%)") % \
dict(time=strftime(u'%x', timestamp.timetuple()),
loc=last_read_location,
pr=percent_read)))
else:
spanTag.insert(0,NavigableString(
_("%(time)s<br />Last Page Read: Location %(loc)d (%(pr)d%%)") % \
dict(time=strftime(u'%x', timestamp.timetuple()),
loc=last_read_location,
pr=percent_read)))
divTag.insert(dtc, spanTag)
dtc += 1
divTag.insert(dtc, Tag(ka_soup,'br'))
dtc += 1
if bookmark.user_notes:
user_notes = bookmark.user_notes
annotations = []
# Add the annotations sorted by location
# Italicize highlighted text
for location in sorted(user_notes):
if user_notes[location]['text']:
annotations.append(
_('<b>Location %(dl)d &bull; %(typ)s</b><br />%(text)s<br />') % \
dict(dl=user_notes[location]['displayed_location'],
typ=user_notes[location]['type'],
text=(user_notes[location]['text'] if \
user_notes[location]['type'] == 'Note' else \
'<i>%s</i>' % user_notes[location]['text'])))
else:
if bookmark.book_format == 'pdf':
annotations.append(
_('<b>Page %(dl)d &bull; %(typ)s</b><br />') % \
dict(dl=user_notes[location]['displayed_location'],
typ=user_notes[location]['type']))
else:
annotations.append(
_('<b>Location %(dl)d &bull; %(typ)s</b><br />') % \
dict(dl=user_notes[location]['displayed_location'],
typ=user_notes[location]['type']))
for annotation in annotations:
divTag.insert(dtc, annotation)
dtc += 1
ka_soup.insert(0,divTag)
return ka_soup
'''
def mark_book_as_read(self,id):
read_tag = gprefs.get('catalog_epub_mobi_read_tag')
if read_tag:
self.db.set_tags(id, [read_tag], append=True)
'''
def canceled(self):
self.pd.hide()
def run(self):
ignore_tags = set(['Catalog','Clippings'])
for (i, id) in enumerate(self.am):
bm = Device.UserAnnotation(self.am[id][0],self.am[id][1])
if bm.type == 'kindle_bookmark':
mi = self.db.get_metadata(id, index_is_id=True)
user_notes_soup = self.generate_annotation_html(bm.value)
if mi.comments:
a_offset = mi.comments.find('<div class="user_annotations">')
ad_offset = mi.comments.find('<hr class="annotations_divider" />')
if a_offset >= 0:
mi.comments = mi.comments[:a_offset]
if ad_offset >= 0:
mi.comments = mi.comments[:ad_offset]
if set(mi.tags).intersection(ignore_tags):
continue
if mi.comments:
hrTag = Tag(user_notes_soup,'hr')
hrTag['class'] = 'annotations_divider'
user_notes_soup.insert(0,hrTag)
mi.comments += user_notes_soup.prettify()
else:
mi.comments = unicode(user_notes_soup.prettify())
# Update library comments
self.db.set_comment(id, mi.comments)
'''
# Update 'read' tag except for Catalogs/Clippings
if bm.value.percent_read >= self.FINISHED_READING_PCT_THRESHOLD:
if not set(mi.tags).intersection(ignore_tags):
self.mark_book_as_read(id)
'''
# Add bookmark file to id
self.db.add_format_with_hooks(id, bm.value.bookmark_extension,
bm.value.path, index_is_id=True)
self.update_progress.emit(i)
elif bm.type == 'kindle_clippings':
# Find 'My Clippings' author=Kindle in database, or add
last_update = 'Last modified %s' % strftime(u'%x %X',bm.value['timestamp'].timetuple())
mc_id = list(db.data.parse('title:"My Clippings"'))
if mc_id:
do_add_format(self.db, mc_id[0], 'TXT', bm.value['path'])
mi = self.db.get_metadata(mc_id[0], index_is_id=True)
mi.comments = last_update
self.db.set_metadata(mc_id[0], mi)
else:
mi = MetaInformation('My Clippings', authors = ['Kindle'])
mi.tags = ['Clippings']
mi.comments = last_update
self.db.add_books([bm.value['path']], ['txt'], [mi])
self.update_done.emit()
self.done_callback(self.am.keys())
# }}}
if not job.result: return
@ -246,9 +137,25 @@ class FetchAnnotationsAction(InterfaceAction):
_('User annotations generated from main library only'),
show=True)
db = self.gui.library_view.model().db
device = self.gui.device_manager.device
self.__annotation_updater = Updater(self.gui, db, job.result,
self.Dispatcher(self.gui.library_view.model().refresh_ids))
self.__annotation_updater = Updater(self.gui, db, device, job.result,
self.Dispatcher(self.annotations_updated))
self.__annotation_updater.start()
def annotations_updated(self, ids, errors):
self.gui.library_view.model().refresh_ids(ids)
if errors:
db = self.gui.library_view.model().db
entries = []
for id_, tb in errors.iteritems():
title = id_
if isinstance(id_, type(1)):
title = db.title(id_, index_is_id=True)
entries.extend([title, tb, ''])
error_dialog(self.gui, _('Some errors'),
_('Could not fetch annotations for some books. Click '
'show details to see which ones.'),
det_msg='\n'.join(entries), show=True)

View File

@ -204,6 +204,7 @@ def render_data(mi, use_roman_numbers=True, all_fields=False):
class CoverView(QWidget): # {{{
cover_changed = pyqtSignal(object, object)
cover_removed = pyqtSignal(object)
def __init__(self, vertical, parent=None):
QWidget.__init__(self, parent)
@ -289,10 +290,12 @@ class CoverView(QWidget): # {{{
cm = QMenu(self)
paste = cm.addAction(_('Paste Cover'))
copy = cm.addAction(_('Copy Cover'))
remove = cm.addAction(_('Remove Cover'))
if not QApplication.instance().clipboard().mimeData().hasImage():
paste.setEnabled(False)
copy.triggered.connect(self.copy_to_clipboard)
paste.triggered.connect(self.paste_from_clipboard)
remove.triggered.connect(self.remove_cover)
cm.exec_(ev.globalPos())
def copy_to_clipboard(self):
@ -315,6 +318,13 @@ class CoverView(QWidget): # {{{
self.cover_changed.emit(id_,
pixmap_to_data(pmap))
def remove_cover(self):
id_ = self.data.get('id', None)
self.pixmap = self.default_pixmap
self.do_layout()
self.update()
if id_ is not None:
self.cover_removed.emit(id_)
# }}}
@ -457,6 +467,7 @@ class BookDetails(QWidget): # {{{
remote_file_dropped = pyqtSignal(object, object)
files_dropped = pyqtSignal(object, object)
cover_changed = pyqtSignal(object, object)
cover_removed = pyqtSignal(object)
# Drag 'n drop {{{
DROPABBLE_EXTENSIONS = IMAGE_EXTENSIONS+BOOK_EXTENSIONS
@ -514,6 +525,7 @@ class BookDetails(QWidget): # {{{
self.cover_view = CoverView(vertical, self)
self.cover_view.cover_changed.connect(self.cover_changed.emit)
self.cover_view.cover_removed.connect(self.cover_removed.emit)
self._layout.addWidget(self.cover_view)
self.book_info = BookInfo(vertical, self)
self._layout.addWidget(self.book_info)

View File

@ -0,0 +1,24 @@
# coding: utf-8
from __future__ import (unicode_literals, division, absolute_import,
print_function)
__license__ = 'GPL v3'
__copyright__ = '2011, Anthon van der Neut <A.van.der.Neut@ruamel.eu>'
from calibre.gui2.convert.djvu_input_ui import Ui_Form
from calibre.gui2.convert import Widget
class PluginWidget(Widget, Ui_Form):
TITLE = _('DJVU Input')
HELP = _('Options specific to')+' DJVU '+_('input')
COMMIT_NAME = 'djvu_input'
ICON = I('mimetypes/djvu.png')
def __init__(self, parent, get_option, get_help, db=None, book_id=None):
Widget.__init__(self, parent,
['use_djvutxt', ])
self.db, self.book_id = db, book_id
self.initialize_options(get_option, get_help, db, book_id)

View File

@ -0,0 +1,28 @@
<?xml version="1.0" encoding="UTF-8"?>
<ui version="4.0">
<class>Form</class>
<widget class="QWidget" name="Form">
<property name="geometry">
<rect>
<x>0</x>
<y>0</y>
<width>400</width>
<height>300</height>
</rect>
</property>
<property name="windowTitle">
<string>Form</string>
</property>
<layout class="QVBoxLayout" name="verticalLayout">
<item>
<widget class="QCheckBox" name="opt_use_djvutxt">
<property name="text">
<string>Use &amp;djvutxt, if available, for faster processing</string>
</property>
</widget>
</item>
</layout>
</widget>
<resources/>
<connections/>
</ui>

View File

@ -45,7 +45,7 @@ class TemplateHighlighter(QSyntaxHighlighter):
"keyword"))
TemplateHighlighter.Rules.append((QRegExp(
"|".join([r"\b%s\b" % builtin for builtin in
formatter_functions.get_builtins()])),
formatter_functions().get_builtins()])),
"builtin"))
TemplateHighlighter.Rules.append((QRegExp(
@ -248,8 +248,8 @@ class TemplateDialog(QDialog, Ui_TemplateDialog):
except:
self.builtin_source_dict = {}
self.funcs = formatter_functions.get_functions()
self.builtins = formatter_functions.get_builtins()
self.funcs = formatter_functions().get_functions()
self.builtins = formatter_functions().get_builtins()
func_names = sorted(self.funcs)
self.function.clear()

View File

@ -261,6 +261,8 @@ class LayoutMixin(object): # {{{
self.book_details.files_dropped.connect(self.iactions['Add Books'].files_dropped_on_book)
self.book_details.cover_changed.connect(self.bd_cover_changed,
type=Qt.QueuedConnection)
self.book_details.cover_removed.connect(self.bd_cover_removed,
type=Qt.QueuedConnection)
self.book_details.remote_file_dropped.connect(
self.iactions['Add Books'].remote_file_dropped_on_book,
type=Qt.QueuedConnection)
@ -279,6 +281,12 @@ class LayoutMixin(object): # {{{
if self.cover_flow:
self.cover_flow.dataChanged()
def bd_cover_removed(self, id_):
self.library_view.model().db.remove_cover(id_, commit=True,
notify=False)
if self.cover_flow:
self.cover_flow.dataChanged()
def save_layout_state(self):
for x in ('library', 'memory', 'card_a', 'card_b'):
getattr(self, x+'_view').save_state()

View File

@ -500,7 +500,8 @@ class JobsDialog(QDialog, Ui_JobsDialog):
def kill_job(self, *args):
rows = [index.row() for index in
self.jobs_view.selectionModel().selectedRows()]
return error_dialog(self, _('No job'),
if not rows:
return error_dialog(self, _('No job'),
_('No job selected'), show=True)
if question_dialog(self, _('Are you sure?'),
ngettext('Do you really want to stop the selected job?',

View File

@ -1239,11 +1239,14 @@ class DeviceBooksModel(BooksModel): # {{{
def paths(self, rows):
return [self.db[self.map[r.row()]].path for r in rows ]
def paths_for_db_ids(self, db_ids):
res = []
def paths_for_db_ids(self, db_ids, as_map=False):
res = defaultdict(list) if as_map else []
for r,b in enumerate(self.db):
if b.application_id in db_ids:
res.append((r,b))
if as_map:
res[b.application_id].append(b)
else:
res.append((r,b))
return res
def get_collections_with_ids(self):

View File

@ -127,7 +127,7 @@ class CreateCustomColumn(QDialog, Ui_QCreateCustomColumn):
self.composite_sort_by.setCurrentIndex(sb)
self.composite_make_category.setChecked(
c['display'].get('make_category', False))
self.composite_make_category.setChecked(
self.composite_contains_html.setChecked(
c['display'].get('contains_html', False))
elif ct == 'enumeration':
self.enum_box.setText(','.join(c['display'].get('enum_values', [])))

View File

@ -206,7 +206,7 @@
<item>
<widget class="QCheckBox" name="opt_autolaunch_server">
<property name="text">
<string>Run server &amp;automatically on startup</string>
<string>Run server &amp;automatically when calibre starts</string>
</property>
</widget>
</item>

View File

@ -82,8 +82,8 @@ class ConfigWidget(ConfigWidgetBase, Ui_Form):
traceback.print_exc()
self.builtin_source_dict = {}
self.funcs = formatter_functions.get_functions()
self.builtins = formatter_functions.get_builtins_and_aliases()
self.funcs = formatter_functions().get_functions()
self.builtins = formatter_functions().get_builtins_and_aliases()
self.build_function_names_box()
self.function_name.currentIndexChanged[str].connect(self.function_index_changed)
@ -217,13 +217,13 @@ class ConfigWidget(ConfigWidgetBase, Ui_Form):
pass
def commit(self):
formatter_functions.reset_to_builtins()
formatter_functions().reset_to_builtins()
pref_value = []
for f in self.funcs:
if f in self.builtins:
continue
func = self.funcs[f]
formatter_functions.register_function(func)
formatter_functions().register_function(func)
pref_value.append((func.name, func.doc, func.arg_count, func.program_text))
self.db.prefs.set('user_template_functions', pref_value)

View File

@ -37,6 +37,7 @@ class SearchRestrictionMixin(object):
search = unicode(search)
if not search:
self.search_restriction.setCurrentIndex(0)
self._apply_search_restriction('')
else:
s = '*' + search
if self.search_restriction.count() > 1:

View File

@ -47,6 +47,9 @@ def get_parser(usage):
def get_db(dbpath, options):
if options.library_path is not None:
dbpath = options.library_path
if dbpath is None:
raise ValueError('No saved library path, either run the GUI or use the'
' --with-library option')
dbpath = os.path.abspath(dbpath)
return LibraryDatabase2(dbpath)
@ -365,6 +368,7 @@ def command_remove(args, dbpath):
def do_add_format(db, id, fmt, path):
db.add_format_with_hooks(id, fmt.upper(), path, index_is_id=True)
send_message()
def add_format_option_parser():
return get_parser(_(
@ -393,6 +397,7 @@ def command_add_format(args, dbpath):
def do_remove_format(db, id, fmt):
db.remove_format(id, fmt, index_is_id=True)
send_message()
def remove_format_option_parser():
return get_parser(_(

View File

@ -133,7 +133,7 @@ class Rule(object): # {{{
'lt': ('1', '', ''),
'gt': ('', '', '1')
}[action]
return "cmp(format_date(raw_field('%s'), 'yyyy-MM-dd'), %s, '%s', '%s', '%s')" % (col,
return "strcmp(format_date(raw_field('%s'), 'yyyy-MM-dd'), '%s', '%s', '%s', '%s')" % (col,
val, lt, eq, gt)
def multiple_condition(self, col, action, val, sep):

View File

@ -302,7 +302,8 @@ class LibraryDatabase2(LibraryDatabase, SchemaUpgrade, CustomColumns):
if cats_changed:
self.prefs.set('user_categories', user_cats)
load_user_template_functions(self.prefs.get('user_template_functions', []))
if not self.is_second_db:
load_user_template_functions(self.prefs.get('user_template_functions', []))
self.conn.executescript('''
DROP TRIGGER IF EXISTS author_insert_trg;
@ -2103,7 +2104,9 @@ class LibraryDatabase2(LibraryDatabase, SchemaUpgrade, CustomColumns):
user_mi = mi.get_all_user_metadata(make_copy=False)
for key in user_mi.iterkeys():
if key in self.field_metadata and \
user_mi[key]['datatype'] == self.field_metadata[key]['datatype']:
user_mi[key]['datatype'] == self.field_metadata[key]['datatype'] and \
(user_mi[key]['datatype'] != 'text' or
user_mi[key]['is_multiple'] == self.field_metadata[key]['is_multiple']):
val = mi.get(key, None)
if force_changes or val is not None:
doit(self.set_custom, id, val=val, extra=mi.get_extra(key),

View File

@ -150,21 +150,12 @@ class Formatter(TemplateFormatter):
traceback.print_exc()
b = None
if b is not None and b['datatype'] == 'composite':
val = b.get('#value#', None)
if val is not None:
return val.replace('/', '_').replace('\\', '_')
if key in self.composite_values:
self.composite_values[key] = val
return self.composite_values[key]
try:
# We really should not get here, but it is safer to try
self.composite_values[key] = 'RECURSIVE_COMPOSITE FIELD (S2D) ' + key
self.composite_values[key] = \
self.vformat(b['display']['composite_template'],
[], kwargs).replace('/', '_').replace('\\', '_')
return self.composite_values[key]
except Exception, e:
return unicode(e)
self.composite_values[key] = 'RECURSIVE_COMPOSITE FIELD (S2D) ' + key
self.composite_values[key] = \
self.vformat(b['display']['composite_template'], [], kwargs)
return self.composite_values[key]
if key in kwargs:
val = kwargs[key]
if isinstance(val, list):
@ -179,13 +170,6 @@ def get_components(template, mi, id, timefmt='%b %Y', length=250,
sanitize_func=ascii_filename, replace_whitespace=False,
to_lowercase=False, safe_format=True):
# Note: the mi argument is assumed to be an instance of Metadata returned
# by db.get_metadata(). Reason: the composite columns should have already
# been evaluated, which get_metadata does. If the mi is something else and
# if the template uses composite columns, then a best-efforts attempt is
# made to evaluate them. This will fail if the template uses a user-defined
# template function.
tsorder = tweaks['save_template_title_series_sorting']
format_args = FORMAT_ARGS.copy()
format_args.update(mi.all_non_none_fields())
@ -374,6 +358,8 @@ def do_save_book_to_disk(id_, mi, cover, plugboards,
newmi.template_to_attribute(mi, cpb)
else:
newmi = mi
if cover:
newmi.cover_data = ('jpg', cover)
set_metadata(stream, newmi, fmt)
except:
if DEBUG:

View File

@ -20,7 +20,7 @@ What formats does |app| support conversion to/from?
|app| supports the conversion of many input formats to many output formats.
It can convert every input format in the following list, to every output format.
*Input Formats:* CBZ, CBR, CBC, CHM, EPUB, FB2, HTML, HTMLZ, LIT, LRF, MOBI, ODT, PDF, PRC, PDB, PML, RB, RTF, SNB, TCR, TXT, TXTZ
*Input Formats:* CBZ, CBR, CBC, CHM, DJVU, EPUB, FB2, HTML, HTMLZ, LIT, LRF, MOBI, ODT, PDF, PRC, PDB, PML, RB, RTF, SNB, TCR, TXT, TXTZ
*Output Formats:* EPUB, FB2, OEB, LIT, LRF, MOBI, HTMLZ, PDB, PML, RB, PDF, RTF, SNB, TCR, TXT, TXTZ
@ -28,6 +28,7 @@ It can convert every input format in the following list, to every output format.
PRC is a generic format, |app| supports PRC files with TextRead and MOBIBook headers.
PDB is also a generic format. |app| supports eReder, Plucker, PML and zTxt PDB files.
DJVU support is only for converting DJVU files that contain embedded text. These are typically generated by OCR software.
.. _best-source-formats:
@ -241,6 +242,10 @@ Replace ``192.168.1.2`` with the local IP address of the computer running |app|.
If you get timeout errors while browsing the calibre catalog in Stanza, try increasing the connection timeout value in the stanza settings. Go to Info->Settings and increase the value of Download Timeout.
.. note::
As of iOS version 5 Stanza no longer works on Apple devices. Alternatives to Stanza are discussed `here <http://www.mobileread.com/forums/showthread.php?t=152789>`_.
Using iBooks
**************
@ -250,7 +255,7 @@ Start the Safari browser and type in the IP address and port of the computer run
Replace ``192.168.1.2`` with the local IP address of the computer running |app|. If you have changed the port the |app| content server is running on, you will have to change ``8080`` as well to the new port. The local IP address is the IP address you computer is assigned on your home network. A quick Google search will tell you how to find out your local IP address.
You wills ee a list of books in Safari, just click on the epub link for whichever book you want to read, Safari will then prompt you to open it with iBooks.
You will see a list of books in Safari, just click on the epub link for whichever book you want to read, Safari will then prompt you to open it with iBooks.
With the USB cable + iTunes

View File

@ -266,8 +266,9 @@ The following functions are available in addition to those described in single-f
* ``has_cover()`` -- return ``Yes`` if the book has a cover, otherwise return the empty string
* ``not(value)`` -- returns the string "1" if the value is empty, otherwise returns the empty string. This function works well with test or first_non_empty. You can have as many values as you want.
* ``list_difference(list1, list2, separator)`` -- return a list made by removing from `list1` any item found in `list2`, using a case-insensitive compare. The items in `list1` and `list2` are separated by separator, as are the items in the returned list.
* ``list_equals(list1, sep1, list2, sep2, yes_val, no_val) -- return `yes_val` if list1 and list2 contain the same items, otherwise return `no_val`. The items are determined by splitting each list using the appropriate separator character (`sep1` or `sep2`). The order of items in the lists is not relevant. The compare is case insensitive.
* ``list_equals(list1, sep1, list2, sep2, yes_val, no_val)`` -- return `yes_val` if `list1` and `list2` contain the same items, otherwise return `no_val`. The items are determined by splitting each list using the appropriate separator character (`sep1` or `sep2`). The order of items in the lists is not relevant. The compare is case insensitive.
* ``list_intersection(list1, list2, separator)`` -- return a list made by removing from `list1` any item not found in `list2`, using a case-insensitive compare. The items in `list1` and `list2` are separated by separator, as are the items in the returned list.
* ``list_re(src_list, separator, search_re, opt_replace)`` -- Construct a list by first separating `src_list` into items using the `separator` character. For each item in the list, check if it matches `search_re`. If it does, then add it to the list to be returned. If `opt_replace` is not the empty string, then apply the replacement before adding the item to the returned list.
* ``list_sort(list, direction, separator)`` -- return list sorted using a case-insensitive sort. If `direction` is zero, the list is sorted ascending, otherwise descending. The list items are separated by separator, as are the items in the returned list.
* ``list_union(list1, list2, separator)`` -- return a list made by merging the items in list1 and list2, removing duplicate items using a case-insensitive compare. If items differ in case, the one in list1 is used. The items in list1 and list2 are separated by separator, as are the items in the returned list.
* ``multiply(x, y)`` -- returns x * y. Throws an exception if either x or y are not numbers.

View File

@ -65,7 +65,7 @@ def generate_template_language_help():
funcs = defaultdict(dict)
for func in formatter_functions.get_builtins().values():
for func in formatter_functions().get_builtins().values():
class_name = func.__class__.__name__
func_sig = getattr(func, 'doc')
x = func_sig.find(' -- ')

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

Some files were not shown because too many files have changed in this diff Show More