Sync to trunk.
@ -2,6 +2,7 @@
|
|||||||
.check-cache.pickle
|
.check-cache.pickle
|
||||||
src/calibre/plugins
|
src/calibre/plugins
|
||||||
resources/images.qrc
|
resources/images.qrc
|
||||||
|
src/calibre/ebooks/oeb/display/test/*.js
|
||||||
src/calibre/manual/.build/
|
src/calibre/manual/.build/
|
||||||
src/calibre/manual/cli/
|
src/calibre/manual/cli/
|
||||||
src/calibre/manual/template_ref.rst
|
src/calibre/manual/template_ref.rst
|
||||||
@ -15,6 +16,7 @@ resources/ebook-convert-complete.pickle
|
|||||||
resources/builtin_recipes.xml
|
resources/builtin_recipes.xml
|
||||||
resources/builtin_recipes.zip
|
resources/builtin_recipes.zip
|
||||||
resources/template-functions.json
|
resources/template-functions.json
|
||||||
|
resources/display/*.js
|
||||||
setup/installer/windows/calibre/build.log
|
setup/installer/windows/calibre/build.log
|
||||||
src/calibre/translations/.errors
|
src/calibre/translations/.errors
|
||||||
src/cssutils/.svn/
|
src/cssutils/.svn/
|
||||||
|
119
Changelog.yaml
@ -19,6 +19,125 @@
|
|||||||
# new recipes:
|
# new recipes:
|
||||||
# - title:
|
# - title:
|
||||||
|
|
||||||
|
- version: 0.8.31
|
||||||
|
date: 2011-12-16
|
||||||
|
|
||||||
|
new features:
|
||||||
|
- title: "Conversion engine: When parsing invalid XHTML use the HTML 5 algorithm, for greater robustness."
|
||||||
|
tickets: [901466]
|
||||||
|
|
||||||
|
- title: "Driver for PocketBook 611 and Lenovo IdeaPad"
|
||||||
|
|
||||||
|
- title: "Allow customization of the order in which custom column editing is performed in the edit metadata dialog. Setting is available via Preferences->Tweaks."
|
||||||
|
tickets: [902731]
|
||||||
|
|
||||||
|
- title: "MOBI news download: Allow recipes to set a thumbnail for entries in the periodical table of contents. Currently used by the NYTimes, WSJ, Independent, GUardian and Globe and Mail recipes"
|
||||||
|
tickets: [900130]
|
||||||
|
|
||||||
|
- title: "E-book viewer: Add an option to the right click menu to search for the currently selected word"
|
||||||
|
|
||||||
|
- title: "Automatically hide the no internet connection available error message if the connection is restored before the user clicks OK"
|
||||||
|
|
||||||
|
bug fixes:
|
||||||
|
- title: "Fix comments not hidden in Book details panel when they are turned off via Preferences->Look & Feel->Book Details"
|
||||||
|
|
||||||
|
- title: "E-book viewer: Do not popup an error message if the user tries to use the mouse wheel to scroll before a document is loaded."
|
||||||
|
tickets: [903449]
|
||||||
|
|
||||||
|
- title: "Add docx to the list of ebook extensions."
|
||||||
|
tickets: [903452]
|
||||||
|
|
||||||
|
- title: "When downloading metadata from non-English Amazon websites, do not correct the case of book titles."
|
||||||
|
|
||||||
|
- title: "Fix regression in 0.8.30 that broke bulk conversion of a single book."
|
||||||
|
tickets: [902506]
|
||||||
|
|
||||||
|
- title: "When minimized to system tray do not display the no internet connection error as a dialog box, instead use a system tray notification"
|
||||||
|
|
||||||
|
- title: "Catalog generation: Include the series_index field for custom series columns as well"
|
||||||
|
|
||||||
|
- title: "Comic Input: Do not rescale images when using the Tablet output profile (or any output profile with a screen size larger than 3000x3000)"
|
||||||
|
|
||||||
|
- title: "HTML Input: Ignore unparseable URLs instead of crashing on them."
|
||||||
|
tickets: [902372]
|
||||||
|
|
||||||
|
|
||||||
|
improved recipes:
|
||||||
|
- La Republica
|
||||||
|
- CND
|
||||||
|
- Berliner Zeitung
|
||||||
|
- Zaman Gazetesi
|
||||||
|
|
||||||
|
new recipes:
|
||||||
|
- title: CND Weekly
|
||||||
|
author: Derek Liang
|
||||||
|
|
||||||
|
- title: descopera.org
|
||||||
|
author: Marius Ignatescu
|
||||||
|
|
||||||
|
- title: Rynek Zdrowia
|
||||||
|
author: spi630
|
||||||
|
|
||||||
|
- version: 0.8.30
|
||||||
|
date: 2011-12-09
|
||||||
|
|
||||||
|
new features:
|
||||||
|
- title: "Get Books: Add amazon.es and amazon.it"
|
||||||
|
|
||||||
|
- title: "Bulk convert dialog: Disable the Use saved conversion settings checkbox when none of the books being converted has saved conversion settings"
|
||||||
|
|
||||||
|
- title: "ebook-viewer: And a command line switch to specify the position at which the file should be opened."
|
||||||
|
tickets: [899325]
|
||||||
|
|
||||||
|
- title: "Distribute calibre source code compressed with xz instead of gzip for a 40% reduction in size"
|
||||||
|
|
||||||
|
bug fixes:
|
||||||
|
- title: "Get Books: Fix ebooks.com and amazon.fr. Fix cover display in Diesel ebooks store."
|
||||||
|
|
||||||
|
- title: "HTML Input: Fix regression that broke processing of a small fraction of HTML files encoded in a multi-byte character encoding."
|
||||||
|
tickets: [899691]
|
||||||
|
|
||||||
|
- title: "Greatly reduce the delay at the end of a bulk metadata edit operation that operates on a very large number (thousands) of books"
|
||||||
|
|
||||||
|
- title: "Template language: Fix the subitems formatter function to split only when the period is surrounded by non-white space and not another period"
|
||||||
|
|
||||||
|
- title: "Fix ampersands in titles not displaying in the Cover Browser"
|
||||||
|
|
||||||
|
- title: "MOBI Output: Do not ignore an empty anchor at the end of a block element."
|
||||||
|
|
||||||
|
- title: "MOBI Output: Handle links to inline anchors placed inside large blocks of text correctly, i.e. the link should not point to the start of the block."
|
||||||
|
tickets: [899831]
|
||||||
|
|
||||||
|
- title: "E-book viewer: Fix searching for text that is represented as entities in the underlying HTML."
|
||||||
|
tickets: [899573]
|
||||||
|
|
||||||
|
- title: "Have the Esc shortcut perform exactly the same set of actions as clicking the clear button."
|
||||||
|
tickets: [900048]
|
||||||
|
|
||||||
|
- title: "Prevent the adding books dialog from becoming too wide"
|
||||||
|
|
||||||
|
- title: "Fix custom column editing not behaving correctly with the Previous button in the edit metadata dialog."
|
||||||
|
tickets: [899836]
|
||||||
|
|
||||||
|
- title: "T1 driver. More fixes to datetime handling to try to convince the T1's buggy firmware to not rescan metadata."
|
||||||
|
tickets: [899514]
|
||||||
|
|
||||||
|
- title: "Only allow searching via non accented author names if the user interface language in calibre is set to English."
|
||||||
|
tickets: [899227]
|
||||||
|
|
||||||
|
improved recipes:
|
||||||
|
- Die Zeit subscription
|
||||||
|
- Metro UK
|
||||||
|
- suedeutsche.de
|
||||||
|
|
||||||
|
new recipes:
|
||||||
|
- title: Blues News
|
||||||
|
author: Oskar Kunicki
|
||||||
|
|
||||||
|
- title: "TVXS"
|
||||||
|
author: Hargikas
|
||||||
|
|
||||||
|
|
||||||
- version: 0.8.29
|
- version: 0.8.29
|
||||||
date: 2011-12-02
|
date: 2011-12-02
|
||||||
|
|
||||||
|
@ -1,19 +1,38 @@
|
|||||||
from calibre.web.feeds.news import BasicNewsRecipe
|
from calibre.web.feeds.news import BasicNewsRecipe
|
||||||
|
import re
|
||||||
class Adventure_zone(BasicNewsRecipe):
|
class Adventure_zone(BasicNewsRecipe):
|
||||||
title = u'Adventure Zone'
|
title = u'Adventure Zone'
|
||||||
__author__ = 'fenuks'
|
__author__ = 'fenuks'
|
||||||
description = 'Adventure zone - adventure games from A to Z'
|
description = 'Adventure zone - adventure games from A to Z'
|
||||||
category = 'games'
|
category = 'games'
|
||||||
language = 'pl'
|
language = 'pl'
|
||||||
oldest_article = 15
|
|
||||||
max_articles_per_feed = 100
|
|
||||||
no_stylesheets = True
|
no_stylesheets = True
|
||||||
|
oldest_article = 20
|
||||||
|
max_articles_per_feed = 100
|
||||||
|
use_embedded_content=False
|
||||||
|
preprocess_regexps = [(re.compile(r"<td class='capmain'>Komentarze</td>", re.IGNORECASE), lambda m: '')]
|
||||||
remove_tags_before= dict(name='td', attrs={'class':'main-bg'})
|
remove_tags_before= dict(name='td', attrs={'class':'main-bg'})
|
||||||
remove_tags_after= dict(name='td', attrs={'class':'main-body middle-border'})
|
remove_tags= [dict(name='img', attrs={'alt':'Drukuj'})]
|
||||||
|
remove_tags_after= dict(id='comments')
|
||||||
extra_css = '.main-bg{text-align: left;} td.capmain{ font-size: 22px; }'
|
extra_css = '.main-bg{text-align: left;} td.capmain{ font-size: 22px; }'
|
||||||
feeds = [(u'Nowinki', u'http://www.adventure-zone.info/fusion/feeds/news.php')]
|
feeds = [(u'Nowinki', u'http://www.adventure-zone.info/fusion/feeds/news.php')]
|
||||||
|
|
||||||
|
def parse_feeds (self):
|
||||||
|
feeds = BasicNewsRecipe.parse_feeds(self)
|
||||||
|
soup=self.index_to_soup(u'http://www.adventure-zone.info/fusion/feeds/news.php')
|
||||||
|
tag=soup.find(name='channel')
|
||||||
|
titles=[]
|
||||||
|
for r in tag.findAll(name='image'):
|
||||||
|
r.extract()
|
||||||
|
art=tag.findAll(name='item')
|
||||||
|
for i in art:
|
||||||
|
titles.append(i.title.string)
|
||||||
|
for feed in feeds:
|
||||||
|
for article in feed.articles[:]:
|
||||||
|
article.title=titles[feed.articles.index(article)]
|
||||||
|
return feeds
|
||||||
|
|
||||||
|
|
||||||
def get_cover_url(self):
|
def get_cover_url(self):
|
||||||
soup = self.index_to_soup('http://www.adventure-zone.info/fusion/news.php')
|
soup = self.index_to_soup('http://www.adventure-zone.info/fusion/news.php')
|
||||||
cover=soup.find(id='box_OstatninumerAZ')
|
cover=soup.find(id='box_OstatninumerAZ')
|
||||||
@ -22,17 +41,10 @@ class Adventure_zone(BasicNewsRecipe):
|
|||||||
|
|
||||||
|
|
||||||
def skip_ad_pages(self, soup):
|
def skip_ad_pages(self, soup):
|
||||||
skip_tag = soup.body.findAll(name='a')
|
skip_tag = soup.body.find(name='td', attrs={'class':'main-bg'})
|
||||||
if skip_tag is not None:
|
skip_tag = skip_tag.findAll(name='a')
|
||||||
for r in skip_tag:
|
for r in skip_tag:
|
||||||
if 'articles.php?' in r['href']:
|
if r.strong:
|
||||||
if r.strong is not None:
|
word=r.strong.string
|
||||||
word=r.strong.string
|
if word and (('zapowied' in word) or ('recenzj' in word) or ('solucj' in word)):
|
||||||
if ('zapowied' or 'recenzj') in word:
|
return self.index_to_soup('http://www.adventure-zone.info/fusion/print.php?type=A&item'+r['href'][r['href'].find('article_id')+7:], raw=True)
|
||||||
return self.index_to_soup('http://www.adventure-zone.info/fusion/print.php?type=A&item_id'+r['href'][r['href'].find('_id')+3:], raw=True)
|
|
||||||
else:
|
|
||||||
None
|
|
||||||
|
|
||||||
def print_version(self, url):
|
|
||||||
return url.replace('news.php?readmore', 'print.php?type=N&item_id')
|
|
||||||
|
|
@ -1,5 +1,4 @@
|
|||||||
from calibre.web.feeds.news import BasicNewsRecipe
|
from calibre.web.feeds.news import BasicNewsRecipe
|
||||||
|
|
||||||
class AstroNEWS(BasicNewsRecipe):
|
class AstroNEWS(BasicNewsRecipe):
|
||||||
title = u'AstroNEWS'
|
title = u'AstroNEWS'
|
||||||
__author__ = 'fenuks'
|
__author__ = 'fenuks'
|
||||||
@ -8,11 +7,16 @@ class AstroNEWS(BasicNewsRecipe):
|
|||||||
language = 'pl'
|
language = 'pl'
|
||||||
oldest_article = 8
|
oldest_article = 8
|
||||||
max_articles_per_feed = 100
|
max_articles_per_feed = 100
|
||||||
auto_cleanup = True
|
#extra_css= 'table {text-align: left;}'
|
||||||
|
no_stylesheets=True
|
||||||
cover_url='http://news.astronet.pl/img/logo_news.jpg'
|
cover_url='http://news.astronet.pl/img/logo_news.jpg'
|
||||||
# no_stylesheets= True
|
remove_tags=[dict(name='hr')]
|
||||||
feeds = [(u'Wiadomości', u'http://news.astronet.pl/rss.cgi')]
|
feeds = [(u'Wiadomości', u'http://news.astronet.pl/rss.cgi')]
|
||||||
|
|
||||||
def print_version(self, url):
|
def print_version(self, url):
|
||||||
return url.replace('astronet.pl/', 'astronet.pl/print.cgi?')
|
return url.replace('astronet.pl/', 'astronet.pl/print.cgi?')
|
||||||
|
|
||||||
|
def preprocess_html(self, soup):
|
||||||
|
for item in soup.findAll(align=True):
|
||||||
|
del item['align']
|
||||||
|
return soup
|
||||||
|
@ -1,61 +1,44 @@
|
|||||||
from calibre.web.feeds.recipes import BasicNewsRecipe
|
from calibre.web.feeds.recipes import BasicNewsRecipe
|
||||||
import re
|
|
||||||
|
'''Calibre recipe to convert the RSS feeds of the Berliner Zeitung to an ebook.'''
|
||||||
|
|
||||||
class SportsIllustratedRecipe(BasicNewsRecipe) :
|
class SportsIllustratedRecipe(BasicNewsRecipe) :
|
||||||
__author__ = 'ape'
|
__author__ = 'a.peter'
|
||||||
__copyright__ = 'ape'
|
__copyright__ = 'a.peter'
|
||||||
__license__ = 'GPL v3'
|
__license__ = 'GPL v3'
|
||||||
language = 'de'
|
language = 'de'
|
||||||
description = 'Berliner Zeitung'
|
description = 'Berliner Zeitung RSS'
|
||||||
version = 2
|
version = 4
|
||||||
title = u'Berliner Zeitung'
|
title = u'Berliner Zeitung'
|
||||||
timefmt = ' [%d.%m.%Y]'
|
timefmt = ' [%d.%m.%Y]'
|
||||||
|
|
||||||
|
#oldest_article = 7.0
|
||||||
no_stylesheets = True
|
no_stylesheets = True
|
||||||
remove_javascript = True
|
remove_javascript = True
|
||||||
use_embedded_content = False
|
use_embedded_content = False
|
||||||
publication_type = 'newspaper'
|
publication_type = 'newspaper'
|
||||||
|
|
||||||
keep_only_tags = [dict(name='div', attrs={'class':'teaser t_split t_artikel'})]
|
remove_tags_before = dict(name='div', attrs={'class':'newstype'})
|
||||||
|
remove_tags_after = [dict(id='article_text')]
|
||||||
|
|
||||||
INDEX = 'http://www.berlinonline.de/berliner-zeitung/'
|
feeds = [(u'Startseite', u'http://www.berliner-zeitung.de/home/10808950,10808950,view,asFeed.xml'),
|
||||||
|
(u'Politik', u'http://www.berliner-zeitung.de/home/10808018,10808018,view,asFeed.xml'),
|
||||||
def parse_index(self):
|
(u'Wirtschaft', u'http://www.berliner-zeitung.de/home/10808230,10808230,view,asFeed.xml'),
|
||||||
base = 'http://www.berlinonline.de'
|
(u'Berlin', u'http://www.berliner-zeitung.de/home/10809148,10809148,view,asFeed.xml'),
|
||||||
answer = []
|
(u'Brandenburg', u'http://www.berliner-zeitung.de/home/10809312,10809312,view,asFeed.xml'),
|
||||||
articles = {}
|
(u'Wissenschaft', u'http://www.berliner-zeitung.de/home/10808894,10808894,view,asFeed.xml'),
|
||||||
more = 1
|
(u'Digital', u'http://www.berliner-zeitung.de/home/10808718,10808718,view,asFeed.xml'),
|
||||||
|
(u'Kultur', u'http://www.berliner-zeitung.de/home/10809150,10809150,view,asFeed.xml'),
|
||||||
soup = self.index_to_soup(self.INDEX)
|
(u'Panorama', u'http://www.berliner-zeitung.de/home/10808334,10808334,view,asFeed.xml'),
|
||||||
|
(u'Sport', u'http://www.berliner-zeitung.de/home/10808794,10808794,view,asFeed.xml'),
|
||||||
# Get list of links to ressorts from index page
|
(u'Hertha', u'http://www.berliner-zeitung.de/home/10808800,10808800,view,asFeed.xml'),
|
||||||
ressort_list = soup.findAll('ul', attrs={'class': re.compile('ressortlist')})
|
(u'Union', u'http://www.berliner-zeitung.de/home/10808802,10808802,view,asFeed.xml'),
|
||||||
for ressort in ressort_list[0].findAll('a'):
|
(u'Verkehr', u'http://www.berliner-zeitung.de/home/10809298,10809298,view,asFeed.xml'),
|
||||||
feed_title = ressort.string
|
(u'Polizei', u'http://www.berliner-zeitung.de/home/10809296,10809296,view,asFeed.xml'),
|
||||||
print 'Analyzing', feed_title
|
(u'Meinung', u'http://www.berliner-zeitung.de/home/10808020,10808020,view,asFeed.xml')]
|
||||||
if not articles.has_key(feed_title):
|
|
||||||
articles[feed_title] = []
|
|
||||||
answer.append(feed_title)
|
|
||||||
# Load ressort page.
|
|
||||||
feed = self.index_to_soup('http://www.berlinonline.de' + ressort['href'])
|
|
||||||
# find mainbar div which contains the list of all articles
|
|
||||||
for article_container in feed.findAll('div', attrs={'class': re.compile('mainbar')}):
|
|
||||||
# iterate over all articles
|
|
||||||
for article_teaser in article_container.findAll('div', attrs={'class': re.compile('teaser')}):
|
|
||||||
# extract title of article
|
|
||||||
if article_teaser.h3 != None:
|
|
||||||
article = {'title' : article_teaser.h3.a.string, 'date' : u'', 'url' : base + article_teaser.h3.a['href'], 'description' : u''}
|
|
||||||
articles[feed_title].append(article)
|
|
||||||
else:
|
|
||||||
# Skip teasers for missing photos
|
|
||||||
if article_teaser.div.p.contents[0].find('Foto:') > -1:
|
|
||||||
continue
|
|
||||||
article = {'title': 'Weitere Artikel ' + str(more), 'date': u'', 'url': base + article_teaser.div.p.a['href'], 'description': u''}
|
|
||||||
articles[feed_title].append(article)
|
|
||||||
more += 1
|
|
||||||
answer = [[key, articles[key]] for key in answer if articles.has_key(key)]
|
|
||||||
return answer
|
|
||||||
|
|
||||||
def get_masthead_url(self):
|
def get_masthead_url(self):
|
||||||
return 'http://www.berlinonline.de/.img/berliner-zeitung/blz_logo.gif'
|
return 'http://www.berliner-zeitung.de/image/view/10810244,7040611,data,logo.png'
|
||||||
|
|
||||||
|
def print_version(self, url):
|
||||||
|
return url.replace('.html', ',view,printVersion.html')
|
||||||
|
19
recipes/biolog_pl.recipe
Normal file
@ -0,0 +1,19 @@
|
|||||||
|
# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai
|
||||||
|
|
||||||
|
from calibre.web.feeds.news import BasicNewsRecipe
|
||||||
|
class Biolog_pl(BasicNewsRecipe):
|
||||||
|
title = u'Biolog.pl'
|
||||||
|
oldest_article = 7
|
||||||
|
max_articles_per_feed = 100
|
||||||
|
remove_empty_feeds=True
|
||||||
|
__author__ = 'fenuks'
|
||||||
|
description = u'Przyrodnicze aktualności ze świata nauki (codziennie aktualizowane), kurs biologii, testy i sprawdziany, forum dyskusyjne.'
|
||||||
|
category = 'biology'
|
||||||
|
language = 'pl'
|
||||||
|
cover_url='http://www.biolog.pl/naukowy,portal,biolog.png'
|
||||||
|
no_stylesheets = True
|
||||||
|
#keeps_only_tags=[dict(id='main')]
|
||||||
|
remove_tags_before=dict(id='main')
|
||||||
|
remove_tags_after=dict(name='a', attrs={'name':'komentarze'})
|
||||||
|
remove_tags=[dict(name='img', attrs={'alt':'Komentarze'})]
|
||||||
|
feeds = [(u'Wszystkie', u'http://www.biolog.pl/backend.php'), (u'Medycyna', u'http://www.biolog.pl/medycyna-rss.php'), (u'Ekologia', u'http://www.biolog.pl/rss-ekologia.php'), (u'Genetyka i biotechnologia', u'http://www.biolog.pl/rss-biotechnologia.php'), (u'Botanika', u'http://www.biolog.pl/rss-botanika.php'), (u'Le\u015bnictwo', u'http://www.biolog.pl/rss-lesnictwo.php'), (u'Zoologia', u'http://www.biolog.pl/rss-zoologia.php')]
|
26
recipes/blues.recipe
Normal file
@ -0,0 +1,26 @@
|
|||||||
|
__license__ = 'GPL v3'
|
||||||
|
__copyright__ = '2011, Oskar Kunicki <rakso at interia.pl>'
|
||||||
|
'''
|
||||||
|
Changelog:
|
||||||
|
2011-11-27
|
||||||
|
News from BluesRSS.info
|
||||||
|
'''
|
||||||
|
|
||||||
|
from calibre.web.feeds.news import BasicNewsRecipe
|
||||||
|
|
||||||
|
class BluesRSS(BasicNewsRecipe):
|
||||||
|
title = 'Blues News'
|
||||||
|
__author__ = 'Oskar Kunicki'
|
||||||
|
description ='Blues news from around the world'
|
||||||
|
publisher = 'BluesRSS.info'
|
||||||
|
category = 'news, blues, USA,UK'
|
||||||
|
oldest_article = 5
|
||||||
|
max_articles_per_feed = 100
|
||||||
|
language = 'en'
|
||||||
|
cover_url = 'http://bluesrss.info/cover.jpg'
|
||||||
|
masthead_url = 'http://bluesrss.info/cover.jpg'
|
||||||
|
no_stylesheets = True
|
||||||
|
|
||||||
|
remove_tags = [dict(name='div', attrs={'class':'wp-pagenavi'})]
|
||||||
|
|
||||||
|
feeds = [(u'News', u'http://bluesrss.info/feed/')]
|
@ -23,7 +23,9 @@ class TheCND(BasicNewsRecipe):
|
|||||||
remove_tags = [dict(name='table', attrs={'align':'right'}), dict(name='img', attrs={'src':'http://my.cnd.org/images/logo.gif'}), dict(name='hr', attrs={}), dict(name='small', attrs={})]
|
remove_tags = [dict(name='table', attrs={'align':'right'}), dict(name='img', attrs={'src':'http://my.cnd.org/images/logo.gif'}), dict(name='hr', attrs={}), dict(name='small', attrs={})]
|
||||||
no_stylesheets = True
|
no_stylesheets = True
|
||||||
|
|
||||||
preprocess_regexps = [(re.compile(r'<!--.*?-->', re.DOTALL), lambda m: '')]
|
preprocess_regexps = [ (re.compile(r'<!--.*?-->', re.DOTALL), lambda m: ''),
|
||||||
|
(re.compile('<table width.*?</table>', re.DOTALL), lambda m: ''),
|
||||||
|
]
|
||||||
|
|
||||||
def print_version(self, url):
|
def print_version(self, url):
|
||||||
if url.find('news/article.php') >= 0:
|
if url.find('news/article.php') >= 0:
|
||||||
@ -46,16 +48,18 @@ class TheCND(BasicNewsRecipe):
|
|||||||
title = self.tag_to_string(a)
|
title = self.tag_to_string(a)
|
||||||
self.log('\tFound article: ', title, 'at', url)
|
self.log('\tFound article: ', title, 'at', url)
|
||||||
date = a.nextSibling
|
date = a.nextSibling
|
||||||
|
if re.search('cm', date):
|
||||||
|
continue
|
||||||
if (date is not None) and len(date)>2:
|
if (date is not None) and len(date)>2:
|
||||||
if not articles.has_key(date):
|
if not articles.has_key(date):
|
||||||
articles[date] = []
|
articles[date] = []
|
||||||
articles[date].append({'title':title, 'url':url, 'description': '', 'date':''})
|
articles[date].append({'title':title, 'url':url, 'description': '', 'date':''})
|
||||||
self.log('\t\tAppend to : ', date)
|
self.log('\t\tAppend to : ', date)
|
||||||
|
|
||||||
self.log('log articles', articles)
|
#self.log('log articles', articles)
|
||||||
mostCurrent = sorted(articles).pop()
|
mostCurrent = sorted(articles).pop()
|
||||||
self.title = 'CND ' + mostCurrent
|
self.title = 'CND ' + mostCurrent
|
||||||
|
|
||||||
feeds.append((self.title, articles[mostCurrent]))
|
feeds.append((self.title, articles[mostCurrent]))
|
||||||
|
|
||||||
return feeds
|
return feeds
|
||||||
|
72
recipes/cnd_weekly.recipe
Normal file
@ -0,0 +1,72 @@
|
|||||||
|
#!/usr/bin/env python
|
||||||
|
|
||||||
|
__license__ = 'GPL v3'
|
||||||
|
__copyright__ = '2010, Derek Liang <Derek.liang.ca @@@at@@@ gmail.com>'
|
||||||
|
'''
|
||||||
|
cnd.org
|
||||||
|
'''
|
||||||
|
import re
|
||||||
|
|
||||||
|
from calibre.web.feeds.news import BasicNewsRecipe
|
||||||
|
|
||||||
|
class TheCND(BasicNewsRecipe):
|
||||||
|
|
||||||
|
title = 'CND Weekly'
|
||||||
|
__author__ = 'Derek Liang'
|
||||||
|
description = ''
|
||||||
|
INDEX = 'http://cnd.org'
|
||||||
|
language = 'zh'
|
||||||
|
conversion_options = {'linearize_tables':True}
|
||||||
|
|
||||||
|
remove_tags_before = dict(name='div', id='articleHead')
|
||||||
|
remove_tags_after = dict(id='copyright')
|
||||||
|
remove_tags = [dict(name='table', attrs={'align':'right'}), dict(name='img', attrs={'src':'http://my.cnd.org/images/logo.gif'}), dict(name='hr', attrs={}), dict(name='small', attrs={})]
|
||||||
|
no_stylesheets = True
|
||||||
|
|
||||||
|
preprocess_regexps = [ (re.compile(r'<!--.*?-->', re.DOTALL), lambda m: ''),
|
||||||
|
(re.compile('<table width.*?</table>', re.DOTALL), lambda m: ''),
|
||||||
|
]
|
||||||
|
|
||||||
|
def print_version(self, url):
|
||||||
|
if url.find('news/article.php') >= 0:
|
||||||
|
return re.sub("^[^=]*", "http://my.cnd.org/modules/news/print.php?storyid", url)
|
||||||
|
else:
|
||||||
|
return re.sub("^[^=]*", "http://my.cnd.org/modules/wfsection/print.php?articleid", url)
|
||||||
|
|
||||||
|
def parse_index(self):
|
||||||
|
soup = self.index_to_soup(self.INDEX)
|
||||||
|
|
||||||
|
feeds = []
|
||||||
|
articles = {}
|
||||||
|
|
||||||
|
for a in soup.findAll('a', attrs={'target':'_cnd'}):
|
||||||
|
url = a['href']
|
||||||
|
if url.find('article.php') < 0 :
|
||||||
|
continue
|
||||||
|
if url.startswith('/'):
|
||||||
|
url = 'http://cnd.org'+url
|
||||||
|
title = self.tag_to_string(a)
|
||||||
|
date = a.nextSibling
|
||||||
|
if not re.search('cm', date):
|
||||||
|
continue
|
||||||
|
self.log('\tFound article: ', title, 'at', url, '@', date)
|
||||||
|
if (date is not None) and len(date)>2:
|
||||||
|
if not articles.has_key(date):
|
||||||
|
articles[date] = []
|
||||||
|
articles[date].append({'title':title, 'url':url, 'description': '', 'date':''})
|
||||||
|
self.log('\t\tAppend to : ', date)
|
||||||
|
|
||||||
|
|
||||||
|
sorted_articles = sorted(articles)
|
||||||
|
while sorted_articles:
|
||||||
|
mostCurrent = sorted_articles.pop()
|
||||||
|
self.title = 'CND ' + mostCurrent
|
||||||
|
feeds.append((self.title, articles[mostCurrent]))
|
||||||
|
|
||||||
|
return feeds
|
||||||
|
|
||||||
|
def populate_article_metadata(self, article, soup, first):
|
||||||
|
header = soup.find('h3')
|
||||||
|
self.log('header: ' + self.tag_to_string(header))
|
||||||
|
pass
|
||||||
|
|
22
recipes/computerworld_pl.recipe
Normal file
@ -0,0 +1,22 @@
|
|||||||
|
# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai
|
||||||
|
|
||||||
|
from calibre.web.feeds.news import BasicNewsRecipe
|
||||||
|
class Computerworld_pl(BasicNewsRecipe):
|
||||||
|
title = u'Computerworld.pl'
|
||||||
|
__author__ = 'fenuks'
|
||||||
|
description = u'Serwis o IT w przemyśle, finansach, handlu, administracji oraz rynku IT i telekomunikacyjnym - wiadomości, opinie, analizy, porady prawne'
|
||||||
|
category = 'IT'
|
||||||
|
language = 'pl'
|
||||||
|
no_stylesheets=True
|
||||||
|
oldest_article = 7
|
||||||
|
max_articles_per_feed = 100
|
||||||
|
keep_only_tags=[dict(name='div', attrs={'id':'s'})]
|
||||||
|
remove_tags_after=dict(name='div', attrs={'class':'rMobi'})
|
||||||
|
remove_tags=[dict(name='div', attrs={'class':['nnav', 'rMobi']}), dict(name='table', attrs={'class':'ramka_slx'})]
|
||||||
|
feeds = [(u'Wiadomo\u015bci', u'http://rssout.idg.pl/cw/news_iso.xml')]
|
||||||
|
|
||||||
|
def get_cover_url(self):
|
||||||
|
soup = self.index_to_soup('http://www.computerworld.pl/')
|
||||||
|
cover=soup.find(name='img', attrs={'class':'prawo'})
|
||||||
|
self.cover_url=cover['src']
|
||||||
|
return getattr(self, 'cover_url', self.cover_url)
|
15
recipes/datasport.recipe
Normal file
@ -0,0 +1,15 @@
|
|||||||
|
__license__ = 'GPL v3'
|
||||||
|
__author__ = 'faber1971'
|
||||||
|
description = 'Italian soccer news website - v1.00 (17, December 2011)'
|
||||||
|
|
||||||
|
from calibre.web.feeds.news import BasicNewsRecipe
|
||||||
|
|
||||||
|
class AdvancedUserRecipe1324114272(BasicNewsRecipe):
|
||||||
|
title = u'Datasport'
|
||||||
|
language = 'it'
|
||||||
|
__author__ = 'faber1971'
|
||||||
|
oldest_article = 1
|
||||||
|
max_articles_per_feed = 100
|
||||||
|
auto_cleanup = True
|
||||||
|
|
||||||
|
feeds = [(u'Datasport', u'http://www.datasport.it/calcio/rss.xml')]
|
27
recipes/descopera_org.recipe
Normal file
@ -0,0 +1,27 @@
|
|||||||
|
# -*- coding: utf-8 -*-
|
||||||
|
'''
|
||||||
|
descopera.org
|
||||||
|
'''
|
||||||
|
|
||||||
|
from calibre.web.feeds.news import BasicNewsRecipe
|
||||||
|
|
||||||
|
class Descopera(BasicNewsRecipe):
|
||||||
|
title = u'Descoperă.org'
|
||||||
|
__author__ = 'Marius Ignătescu'
|
||||||
|
description = 'Descoperă. Placerea de a cunoaște'
|
||||||
|
publisher = 'descopera.org'
|
||||||
|
category = 'science, technology, culture, history, earth'
|
||||||
|
language = 'ro'
|
||||||
|
oldest_article = 14
|
||||||
|
max_articles_per_feed = 100
|
||||||
|
encoding = 'utf8'
|
||||||
|
no_stylesheets = True
|
||||||
|
extra_css = ' body{ font-family: Verdana,Helvetica,Arial,sans-serif } .introduction{font-weight: bold} .story-feature{display: block; padding: 0; border: 1px solid; width: 40%; font-size: small} .story-feature h2{text-align: center; text-transform: uppercase} '
|
||||||
|
keep_only_tags = [dict(name='div', attrs={'class':['post']})]
|
||||||
|
remove_tags = [dict(name='div', attrs={'class':['topnav', 'box_a', 'shr-bookmarks shr-bookmarks-expand shr-bookmarks-center shr-bookmarks-bg-knowledge']})]
|
||||||
|
remove_attributes = ['width','height']
|
||||||
|
cover_url = 'http://www.descopera.org/wp-content/themes/dorg/styles/default/img/b_top.png?width=400'
|
||||||
|
feeds = [(u'Articles', u'http://www.descopera.org/feed/')]
|
||||||
|
|
||||||
|
def preprocess_html(self, soup):
|
||||||
|
return self.adeify_images(soup)
|
58
recipes/dziennik_pl.recipe
Normal file
@ -0,0 +1,58 @@
|
|||||||
|
# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai
|
||||||
|
|
||||||
|
from calibre.web.feeds.news import BasicNewsRecipe
|
||||||
|
import re
|
||||||
|
class Dziennik_pl(BasicNewsRecipe):
|
||||||
|
title = u'Dziennik.pl'
|
||||||
|
__author__ = 'fenuks'
|
||||||
|
description = u'Wiadomości z kraju i ze świata. Wiadomości gospodarcze. Znajdziesz u nas informacje, wydarzenia, komentarze, opinie.'
|
||||||
|
category = 'newspaper'
|
||||||
|
language = 'pl'
|
||||||
|
cover_url='http://6.s.dziennik.pl/images/og_dziennik.jpg'
|
||||||
|
no_stylesheets = True
|
||||||
|
oldest_article = 7
|
||||||
|
max_articles_per_feed = 100
|
||||||
|
remove_javascript=True
|
||||||
|
remove_empty_feeds=True
|
||||||
|
preprocess_regexps = [(re.compile("Komentarze:"), lambda m: '')]
|
||||||
|
keep_only_tags=[dict(id='article')]
|
||||||
|
remove_tags=[dict(name='div', attrs={'class':['art_box_dodatki', 'new_facebook_icons2', 'leftArt', 'article_print', 'quiz-widget']}), dict(name='a', attrs={'class':'komentarz'})]
|
||||||
|
feeds = [(u'Wszystko', u'http://rss.dziennik.pl/Dziennik-PL/'),
|
||||||
|
(u'Wiadomości', u'http://rss.dziennik.pl/Dziennik-Wiadomosci'),
|
||||||
|
(u'Gospodarka', u'http://rss.dziennik.pl/Dziennik-Gospodarka'),
|
||||||
|
(u'Kobieta', u'http://rss.dziennik.pl/Dziennik-Kobieta'),
|
||||||
|
(u'Auto', u'http://rss.dziennik.pl/Dziennik-Auto'),
|
||||||
|
(u'Rozrywka', u'http://rss.dziennik.pl/Dziennik-Rozrywka'),
|
||||||
|
(u'Film', u'http://rss.dziennik.pl/Dziennik-Film'),
|
||||||
|
(u'Muzyka' , u'http://rss.dziennik.pl/Dziennik-Muzyka'),
|
||||||
|
(u'Kultura', u'http://rss.dziennik.pl/Dziennik-Kultura'),
|
||||||
|
(u'Nauka', u'http://rss.dziennik.pl/Dziennik-Nauka'),
|
||||||
|
(u'Podróże', u'http://rss.dziennik.pl/Dziennik-Podroze/'),
|
||||||
|
(u'Nieruchomości', u'http://rss.dziennik.pl/Dziennik-Nieruchomosci')]
|
||||||
|
|
||||||
|
def append_page(self, soup, appendtag):
|
||||||
|
tag=soup.find('a', attrs={'class':'page_next'})
|
||||||
|
if tag:
|
||||||
|
appendtag.find('div', attrs={'class':'article_paginator'}).extract()
|
||||||
|
while tag:
|
||||||
|
soup2= self.index_to_soup(tag['href'])
|
||||||
|
tag=soup2.find('a', attrs={'class':'page_next'})
|
||||||
|
if not tag:
|
||||||
|
for r in appendtag.findAll('div', attrs={'class':'art_src'}):
|
||||||
|
r.extract()
|
||||||
|
pagetext = soup2.find(name='div', attrs={'class':'article_body'})
|
||||||
|
for dictionary in self.remove_tags:
|
||||||
|
v=pagetext.findAll(name=dictionary['name'], attrs=dictionary['attrs'])
|
||||||
|
for delete in v:
|
||||||
|
delete.extract()
|
||||||
|
pos = len(appendtag.contents)
|
||||||
|
appendtag.insert(pos, pagetext)
|
||||||
|
if appendtag.find('div', attrs={'class':'article_paginator'}):
|
||||||
|
appendtag.find('div', attrs={'class':'article_paginator'}).extract()
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
def preprocess_html(self, soup):
|
||||||
|
self.append_page(soup, soup.body)
|
||||||
|
return soup
|
16
recipes/emuzica_pl.recipe
Normal file
@ -0,0 +1,16 @@
|
|||||||
|
# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai
|
||||||
|
|
||||||
|
from calibre.web.feeds.news import BasicNewsRecipe
|
||||||
|
class eMuzyka(BasicNewsRecipe):
|
||||||
|
title = u'eMuzyka'
|
||||||
|
__author__ = 'fenuks'
|
||||||
|
description = u'Emuzyka to największa i najpopularniejsza strona o muzyce w Polsce'
|
||||||
|
category = 'music'
|
||||||
|
language = 'pl'
|
||||||
|
cover_url='http://s.emuzyka.pl/img/emuzyka_invert_small.jpg'
|
||||||
|
no_stylesheets = True
|
||||||
|
oldest_article = 7
|
||||||
|
max_articles_per_feed = 100
|
||||||
|
keep_only_tags=[dict(name='div', attrs={'id':'news_container'}), dict(name='h3'), dict(name='div', attrs={'class':'review_text'})]
|
||||||
|
remove_tags=[dict(name='span', attrs={'id':'date'})]
|
||||||
|
feeds = [(u'Aktualno\u015bci', u'http://www.emuzyka.pl/rss.php?f=1'), (u'Recenzje', u'http://www.emuzyka.pl/rss.php?f=2')]
|
18
recipes/fisco_oggi.recipe
Normal file
@ -0,0 +1,18 @@
|
|||||||
|
__license__ = 'GPL v3'
|
||||||
|
__author__ = 'faber1971'
|
||||||
|
description = 'Website of Italian Governament Income Agency (about revenue, taxation, taxes)- v1.00 (17, December 2011)'
|
||||||
|
|
||||||
|
from calibre.web.feeds.news import BasicNewsRecipe
|
||||||
|
|
||||||
|
class AdvancedUserRecipe1324112023(BasicNewsRecipe):
|
||||||
|
title = u'Fisco Oggi'
|
||||||
|
language = 'it'
|
||||||
|
__author__ = 'faber1971'
|
||||||
|
oldest_article = 7
|
||||||
|
max_articles_per_feed = 100
|
||||||
|
auto_cleanup = True
|
||||||
|
remove_javascript = True
|
||||||
|
no_stylesheets = True
|
||||||
|
|
||||||
|
feeds = [(u'Attualit\xe0', u'http://www.fiscooggi.it/taxonomy/term/1/feed'), (u'Normativa', u'http://www.fiscooggi.it/taxonomy/term/5/feed'), (u'Giurisprudenza', u'http://www.fiscooggi.it/taxonomy/term/8/feed'), (u'Dati e statistiche', u'http://www.fiscooggi.it/taxonomy/term/12/feed'), (u'Analisi e commenti', u'http://www.fiscooggi.it/taxonomy/term/13/feed'), (u'Bilancio e contabilit\xe0', u'http://www.fiscooggi.it/taxonomy/term/576/feed'), (u'Dalle regioni', u'http://www.fiscooggi.it/taxonomy/term/16/feed'), (u'Dal mondo', u'http://www.fiscooggi.it/taxonomy/term/17/feed')]
|
||||||
|
|
@ -1,57 +1,68 @@
|
|||||||
# -*- coding: utf-8 -*-
|
import re
|
||||||
|
|
||||||
from calibre.web.feeds.news import BasicNewsRecipe
|
from calibre.web.feeds.news import BasicNewsRecipe
|
||||||
|
|
||||||
class Focus_pl(BasicNewsRecipe):
|
class FocusRecipe(BasicNewsRecipe):
|
||||||
title = u'Focus.pl'
|
__license__ = 'GPL v3'
|
||||||
oldest_article = 15
|
__author__ = u'intromatyk <intromatyk@gmail.com>'
|
||||||
max_articles_per_feed = 100
|
language = 'pl'
|
||||||
__author__ = 'fenuks'
|
version = 1
|
||||||
language = 'pl'
|
|
||||||
description ='polish scientific monthly magazine'
|
title = u'Focus'
|
||||||
|
publisher = u'Gruner + Jahr Polska'
|
||||||
|
category = u'News'
|
||||||
|
description = u'Newspaper'
|
||||||
category='magazine'
|
category='magazine'
|
||||||
cover_url=''
|
cover_url=''
|
||||||
remove_empty_feeds= True
|
remove_empty_feeds= True
|
||||||
no_stylesheets=True
|
no_stylesheets=True
|
||||||
remove_tags_before=dict(name='div', attrs={'class':'h2 h2f'})
|
oldest_article = 7
|
||||||
remove_tags_after=dict(name='div', attrs={'class':'clear'})
|
max_articles_per_feed = 100000
|
||||||
feeds = [(u'Wszystkie kategorie', u'http://focus.pl.feedsportal.com/c/32992/f/532692/index.rss'),
|
recursions = 0
|
||||||
(u'Nauka', u'http://focus.pl.feedsportal.com/c/32992/f/532693/index.rss'),
|
|
||||||
(u'Historia', u'http://focus.pl.feedsportal.com/c/32992/f/532694/index.rss'),
|
no_stylesheets = True
|
||||||
(u'Cywilizacja', u'http://focus.pl.feedsportal.com/c/32992/f/532695/index.rss'),
|
remove_javascript = True
|
||||||
(u'Sport', u'http://focus.pl.feedsportal.com/c/32992/f/532696/index.rss'),
|
encoding = 'utf-8'
|
||||||
(u'Technika', u'http://focus.pl.feedsportal.com/c/32992/f/532697/index.rss'),
|
# Seems to work best, but YMMV
|
||||||
(u'Przyroda', u'http://focus.pl.feedsportal.com/c/32992/f/532698/index.rss'),
|
simultaneous_downloads = 5
|
||||||
(u'Technologie', u'http://focus.pl.feedsportal.com/c/32992/f/532699/index.rss'),
|
|
||||||
(u'Warto wiedzieć', u'http://focus.pl.feedsportal.com/c/32992/f/532700/index.rss'),
|
r = re.compile('.*(?P<url>http:\/\/(www.focus.pl)|(rss.feedsportal.com\/c)\/.*\.html?).*')
|
||||||
|
keep_only_tags =[]
|
||||||
|
keep_only_tags.append(dict(name = 'div', attrs = {'id' : 'cll'}))
|
||||||
|
|
||||||
|
remove_tags =[]
|
||||||
|
remove_tags.append(dict(name = 'div', attrs = {'class' : 'ulm noprint'}))
|
||||||
|
remove_tags.append(dict(name = 'div', attrs = {'class' : 'txb'}))
|
||||||
|
remove_tags.append(dict(name = 'div', attrs = {'class' : 'h2'}))
|
||||||
|
remove_tags.append(dict(name = 'ul', attrs = {'class' : 'txu'}))
|
||||||
|
remove_tags.append(dict(name = 'div', attrs = {'class' : 'ulc'}))
|
||||||
|
|
||||||
|
extra_css = '''
|
||||||
|
body {font-family: verdana, arial, helvetica, geneva, sans-serif ;}
|
||||||
|
h1{text-align: left;}
|
||||||
|
h2{font-size: medium; font-weight: bold;}
|
||||||
|
p.lead {font-weight: bold; text-align: left;}
|
||||||
|
.authordate {font-size: small; color: #696969;}
|
||||||
|
.fot{font-size: x-small; color: #666666;}
|
||||||
|
'''
|
||||||
|
|
||||||
|
|
||||||
|
feeds = [
|
||||||
]
|
('Nauka', 'http://focus.pl.feedsportal.com/c/32992/f/532693/index.rss'),
|
||||||
|
('Historia', 'http://focus.pl.feedsportal.com/c/32992/f/532694/index.rss'),
|
||||||
|
('Cywilizacja', 'http://focus.pl.feedsportal.com/c/32992/f/532695/index.rss'),
|
||||||
|
('Sport', 'http://focus.pl.feedsportal.com/c/32992/f/532696/index.rss'),
|
||||||
|
('Technika', 'http://focus.pl.feedsportal.com/c/32992/f/532697/index.rss'),
|
||||||
|
('Przyroda', 'http://focus.pl.feedsportal.com/c/32992/f/532698/index.rss'),
|
||||||
|
('Technologie', 'http://focus.pl.feedsportal.com/c/32992/f/532699/index.rss'),
|
||||||
|
]
|
||||||
|
|
||||||
def skip_ad_pages(self, soup):
|
def skip_ad_pages(self, soup):
|
||||||
tag=soup.find(name='a')
|
if ('advertisement' in soup.find('title').string.lower()):
|
||||||
if tag:
|
href = soup.find('a').get('href')
|
||||||
new_soup=self.index_to_soup(tag['href']+ 'do-druku/1/', raw=True)
|
return self.index_to_soup(href, raw=True)
|
||||||
return new_soup
|
else:
|
||||||
|
return None
|
||||||
def append_page(self, appendtag):
|
|
||||||
tag=appendtag.find(name='div', attrs={'class':'arrows'})
|
|
||||||
if tag:
|
|
||||||
nexturl='http://www.focus.pl/'+tag.a['href']
|
|
||||||
for rem in appendtag.findAll(name='div', attrs={'class':'klik-nav'}):
|
|
||||||
rem.extract()
|
|
||||||
while nexturl:
|
|
||||||
soup2=self.index_to_soup(nexturl)
|
|
||||||
nexturl=None
|
|
||||||
pagetext=soup2.find(name='div', attrs={'class':'txt'})
|
|
||||||
tag=pagetext.find(name='div', attrs={'class':'arrows'})
|
|
||||||
for r in tag.findAll(name='a'):
|
|
||||||
if u'Następne' in r.string:
|
|
||||||
nexturl='http://www.focus.pl/'+r['href']
|
|
||||||
for rem in pagetext.findAll(name='div', attrs={'class':'klik-nav'}):
|
|
||||||
rem.extract()
|
|
||||||
pos = len(appendtag.contents)
|
|
||||||
appendtag.insert(pos, pagetext)
|
|
||||||
|
|
||||||
def get_cover_url(self):
|
def get_cover_url(self):
|
||||||
soup=self.index_to_soup('http://www.focus.pl/magazyn/')
|
soup=self.index_to_soup('http://www.focus.pl/magazyn/')
|
||||||
@ -60,7 +71,14 @@ class Focus_pl(BasicNewsRecipe):
|
|||||||
self.cover_url='http://www.focus.pl/' + tag.a['href']
|
self.cover_url='http://www.focus.pl/' + tag.a['href']
|
||||||
return getattr(self, 'cover_url', self.cover_url)
|
return getattr(self, 'cover_url', self.cover_url)
|
||||||
|
|
||||||
|
def print_version(self, url):
|
||||||
def preprocess_html(self, soup):
|
if url.count ('focus.pl.feedsportal.com'):
|
||||||
self.append_page(soup.body)
|
u = url.find('focus0Bpl')
|
||||||
return soup
|
u = 'http://www.focus.pl/' + url[u + 11:]
|
||||||
|
u = u.replace('0C', '/')
|
||||||
|
u = u.replace('A', '')
|
||||||
|
u = u.replace ('0E','-')
|
||||||
|
u = u.replace('/nc/1//story01.htm', '/do-druku/1')
|
||||||
|
else:
|
||||||
|
u = url.replace('/nc/1','/do-druku/1')
|
||||||
|
return u
|
@ -51,6 +51,13 @@ class AdvancedUserRecipe1287083651(BasicNewsRecipe):
|
|||||||
{'class':['articleTools', 'pagination', 'Ads', 'topad',
|
{'class':['articleTools', 'pagination', 'Ads', 'topad',
|
||||||
'breadcrumbs', 'footerNav', 'footerUtil', 'downloadlinks']}]
|
'breadcrumbs', 'footerNav', 'footerUtil', 'downloadlinks']}]
|
||||||
|
|
||||||
|
def populate_article_metadata(self, article, soup, first):
|
||||||
|
if first and hasattr(self, 'add_toc_thumbnail'):
|
||||||
|
picdiv = soup.find('img')
|
||||||
|
if picdiv is not None:
|
||||||
|
self.add_toc_thumbnail(article,picdiv['src'])
|
||||||
|
|
||||||
|
|
||||||
#Use the mobile version rather than the web version
|
#Use the mobile version rather than the web version
|
||||||
def print_version(self, url):
|
def print_version(self, url):
|
||||||
return url.rpartition('?')[0] + '?service=mobile'
|
return url.rpartition('?')[0] + '?service=mobile'
|
||||||
|
@ -79,6 +79,12 @@ class Guardian(BasicNewsRecipe):
|
|||||||
url = None
|
url = None
|
||||||
return url
|
return url
|
||||||
|
|
||||||
|
def populate_article_metadata(self, article, soup, first):
|
||||||
|
if first and hasattr(self, 'add_toc_thumbnail'):
|
||||||
|
picdiv = soup.find('img')
|
||||||
|
if picdiv is not None:
|
||||||
|
self.add_toc_thumbnail(article,picdiv['src'])
|
||||||
|
|
||||||
def preprocess_html(self, soup):
|
def preprocess_html(self, soup):
|
||||||
|
|
||||||
# multiple html sections in soup, useful stuff in the first
|
# multiple html sections in soup, useful stuff in the first
|
||||||
|
@ -9,9 +9,9 @@ from calibre.ptempfile import PersistentTemporaryFile
|
|||||||
from urlparse import urlparse
|
from urlparse import urlparse
|
||||||
import re
|
import re
|
||||||
|
|
||||||
class HackerNews(BasicNewsRecipe):
|
class HNWithCommentsLink(BasicNewsRecipe):
|
||||||
title = 'Hacker News'
|
title = 'HN With Comments Link'
|
||||||
__author__ = 'Tom Scholl'
|
__author__ = 'Tom Scholl & David Kerschner'
|
||||||
description = u'Hacker News, run by Y Combinator. Anything that good hackers would find interesting, with a focus on programming and startups.'
|
description = u'Hacker News, run by Y Combinator. Anything that good hackers would find interesting, with a focus on programming and startups.'
|
||||||
publisher = 'Y Combinator'
|
publisher = 'Y Combinator'
|
||||||
category = 'news, programming, it, technology'
|
category = 'news, programming, it, technology'
|
||||||
@ -80,6 +80,11 @@ class HackerNews(BasicNewsRecipe):
|
|||||||
body = body + comments
|
body = body + comments
|
||||||
return u'<html><title>' + title + u'</title><body>' + body + '</body></html>'
|
return u'<html><title>' + title + u'</title><body>' + body + '</body></html>'
|
||||||
|
|
||||||
|
def parse_feeds(self):
|
||||||
|
a = super(HNWithCommentsLink, self).parse_feeds()
|
||||||
|
self.hn_articles = a[0].articles
|
||||||
|
return a
|
||||||
|
|
||||||
def get_obfuscated_article(self, url):
|
def get_obfuscated_article(self, url):
|
||||||
if url.startswith('http://news.ycombinator.com'):
|
if url.startswith('http://news.ycombinator.com'):
|
||||||
content = self.get_hn_content(url)
|
content = self.get_hn_content(url)
|
||||||
@ -97,6 +102,13 @@ class HackerNews(BasicNewsRecipe):
|
|||||||
else:
|
else:
|
||||||
content = self.get_readable_content(url)
|
content = self.get_readable_content(url)
|
||||||
|
|
||||||
|
article = 0
|
||||||
|
for a in self.hn_articles:
|
||||||
|
if a.url == url:
|
||||||
|
article = a
|
||||||
|
|
||||||
|
content = re.sub(r'</body>\s*</html>\s*$', '', content) + article.summary + '</body></html>'
|
||||||
|
|
||||||
self.temp_files.append(PersistentTemporaryFile('_fa.html'))
|
self.temp_files.append(PersistentTemporaryFile('_fa.html'))
|
||||||
self.temp_files[-1].write(content)
|
self.temp_files[-1].write(content)
|
||||||
self.temp_files[-1].close()
|
self.temp_files[-1].close()
|
||||||
|
BIN
recipes/icons/biolog_pl.png
Normal file
After Width: | Height: | Size: 1.2 KiB |
BIN
recipes/icons/blues.png
Normal file
After Width: | Height: | Size: 910 B |
BIN
recipes/icons/computerworld_pl.png
Normal file
After Width: | Height: | Size: 373 B |
BIN
recipes/icons/descopera_org.png
Normal file
After Width: | Height: | Size: 9.3 KiB |
BIN
recipes/icons/dziennik_pl.png
Normal file
After Width: | Height: | Size: 481 B |
BIN
recipes/icons/kosmonauta_pl.png
Normal file
After Width: | Height: | Size: 1.2 KiB |
BIN
recipes/icons/mlody_technik_pl.recipe
Normal file
After Width: | Height: | Size: 15 KiB |
BIN
recipes/icons/zaman.png
Normal file
After Width: | Height: | Size: 999 B |
@ -104,6 +104,12 @@ class TheIndependentNew(BasicNewsRecipe):
|
|||||||
url = None
|
url = None
|
||||||
return url
|
return url
|
||||||
|
|
||||||
|
def populate_article_metadata(self, article, soup, first):
|
||||||
|
if first and hasattr(self, 'add_toc_thumbnail'):
|
||||||
|
picdiv = soup.find('img')
|
||||||
|
if picdiv is not None:
|
||||||
|
self.add_toc_thumbnail(article,picdiv['src'])
|
||||||
|
|
||||||
def preprocess_html(self, soup):
|
def preprocess_html(self, soup):
|
||||||
|
|
||||||
#remove 'advertorial articles'
|
#remove 'advertorial articles'
|
||||||
@ -266,12 +272,15 @@ class TheIndependentNew(BasicNewsRecipe):
|
|||||||
|
|
||||||
|
|
||||||
def _insertRatingStars(self,soup,item):
|
def _insertRatingStars(self,soup,item):
|
||||||
if item.contents is None:
|
if item.contents is None or len(item.contents) < 1:
|
||||||
return
|
return
|
||||||
rating = item.contents[0]
|
rating = item.contents[0]
|
||||||
if not rating.isdigit():
|
|
||||||
return None
|
try:
|
||||||
rating = int(item.contents[0])
|
rating = float(item.contents[0])
|
||||||
|
except:
|
||||||
|
print 'Could not convert decimal rating to star: malformatted float.'
|
||||||
|
return
|
||||||
for i in range(1,6):
|
for i in range(1,6):
|
||||||
star = Tag(soup,'img')
|
star = Tag(soup,'img')
|
||||||
if i <= rating:
|
if i <= rating:
|
||||||
|
14
recipes/kosmonauta_pl.recipe
Normal file
@ -0,0 +1,14 @@
|
|||||||
|
# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai
|
||||||
|
|
||||||
|
from calibre.web.feeds.news import BasicNewsRecipe
|
||||||
|
class Kosmonauta(BasicNewsRecipe):
|
||||||
|
title = u'Kosmonauta.net'
|
||||||
|
__author__ = 'fenuks'
|
||||||
|
description = u'polskojęzyczny portal w całości dedykowany misjom kosmicznym i badaniom kosmosu.'
|
||||||
|
category = 'astronomy'
|
||||||
|
language = 'pl'
|
||||||
|
cover_url='http://bi.gazeta.pl/im/4/10393/z10393414X,Kosmonauta-net.jpg'
|
||||||
|
no_stylesheets = True
|
||||||
|
oldest_article = 7
|
||||||
|
max_articles_per_feed = 100
|
||||||
|
feeds = [(u'Kosmonauta.net', u'http://www.kosmonauta.net/index.php/feed/rss.html')]
|
@ -1,13 +1,12 @@
|
|||||||
__license__ = 'GPL v3'
|
__license__ = 'GPL v3'
|
||||||
__author__ = 'Lorenzo Vigentini, based on Darko Miletic, Gabriele Marini'
|
__author__ = 'Lorenzo Vigentini, based on Darko Miletic, Gabriele Marini'
|
||||||
__copyright__ = '2009-2011, Darko Miletic <darko.miletic at gmail.com>, Lorenzo Vigentini <l.vigentini at gmail.com>'
|
__copyright__ = '2009-2011, Darko Miletic <darko.miletic at gmail.com>, Lorenzo Vigentini <l.vigentini at gmail.com>'
|
||||||
description = 'Italian daily newspaper - v1.01 (04, January 2010); 16.05.2010 new version; 17.10.2011 new version'
|
description = 'Italian daily newspaper - v1.01 (04, January 2010); 16.05.2010 new version; 17.10.2011 new version; 14.12.2011 new version'
|
||||||
|
|
||||||
'''
|
'''
|
||||||
http://www.repubblica.it/
|
http://www.repubblica.it/
|
||||||
'''
|
'''
|
||||||
|
|
||||||
import re
|
|
||||||
from calibre.ptempfile import PersistentTemporaryFile
|
from calibre.ptempfile import PersistentTemporaryFile
|
||||||
from calibre.web.feeds.news import BasicNewsRecipe
|
from calibre.web.feeds.news import BasicNewsRecipe
|
||||||
|
|
||||||
@ -32,12 +31,6 @@ class LaRepubblica(BasicNewsRecipe):
|
|||||||
"""
|
"""
|
||||||
|
|
||||||
remove_attributes = ['width','height','lang','xmlns:og','xmlns:fb']
|
remove_attributes = ['width','height','lang','xmlns:og','xmlns:fb']
|
||||||
|
|
||||||
preprocess_regexps = [
|
|
||||||
(re.compile(r'.*?<head>', re.DOTALL|re.IGNORECASE), lambda match: '<head>'),
|
|
||||||
(re.compile(r'<head>.*?<title>', re.DOTALL|re.IGNORECASE), lambda match: '<head><title>'),
|
|
||||||
(re.compile(r'</title>.*?</head>', re.DOTALL|re.IGNORECASE), lambda match: '</title></head>')
|
|
||||||
]
|
|
||||||
|
|
||||||
def get_article_url(self, article):
|
def get_article_url(self, article):
|
||||||
link = BasicNewsRecipe.get_article_url(self, article)
|
link = BasicNewsRecipe.get_article_url(self, article)
|
||||||
@ -73,15 +66,15 @@ class LaRepubblica(BasicNewsRecipe):
|
|||||||
remove_tags = [
|
remove_tags = [
|
||||||
dict(name=['object','link','meta','iframe','embed']),
|
dict(name=['object','link','meta','iframe','embed']),
|
||||||
dict(name='span',attrs={'class':'linkindice'}),
|
dict(name='span',attrs={'class':'linkindice'}),
|
||||||
dict(name='div', attrs={'class':'bottom-mobile'}),
|
dict(name='div', attrs={'class':['bottom-mobile','adv adv-middle-inline']}),
|
||||||
dict(name='div', attrs={'id':['rssdiv','blocco']}),
|
dict(name='div', attrs={'id':['rssdiv','blocco','fb-like-head']}),
|
||||||
dict(name='div', attrs={'class':'utility'}),
|
dict(name='div', attrs={'class':['utility','fb-like-button','archive-button']}),
|
||||||
dict(name='div', attrs={'class':'generalbox'}),
|
dict(name='div', attrs={'class':'generalbox'}),
|
||||||
dict(name='ul', attrs={'id':'hystory'})
|
dict(name='ul', attrs={'id':'hystory'})
|
||||||
]
|
]
|
||||||
|
|
||||||
feeds = [
|
feeds = [
|
||||||
(u'Rilievo', u'http://www.repubblica.it/rss/homepage/rss2.0.xml'),
|
(u'Homepage', u'http://www.repubblica.it/rss/homepage/rss2.0.xml'),
|
||||||
(u'Cronaca', u'http://www.repubblica.it/rss/cronaca/rss2.0.xml'),
|
(u'Cronaca', u'http://www.repubblica.it/rss/cronaca/rss2.0.xml'),
|
||||||
(u'Esteri', u'http://www.repubblica.it/rss/esteri/rss2.0.xml'),
|
(u'Esteri', u'http://www.repubblica.it/rss/esteri/rss2.0.xml'),
|
||||||
(u'Economia', u'http://www.repubblica.it/rss/economia/rss2.0.xml'),
|
(u'Economia', u'http://www.repubblica.it/rss/economia/rss2.0.xml'),
|
||||||
@ -110,3 +103,5 @@ class LaRepubblica(BasicNewsRecipe):
|
|||||||
del item['style']
|
del item['style']
|
||||||
return soup
|
return soup
|
||||||
|
|
||||||
|
def preprocess_raw_html(self, raw, url):
|
||||||
|
return '<html><head>'+raw[raw.find('</head>'):]
|
||||||
|
@ -15,13 +15,13 @@ try:
|
|||||||
SHOWDEBUG1 = mlog.showdebuglevel(1)
|
SHOWDEBUG1 = mlog.showdebuglevel(1)
|
||||||
SHOWDEBUG2 = mlog.showdebuglevel(2)
|
SHOWDEBUG2 = mlog.showdebuglevel(2)
|
||||||
except:
|
except:
|
||||||
print 'drMerry debuglogger not found, skipping debug options'
|
#print 'drMerry debuglogger not found, skipping debug options'
|
||||||
SHOWDEBUG0 = False
|
SHOWDEBUG0 = False
|
||||||
SHOWDEBUG1 = False
|
SHOWDEBUG1 = False
|
||||||
SHOWDEBUG2 = False
|
SHOWDEBUG2 = False
|
||||||
KEEPSTATS = False
|
KEEPSTATS = False
|
||||||
|
|
||||||
print ('level0: %s\nlevel1: %s\nlevel2: %s' % (SHOWDEBUG0,SHOWDEBUG1,SHOWDEBUG2))
|
#print ('level0: %s\nlevel1: %s\nlevel2: %s' % (SHOWDEBUG0,SHOWDEBUG1,SHOWDEBUG2))
|
||||||
|
|
||||||
''' Version 1.2, updated cover image to match the changed website.
|
''' Version 1.2, updated cover image to match the changed website.
|
||||||
added info date on title
|
added info date on title
|
||||||
|
15
recipes/mlody_technik_pl.recipe
Normal file
@ -0,0 +1,15 @@
|
|||||||
|
# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai
|
||||||
|
|
||||||
|
from calibre.web.feeds.news import BasicNewsRecipe
|
||||||
|
class Mlody_technik(BasicNewsRecipe):
|
||||||
|
title = u'Mlody technik'
|
||||||
|
__author__ = 'fenuks'
|
||||||
|
description = u'Młody technik'
|
||||||
|
category = 'science'
|
||||||
|
language = 'pl'
|
||||||
|
cover_url='http://science-everywhere.pl/wp-content/uploads/2011/10/mt12.jpg'
|
||||||
|
no_stylesheets = True
|
||||||
|
oldest_article = 7
|
||||||
|
max_articles_per_feed = 100
|
||||||
|
#keep_only_tags=[dict(id='container')]
|
||||||
|
feeds = [(u'Artyku\u0142y', u'http://www.mt.com.pl/feed')]
|
@ -7,6 +7,7 @@ class naczytniki(BasicNewsRecipe):
|
|||||||
language = 'pl'
|
language = 'pl'
|
||||||
description ='everything about e-readers'
|
description ='everything about e-readers'
|
||||||
category='readers'
|
category='readers'
|
||||||
|
no_stylesheets=True
|
||||||
oldest_article = 7
|
oldest_article = 7
|
||||||
max_articles_per_feed = 100
|
max_articles_per_feed = 100
|
||||||
remove_tags_after= dict(name='div', attrs={'class':'sociable'})
|
remove_tags_after= dict(name='div', attrs={'class':'sociable'})
|
||||||
|
@ -1,20 +1,21 @@
|
|||||||
# -*- coding: utf-8 -*-
|
# -*- coding: utf-8 -*-
|
||||||
from calibre.web.feeds.news import BasicNewsRecipe
|
from calibre.web.feeds.news import BasicNewsRecipe
|
||||||
|
|
||||||
class Nowa_Fantastyka(BasicNewsRecipe):
|
class Nowa_Fantastyka(BasicNewsRecipe):
|
||||||
title = u'Nowa Fantastyka'
|
title = u'Nowa Fantastyka'
|
||||||
oldest_article = 7
|
oldest_article = 7
|
||||||
__author__ = 'fenuks'
|
__author__ = 'fenuks'
|
||||||
language = 'pl'
|
language = 'pl'
|
||||||
|
encoding='latin2'
|
||||||
description ='site for fantasy readers'
|
description ='site for fantasy readers'
|
||||||
category='fantasy'
|
category='fantasy'
|
||||||
max_articles_per_feed = 100
|
max_articles_per_feed = 100
|
||||||
INDEX='http://www.fantastyka.pl/'
|
INDEX='http://www.fantastyka.pl/'
|
||||||
|
no_stylesheets=True
|
||||||
|
needs_subscription = 'optional'
|
||||||
remove_tags_before=dict(attrs={'class':'belka1-tlo-md'})
|
remove_tags_before=dict(attrs={'class':'belka1-tlo-md'})
|
||||||
#remove_tags_after=dict(name='span', attrs={'class':'naglowek-oceny'})
|
#remove_tags_after=dict(name='span', attrs={'class':'naglowek-oceny'})
|
||||||
remove_tags_after=dict(name='td', attrs={'class':'belka1-bot'})
|
remove_tags_after=dict(name='td', attrs={'class':'belka1-bot'})
|
||||||
remove_tags=[dict(attrs={'class':'avatar2'})]
|
remove_tags=[dict(attrs={'class':'avatar2'}), dict(name='span', attrs={'class':'alert-oceny'}), dict(name='img', attrs={'src':['obrazki/sledz1.png', 'obrazki/print.gif', 'obrazki/mlnf.gif']}), dict(name='b', text='Dodaj komentarz'),dict(name='a', attrs={'href':'http://www.fantastyka.pl/10,1727.html'})]
|
||||||
feeds = []
|
|
||||||
|
|
||||||
def find_articles(self, url):
|
def find_articles(self, url):
|
||||||
articles = []
|
articles = []
|
||||||
@ -45,3 +46,13 @@ class Nowa_Fantastyka(BasicNewsRecipe):
|
|||||||
cover=soup.find(name='img', attrs={'class':'okladka'})
|
cover=soup.find(name='img', attrs={'class':'okladka'})
|
||||||
self.cover_url=self.INDEX+ cover['src']
|
self.cover_url=self.INDEX+ cover['src']
|
||||||
return getattr(self, 'cover_url', self.cover_url)
|
return getattr(self, 'cover_url', self.cover_url)
|
||||||
|
|
||||||
|
def get_browser(self):
|
||||||
|
br = BasicNewsRecipe.get_browser()
|
||||||
|
if self.username is not None and self.password is not None:
|
||||||
|
br.open('http://www.fantastyka.pl/')
|
||||||
|
br.select_form(nr=0)
|
||||||
|
br['login'] = self.username
|
||||||
|
br['pass'] = self.password
|
||||||
|
br.submit()
|
||||||
|
return br
|
||||||
|
@ -1,5 +1,5 @@
|
|||||||
#!/usr/bin/env python
|
#!/usr/bin/env python
|
||||||
|
# -*- coding: utf-8 -*-
|
||||||
__license__ = 'GPL v3'
|
__license__ = 'GPL v3'
|
||||||
__copyright__ = '2008, Kovid Goyal <kovid at kovidgoyal.net>'
|
__copyright__ = '2008, Kovid Goyal <kovid at kovidgoyal.net>'
|
||||||
'''
|
'''
|
||||||
@ -707,6 +707,16 @@ class NYTimes(BasicNewsRecipe):
|
|||||||
return soup
|
return soup
|
||||||
|
|
||||||
def populate_article_metadata(self, article, soup, first):
|
def populate_article_metadata(self, article, soup, first):
|
||||||
|
if first and hasattr(self, 'add_toc_thumbnail'):
|
||||||
|
idxdiv = soup.find('div',attrs={'class':'articleSpanImage'})
|
||||||
|
if idxdiv is not None:
|
||||||
|
if idxdiv.img:
|
||||||
|
self.add_toc_thumbnail(article, idxdiv.img['src'])
|
||||||
|
else:
|
||||||
|
img = soup.find('img')
|
||||||
|
if img is not None:
|
||||||
|
self.add_toc_thumbnail(article, img['src'])
|
||||||
|
|
||||||
shortparagraph = ""
|
shortparagraph = ""
|
||||||
try:
|
try:
|
||||||
if len(article.text_summary.strip()) == 0:
|
if len(article.text_summary.strip()) == 0:
|
||||||
|
@ -855,6 +855,16 @@ class NYTimes(BasicNewsRecipe):
|
|||||||
|
|
||||||
return soup
|
return soup
|
||||||
def populate_article_metadata(self, article, soup, first):
|
def populate_article_metadata(self, article, soup, first):
|
||||||
|
if first and hasattr(self, 'add_toc_thumbnail'):
|
||||||
|
idxdiv = soup.find('div',attrs={'class':'articleSpanImage'})
|
||||||
|
if idxdiv is not None:
|
||||||
|
if idxdiv.img:
|
||||||
|
self.add_toc_thumbnail(article, idxdiv.img['src'])
|
||||||
|
else:
|
||||||
|
img = soup.find('img')
|
||||||
|
if img is not None:
|
||||||
|
self.add_toc_thumbnail(article, img['src'])
|
||||||
|
|
||||||
shortparagraph = ""
|
shortparagraph = ""
|
||||||
try:
|
try:
|
||||||
if len(article.text_summary.strip()) == 0:
|
if len(article.text_summary.strip()) == 0:
|
||||||
|
21
recipes/rynek_zdrowia.recipe
Normal file
@ -0,0 +1,21 @@
|
|||||||
|
from calibre.web.feeds.news import BasicNewsRecipe
|
||||||
|
|
||||||
|
class rynekzdrowia(BasicNewsRecipe):
|
||||||
|
title = u'Rynek Zdrowia'
|
||||||
|
__author__ = u'spi630'
|
||||||
|
language = 'pl'
|
||||||
|
masthead_url = 'http://k.rynekzdrowia.pl/images/headerLogo.png'
|
||||||
|
cover_url = 'http://k.rynekzdrowia.pl/images/headerLogo.png'
|
||||||
|
oldest_article = 3
|
||||||
|
max_articles_per_feed = 25
|
||||||
|
no_stylesheets = True
|
||||||
|
auto_cleanup = True
|
||||||
|
remove_empty_feeds=True
|
||||||
|
|
||||||
|
remove_tags_before = dict(name='h3')
|
||||||
|
|
||||||
|
feeds = [(u'Finanse i Zarz\u0105dzanie', u'http://www.rynekzdrowia.pl/Kanal/finanse.html'), (u'Inwestycje', u'http://www.rynekzdrowia.pl/Kanal/inwestycje.html'), (u'Aparatura i wyposa\u017cenie', u'http://www.rynekzdrowia.pl/Kanal/aparatura.html'), (u'Informatyka', u'http://www.rynekzdrowia.pl/Kanal/informatyka.html'), (u'Prawo', u'http://www.rynekzdrowia.pl/Kanal/prawo.html'), (u'Polityka zdrowotna', u'http://www.rynekzdrowia.pl/Kanal/polityka_zdrowotna.html'), (u'Ubezpieczenia Zdrowotne', u'http://www.rynekzdrowia.pl/Kanal/ubezpieczenia.html'), (u'Farmacja', u'http://www.rynekzdrowia.pl/Kanal/farmacja.html'), (u'Badania i rozw\xf3j', u'http://www.rynekzdrowia.pl/Kanal/badania.html'), (u'Nauka', u'http://www.rynekzdrowia.pl/Kanal/nauka.html'), (u'Po godzinach', u'http://www.rynekzdrowia.pl/Kanal/godziny.html'), (u'Us\u0142ugi medyczne', u'http://www.rynekzdrowia.pl/Kanal/uslugi.html')]
|
||||||
|
|
||||||
|
def print_version(self, url):
|
||||||
|
url = url.replace('.html', ',drukuj.html')
|
||||||
|
return url
|
@ -8,8 +8,8 @@ class SpidersWeb(BasicNewsRecipe):
|
|||||||
cover_url = 'http://www.spidersweb.pl/wp-content/themes/spiderweb/img/Logo.jpg'
|
cover_url = 'http://www.spidersweb.pl/wp-content/themes/spiderweb/img/Logo.jpg'
|
||||||
category = 'IT, WEB'
|
category = 'IT, WEB'
|
||||||
language = 'pl'
|
language = 'pl'
|
||||||
|
no_stylesheers=True
|
||||||
max_articles_per_feed = 100
|
max_articles_per_feed = 100
|
||||||
remove_tags_before=dict(name="h1", attrs={'class':'Title'})
|
keep_only_tags=[dict(id='Post')]
|
||||||
remove_tags_after=dict(name="div", attrs={'class':'Text'})
|
remove_tags=[dict(name='div', attrs={'class':['Comments', 'Shows', 'Post-Tags']})]
|
||||||
remove_tags=[dict(name='div', attrs={'class':['Tags', 'CommentCount FloatL', 'Show FloatL']})]
|
|
||||||
feeds = [(u'Wpisy', u'http://www.spidersweb.pl/feed')]
|
feeds = [(u'Wpisy', u'http://www.spidersweb.pl/feed')]
|
||||||
|
@ -6,54 +6,21 @@ __copyright__ = '2008, Kovid Goyal <kovid at kovidgoyal.net>'
|
|||||||
Fetch sueddeutsche.de
|
Fetch sueddeutsche.de
|
||||||
'''
|
'''
|
||||||
from calibre.web.feeds.news import BasicNewsRecipe
|
from calibre.web.feeds.news import BasicNewsRecipe
|
||||||
|
|
||||||
|
|
||||||
class Sueddeutsche(BasicNewsRecipe):
|
class Sueddeutsche(BasicNewsRecipe):
|
||||||
|
|
||||||
title = u'sueddeutsche.de'
|
title = u'sueddeutsche.de'
|
||||||
description = 'News from Germany'
|
description = 'News from Germany'
|
||||||
__author__ = 'Oliver Niesner and Armin Geller' #AGe 2011-11-25
|
__author__ = 'Oliver Niesner and Armin Geller' #Update AGe 2011-12-16
|
||||||
use_embedded_content = False
|
use_embedded_content = False
|
||||||
timefmt = ' [%d %b %Y]'
|
timefmt = ' [%d %b %Y]'
|
||||||
oldest_article = 7
|
oldest_article = 7
|
||||||
max_articles_per_feed = 50
|
max_articles_per_feed = 50
|
||||||
no_stylesheets = True
|
no_stylesheets = True
|
||||||
language = 'de'
|
language = 'de'
|
||||||
|
|
||||||
encoding = 'utf-8'
|
encoding = 'utf-8'
|
||||||
remove_javascript = True
|
remove_javascript = True
|
||||||
cover_url = 'http://polpix.sueddeutsche.com/polopoly_fs/1.1219199.1322239289!/image/image.jpg_gen/derivatives/860x860/image.jpg' # 2011-11-25 AGe
|
auto_cleanup = True
|
||||||
|
cover_url = 'http://polpix.sueddeutsche.com/polopoly_fs/1.1237395.1324054345!/image/image.jpg_gen/derivatives/860x860/image.jpg' # 2011-12-16 AGe
|
||||||
remove_tags = [ dict(name='link'), dict(name='iframe'),
|
|
||||||
dict(name='div', attrs={'id':["bookmarking","themenbox","artikelfoot","CAD_AD",
|
|
||||||
"SKY_AD","NT1_AD","navbar1","sdesiteheader"]}),
|
|
||||||
|
|
||||||
dict(name='div', attrs={'class':["similar-article-box","artikelliste","nteaser301bg",
|
|
||||||
"pages closed","basebox right narrow","headslot galleried"]}),
|
|
||||||
|
|
||||||
dict(name='div', attrs={'class':["articleDistractor","listHeader","listHeader2","hr2",
|
|
||||||
"item","videoBigButton","articlefooter full-column",
|
|
||||||
"bildbanderolle full-column","footerCopy padleft5"]}),
|
|
||||||
|
|
||||||
dict(name='p', attrs={'class':["ressortartikeln","artikelFliestext","entry-summary"]}),
|
|
||||||
dict(name='div', attrs={'style':["position:relative;"]}),
|
|
||||||
dict(name='span', attrs={'class':["nlinkheaderteaserschwarz","artikelLink","r10000000"]}),
|
|
||||||
dict(name='table', attrs={'class':["stoerBS","kommentare","footer","pageBoxBot","pageAktiv","bgcontent"]}),
|
|
||||||
dict(name='ul', attrs={'class':["breadcrumb","articles","activities","sitenav","actions"]}),
|
|
||||||
dict(name='td', attrs={'class':["artikelDruckenRight"]}),
|
|
||||||
dict(name='p', text = "ANZEIGE")
|
|
||||||
]
|
|
||||||
remove_tags_after = [dict(name='div', attrs={'class':["themenbox full-column"]})]
|
|
||||||
|
|
||||||
extra_css = '''
|
|
||||||
h2{font-family:Arial,Helvetica,sans-serif; font-size: x-small; color: #003399;}
|
|
||||||
a{font-family:Arial,Helvetica,sans-serif; font-style:italic;}
|
|
||||||
.dachzeile p{font-family:Arial,Helvetica,sans-serif; font-size: x-small; }
|
|
||||||
h1{ font-family:Arial,Helvetica,sans-serif; font-size:x-large; font-weight:bold;}
|
|
||||||
.artikelTeaser{font-family:Arial,Helvetica,sans-serif; font-size: x-small; font-weight:bold; }
|
|
||||||
body{font-family:Arial,Helvetica,sans-serif; }
|
|
||||||
.photo {font-family:Arial,Helvetica,sans-serif; font-size: x-small; color: #666666;} '''
|
|
||||||
|
|
||||||
feeds = [
|
feeds = [
|
||||||
(u'Politik', u'http://suche.sueddeutsche.de/query/%23/sort/-docdatetime/drilldown/%C2%A7ressort%3A%5EPolitik%24?output=rss'),
|
(u'Politik', u'http://suche.sueddeutsche.de/query/%23/sort/-docdatetime/drilldown/%C2%A7ressort%3A%5EPolitik%24?output=rss'),
|
||||||
(u'Wirtschaft', u'http://suche.sueddeutsche.de/query/%23/sort/-docdatetime/drilldown/%C2%A7ressort%3A%5EWirtschaft%24?output=rss'),
|
(u'Wirtschaft', u'http://suche.sueddeutsche.de/query/%23/sort/-docdatetime/drilldown/%C2%A7ressort%3A%5EWirtschaft%24?output=rss'),
|
||||||
@ -62,7 +29,7 @@ class Sueddeutsche(BasicNewsRecipe):
|
|||||||
(u'Sport', u'http://suche.sueddeutsche.de/query/%23/sort/-docdatetime/drilldown/%C2%A7ressort%3A%5ESport%24?output=rss'),
|
(u'Sport', u'http://suche.sueddeutsche.de/query/%23/sort/-docdatetime/drilldown/%C2%A7ressort%3A%5ESport%24?output=rss'),
|
||||||
(u'Leben', u'http://suche.sueddeutsche.de/query/%23/sort/-docdatetime/drilldown/%C2%A7ressort%3A%5ELeben%24?output=rss'),
|
(u'Leben', u'http://suche.sueddeutsche.de/query/%23/sort/-docdatetime/drilldown/%C2%A7ressort%3A%5ELeben%24?output=rss'),
|
||||||
(u'Karriere', u'http://suche.sueddeutsche.de/query/%23/sort/-docdatetime/drilldown/%C2%A7ressort%3A%5EKarriere%24?output=rss'),
|
(u'Karriere', u'http://suche.sueddeutsche.de/query/%23/sort/-docdatetime/drilldown/%C2%A7ressort%3A%5EKarriere%24?output=rss'),
|
||||||
(u'München & Region', u'http://www.sueddeutsche.de/app/service/rss/ressort/muenchen/rss.xml'), # AGe 2011-11-13
|
(u'München & Region', u'http://suche.sueddeutsche.de/query/%23/sort/-docdatetime/drilldown/%C2%A7ressort%3A%5EMünchen&Region%24?output=rss'),
|
||||||
(u'Bayern', u'http://suche.sueddeutsche.de/query/%23/sort/-docdatetime/drilldown/%C2%A7ressort%3A%5EBayern%24?output=rss'),
|
(u'Bayern', u'http://suche.sueddeutsche.de/query/%23/sort/-docdatetime/drilldown/%C2%A7ressort%3A%5EBayern%24?output=rss'),
|
||||||
(u'Medien', u'http://suche.sueddeutsche.de/query/%23/sort/-docdatetime/drilldown/%C2%A7ressort%3A%5EMedien%24?output=rss'),
|
(u'Medien', u'http://suche.sueddeutsche.de/query/%23/sort/-docdatetime/drilldown/%C2%A7ressort%3A%5EMedien%24?output=rss'),
|
||||||
(u'Digital', u'http://suche.sueddeutsche.de/query/%23/sort/-docdatetime/drilldown/%C2%A7ressort%3A%5EDigital%24?output=rss'),
|
(u'Digital', u'http://suche.sueddeutsche.de/query/%23/sort/-docdatetime/drilldown/%C2%A7ressort%3A%5EDigital%24?output=rss'),
|
||||||
@ -76,7 +43,12 @@ class Sueddeutsche(BasicNewsRecipe):
|
|||||||
(u'Service', u'http://suche.sueddeutsche.de/query/%23/sort/-docdatetime/drilldown/%C2%A7ressort%3A%5EService%24?output=rss'), # sometimes only
|
(u'Service', u'http://suche.sueddeutsche.de/query/%23/sort/-docdatetime/drilldown/%C2%A7ressort%3A%5EService%24?output=rss'), # sometimes only
|
||||||
(u'Verlag', u'http://suche.sueddeutsche.de/query/%23/sort/-docdatetime/drilldown/%C2%A7ressort%3A%5EVerlag%24?output=rss'), # sometimes only
|
(u'Verlag', u'http://suche.sueddeutsche.de/query/%23/sort/-docdatetime/drilldown/%C2%A7ressort%3A%5EVerlag%24?output=rss'), # sometimes only
|
||||||
]
|
]
|
||||||
|
# AGe 2011-12-16 Problem of Handling redirections solved by a solution of Recipes-Re-usable code from kiklop74.
|
||||||
|
# Feed is: http://suche.sueddeutsche.de/query/%23/sort/-docdatetime/drilldown/%C2%A7ressort%3A%5ESport%24?output=rss
|
||||||
|
# Article download source is: http://sz.de/1.1237295 (Ski Alpin: Der Erfolg kommt, der Trainer geht)
|
||||||
|
# Article source is: http://www.sueddeutsche.de/sport/ski-alpin-der-erfolg-kommt-der-trainer-geht-1.1237295
|
||||||
|
# Article printversion is: http://www.sueddeutsche.de/sport/2.220/ski-alpin-der-erfolg-kommt-der-trainer-geht-1.1237295
|
||||||
def print_version(self, url):
|
def print_version(self, url):
|
||||||
main, sep, id = url.rpartition('/')
|
n_url=self.browser.open_novisit(url).geturl()
|
||||||
|
main, sep, id = n_url.rpartition('/')
|
||||||
return main + '/2.220/' + id
|
return main + '/2.220/' + id
|
||||||
|
@ -59,6 +59,11 @@ class TelegraphUK(BasicNewsRecipe):
|
|||||||
,(u'Travel' , u'http://www.telegraph.co.uk/travel/rss' )
|
,(u'Travel' , u'http://www.telegraph.co.uk/travel/rss' )
|
||||||
,(u'How about that?', u'http://www.telegraph.co.uk/news/newstopics/howaboutthat/rss' )
|
,(u'How about that?', u'http://www.telegraph.co.uk/news/newstopics/howaboutthat/rss' )
|
||||||
]
|
]
|
||||||
|
def populate_article_metadata(self, article, soup, first):
|
||||||
|
if first and hasattr(self, 'add_toc_thumbnail'):
|
||||||
|
picdiv = soup.find('img')
|
||||||
|
if picdiv is not None:
|
||||||
|
self.add_toc_thumbnail(article,picdiv['src'])
|
||||||
|
|
||||||
def get_article_url(self, article):
|
def get_article_url(self, article):
|
||||||
url = article.get('link', None)
|
url = article.get('link', None)
|
||||||
|
17
recipes/tuttojove.recipe
Normal file
@ -0,0 +1,17 @@
|
|||||||
|
__license__ = 'GPL v3'
|
||||||
|
__author__ = 'faber1971'
|
||||||
|
description = 'Italian website on Juventus F.C. - v1.00 (17, December 2011)'
|
||||||
|
|
||||||
|
from calibre.web.feeds.news import BasicNewsRecipe
|
||||||
|
|
||||||
|
class AdvancedUserRecipe1305984536(BasicNewsRecipe):
|
||||||
|
title = u'tuttojuve'
|
||||||
|
description = 'Juventus'
|
||||||
|
language = 'it'
|
||||||
|
__author__ = 'faber1971'
|
||||||
|
oldest_article = 1
|
||||||
|
max_articles_per_feed = 100
|
||||||
|
|
||||||
|
feeds = [(u'notizie', u'http://feeds.tuttojuve.com/rss/'), (u'da vinovo', u'http://feeds.tuttojuve.com/rss/?c=10'), (u'primo piano', u'http://feeds.tuttojuve.com/rss/?c=16'), (u'editoriale', u'http://feeds.tuttojuve.com/rss/?c=3'), (u'il punto', u'http://feeds.tuttojuve.com/rss/?c=8'), (u'pagelle', u'http://feeds.tuttojuve.com/rss/?c=9'), (u'avversario', u'http://feeds.tuttojuve.com/rss/?c=11')]
|
||||||
|
def print_version(self, url):
|
||||||
|
return self.browser.open_novisit(url).geturl()
|
@ -57,6 +57,12 @@ class WallStreetJournal(BasicNewsRecipe):
|
|||||||
'username and password')
|
'username and password')
|
||||||
return br
|
return br
|
||||||
|
|
||||||
|
def populate_article_metadata(self, article, soup, first):
|
||||||
|
if first and hasattr(self, 'add_toc_thumbnail'):
|
||||||
|
picdiv = soup.find('img')
|
||||||
|
if picdiv is not None:
|
||||||
|
self.add_toc_thumbnail(article,picdiv['src'])
|
||||||
|
|
||||||
def postprocess_html(self, soup, first):
|
def postprocess_html(self, soup, first):
|
||||||
for tag in soup.findAll(name=['table', 'tr', 'td']):
|
for tag in soup.findAll(name=['table', 'tr', 'td']):
|
||||||
tag.name = 'div'
|
tag.name = 'div'
|
||||||
|
@ -44,6 +44,12 @@ class WallStreetJournal(BasicNewsRecipe):
|
|||||||
]
|
]
|
||||||
remove_tags_after = [dict(id="article_story_body"), {'class':"article story"},]
|
remove_tags_after = [dict(id="article_story_body"), {'class':"article story"},]
|
||||||
|
|
||||||
|
def populate_article_metadata(self, article, soup, first):
|
||||||
|
if first and hasattr(self, 'add_toc_thumbnail'):
|
||||||
|
picdiv = soup.find('img')
|
||||||
|
if picdiv is not None:
|
||||||
|
self.add_toc_thumbnail(article,picdiv['src'])
|
||||||
|
|
||||||
def postprocess_html(self, soup, first):
|
def postprocess_html(self, soup, first):
|
||||||
for tag in soup.findAll(name=['table', 'tr', 'td']):
|
for tag in soup.findAll(name=['table', 'tr', 'td']):
|
||||||
tag.name = 'div'
|
tag.name = 'div'
|
||||||
|
@ -5,9 +5,10 @@ from calibre.web.feeds.news import BasicNewsRecipe
|
|||||||
class Zaman (BasicNewsRecipe):
|
class Zaman (BasicNewsRecipe):
|
||||||
|
|
||||||
title = u'ZAMAN Gazetesi'
|
title = u'ZAMAN Gazetesi'
|
||||||
|
description = ' Zaman Gazetesi''nin internet sitesinden günlük haberler'
|
||||||
__author__ = u'thomass'
|
__author__ = u'thomass'
|
||||||
oldest_article = 2
|
oldest_article = 2
|
||||||
max_articles_per_feed =100
|
max_articles_per_feed =50
|
||||||
# no_stylesheets = True
|
# no_stylesheets = True
|
||||||
#delay = 1
|
#delay = 1
|
||||||
#use_embedded_content = False
|
#use_embedded_content = False
|
||||||
@ -16,19 +17,19 @@ class Zaman (BasicNewsRecipe):
|
|||||||
category = 'news, haberler,TR,gazete'
|
category = 'news, haberler,TR,gazete'
|
||||||
language = 'tr'
|
language = 'tr'
|
||||||
publication_type = 'newspaper '
|
publication_type = 'newspaper '
|
||||||
extra_css = ' body{ font-family: Verdana,Helvetica,Arial,sans-serif } .introduction{font-weight: bold} .story-feature{display: block; padding: 0; border: 1px solid; width: 40%; font-size: small} .story-feature h2{text-align: center; text-transform: uppercase} '
|
extra_css = '.buyukbaslik{font-weight: bold; font-size: 18px;color:#0000FF}'#body{ font-family: Verdana,Helvetica,Arial,sans-serif } .introduction{font-weight: bold} .story-feature{display: block; padding: 0; border: 1px solid; width: 40%; font-size: small} .story-feature h2{text-align: center; text-transform: uppercase} '
|
||||||
conversion_options = {
|
conversion_options = {
|
||||||
'tags' : category
|
'tags' : category
|
||||||
,'language' : language
|
,'language' : language
|
||||||
,'publisher' : publisher
|
,'publisher' : publisher
|
||||||
,'linearize_tables': False
|
,'linearize_tables': True
|
||||||
}
|
}
|
||||||
cover_img_url = 'https://fbcdn-profile-a.akamaihd.net/hprofile-ak-snc4/188140_81722291869_2111820_n.jpg'
|
cover_img_url = 'https://fbcdn-profile-a.akamaihd.net/hprofile-ak-snc4/188140_81722291869_2111820_n.jpg'
|
||||||
masthead_url = 'http://medya.zaman.com.tr/extentions/zaman.com.tr/img/section/logo-section.png'
|
masthead_url = 'http://medya.zaman.com.tr/extentions/zaman.com.tr/img/section/logo-section.png'
|
||||||
|
|
||||||
|
|
||||||
keep_only_tags = [dict(name='div', attrs={'id':[ 'news-detail-content']}), dict(name='td', attrs={'class':['columnist-detail','columnist_head']}) ]
|
#keep_only_tags = [dict(name='div', attrs={'id':[ 'news-detail-content']}), dict(name='td', attrs={'class':['columnist-detail','columnist_head']}) ]
|
||||||
remove_tags = [ dict(name='div', attrs={'id':['news-detail-news-text-font-size','news-detail-gallery','news-detail-news-bottom-social']}),dict(name='div', attrs={'class':['radioEmbedBg','radyoProgramAdi']}),dict(name='a', attrs={'class':['webkit-html-attribute-value webkit-html-external-link']}),dict(name='table', attrs={'id':['yaziYorumTablosu']}),dict(name='img', attrs={'src':['http://medya.zaman.com.tr/pics/paylas.gif','http://medya.zaman.com.tr/extentions/zaman.com.tr/img/columnist/ma-16.png']})]
|
remove_tags = [ dict(name='img', attrs={'src':['http://medya.zaman.com.tr/zamantryeni/pics/zamanonline.gif']})]#,dict(name='div', attrs={'class':['radioEmbedBg','radyoProgramAdi']}),dict(name='a', attrs={'class':['webkit-html-attribute-value webkit-html-external-link']}),dict(name='table', attrs={'id':['yaziYorumTablosu']}),dict(name='img', attrs={'src':['http://medya.zaman.com.tr/pics/paylas.gif','http://medya.zaman.com.tr/extentions/zaman.com.tr/img/columnist/ma-16.png']})
|
||||||
|
|
||||||
|
|
||||||
#remove_attributes = ['width','height']
|
#remove_attributes = ['width','height']
|
||||||
@ -37,7 +38,8 @@ class Zaman (BasicNewsRecipe):
|
|||||||
feeds = [
|
feeds = [
|
||||||
( u'Anasayfa', u'http://www.zaman.com.tr/anasayfa.rss'),
|
( u'Anasayfa', u'http://www.zaman.com.tr/anasayfa.rss'),
|
||||||
( u'Son Dakika', u'http://www.zaman.com.tr/sondakika.rss'),
|
( u'Son Dakika', u'http://www.zaman.com.tr/sondakika.rss'),
|
||||||
( u'En çok Okunanlar', u'http://www.zaman.com.tr/max_all.rss'),
|
#( u'En çok Okunanlar', u'http://www.zaman.com.tr/max_all.rss'),
|
||||||
|
#( u'Manşet', u'http://www.zaman.com.tr/manset.rss'),
|
||||||
( u'Gündem', u'http://www.zaman.com.tr/gundem.rss'),
|
( u'Gündem', u'http://www.zaman.com.tr/gundem.rss'),
|
||||||
( u'Yazarlar', u'http://www.zaman.com.tr/yazarlar.rss'),
|
( u'Yazarlar', u'http://www.zaman.com.tr/yazarlar.rss'),
|
||||||
( u'Politika', u'http://www.zaman.com.tr/politika.rss'),
|
( u'Politika', u'http://www.zaman.com.tr/politika.rss'),
|
||||||
@ -45,11 +47,20 @@ class Zaman (BasicNewsRecipe):
|
|||||||
( u'Dış Haberler', u'http://www.zaman.com.tr/dishaberler.rss'),
|
( u'Dış Haberler', u'http://www.zaman.com.tr/dishaberler.rss'),
|
||||||
( u'Yorumlar', u'http://www.zaman.com.tr/yorumlar.rss'),
|
( u'Yorumlar', u'http://www.zaman.com.tr/yorumlar.rss'),
|
||||||
( u'Röportaj', u'http://www.zaman.com.tr/roportaj.rss'),
|
( u'Röportaj', u'http://www.zaman.com.tr/roportaj.rss'),
|
||||||
|
( u'Dizi Yazı', u'http://www.zaman.com.tr/dizi.rss'),
|
||||||
|
( u'Bilişim', u'http://www.zaman.com.tr/bilisim.rss'),
|
||||||
|
( u'Otomotiv', u'http://www.zaman.com.tr/otomobil.rss'),
|
||||||
( u'Spor', u'http://www.zaman.com.tr/spor.rss'),
|
( u'Spor', u'http://www.zaman.com.tr/spor.rss'),
|
||||||
( u'Kürsü', u'http://www.zaman.com.tr/kursu.rss'),
|
( u'Kürsü', u'http://www.zaman.com.tr/kursu.rss'),
|
||||||
|
( u'Eğitim', u'http://www.zaman.com.tr/egitim.rss'),
|
||||||
( u'Kültür Sanat', u'http://www.zaman.com.tr/kultursanat.rss'),
|
( u'Kültür Sanat', u'http://www.zaman.com.tr/kultursanat.rss'),
|
||||||
( u'Televizyon', u'http://www.zaman.com.tr/televizyon.rss'),
|
( u'Televizyon', u'http://www.zaman.com.tr/televizyon.rss'),
|
||||||
( u'Manşet', u'http://www.zaman.com.tr/manset.rss'),
|
( u'Aile', u'http://www.zaman.com.tr/aile.rss'),
|
||||||
|
( u'Cuma Eki', u'http://www.zaman.com.tr/cuma.rss'),
|
||||||
|
( u'Cumaertesi Eki', u'http://www.zaman.com.tr/cumaertesi.rss'),
|
||||||
|
( u'Pazar Eki', u'http://www.zaman.com.tr/pazar.rss'),
|
||||||
|
|
||||||
]
|
]
|
||||||
|
def print_version(self, url):
|
||||||
|
return url.replace('http://www.zaman.com.tr/haber.do?haberno=', 'http://www.zaman.com.tr/yazdir.do?haberno=')
|
||||||
|
|
||||||
|
@ -409,6 +409,17 @@ locale_for_sorting = ''
|
|||||||
# columns. If False, one column is used.
|
# columns. If False, one column is used.
|
||||||
metadata_single_use_2_cols_for_custom_fields = True
|
metadata_single_use_2_cols_for_custom_fields = True
|
||||||
|
|
||||||
|
#: Order of custom column(s) in edit metadata
|
||||||
|
# Controls the order that custom columns are listed in edit metadata single
|
||||||
|
# and bulk. The columns listed in the tweak are displayed first and in the
|
||||||
|
# order provided. Any columns not listed are dislayed after the listed ones,
|
||||||
|
# in alphabetical order. Do note that this tweak does not change the size of
|
||||||
|
# the edit widgets. Putting comments widgets in this list may result in some
|
||||||
|
# odd widget spacing when using two-column mode.
|
||||||
|
# Enter a comma-separated list of custom field lookup names, as in
|
||||||
|
# metadata_edit_custom_column_order = ['#genre', '#mytags', '#etc']
|
||||||
|
metadata_edit_custom_column_order = []
|
||||||
|
|
||||||
#: The number of seconds to wait before sending emails
|
#: The number of seconds to wait before sending emails
|
||||||
# The number of seconds to wait before sending emails when using a
|
# The number of seconds to wait before sending emails when using a
|
||||||
# public email server like gmail or hotmail. Default is: 5 minutes
|
# public email server like gmail or hotmail. Default is: 5 minutes
|
||||||
|
@ -1,5 +1,5 @@
|
|||||||
" Project wide builtins
|
" Project wide builtins
|
||||||
let g:pyflakes_builtins = ["_", "dynamic_property", "__", "P", "I", "lopen", "icu_lower", "icu_upper", "icu_title", "ngettext"]
|
let $PYFLAKES_BUILTINS = "_,dynamic_property,__,P,I,lopen,icu_lower,icu_upper,icu_title,ngettext"
|
||||||
|
|
||||||
python << EOFPY
|
python << EOFPY
|
||||||
import os, sys
|
import os, sys
|
||||||
|
@ -11,7 +11,7 @@ __all__ = [
|
|||||||
'build', 'build_pdf2xml', 'server',
|
'build', 'build_pdf2xml', 'server',
|
||||||
'gui',
|
'gui',
|
||||||
'develop', 'install',
|
'develop', 'install',
|
||||||
'kakasi', 'resources',
|
'kakasi', 'coffee', 'resources',
|
||||||
'check',
|
'check',
|
||||||
'sdist',
|
'sdist',
|
||||||
'manual', 'tag_release',
|
'manual', 'tag_release',
|
||||||
@ -49,9 +49,10 @@ gui = GUI()
|
|||||||
from setup.check import Check
|
from setup.check import Check
|
||||||
check = Check()
|
check = Check()
|
||||||
|
|
||||||
from setup.resources import Resources, Kakasi
|
from setup.resources import Resources, Kakasi, Coffee
|
||||||
resources = Resources()
|
resources = Resources()
|
||||||
kakasi = Kakasi()
|
kakasi = Kakasi()
|
||||||
|
coffee = Coffee()
|
||||||
|
|
||||||
from setup.publish import Manual, TagRelease, Stage1, Stage2, \
|
from setup.publish import Manual, TagRelease, Stage1, Stage2, \
|
||||||
Stage3, Stage4, Stage5, Publish
|
Stage3, Stage4, Stage5, Publish
|
||||||
|
@ -6,7 +6,7 @@ __license__ = 'GPL v3'
|
|||||||
__copyright__ = '2009, Kovid Goyal <kovid@kovidgoyal.net>'
|
__copyright__ = '2009, Kovid Goyal <kovid@kovidgoyal.net>'
|
||||||
__docformat__ = 'restructuredtext en'
|
__docformat__ = 'restructuredtext en'
|
||||||
|
|
||||||
import sys, os, textwrap, subprocess, shutil, tempfile, atexit, shlex
|
import sys, os, textwrap, subprocess, shutil, tempfile, atexit, shlex, glob
|
||||||
|
|
||||||
from setup import (Command, islinux, isbsd, basenames, modules, functions,
|
from setup import (Command, islinux, isbsd, basenames, modules, functions,
|
||||||
__appname__, __version__)
|
__appname__, __version__)
|
||||||
@ -296,13 +296,14 @@ class Sdist(Command):
|
|||||||
for x in open('.bzrignore').readlines():
|
for x in open('.bzrignore').readlines():
|
||||||
if not x.startswith('resources/'): continue
|
if not x.startswith('resources/'): continue
|
||||||
p = x.strip().replace('/', os.sep)
|
p = x.strip().replace('/', os.sep)
|
||||||
d = self.j(tdir, os.path.dirname(p))
|
for p in glob.glob(p):
|
||||||
if not self.e(d):
|
d = self.j(tdir, os.path.dirname(p))
|
||||||
os.makedirs(d)
|
if not self.e(d):
|
||||||
if os.path.isdir(p):
|
os.makedirs(d)
|
||||||
shutil.copytree(p, self.j(tdir, p))
|
if os.path.isdir(p):
|
||||||
else:
|
shutil.copytree(p, self.j(tdir, p))
|
||||||
shutil.copy2(p, d)
|
else:
|
||||||
|
shutil.copy2(p, d)
|
||||||
for x in os.walk(os.path.join(self.SRC, 'calibre')):
|
for x in os.walk(os.path.join(self.SRC, 'calibre')):
|
||||||
for f in x[-1]:
|
for f in x[-1]:
|
||||||
if not f.endswith('_ui.py'): continue
|
if not f.endswith('_ui.py'): continue
|
||||||
|
@ -12,14 +12,14 @@ msgstr ""
|
|||||||
"Report-Msgid-Bugs-To: Debian iso-codes team <pkg-isocodes-"
|
"Report-Msgid-Bugs-To: Debian iso-codes team <pkg-isocodes-"
|
||||||
"devel@lists.alioth.debian.org>\n"
|
"devel@lists.alioth.debian.org>\n"
|
||||||
"POT-Creation-Date: 2011-11-25 14:01+0000\n"
|
"POT-Creation-Date: 2011-11-25 14:01+0000\n"
|
||||||
"PO-Revision-Date: 2011-11-22 16:45+0000\n"
|
"PO-Revision-Date: 2011-12-14 19:48+0000\n"
|
||||||
"Last-Translator: Ferran Rius <frius64@hotmail.com>\n"
|
"Last-Translator: Ferran Rius <frius64@hotmail.com>\n"
|
||||||
"Language-Team: Catalan <linux@softcatala.org>\n"
|
"Language-Team: Catalan <linux@softcatala.org>\n"
|
||||||
"MIME-Version: 1.0\n"
|
"MIME-Version: 1.0\n"
|
||||||
"Content-Type: text/plain; charset=UTF-8\n"
|
"Content-Type: text/plain; charset=UTF-8\n"
|
||||||
"Content-Transfer-Encoding: 8bit\n"
|
"Content-Transfer-Encoding: 8bit\n"
|
||||||
"X-Launchpad-Export-Date: 2011-11-26 05:10+0000\n"
|
"X-Launchpad-Export-Date: 2011-12-15 05:18+0000\n"
|
||||||
"X-Generator: Launchpad (build 14381)\n"
|
"X-Generator: Launchpad (build 14487)\n"
|
||||||
"Language: ca\n"
|
"Language: ca\n"
|
||||||
|
|
||||||
#. name for aaa
|
#. name for aaa
|
||||||
@ -9348,7 +9348,7 @@ msgstr "Seit-Kaitetu"
|
|||||||
|
|
||||||
#. name for hil
|
#. name for hil
|
||||||
msgid "Hiligaynon"
|
msgid "Hiligaynon"
|
||||||
msgstr ""
|
msgstr "Hiligainon"
|
||||||
|
|
||||||
#. name for hin
|
#. name for hin
|
||||||
msgid "Hindi"
|
msgid "Hindi"
|
||||||
@ -9356,39 +9356,39 @@ msgstr "Hindi"
|
|||||||
|
|
||||||
#. name for hio
|
#. name for hio
|
||||||
msgid "Tsoa"
|
msgid "Tsoa"
|
||||||
msgstr ""
|
msgstr "Tsoa"
|
||||||
|
|
||||||
#. name for hir
|
#. name for hir
|
||||||
msgid "Himarimã"
|
msgid "Himarimã"
|
||||||
msgstr ""
|
msgstr "Himarimà"
|
||||||
|
|
||||||
#. name for hit
|
#. name for hit
|
||||||
msgid "Hittite"
|
msgid "Hittite"
|
||||||
msgstr ""
|
msgstr "Hittita"
|
||||||
|
|
||||||
#. name for hiw
|
#. name for hiw
|
||||||
msgid "Hiw"
|
msgid "Hiw"
|
||||||
msgstr ""
|
msgstr "Hiw"
|
||||||
|
|
||||||
#. name for hix
|
#. name for hix
|
||||||
msgid "Hixkaryána"
|
msgid "Hixkaryána"
|
||||||
msgstr ""
|
msgstr "Hishkaryana"
|
||||||
|
|
||||||
#. name for hji
|
#. name for hji
|
||||||
msgid "Haji"
|
msgid "Haji"
|
||||||
msgstr ""
|
msgstr "Aji"
|
||||||
|
|
||||||
#. name for hka
|
#. name for hka
|
||||||
msgid "Kahe"
|
msgid "Kahe"
|
||||||
msgstr ""
|
msgstr "Kahe"
|
||||||
|
|
||||||
#. name for hke
|
#. name for hke
|
||||||
msgid "Hunde"
|
msgid "Hunde"
|
||||||
msgstr ""
|
msgstr "Hunde"
|
||||||
|
|
||||||
#. name for hkk
|
#. name for hkk
|
||||||
msgid "Hunjara-Kaina Ke"
|
msgid "Hunjara-Kaina Ke"
|
||||||
msgstr ""
|
msgstr "Hunjara"
|
||||||
|
|
||||||
#. name for hks
|
#. name for hks
|
||||||
msgid "Hong Kong Sign Language"
|
msgid "Hong Kong Sign Language"
|
||||||
@ -9396,27 +9396,27 @@ msgstr "Llenguatge de signes de Hong Kong"
|
|||||||
|
|
||||||
#. name for hla
|
#. name for hla
|
||||||
msgid "Halia"
|
msgid "Halia"
|
||||||
msgstr ""
|
msgstr "Halia"
|
||||||
|
|
||||||
#. name for hlb
|
#. name for hlb
|
||||||
msgid "Halbi"
|
msgid "Halbi"
|
||||||
msgstr ""
|
msgstr "Halbi"
|
||||||
|
|
||||||
#. name for hld
|
#. name for hld
|
||||||
msgid "Halang Doan"
|
msgid "Halang Doan"
|
||||||
msgstr ""
|
msgstr "Halang Doan"
|
||||||
|
|
||||||
#. name for hle
|
#. name for hle
|
||||||
msgid "Hlersu"
|
msgid "Hlersu"
|
||||||
msgstr ""
|
msgstr "Sansu"
|
||||||
|
|
||||||
#. name for hlt
|
#. name for hlt
|
||||||
msgid "Nga La"
|
msgid "Nga La"
|
||||||
msgstr ""
|
msgstr "Nga La"
|
||||||
|
|
||||||
#. name for hlu
|
#. name for hlu
|
||||||
msgid "Luwian; Hieroglyphic"
|
msgid "Luwian; Hieroglyphic"
|
||||||
msgstr ""
|
msgstr "Luvi; jeroglífic"
|
||||||
|
|
||||||
#. name for hma
|
#. name for hma
|
||||||
msgid "Miao; Southern Mashan"
|
msgid "Miao; Southern Mashan"
|
||||||
@ -9424,7 +9424,7 @@ msgstr "Miao; Mashan meridional"
|
|||||||
|
|
||||||
#. name for hmb
|
#. name for hmb
|
||||||
msgid "Songhay; Humburi Senni"
|
msgid "Songhay; Humburi Senni"
|
||||||
msgstr ""
|
msgstr "Songhai; central"
|
||||||
|
|
||||||
#. name for hmc
|
#. name for hmc
|
||||||
msgid "Miao; Central Huishui"
|
msgid "Miao; Central Huishui"
|
||||||
@ -9440,11 +9440,11 @@ msgstr "Miao; Huishui oriental"
|
|||||||
|
|
||||||
#. name for hmf
|
#. name for hmf
|
||||||
msgid "Hmong Don"
|
msgid "Hmong Don"
|
||||||
msgstr ""
|
msgstr "Miao; Don"
|
||||||
|
|
||||||
#. name for hmg
|
#. name for hmg
|
||||||
msgid "Hmong; Southwestern Guiyang"
|
msgid "Hmong; Southwestern Guiyang"
|
||||||
msgstr ""
|
msgstr "Miao; Guiyang sudoccidental"
|
||||||
|
|
||||||
#. name for hmh
|
#. name for hmh
|
||||||
msgid "Miao; Southwestern Huishui"
|
msgid "Miao; Southwestern Huishui"
|
||||||
@ -9456,11 +9456,11 @@ msgstr "Miao; Huishui septentrional"
|
|||||||
|
|
||||||
#. name for hmj
|
#. name for hmj
|
||||||
msgid "Ge"
|
msgid "Ge"
|
||||||
msgstr ""
|
msgstr "Ge"
|
||||||
|
|
||||||
#. name for hmk
|
#. name for hmk
|
||||||
msgid "Maek"
|
msgid "Maek"
|
||||||
msgstr ""
|
msgstr "Maek"
|
||||||
|
|
||||||
#. name for hml
|
#. name for hml
|
||||||
msgid "Miao; Luopohe"
|
msgid "Miao; Luopohe"
|
||||||
@ -9472,11 +9472,11 @@ msgstr "Miao; Mashan central"
|
|||||||
|
|
||||||
#. name for hmn
|
#. name for hmn
|
||||||
msgid "Hmong"
|
msgid "Hmong"
|
||||||
msgstr ""
|
msgstr "Hmong (macrollengua)"
|
||||||
|
|
||||||
#. name for hmo
|
#. name for hmo
|
||||||
msgid "Hiri Motu"
|
msgid "Hiri Motu"
|
||||||
msgstr ""
|
msgstr "Hiri Motu"
|
||||||
|
|
||||||
#. name for hmp
|
#. name for hmp
|
||||||
msgid "Miao; Northern Mashan"
|
msgid "Miao; Northern Mashan"
|
||||||
@ -9488,7 +9488,7 @@ msgstr "Miao; Qiandong oriental"
|
|||||||
|
|
||||||
#. name for hmr
|
#. name for hmr
|
||||||
msgid "Hmar"
|
msgid "Hmar"
|
||||||
msgstr ""
|
msgstr "Hmar"
|
||||||
|
|
||||||
#. name for hms
|
#. name for hms
|
||||||
msgid "Miao; Southern Qiandong"
|
msgid "Miao; Southern Qiandong"
|
||||||
@ -9496,15 +9496,15 @@ msgstr "Miao; Qiandong meridional"
|
|||||||
|
|
||||||
#. name for hmt
|
#. name for hmt
|
||||||
msgid "Hamtai"
|
msgid "Hamtai"
|
||||||
msgstr ""
|
msgstr "Hamtai"
|
||||||
|
|
||||||
#. name for hmu
|
#. name for hmu
|
||||||
msgid "Hamap"
|
msgid "Hamap"
|
||||||
msgstr ""
|
msgstr "Hamap"
|
||||||
|
|
||||||
#. name for hmv
|
#. name for hmv
|
||||||
msgid "Hmong Dô"
|
msgid "Hmong Dô"
|
||||||
msgstr ""
|
msgstr "Miao; Do"
|
||||||
|
|
||||||
#. name for hmw
|
#. name for hmw
|
||||||
msgid "Miao; Western Mashan"
|
msgid "Miao; Western Mashan"
|
||||||
@ -9520,19 +9520,19 @@ msgstr "Miao; Shua"
|
|||||||
|
|
||||||
#. name for hna
|
#. name for hna
|
||||||
msgid "Mina (Cameroon)"
|
msgid "Mina (Cameroon)"
|
||||||
msgstr ""
|
msgstr "Mina (Camerun)"
|
||||||
|
|
||||||
#. name for hnd
|
#. name for hnd
|
||||||
msgid "Hindko; Southern"
|
msgid "Hindko; Southern"
|
||||||
msgstr ""
|
msgstr "Hindko; meridional"
|
||||||
|
|
||||||
#. name for hne
|
#. name for hne
|
||||||
msgid "Chhattisgarhi"
|
msgid "Chhattisgarhi"
|
||||||
msgstr ""
|
msgstr "Chattisgarbi"
|
||||||
|
|
||||||
#. name for hnh
|
#. name for hnh
|
||||||
msgid "//Ani"
|
msgid "//Ani"
|
||||||
msgstr ""
|
msgstr "Ani"
|
||||||
|
|
||||||
#. name for hni
|
#. name for hni
|
||||||
msgid "Hani"
|
msgid "Hani"
|
||||||
@ -9540,7 +9540,7 @@ msgstr ""
|
|||||||
|
|
||||||
#. name for hnj
|
#. name for hnj
|
||||||
msgid "Hmong Njua"
|
msgid "Hmong Njua"
|
||||||
msgstr ""
|
msgstr "Miao; Hmong Njua"
|
||||||
|
|
||||||
#. name for hnn
|
#. name for hnn
|
||||||
msgid "Hanunoo"
|
msgid "Hanunoo"
|
||||||
@ -9548,7 +9548,7 @@ msgstr ""
|
|||||||
|
|
||||||
#. name for hno
|
#. name for hno
|
||||||
msgid "Hindko; Northern"
|
msgid "Hindko; Northern"
|
||||||
msgstr ""
|
msgstr "Hindko; septentrional"
|
||||||
|
|
||||||
#. name for hns
|
#. name for hns
|
||||||
msgid "Hindustani; Caribbean"
|
msgid "Hindustani; Caribbean"
|
||||||
@ -11800,7 +11800,7 @@ msgstr ""
|
|||||||
|
|
||||||
#. name for khq
|
#. name for khq
|
||||||
msgid "Songhay; Koyra Chiini"
|
msgid "Songhay; Koyra Chiini"
|
||||||
msgstr ""
|
msgstr "Songhai; Koyra"
|
||||||
|
|
||||||
#. name for khr
|
#. name for khr
|
||||||
msgid "Kharia"
|
msgid "Kharia"
|
||||||
@ -17288,7 +17288,7 @@ msgstr ""
|
|||||||
|
|
||||||
#. name for mww
|
#. name for mww
|
||||||
msgid "Hmong Daw"
|
msgid "Hmong Daw"
|
||||||
msgstr ""
|
msgstr "Miao; blanc"
|
||||||
|
|
||||||
#. name for mwx
|
#. name for mwx
|
||||||
msgid "Mediak"
|
msgid "Mediak"
|
||||||
@ -28680,7 +28680,7 @@ msgstr ""
|
|||||||
|
|
||||||
#. name for xlu
|
#. name for xlu
|
||||||
msgid "Luwian; Cuneiform"
|
msgid "Luwian; Cuneiform"
|
||||||
msgstr ""
|
msgstr "Luvi; cuneïforme"
|
||||||
|
|
||||||
#. name for xly
|
#. name for xly
|
||||||
msgid "Elymian"
|
msgid "Elymian"
|
||||||
|
@ -12,14 +12,14 @@ msgstr ""
|
|||||||
"Report-Msgid-Bugs-To: Debian iso-codes team <pkg-isocodes-"
|
"Report-Msgid-Bugs-To: Debian iso-codes team <pkg-isocodes-"
|
||||||
"devel@lists.alioth.debian.org>\n"
|
"devel@lists.alioth.debian.org>\n"
|
||||||
"POT-Creation-Date: 2011-11-25 14:01+0000\n"
|
"POT-Creation-Date: 2011-11-25 14:01+0000\n"
|
||||||
"PO-Revision-Date: 2011-09-27 15:33+0000\n"
|
"PO-Revision-Date: 2011-12-03 15:11+0000\n"
|
||||||
"Last-Translator: Kovid Goyal <Unknown>\n"
|
"Last-Translator: Yuri Chornoivan <yurchor@gmail.com>\n"
|
||||||
"Language-Team: Ukrainian <translation-team-uk@lists.sourceforge.net>\n"
|
"Language-Team: Ukrainian <translation-team-uk@lists.sourceforge.net>\n"
|
||||||
"MIME-Version: 1.0\n"
|
"MIME-Version: 1.0\n"
|
||||||
"Content-Type: text/plain; charset=UTF-8\n"
|
"Content-Type: text/plain; charset=UTF-8\n"
|
||||||
"Content-Transfer-Encoding: 8bit\n"
|
"Content-Transfer-Encoding: 8bit\n"
|
||||||
"X-Launchpad-Export-Date: 2011-11-26 05:43+0000\n"
|
"X-Launchpad-Export-Date: 2011-12-04 04:43+0000\n"
|
||||||
"X-Generator: Launchpad (build 14381)\n"
|
"X-Generator: Launchpad (build 14418)\n"
|
||||||
"Language: uk\n"
|
"Language: uk\n"
|
||||||
|
|
||||||
#. name for aaa
|
#. name for aaa
|
||||||
@ -17956,7 +17956,7 @@ msgstr "ндоола"
|
|||||||
|
|
||||||
#. name for nds
|
#. name for nds
|
||||||
msgid "German; Low"
|
msgid "German; Low"
|
||||||
msgstr ""
|
msgstr "нижньонімецька"
|
||||||
|
|
||||||
#. name for ndt
|
#. name for ndt
|
||||||
msgid "Ndunga"
|
msgid "Ndunga"
|
||||||
|
@ -6,7 +6,7 @@ __license__ = 'GPL v3'
|
|||||||
__copyright__ = '2009, Kovid Goyal <kovid@kovidgoyal.net>'
|
__copyright__ = '2009, Kovid Goyal <kovid@kovidgoyal.net>'
|
||||||
__docformat__ = 'restructuredtext en'
|
__docformat__ = 'restructuredtext en'
|
||||||
|
|
||||||
import os, cPickle, re, shutil, marshal, zipfile, glob
|
import os, cPickle, re, shutil, marshal, zipfile, glob, subprocess, time
|
||||||
from zlib import compress
|
from zlib import compress
|
||||||
|
|
||||||
from setup import Command, basenames, __appname__
|
from setup import Command, basenames, __appname__
|
||||||
@ -23,7 +23,70 @@ def get_opts_from_parser(parser):
|
|||||||
for o in g.option_list:
|
for o in g.option_list:
|
||||||
for x in do_opt(o): yield x
|
for x in do_opt(o): yield x
|
||||||
|
|
||||||
class Kakasi(Command):
|
class Coffee(Command): # {{{
|
||||||
|
|
||||||
|
description = 'Compile coffeescript files into javascript'
|
||||||
|
COFFEE_DIRS = {'ebooks/oeb/display': 'display'}
|
||||||
|
|
||||||
|
def add_options(self, parser):
|
||||||
|
parser.add_option('--watch', '-w', action='store_true', default=False,
|
||||||
|
help='Autocompile when .coffee files are changed')
|
||||||
|
parser.add_option('--show-js', action='store_true', default=False,
|
||||||
|
help='Display the generated javascript')
|
||||||
|
|
||||||
|
def run(self, opts):
|
||||||
|
self.do_coffee_compile(opts)
|
||||||
|
if opts.watch:
|
||||||
|
try:
|
||||||
|
while True:
|
||||||
|
time.sleep(0.5)
|
||||||
|
self.do_coffee_compile(opts, timestamp=True,
|
||||||
|
ignore_errors=True)
|
||||||
|
except KeyboardInterrupt:
|
||||||
|
pass
|
||||||
|
|
||||||
|
def show_js(self, jsfile):
|
||||||
|
from pygments.lexers import JavascriptLexer
|
||||||
|
from pygments.formatters import TerminalFormatter
|
||||||
|
from pygments import highlight
|
||||||
|
with open(jsfile, 'rb') as f:
|
||||||
|
raw = f.read()
|
||||||
|
print highlight(raw, JavascriptLexer(), TerminalFormatter())
|
||||||
|
|
||||||
|
def do_coffee_compile(self, opts, timestamp=False, ignore_errors=False):
|
||||||
|
for toplevel, dest in self.COFFEE_DIRS.iteritems():
|
||||||
|
dest = self.j(self.RESOURCES, dest)
|
||||||
|
for x in glob.glob(self.j(self.SRC, __appname__, toplevel, '*.coffee')):
|
||||||
|
js = self.j(dest, os.path.basename(x.rpartition('.')[0]+'.js'))
|
||||||
|
if self.newer(js, x):
|
||||||
|
print ('\t%sCompiling %s'%(time.strftime('[%H:%M:%S] ') if
|
||||||
|
timestamp else '', os.path.basename(x)))
|
||||||
|
try:
|
||||||
|
subprocess.check_call(['coffee', '-c', '-o', dest, x])
|
||||||
|
except:
|
||||||
|
print ('\n\tCompilation of %s failed'%os.path.basename(x))
|
||||||
|
if ignore_errors:
|
||||||
|
with open(js, 'wb') as f:
|
||||||
|
f.write('# Compilation from coffeescript failed')
|
||||||
|
else:
|
||||||
|
raise SystemExit(1)
|
||||||
|
else:
|
||||||
|
if opts.show_js:
|
||||||
|
self.show_js(js)
|
||||||
|
print ('#'*80)
|
||||||
|
print ('#'*80)
|
||||||
|
|
||||||
|
def clean(self):
|
||||||
|
for toplevel, dest in self.COFFEE_DIRS.iteritems():
|
||||||
|
dest = self.j(self.RESOURCES, dest)
|
||||||
|
for x in glob.glob(self.j(self.SRC, __appname__, toplevel, '*.coffee')):
|
||||||
|
x = x.rpartition('.')[0] + '.js'
|
||||||
|
x = self.j(dest, os.path.basename(x))
|
||||||
|
if os.path.exists(x):
|
||||||
|
os.remove(x)
|
||||||
|
# }}}
|
||||||
|
|
||||||
|
class Kakasi(Command): # {{{
|
||||||
|
|
||||||
description = 'Compile resources for unihandecode'
|
description = 'Compile resources for unihandecode'
|
||||||
|
|
||||||
@ -62,9 +125,6 @@ class Kakasi(Command):
|
|||||||
self.info('\tGenerating kanadict')
|
self.info('\tGenerating kanadict')
|
||||||
self.mkkanadict(src, dest)
|
self.mkkanadict(src, dest)
|
||||||
|
|
||||||
return
|
|
||||||
|
|
||||||
|
|
||||||
def mkitaiji(self, src, dst):
|
def mkitaiji(self, src, dst):
|
||||||
dic = {}
|
dic = {}
|
||||||
for line in open(src, "r"):
|
for line in open(src, "r"):
|
||||||
@ -125,11 +185,12 @@ class Kakasi(Command):
|
|||||||
kakasi = self.j(self.RESOURCES, 'localization', 'pykakasi')
|
kakasi = self.j(self.RESOURCES, 'localization', 'pykakasi')
|
||||||
if os.path.exists(kakasi):
|
if os.path.exists(kakasi):
|
||||||
shutil.rmtree(kakasi)
|
shutil.rmtree(kakasi)
|
||||||
|
# }}}
|
||||||
|
|
||||||
class Resources(Command):
|
class Resources(Command): # {{{
|
||||||
|
|
||||||
description = 'Compile various needed calibre resources'
|
description = 'Compile various needed calibre resources'
|
||||||
sub_commands = ['kakasi']
|
sub_commands = ['kakasi', 'coffee']
|
||||||
|
|
||||||
def run(self, opts):
|
def run(self, opts):
|
||||||
scripts = {}
|
scripts = {}
|
||||||
@ -223,13 +284,13 @@ class Resources(Command):
|
|||||||
x = self.j(self.RESOURCES, x+'.pickle')
|
x = self.j(self.RESOURCES, x+'.pickle')
|
||||||
if os.path.exists(x):
|
if os.path.exists(x):
|
||||||
os.remove(x)
|
os.remove(x)
|
||||||
from setup.commands import kakasi
|
from setup.commands import kakasi, coffee
|
||||||
kakasi.clean()
|
kakasi.clean()
|
||||||
|
coffee.clean()
|
||||||
for x in ('builtin_recipes.xml', 'builtin_recipes.zip',
|
for x in ('builtin_recipes.xml', 'builtin_recipes.zip',
|
||||||
'template-functions.json'):
|
'template-functions.json'):
|
||||||
x = self.j(self.RESOURCES, x)
|
x = self.j(self.RESOURCES, x)
|
||||||
if os.path.exists(x):
|
if os.path.exists(x):
|
||||||
os.remove(x)
|
os.remove(x)
|
||||||
|
# }}}
|
||||||
|
|
||||||
|
|
||||||
|
@ -215,32 +215,34 @@ class GetTranslations(Translations): # {{{
|
|||||||
description = 'Get updated translations from Launchpad'
|
description = 'Get updated translations from Launchpad'
|
||||||
BRANCH = 'lp:~kovid/calibre/translations'
|
BRANCH = 'lp:~kovid/calibre/translations'
|
||||||
|
|
||||||
@classmethod
|
@property
|
||||||
def modified_translations(cls):
|
def modified_translations(self):
|
||||||
raw = subprocess.Popen(['bzr', 'status'],
|
raw = subprocess.Popen(['bzr', 'status', '-S', self.PATH],
|
||||||
stdout=subprocess.PIPE).stdout.read().strip()
|
stdout=subprocess.PIPE).stdout.read().strip()
|
||||||
|
ans = []
|
||||||
for line in raw.splitlines():
|
for line in raw.splitlines():
|
||||||
line = line.strip()
|
line = line.strip()
|
||||||
if line.startswith(cls.PATH) and line.endswith('.po'):
|
if line.startswith('M') and line.endswith('.po'):
|
||||||
yield line
|
ans.append(line.split()[-1])
|
||||||
|
return ans
|
||||||
|
|
||||||
def run(self, opts):
|
def run(self, opts):
|
||||||
if len(list(self.modified_translations())) == 0:
|
if not self.modified_translations:
|
||||||
subprocess.check_call(['bzr', 'merge', self.BRANCH])
|
subprocess.check_call(['bzr', 'merge', self.BRANCH])
|
||||||
if len(list(self.modified_translations())) == 0:
|
|
||||||
print 'No updated translations available'
|
|
||||||
else:
|
|
||||||
subprocess.check_call(['bzr', 'commit', '-m',
|
|
||||||
'IGN:Updated translations', self.PATH])
|
|
||||||
self.check_for_errors()
|
self.check_for_errors()
|
||||||
|
|
||||||
@classmethod
|
if self.modified_translations:
|
||||||
def check_for_errors(cls):
|
subprocess.check_call(['bzr', 'commit', '-m',
|
||||||
|
'IGN:Updated translations', self.PATH])
|
||||||
|
else:
|
||||||
|
print('No updated translations available')
|
||||||
|
|
||||||
|
def check_for_errors(self):
|
||||||
errors = os.path.join(tempfile.gettempdir(), 'calibre-translation-errors')
|
errors = os.path.join(tempfile.gettempdir(), 'calibre-translation-errors')
|
||||||
if os.path.exists(errors):
|
if os.path.exists(errors):
|
||||||
shutil.rmtree(errors)
|
shutil.rmtree(errors)
|
||||||
os.mkdir(errors)
|
os.mkdir(errors)
|
||||||
pofilter = ('pofilter', '-i', cls.PATH, '-o', errors,
|
pofilter = ('pofilter', '-i', self.PATH, '-o', errors,
|
||||||
'-t', 'accelerators', '-t', 'escapes', '-t', 'variables',
|
'-t', 'accelerators', '-t', 'escapes', '-t', 'variables',
|
||||||
#'-t', 'xmltags',
|
#'-t', 'xmltags',
|
||||||
#'-t', 'brackets',
|
#'-t', 'brackets',
|
||||||
@ -253,23 +255,20 @@ class GetTranslations(Translations): # {{{
|
|||||||
'-t', 'printf')
|
'-t', 'printf')
|
||||||
subprocess.check_call(pofilter)
|
subprocess.check_call(pofilter)
|
||||||
errfiles = glob.glob(errors+os.sep+'*.po')
|
errfiles = glob.glob(errors+os.sep+'*.po')
|
||||||
subprocess.check_call(['gvim', '-f', '-p', '--']+errfiles)
|
if errfiles:
|
||||||
for f in errfiles:
|
subprocess.check_call(['gvim', '-f', '-p', '--']+errfiles)
|
||||||
with open(f, 'r+b') as f:
|
for f in errfiles:
|
||||||
raw = f.read()
|
with open(f, 'r+b') as f:
|
||||||
raw = re.sub(r'# \(pofilter\).*', '', raw)
|
raw = f.read()
|
||||||
f.seek(0)
|
raw = re.sub(r'# \(pofilter\).*', '', raw)
|
||||||
f.truncate()
|
f.seek(0)
|
||||||
f.write(raw)
|
f.truncate()
|
||||||
|
f.write(raw)
|
||||||
|
|
||||||
subprocess.check_call(['pomerge', '-t', cls.PATH, '-i', errors, '-o',
|
subprocess.check_call(['pomerge', '-t', self.PATH, '-i', errors, '-o',
|
||||||
cls.PATH])
|
self.PATH])
|
||||||
if len(list(cls.modified_translations())) > 0:
|
return True
|
||||||
subprocess.call(['bzr', 'diff', cls.PATH])
|
return False
|
||||||
yes = raw_input('Merge corrections? [y/n]: ').strip()
|
|
||||||
if yes in ['', 'y']:
|
|
||||||
subprocess.check_call(['bzr', 'commit', '-m',
|
|
||||||
'IGN:Translation corrections', cls.PATH])
|
|
||||||
|
|
||||||
# }}}
|
# }}}
|
||||||
|
|
||||||
|
@ -558,11 +558,11 @@ xml_entity_to_unicode = partial(entity_to_unicode, result_exceptions = {
|
|||||||
'>' : '>',
|
'>' : '>',
|
||||||
'&' : '&'})
|
'&' : '&'})
|
||||||
|
|
||||||
def replace_entities(raw):
|
def replace_entities(raw, encoding='cp1252'):
|
||||||
return _ent_pat.sub(entity_to_unicode, raw)
|
return _ent_pat.sub(partial(entity_to_unicode, encoding=encoding), raw)
|
||||||
|
|
||||||
def xml_replace_entities(raw):
|
def xml_replace_entities(raw, encoding='cp1252'):
|
||||||
return _ent_pat.sub(xml_entity_to_unicode, raw)
|
return _ent_pat.sub(partial(xml_entity_to_unicode, encoding=encoding), raw)
|
||||||
|
|
||||||
def prepare_string_for_xml(raw, attribute=False):
|
def prepare_string_for_xml(raw, attribute=False):
|
||||||
raw = _ent_pat.sub(entity_to_unicode, raw)
|
raw = _ent_pat.sub(entity_to_unicode, raw)
|
||||||
|
@ -4,7 +4,7 @@ __license__ = 'GPL v3'
|
|||||||
__copyright__ = '2008, Kovid Goyal kovid@kovidgoyal.net'
|
__copyright__ = '2008, Kovid Goyal kovid@kovidgoyal.net'
|
||||||
__docformat__ = 'restructuredtext en'
|
__docformat__ = 'restructuredtext en'
|
||||||
__appname__ = u'calibre'
|
__appname__ = u'calibre'
|
||||||
numeric_version = (0, 8, 29)
|
numeric_version = (0, 8, 31)
|
||||||
__version__ = u'.'.join(map(unicode, numeric_version))
|
__version__ = u'.'.join(map(unicode, numeric_version))
|
||||||
__author__ = u"Kovid Goyal <kovid@kovidgoyal.net>"
|
__author__ = u"Kovid Goyal <kovid@kovidgoyal.net>"
|
||||||
|
|
||||||
|
@ -451,6 +451,10 @@ class CatalogPlugin(Plugin): # {{{
|
|||||||
'series_index','series','size','tags','timestamp',
|
'series_index','series','size','tags','timestamp',
|
||||||
'title_sort','title','uuid','languages'])
|
'title_sort','title','uuid','languages'])
|
||||||
all_custom_fields = set(db.custom_field_keys())
|
all_custom_fields = set(db.custom_field_keys())
|
||||||
|
for field in list(all_custom_fields):
|
||||||
|
fm = db.field_metadata[field]
|
||||||
|
if fm['datatype'] == 'series':
|
||||||
|
all_custom_fields.add(field+'_index')
|
||||||
all_fields = all_std_fields.union(all_custom_fields)
|
all_fields = all_std_fields.union(all_custom_fields)
|
||||||
|
|
||||||
if opts.fields != 'all':
|
if opts.fields != 'all':
|
||||||
|
@ -143,6 +143,9 @@ class ANDROID(USBMS):
|
|||||||
# Kobo
|
# Kobo
|
||||||
0x2237: { 0x2208 : [0x0226] },
|
0x2237: { 0x2208 : [0x0226] },
|
||||||
|
|
||||||
|
# Lenovo
|
||||||
|
0x17ef : { 0x7421 : [0x0216] },
|
||||||
|
|
||||||
}
|
}
|
||||||
EBOOK_DIR_MAIN = ['eBooks/import', 'wordplayer/calibretransfer', 'Books',
|
EBOOK_DIR_MAIN = ['eBooks/import', 'wordplayer/calibretransfer', 'Books',
|
||||||
'sdcard/ebooks']
|
'sdcard/ebooks']
|
||||||
@ -155,7 +158,7 @@ class ANDROID(USBMS):
|
|||||||
'GT-I5700', 'SAMSUNG', 'DELL', 'LINUX', 'GOOGLE', 'ARCHOS',
|
'GT-I5700', 'SAMSUNG', 'DELL', 'LINUX', 'GOOGLE', 'ARCHOS',
|
||||||
'TELECHIP', 'HUAWEI', 'T-MOBILE', 'SEMC', 'LGE', 'NVIDIA',
|
'TELECHIP', 'HUAWEI', 'T-MOBILE', 'SEMC', 'LGE', 'NVIDIA',
|
||||||
'GENERIC-', 'ZTE', 'MID', 'QUALCOMM', 'PANDIGIT', 'HYSTON',
|
'GENERIC-', 'ZTE', 'MID', 'QUALCOMM', 'PANDIGIT', 'HYSTON',
|
||||||
'VIZIO', 'GOOGLE', 'FREESCAL', 'KOBO_INC']
|
'VIZIO', 'GOOGLE', 'FREESCAL', 'KOBO_INC', 'LENOVO']
|
||||||
WINDOWS_MAIN_MEM = ['ANDROID_PHONE', 'A855', 'A853', 'INC.NEXUS_ONE',
|
WINDOWS_MAIN_MEM = ['ANDROID_PHONE', 'A855', 'A853', 'INC.NEXUS_ONE',
|
||||||
'__UMS_COMPOSITE', '_MB200', 'MASS_STORAGE', '_-_CARD', 'SGH-I897',
|
'__UMS_COMPOSITE', '_MB200', 'MASS_STORAGE', '_-_CARD', 'SGH-I897',
|
||||||
'GT-I9000', 'FILE-STOR_GADGET', 'SGH-T959', 'SAMSUNG_ANDROID',
|
'GT-I9000', 'FILE-STOR_GADGET', 'SGH-T959', 'SAMSUNG_ANDROID',
|
||||||
@ -167,12 +170,13 @@ class ANDROID(USBMS):
|
|||||||
'MB525', 'ANDROID2.3', 'SGH-I997', 'GT-I5800_CARD', 'MB612',
|
'MB525', 'ANDROID2.3', 'SGH-I997', 'GT-I5800_CARD', 'MB612',
|
||||||
'GT-S5830_CARD', 'GT-S5570_CARD', 'MB870', 'MID7015A',
|
'GT-S5830_CARD', 'GT-S5570_CARD', 'MB870', 'MID7015A',
|
||||||
'ALPANDIGITAL', 'ANDROID_MID', 'VTAB1008', 'EMX51_BBG_ANDROI',
|
'ALPANDIGITAL', 'ANDROID_MID', 'VTAB1008', 'EMX51_BBG_ANDROI',
|
||||||
'UMS', '.K080', 'P990', 'LTE', 'MB853', 'GT-S5660_CARD']
|
'UMS', '.K080', 'P990', 'LTE', 'MB853', 'GT-S5660_CARD', 'A107']
|
||||||
WINDOWS_CARD_A_MEM = ['ANDROID_PHONE', 'GT-I9000_CARD', 'SGH-I897',
|
WINDOWS_CARD_A_MEM = ['ANDROID_PHONE', 'GT-I9000_CARD', 'SGH-I897',
|
||||||
'FILE-STOR_GADGET', 'SGH-T959', 'SAMSUNG_ANDROID', 'GT-P1000_CARD',
|
'FILE-STOR_GADGET', 'SGH-T959', 'SAMSUNG_ANDROID', 'GT-P1000_CARD',
|
||||||
'A70S', 'A101IT', '7', 'INCREDIBLE', 'A7EB', 'SGH-T849_CARD',
|
'A70S', 'A101IT', '7', 'INCREDIBLE', 'A7EB', 'SGH-T849_CARD',
|
||||||
'__UMS_COMPOSITE', 'SGH-I997_CARD', 'MB870', 'ALPANDIGITAL',
|
'__UMS_COMPOSITE', 'SGH-I997_CARD', 'MB870', 'ALPANDIGITAL',
|
||||||
'ANDROID_MID', 'P990_SD_CARD', '.K080', 'LTE_CARD', 'MB853']
|
'ANDROID_MID', 'P990_SD_CARD', '.K080', 'LTE_CARD', 'MB853',
|
||||||
|
'A1-07___C0541A4F']
|
||||||
|
|
||||||
OSX_MAIN_MEM = 'Android Device Main Memory'
|
OSX_MAIN_MEM = 'Android Device Main Memory'
|
||||||
|
|
||||||
|
@ -173,8 +173,9 @@ class INVESBOOK(EB600):
|
|||||||
FORMATS = ['epub', 'mobi', 'prc', 'fb2', 'html', 'pdf', 'rtf', 'txt']
|
FORMATS = ['epub', 'mobi', 'prc', 'fb2', 'html', 'pdf', 'rtf', 'txt']
|
||||||
BCD = [0x110, 0x323]
|
BCD = [0x110, 0x323]
|
||||||
|
|
||||||
VENDOR_NAME = ['INVES_E6', 'INVES-WI']
|
VENDOR_NAME = ['INVES_E6', 'INVES-WI', 'POCKETBO']
|
||||||
WINDOWS_MAIN_MEM = WINDOWS_CARD_A_MEM = ['00INVES_E600', 'INVES-WIBOOK']
|
WINDOWS_MAIN_MEM = WINDOWS_CARD_A_MEM = ['00INVES_E600', 'INVES-WIBOOK',
|
||||||
|
'OK_POCKET_611_61']
|
||||||
|
|
||||||
class BOOQ(EB600):
|
class BOOQ(EB600):
|
||||||
name = 'Booq Device Interface'
|
name = 'Booq Device Interface'
|
||||||
|
@ -30,7 +30,7 @@ BOOK_EXTENSIONS = ['lrf', 'rar', 'zip', 'rtf', 'lit', 'txt', 'txtz', 'text', 'ht
|
|||||||
'html', 'htmlz', 'xhtml', 'pdf', 'pdb', 'pdr', 'prc', 'mobi', 'azw', 'doc',
|
'html', 'htmlz', 'xhtml', 'pdf', 'pdb', 'pdr', 'prc', 'mobi', 'azw', 'doc',
|
||||||
'epub', 'fb2', 'djv', 'djvu', 'lrx', 'cbr', 'cbz', 'cbc', 'oebzip',
|
'epub', 'fb2', 'djv', 'djvu', 'lrx', 'cbr', 'cbz', 'cbc', 'oebzip',
|
||||||
'rb', 'imp', 'odt', 'chm', 'tpz', 'azw1', 'pml', 'pmlz', 'mbp', 'tan', 'snb',
|
'rb', 'imp', 'odt', 'chm', 'tpz', 'azw1', 'pml', 'pmlz', 'mbp', 'tan', 'snb',
|
||||||
'xps', 'oxps', 'azw4', 'book', 'zbf', 'pobi']
|
'xps', 'oxps', 'azw4', 'book', 'zbf', 'pobi', 'docx']
|
||||||
|
|
||||||
class HTMLRenderer(object):
|
class HTMLRenderer(object):
|
||||||
|
|
||||||
|
@ -17,6 +17,10 @@ from calibre.ptempfile import PersistentTemporaryDirectory
|
|||||||
from calibre.utils.ipc.server import Server
|
from calibre.utils.ipc.server import Server
|
||||||
from calibre.utils.ipc.job import ParallelJob
|
from calibre.utils.ipc.job import ParallelJob
|
||||||
|
|
||||||
|
# If the specified screen has either dimension larger than this value, no image
|
||||||
|
# rescaling is done (we assume that it is a tablet output profile)
|
||||||
|
MAX_SCREEN_SIZE = 3000
|
||||||
|
|
||||||
def extract_comic(path_to_comic_file):
|
def extract_comic(path_to_comic_file):
|
||||||
'''
|
'''
|
||||||
Un-archive the comic file.
|
Un-archive the comic file.
|
||||||
@ -141,7 +145,7 @@ class PageProcessor(list): # {{{
|
|||||||
newsizey = int(newsizex / aspect)
|
newsizey = int(newsizex / aspect)
|
||||||
deltax = 0
|
deltax = 0
|
||||||
deltay = (SCRHEIGHT - newsizey) / 2
|
deltay = (SCRHEIGHT - newsizey) / 2
|
||||||
if newsizex < 20000 and newsizey < 20000:
|
if newsizex < MAX_SCREEN_SIZE and newsizey < MAX_SCREEN_SIZE:
|
||||||
# Too large and resizing fails, so better
|
# Too large and resizing fails, so better
|
||||||
# to leave it as original size
|
# to leave it as original size
|
||||||
wand.size = (newsizex, newsizey)
|
wand.size = (newsizex, newsizey)
|
||||||
@ -165,14 +169,14 @@ class PageProcessor(list): # {{{
|
|||||||
newsizey = int(newsizex / aspect)
|
newsizey = int(newsizex / aspect)
|
||||||
deltax = 0
|
deltax = 0
|
||||||
deltay = (wscreeny - newsizey) / 2
|
deltay = (wscreeny - newsizey) / 2
|
||||||
if newsizex < 20000 and newsizey < 20000:
|
if newsizex < MAX_SCREEN_SIZE and newsizey < MAX_SCREEN_SIZE:
|
||||||
# Too large and resizing fails, so better
|
# Too large and resizing fails, so better
|
||||||
# to leave it as original size
|
# to leave it as original size
|
||||||
wand.size = (newsizex, newsizey)
|
wand.size = (newsizex, newsizey)
|
||||||
wand.set_border_color(pw)
|
wand.set_border_color(pw)
|
||||||
wand.add_border(pw, deltax, deltay)
|
wand.add_border(pw, deltax, deltay)
|
||||||
else:
|
else:
|
||||||
if SCRWIDTH < 20000 and SCRHEIGHT < 20000:
|
if SCRWIDTH < MAX_SCREEN_SIZE and SCRHEIGHT < MAX_SCREEN_SIZE:
|
||||||
wand.size = (SCRWIDTH, SCRHEIGHT)
|
wand.size = (SCRWIDTH, SCRHEIGHT)
|
||||||
|
|
||||||
if not self.opts.dont_sharpen:
|
if not self.opts.dont_sharpen:
|
||||||
|
@ -229,7 +229,10 @@ class EPUBOutput(OutputFormatPlugin):
|
|||||||
if opts.extract_to is not None:
|
if opts.extract_to is not None:
|
||||||
from calibre.utils.zipfile import ZipFile
|
from calibre.utils.zipfile import ZipFile
|
||||||
if os.path.exists(opts.extract_to):
|
if os.path.exists(opts.extract_to):
|
||||||
shutil.rmtree(opts.extract_to)
|
if os.path.isdir(opts.extract_to):
|
||||||
|
shutil.rmtree(opts.extract_to)
|
||||||
|
else:
|
||||||
|
os.remove(opts.extract_to)
|
||||||
os.mkdir(opts.extract_to)
|
os.mkdir(opts.extract_to)
|
||||||
with ZipFile(output_path) as zf:
|
with ZipFile(output_path) as zf:
|
||||||
zf.extractall(path=opts.extract_to)
|
zf.extractall(path=opts.extract_to)
|
||||||
|
@ -148,7 +148,11 @@ class HTMLFile(object):
|
|||||||
url = match.group(i)
|
url = match.group(i)
|
||||||
if url:
|
if url:
|
||||||
break
|
break
|
||||||
link = self.resolve(url)
|
try:
|
||||||
|
link = self.resolve(url)
|
||||||
|
except ValueError:
|
||||||
|
# Unparseable URL, ignore
|
||||||
|
continue
|
||||||
if link not in self.links:
|
if link not in self.links:
|
||||||
self.links.append(link)
|
self.links.append(link)
|
||||||
|
|
||||||
|
@ -16,7 +16,8 @@ from lxml.html import tostring
|
|||||||
|
|
||||||
from calibre import as_unicode
|
from calibre import as_unicode
|
||||||
from calibre.ebooks.metadata import check_isbn
|
from calibre.ebooks.metadata import check_isbn
|
||||||
from calibre.ebooks.metadata.sources.base import Source, Option
|
from calibre.ebooks.metadata.sources.base import (Source, Option, fixcase,
|
||||||
|
fixauthors)
|
||||||
from calibre.utils.cleantext import clean_ascii_chars
|
from calibre.utils.cleantext import clean_ascii_chars
|
||||||
from calibre.ebooks.chardet import xml_to_unicode
|
from calibre.ebooks.chardet import xml_to_unicode
|
||||||
from calibre.ebooks.metadata.book.base import Metadata
|
from calibre.ebooks.metadata.book.base import Metadata
|
||||||
@ -509,6 +510,15 @@ class Amazon(Source):
|
|||||||
|
|
||||||
return domain
|
return domain
|
||||||
|
|
||||||
|
def clean_downloaded_metadata(self, mi):
|
||||||
|
if mi.title and self.domain in ('com', 'uk'):
|
||||||
|
mi.title = fixcase(mi.title)
|
||||||
|
mi.authors = fixauthors(mi.authors)
|
||||||
|
if self.domain in ('com', 'uk'):
|
||||||
|
mi.tags = list(map(fixcase, mi.tags))
|
||||||
|
mi.isbn = check_isbn(mi.isbn)
|
||||||
|
|
||||||
|
|
||||||
def create_query(self, log, title=None, authors=None, identifiers={}, # {{{
|
def create_query(self, log, title=None, authors=None, identifiers={}, # {{{
|
||||||
domain=None):
|
domain=None):
|
||||||
if domain is None:
|
if domain is None:
|
||||||
|
@ -31,7 +31,7 @@ class TOC(list):
|
|||||||
|
|
||||||
def __init__(self, href=None, fragment=None, text=None, parent=None, play_order=0,
|
def __init__(self, href=None, fragment=None, text=None, parent=None, play_order=0,
|
||||||
base_path=os.getcwd(), type='unknown', author=None,
|
base_path=os.getcwd(), type='unknown', author=None,
|
||||||
description=None):
|
description=None, toc_thumbnail=None):
|
||||||
self.href = href
|
self.href = href
|
||||||
self.fragment = fragment
|
self.fragment = fragment
|
||||||
if not self.fragment:
|
if not self.fragment:
|
||||||
@ -43,6 +43,7 @@ class TOC(list):
|
|||||||
self.type = type
|
self.type = type
|
||||||
self.author = author
|
self.author = author
|
||||||
self.description = description
|
self.description = description
|
||||||
|
self.toc_thumbnail = toc_thumbnail
|
||||||
|
|
||||||
def __str__(self):
|
def __str__(self):
|
||||||
lines = ['TOC: %s#%s'%(self.href, self.fragment)]
|
lines = ['TOC: %s#%s'%(self.href, self.fragment)]
|
||||||
@ -72,12 +73,12 @@ class TOC(list):
|
|||||||
entry.parent = None
|
entry.parent = None
|
||||||
|
|
||||||
def add_item(self, href, fragment, text, play_order=None, type='unknown',
|
def add_item(self, href, fragment, text, play_order=None, type='unknown',
|
||||||
author=None, description=None):
|
author=None, description=None, toc_thumbnail=None):
|
||||||
if play_order is None:
|
if play_order is None:
|
||||||
play_order = (self[-1].play_order if len(self) else self.play_order) + 1
|
play_order = (self[-1].play_order if len(self) else self.play_order) + 1
|
||||||
self.append(TOC(href=href, fragment=fragment, text=text, parent=self,
|
self.append(TOC(href=href, fragment=fragment, text=text, parent=self,
|
||||||
base_path=self.base_path, play_order=play_order,
|
base_path=self.base_path, play_order=play_order,
|
||||||
type=type, author=author, description=description))
|
type=type, author=author, description=description, toc_thumbnail=toc_thumbnail))
|
||||||
return self[-1]
|
return self[-1]
|
||||||
|
|
||||||
def top_level_items(self):
|
def top_level_items(self):
|
||||||
@ -269,6 +270,9 @@ class TOC(list):
|
|||||||
if desc:
|
if desc:
|
||||||
desc = re.sub(r'\s+', ' ', desc)
|
desc = re.sub(r'\s+', ' ', desc)
|
||||||
elem.append(C.meta(desc, name='description'))
|
elem.append(C.meta(desc, name='description'))
|
||||||
|
idx = getattr(np, 'toc_thumbnail', None)
|
||||||
|
if idx:
|
||||||
|
elem.append(C.meta(idx, name='toc_thumbnail'))
|
||||||
parent.append(elem)
|
parent.append(elem)
|
||||||
for np2 in np:
|
for np2 in np:
|
||||||
navpoint(elem, np2)
|
navpoint(elem, np2)
|
||||||
|
@ -656,11 +656,11 @@ class Tag(object): # {{{
|
|||||||
' image record associated with this article',
|
' image record associated with this article',
|
||||||
'image_index'),
|
'image_index'),
|
||||||
70 : ('Description offset in cncx', 'desc_offset'),
|
70 : ('Description offset in cncx', 'desc_offset'),
|
||||||
71 : ('Image attribution offset in cncx',
|
71 : ('Author offset in cncx', 'author_offset'),
|
||||||
'image_attr_offset'),
|
|
||||||
72 : ('Image caption offset in cncx',
|
72 : ('Image caption offset in cncx',
|
||||||
'image_caption_offset'),
|
'image_caption_offset'),
|
||||||
73 : ('Author offset in cncx', 'author_offset'),
|
73 : ('Image attribution offset in cncx',
|
||||||
|
'image_attr_offset'),
|
||||||
},
|
},
|
||||||
|
|
||||||
'chapter_with_subchapters' : {
|
'chapter_with_subchapters' : {
|
||||||
|
@ -973,7 +973,8 @@ class MobiReader(object):
|
|||||||
continue
|
continue
|
||||||
processed_records.append(i)
|
processed_records.append(i)
|
||||||
data = self.sections[i][0]
|
data = self.sections[i][0]
|
||||||
if data[:4] in (b'FLIS', b'FCIS', b'SRCS', b'\xe9\x8e\r\n'):
|
if data[:4] in {b'FLIS', b'FCIS', b'SRCS', b'\xe9\x8e\r\n',
|
||||||
|
b'RESC', b'BOUN', b'FDST', b'DATP'}:
|
||||||
# A FLIS, FCIS, SRCS or EOF record, ignore
|
# A FLIS, FCIS, SRCS or EOF record, ignore
|
||||||
continue
|
continue
|
||||||
buf = cStringIO.StringIO(data)
|
buf = cStringIO.StringIO(data)
|
||||||
|
@ -136,7 +136,8 @@ class IndexEntry(object):
|
|||||||
'last_child_index': 23,
|
'last_child_index': 23,
|
||||||
'image_index': 69,
|
'image_index': 69,
|
||||||
'desc_offset': 70,
|
'desc_offset': 70,
|
||||||
'author_offset': 73,
|
'author_offset': 71,
|
||||||
|
|
||||||
}
|
}
|
||||||
RTAG_MAP = {v:k for k, v in TAG_VALUES.iteritems()}
|
RTAG_MAP = {v:k for k, v in TAG_VALUES.iteritems()}
|
||||||
|
|
||||||
@ -754,6 +755,13 @@ class Indexer(object): # {{{
|
|||||||
normalized_articles.append(article)
|
normalized_articles.append(article)
|
||||||
article.author_offset = self.cncx[art.author]
|
article.author_offset = self.cncx[art.author]
|
||||||
article.desc_offset = self.cncx[art.description]
|
article.desc_offset = self.cncx[art.description]
|
||||||
|
if getattr(art, 'toc_thumbnail', None) is not None:
|
||||||
|
try:
|
||||||
|
ii = self.serializer.images[art.toc_thumbnail] - 1
|
||||||
|
if ii > -1:
|
||||||
|
article.image_index = ii
|
||||||
|
except KeyError:
|
||||||
|
pass # Image not found in serializer
|
||||||
|
|
||||||
if normalized_articles:
|
if normalized_articles:
|
||||||
normalized_articles.sort(key=lambda x:x.offset)
|
normalized_articles.sort(key=lambda x:x.offset)
|
||||||
|
@ -161,7 +161,7 @@ class MobiWriter(object):
|
|||||||
index = 1
|
index = 1
|
||||||
|
|
||||||
mh_href = None
|
mh_href = None
|
||||||
if 'masthead' in oeb.guide:
|
if 'masthead' in oeb.guide and oeb.guide['masthead'].href:
|
||||||
mh_href = oeb.guide['masthead'].href
|
mh_href = oeb.guide['masthead'].href
|
||||||
self.image_records.append(None)
|
self.image_records.append(None)
|
||||||
index += 1
|
index += 1
|
||||||
|
@ -178,7 +178,11 @@ class Serializer(object):
|
|||||||
at the end.
|
at the end.
|
||||||
'''
|
'''
|
||||||
hrefs = self.oeb.manifest.hrefs
|
hrefs = self.oeb.manifest.hrefs
|
||||||
path, frag = urldefrag(urlnormalize(href))
|
try:
|
||||||
|
path, frag = urldefrag(urlnormalize(href))
|
||||||
|
except ValueError:
|
||||||
|
# Unparseable URL
|
||||||
|
return False
|
||||||
if path and base:
|
if path and base:
|
||||||
path = base.abshref(path)
|
path = base.abshref(path)
|
||||||
if path and path not in hrefs:
|
if path and path not in hrefs:
|
||||||
|
@ -16,15 +16,13 @@ from urllib import unquote as urlunquote
|
|||||||
from lxml import etree, html
|
from lxml import etree, html
|
||||||
from calibre.constants import filesystem_encoding, __version__
|
from calibre.constants import filesystem_encoding, __version__
|
||||||
from calibre.translations.dynamic import translate
|
from calibre.translations.dynamic import translate
|
||||||
from calibre.ebooks.chardet import xml_to_unicode, strip_encoding_declarations
|
from calibre.ebooks.chardet import xml_to_unicode
|
||||||
from calibre.ebooks.oeb.entitydefs import ENTITYDEFS
|
|
||||||
from calibre.ebooks.conversion.preprocess import CSSPreProcessor
|
from calibre.ebooks.conversion.preprocess import CSSPreProcessor
|
||||||
from calibre import isbytestring, as_unicode, get_types_map
|
from calibre import (isbytestring, as_unicode, get_types_map)
|
||||||
|
from calibre.ebooks.oeb.parse_utils import (barename, XHTML_NS, RECOVER_PARSER,
|
||||||
RECOVER_PARSER = etree.XMLParser(recover=True, no_network=True)
|
namespace, XHTML, parse_html, NotHTML)
|
||||||
|
|
||||||
XML_NS = 'http://www.w3.org/XML/1998/namespace'
|
XML_NS = 'http://www.w3.org/XML/1998/namespace'
|
||||||
XHTML_NS = 'http://www.w3.org/1999/xhtml'
|
|
||||||
OEB_DOC_NS = 'http://openebook.org/namespaces/oeb-document/1.0/'
|
OEB_DOC_NS = 'http://openebook.org/namespaces/oeb-document/1.0/'
|
||||||
OPF1_NS = 'http://openebook.org/namespaces/oeb-package/1.0/'
|
OPF1_NS = 'http://openebook.org/namespaces/oeb-package/1.0/'
|
||||||
OPF2_NS = 'http://www.idpf.org/2007/opf'
|
OPF2_NS = 'http://www.idpf.org/2007/opf'
|
||||||
@ -55,9 +53,6 @@ OPF2_NSMAP = {'opf': OPF2_NS, 'dc': DC11_NS, 'dcterms': DCTERMS_NS,
|
|||||||
def XML(name):
|
def XML(name):
|
||||||
return '{%s}%s' % (XML_NS, name)
|
return '{%s}%s' % (XML_NS, name)
|
||||||
|
|
||||||
def XHTML(name):
|
|
||||||
return '{%s}%s' % (XHTML_NS, name)
|
|
||||||
|
|
||||||
def OPF(name):
|
def OPF(name):
|
||||||
return '{%s}%s' % (OPF2_NS, name)
|
return '{%s}%s' % (OPF2_NS, name)
|
||||||
|
|
||||||
@ -279,22 +274,11 @@ PREFIXNAME_RE = re.compile(r'^[^:]+[:][^:]+')
|
|||||||
XMLDECL_RE = re.compile(r'^\s*<[?]xml.*?[?]>')
|
XMLDECL_RE = re.compile(r'^\s*<[?]xml.*?[?]>')
|
||||||
CSSURL_RE = re.compile(r'''url[(](?P<q>["']?)(?P<url>[^)]+)(?P=q)[)]''')
|
CSSURL_RE = re.compile(r'''url[(](?P<q>["']?)(?P<url>[^)]+)(?P=q)[)]''')
|
||||||
|
|
||||||
|
|
||||||
def element(parent, *args, **kwargs):
|
def element(parent, *args, **kwargs):
|
||||||
if parent is not None:
|
if parent is not None:
|
||||||
return etree.SubElement(parent, *args, **kwargs)
|
return etree.SubElement(parent, *args, **kwargs)
|
||||||
return etree.Element(*args, **kwargs)
|
return etree.Element(*args, **kwargs)
|
||||||
|
|
||||||
def namespace(name):
|
|
||||||
if '}' in name:
|
|
||||||
return name.split('}', 1)[0][1:]
|
|
||||||
return ''
|
|
||||||
|
|
||||||
def barename(name):
|
|
||||||
if '}' in name:
|
|
||||||
return name.split('}', 1)[1]
|
|
||||||
return name
|
|
||||||
|
|
||||||
def prefixname(name, nsrmap):
|
def prefixname(name, nsrmap):
|
||||||
if not isqname(name):
|
if not isqname(name):
|
||||||
return name
|
return name
|
||||||
@ -373,25 +357,6 @@ def urlnormalize(href):
|
|||||||
parts = (urlquote(part) for part in parts)
|
parts = (urlquote(part) for part in parts)
|
||||||
return urlunparse(parts)
|
return urlunparse(parts)
|
||||||
|
|
||||||
def merge_multiple_html_heads_and_bodies(root, log=None):
|
|
||||||
heads, bodies = xpath(root, '//h:head'), xpath(root, '//h:body')
|
|
||||||
if not (len(heads) > 1 or len(bodies) > 1): return root
|
|
||||||
for child in root: root.remove(child)
|
|
||||||
head = root.makeelement(XHTML('head'))
|
|
||||||
body = root.makeelement(XHTML('body'))
|
|
||||||
for h in heads:
|
|
||||||
for x in h:
|
|
||||||
head.append(x)
|
|
||||||
for b in bodies:
|
|
||||||
for x in b:
|
|
||||||
body.append(x)
|
|
||||||
map(root.append, (head, body))
|
|
||||||
if log is not None:
|
|
||||||
log.warn('Merging multiple <head> and <body> sections')
|
|
||||||
return root
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
class DummyHandler(logging.Handler):
|
class DummyHandler(logging.Handler):
|
||||||
@ -418,10 +383,6 @@ class OEBError(Exception):
|
|||||||
"""Generic OEB-processing error."""
|
"""Generic OEB-processing error."""
|
||||||
pass
|
pass
|
||||||
|
|
||||||
class NotHTML(OEBError):
|
|
||||||
'''Raised when a file that should be HTML (as per manifest) is not'''
|
|
||||||
pass
|
|
||||||
|
|
||||||
class NullContainer(object):
|
class NullContainer(object):
|
||||||
"""An empty container.
|
"""An empty container.
|
||||||
|
|
||||||
@ -801,7 +762,6 @@ class Manifest(object):
|
|||||||
"""
|
"""
|
||||||
|
|
||||||
NUM_RE = re.compile('^(.*)([0-9][0-9.]*)(?=[.]|$)')
|
NUM_RE = re.compile('^(.*)([0-9][0-9.]*)(?=[.]|$)')
|
||||||
META_XP = XPath('/h:html/h:head/h:meta[@http-equiv="Content-Type"]')
|
|
||||||
|
|
||||||
def __init__(self, oeb, id, href, media_type,
|
def __init__(self, oeb, id, href, media_type,
|
||||||
fallback=None, loader=str, data=None):
|
fallback=None, loader=str, data=None):
|
||||||
@ -830,244 +790,17 @@ class Manifest(object):
|
|||||||
return None
|
return None
|
||||||
return etree.fromstring(data, parser=RECOVER_PARSER)
|
return etree.fromstring(data, parser=RECOVER_PARSER)
|
||||||
|
|
||||||
def clean_word_doc(self, data):
|
|
||||||
prefixes = []
|
|
||||||
for match in re.finditer(r'xmlns:(\S+?)=".*?microsoft.*?"', data):
|
|
||||||
prefixes.append(match.group(1))
|
|
||||||
if prefixes:
|
|
||||||
self.oeb.log.warn('Found microsoft markup, cleaning...')
|
|
||||||
# Remove empty tags as they are not rendered by browsers
|
|
||||||
# but can become renderable HTML tags like <p/> if the
|
|
||||||
# document is parsed by an HTML parser
|
|
||||||
pat = re.compile(
|
|
||||||
r'<(%s):([a-zA-Z0-9]+)[^>/]*?></\1:\2>'%('|'.join(prefixes)),
|
|
||||||
re.DOTALL)
|
|
||||||
data = pat.sub('', data)
|
|
||||||
pat = re.compile(
|
|
||||||
r'<(%s):([a-zA-Z0-9]+)[^>/]*?/>'%('|'.join(prefixes)))
|
|
||||||
data = pat.sub('', data)
|
|
||||||
return data
|
|
||||||
|
|
||||||
def _parse_xhtml(self, data):
|
def _parse_xhtml(self, data):
|
||||||
orig_data = data
|
orig_data = data
|
||||||
self.oeb.log.debug('Parsing', self.href, '...')
|
fname = urlunquote(self.href)
|
||||||
# Convert to Unicode and normalize line endings
|
self.oeb.log.debug('Parsing', fname, '...')
|
||||||
data = self.oeb.decode(data)
|
|
||||||
data = strip_encoding_declarations(data)
|
|
||||||
data = self.oeb.html_preprocessor(data)
|
|
||||||
# There could be null bytes in data if it had � entities in it
|
|
||||||
data = data.replace('\0', '')
|
|
||||||
|
|
||||||
# Remove DOCTYPE declaration as it messes up parsing
|
|
||||||
# In particular, it causes tostring to insert xmlns
|
|
||||||
# declarations, which messes up the coercing logic
|
|
||||||
idx = data.find('<html')
|
|
||||||
if idx == -1:
|
|
||||||
idx = data.find('<HTML')
|
|
||||||
if idx > -1:
|
|
||||||
pre = data[:idx]
|
|
||||||
data = data[idx:]
|
|
||||||
if '<!DOCTYPE' in pre:
|
|
||||||
user_entities = {}
|
|
||||||
for match in re.finditer(r'<!ENTITY\s+(\S+)\s+([^>]+)', pre):
|
|
||||||
val = match.group(2)
|
|
||||||
if val.startswith('"') and val.endswith('"'):
|
|
||||||
val = val[1:-1]
|
|
||||||
user_entities[match.group(1)] = val
|
|
||||||
if user_entities:
|
|
||||||
pat = re.compile(r'&(%s);'%('|'.join(user_entities.keys())))
|
|
||||||
data = pat.sub(lambda m:user_entities[m.group(1)], data)
|
|
||||||
|
|
||||||
# Setting huge_tree=True causes crashes in windows with large files
|
|
||||||
parser = etree.XMLParser(no_network=True)
|
|
||||||
# Try with more & more drastic measures to parse
|
|
||||||
def first_pass(data):
|
|
||||||
try:
|
|
||||||
data = etree.fromstring(data, parser=parser)
|
|
||||||
except etree.XMLSyntaxError as err:
|
|
||||||
self.oeb.log.debug('Initial parse failed, using more'
|
|
||||||
' forgiving parsers')
|
|
||||||
repl = lambda m: ENTITYDEFS.get(m.group(1), m.group(0))
|
|
||||||
data = ENTITY_RE.sub(repl, data)
|
|
||||||
try:
|
|
||||||
data = etree.fromstring(data, parser=parser)
|
|
||||||
except etree.XMLSyntaxError as err:
|
|
||||||
self.oeb.logger.warn('Parsing file %r as HTML' % self.href)
|
|
||||||
if err.args and err.args[0].startswith('Excessive depth'):
|
|
||||||
from calibre.utils.soupparser import fromstring
|
|
||||||
data = fromstring(data)
|
|
||||||
else:
|
|
||||||
data = html.fromstring(data)
|
|
||||||
data.attrib.pop('xmlns', None)
|
|
||||||
for elem in data.iter(tag=etree.Comment):
|
|
||||||
if elem.text:
|
|
||||||
elem.text = elem.text.strip('-')
|
|
||||||
data = etree.tostring(data, encoding=unicode)
|
|
||||||
try:
|
|
||||||
data = etree.fromstring(data, parser=parser)
|
|
||||||
except etree.XMLSyntaxError:
|
|
||||||
data = etree.fromstring(data, parser=RECOVER_PARSER)
|
|
||||||
return data
|
|
||||||
try:
|
try:
|
||||||
data = self.clean_word_doc(data)
|
data = parse_html(data, log=self.oeb.log,
|
||||||
except:
|
decoder=self.oeb.decode,
|
||||||
pass
|
preprocessor=self.oeb.html_preprocessor,
|
||||||
data = first_pass(data)
|
filename=fname, non_html_file_tags={'ncx'})
|
||||||
|
except NotHTML:
|
||||||
if data.tag == 'HTML':
|
return self._parse_xml(orig_data)
|
||||||
# Lower case all tag and attribute names
|
|
||||||
data.tag = data.tag.lower()
|
|
||||||
for x in data.iterdescendants():
|
|
||||||
try:
|
|
||||||
x.tag = x.tag.lower()
|
|
||||||
for key, val in list(x.attrib.iteritems()):
|
|
||||||
del x.attrib[key]
|
|
||||||
key = key.lower()
|
|
||||||
x.attrib[key] = val
|
|
||||||
except:
|
|
||||||
pass
|
|
||||||
|
|
||||||
# Handle weird (non-HTML/fragment) files
|
|
||||||
if barename(data.tag) != 'html':
|
|
||||||
if barename(data.tag) == 'ncx':
|
|
||||||
return self._parse_xml(orig_data)
|
|
||||||
self.oeb.log.warn('File %r does not appear to be (X)HTML'%self.href)
|
|
||||||
nroot = etree.fromstring('<html></html>')
|
|
||||||
has_body = False
|
|
||||||
for child in list(data):
|
|
||||||
if isinstance(child.tag, (unicode, str)) and barename(child.tag) == 'body':
|
|
||||||
has_body = True
|
|
||||||
break
|
|
||||||
parent = nroot
|
|
||||||
if not has_body:
|
|
||||||
self.oeb.log.warn('File %r appears to be a HTML fragment'%self.href)
|
|
||||||
nroot = etree.fromstring('<html><body/></html>')
|
|
||||||
parent = nroot[0]
|
|
||||||
for child in list(data.iter()):
|
|
||||||
oparent = child.getparent()
|
|
||||||
if oparent is not None:
|
|
||||||
oparent.remove(child)
|
|
||||||
parent.append(child)
|
|
||||||
data = nroot
|
|
||||||
|
|
||||||
|
|
||||||
# Force into the XHTML namespace
|
|
||||||
if not namespace(data.tag):
|
|
||||||
self.oeb.log.warn('Forcing', self.href, 'into XHTML namespace')
|
|
||||||
data.attrib['xmlns'] = XHTML_NS
|
|
||||||
data = etree.tostring(data, encoding=unicode)
|
|
||||||
|
|
||||||
try:
|
|
||||||
data = etree.fromstring(data, parser=parser)
|
|
||||||
except:
|
|
||||||
data = data.replace(':=', '=').replace(':>', '>')
|
|
||||||
data = data.replace('<http:/>', '')
|
|
||||||
try:
|
|
||||||
data = etree.fromstring(data, parser=parser)
|
|
||||||
except etree.XMLSyntaxError:
|
|
||||||
self.oeb.logger.warn('Stripping comments from %s'%
|
|
||||||
self.href)
|
|
||||||
data = re.compile(r'<!--.*?-->', re.DOTALL).sub('',
|
|
||||||
data)
|
|
||||||
data = data.replace(
|
|
||||||
"<?xml version='1.0' encoding='utf-8'?><o:p></o:p>",
|
|
||||||
'')
|
|
||||||
data = data.replace("<?xml version='1.0' encoding='utf-8'??>", '')
|
|
||||||
try:
|
|
||||||
data = etree.fromstring(data,
|
|
||||||
parser=RECOVER_PARSER)
|
|
||||||
except etree.XMLSyntaxError:
|
|
||||||
self.oeb.logger.warn('Stripping meta tags from %s'%
|
|
||||||
self.href)
|
|
||||||
data = re.sub(r'<meta\s+[^>]+?>', '', data)
|
|
||||||
data = etree.fromstring(data, parser=RECOVER_PARSER)
|
|
||||||
elif namespace(data.tag) != XHTML_NS:
|
|
||||||
# OEB_DOC_NS, but possibly others
|
|
||||||
ns = namespace(data.tag)
|
|
||||||
attrib = dict(data.attrib)
|
|
||||||
nroot = etree.Element(XHTML('html'),
|
|
||||||
nsmap={None: XHTML_NS}, attrib=attrib)
|
|
||||||
for elem in data.iterdescendants():
|
|
||||||
if isinstance(elem.tag, basestring) and \
|
|
||||||
namespace(elem.tag) == ns:
|
|
||||||
elem.tag = XHTML(barename(elem.tag))
|
|
||||||
for elem in data:
|
|
||||||
nroot.append(elem)
|
|
||||||
data = nroot
|
|
||||||
|
|
||||||
data = merge_multiple_html_heads_and_bodies(data, self.oeb.logger)
|
|
||||||
# Ensure has a <head/>
|
|
||||||
head = xpath(data, '/h:html/h:head')
|
|
||||||
head = head[0] if head else None
|
|
||||||
if head is None:
|
|
||||||
self.oeb.logger.warn(
|
|
||||||
'File %r missing <head/> element' % self.href)
|
|
||||||
head = etree.Element(XHTML('head'))
|
|
||||||
data.insert(0, head)
|
|
||||||
title = etree.SubElement(head, XHTML('title'))
|
|
||||||
title.text = self.oeb.translate(__('Unknown'))
|
|
||||||
elif not xpath(data, '/h:html/h:head/h:title'):
|
|
||||||
self.oeb.logger.warn(
|
|
||||||
'File %r missing <title/> element' % self.href)
|
|
||||||
title = etree.SubElement(head, XHTML('title'))
|
|
||||||
title.text = self.oeb.translate(__('Unknown'))
|
|
||||||
# Remove any encoding-specifying <meta/> elements
|
|
||||||
for meta in self.META_XP(data):
|
|
||||||
meta.getparent().remove(meta)
|
|
||||||
etree.SubElement(head, XHTML('meta'),
|
|
||||||
attrib={'http-equiv': 'Content-Type',
|
|
||||||
'content': '%s; charset=utf-8' % XHTML_NS})
|
|
||||||
# Ensure has a <body/>
|
|
||||||
if not xpath(data, '/h:html/h:body'):
|
|
||||||
body = xpath(data, '//h:body')
|
|
||||||
if body:
|
|
||||||
body = body[0]
|
|
||||||
body.getparent().remove(body)
|
|
||||||
data.append(body)
|
|
||||||
else:
|
|
||||||
self.oeb.logger.warn(
|
|
||||||
'File %r missing <body/> element' % self.href)
|
|
||||||
etree.SubElement(data, XHTML('body'))
|
|
||||||
|
|
||||||
# Remove microsoft office markup
|
|
||||||
r = [x for x in data.iterdescendants(etree.Element) if 'microsoft-com' in x.tag]
|
|
||||||
for x in r:
|
|
||||||
x.tag = XHTML('span')
|
|
||||||
|
|
||||||
# Remove lang redefinition inserted by the amazing Microsoft Word!
|
|
||||||
body = xpath(data, '/h:html/h:body')[0]
|
|
||||||
for key in list(body.attrib.keys()):
|
|
||||||
if key == 'lang' or key.endswith('}lang'):
|
|
||||||
body.attrib.pop(key)
|
|
||||||
|
|
||||||
def remove_elem(a):
|
|
||||||
p = a.getparent()
|
|
||||||
idx = p.index(a) -1
|
|
||||||
p.remove(a)
|
|
||||||
if a.tail:
|
|
||||||
if idx <= 0:
|
|
||||||
if p.text is None:
|
|
||||||
p.text = ''
|
|
||||||
p.text += a.tail
|
|
||||||
else:
|
|
||||||
if p[idx].tail is None:
|
|
||||||
p[idx].tail = ''
|
|
||||||
p[idx].tail += a.tail
|
|
||||||
|
|
||||||
# Remove hyperlinks with no content as they cause rendering
|
|
||||||
# artifacts in browser based renderers
|
|
||||||
# Also remove empty <b>, <u> and <i> tags
|
|
||||||
for a in xpath(data, '//h:a[@href]|//h:i|//h:b|//h:u'):
|
|
||||||
if a.get('id', None) is None and a.get('name', None) is None \
|
|
||||||
and len(a) == 0 and not a.text:
|
|
||||||
remove_elem(a)
|
|
||||||
|
|
||||||
# Convert <br>s with content into paragraphs as ADE can't handle
|
|
||||||
# them
|
|
||||||
for br in xpath(data, '//h:br'):
|
|
||||||
if len(br) > 0 or br.text:
|
|
||||||
br.tag = XHTML('div')
|
|
||||||
|
|
||||||
return data
|
return data
|
||||||
|
|
||||||
def _parse_txt(self, data):
|
def _parse_txt(self, data):
|
||||||
@ -1629,9 +1362,10 @@ class TOC(object):
|
|||||||
:attr:`id`: Option unique identifier for this node.
|
:attr:`id`: Option unique identifier for this node.
|
||||||
:attr:`author`: Optional author attribution for periodicals <mbp:>
|
:attr:`author`: Optional author attribution for periodicals <mbp:>
|
||||||
:attr:`description`: Optional description attribute for periodicals <mbp:>
|
:attr:`description`: Optional description attribute for periodicals <mbp:>
|
||||||
|
:attr:`toc_thumbnail`: Optional toc thumbnail image
|
||||||
"""
|
"""
|
||||||
def __init__(self, title=None, href=None, klass=None, id=None,
|
def __init__(self, title=None, href=None, klass=None, id=None,
|
||||||
play_order=None, author=None, description=None):
|
play_order=None, author=None, description=None, toc_thumbnail=None):
|
||||||
self.title = title
|
self.title = title
|
||||||
self.href = urlnormalize(href) if href else href
|
self.href = urlnormalize(href) if href else href
|
||||||
self.klass = klass
|
self.klass = klass
|
||||||
@ -1643,10 +1377,11 @@ class TOC(object):
|
|||||||
self.play_order = play_order
|
self.play_order = play_order
|
||||||
self.author = author
|
self.author = author
|
||||||
self.description = description
|
self.description = description
|
||||||
|
self.toc_thumbnail = toc_thumbnail
|
||||||
|
|
||||||
def add(self, title, href, klass=None, id=None, play_order=0, author=None, description=None):
|
def add(self, title, href, klass=None, id=None, play_order=0, author=None, description=None, toc_thumbnail=None):
|
||||||
"""Create and return a new sub-node of this node."""
|
"""Create and return a new sub-node of this node."""
|
||||||
node = TOC(title, href, klass, id, play_order, author, description)
|
node = TOC(title, href, klass, id, play_order, author, description, toc_thumbnail)
|
||||||
self.nodes.append(node)
|
self.nodes.append(node)
|
||||||
return node
|
return node
|
||||||
|
|
||||||
|
225
src/calibre/ebooks/oeb/display/cfi.coffee
Normal file
@ -0,0 +1,225 @@
|
|||||||
|
#!/usr/bin/env coffee
|
||||||
|
# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai
|
||||||
|
|
||||||
|
###
|
||||||
|
Copyright 2011, Kovid Goyal <kovid@kovidgoyal.net>
|
||||||
|
Released under the GPLv3 License
|
||||||
|
###
|
||||||
|
#
|
||||||
|
log = (error) ->
|
||||||
|
if error
|
||||||
|
if window?.console?.log
|
||||||
|
window.console.log(error)
|
||||||
|
else if process?.stdout?.write
|
||||||
|
process.stdout.write(error + '\n')
|
||||||
|
|
||||||
|
# CFI escaping {{{
|
||||||
|
escape_for_cfi = (raw) ->
|
||||||
|
if raw
|
||||||
|
for c in ['^', '[', ']', ',', '(', ')', ';', '~', '@', '-', '!']
|
||||||
|
raw = raw.replace(c, '^'+c)
|
||||||
|
raw
|
||||||
|
|
||||||
|
unescape_from_cfi = (raw) ->
|
||||||
|
ans = raw
|
||||||
|
if raw
|
||||||
|
dropped = false
|
||||||
|
ans = []
|
||||||
|
for c in raw
|
||||||
|
if not dropped and c == '^'
|
||||||
|
dropped = true
|
||||||
|
continue
|
||||||
|
dropped = false
|
||||||
|
ans.push(c)
|
||||||
|
ans = ans.join('')
|
||||||
|
ans
|
||||||
|
# }}}
|
||||||
|
|
||||||
|
fstr = (d) -> # {{{
|
||||||
|
# Convert a timestamp floating point number to a string
|
||||||
|
ans = ""
|
||||||
|
if ( d < 0 )
|
||||||
|
ans = "-"
|
||||||
|
d = -d
|
||||||
|
n = Math.floor(d)
|
||||||
|
ans += n
|
||||||
|
n = Math.round((d-n)*100)
|
||||||
|
if( n != 0 )
|
||||||
|
ans += "."
|
||||||
|
ans += if (n % 10 == 0) then (n/10) else n
|
||||||
|
ans
|
||||||
|
# }}}
|
||||||
|
|
||||||
|
class CanonicalFragmentIdentifier
|
||||||
|
|
||||||
|
# This class is a namespace to expose CFI functions via the window.cfi
|
||||||
|
# object
|
||||||
|
|
||||||
|
constructor: () ->
|
||||||
|
|
||||||
|
encode: (doc, node, offset, tail) -> # {{{
|
||||||
|
cfi = tail or ""
|
||||||
|
|
||||||
|
# Handle the offset, if any
|
||||||
|
switch node.nodeType
|
||||||
|
when 1 # Element node
|
||||||
|
if typeoff(offset) == 'number'
|
||||||
|
node = node.childNodes.item(offset)
|
||||||
|
when 3, 4, 5, 6 # Text/entity/CDATA node
|
||||||
|
offset or= 0
|
||||||
|
while true
|
||||||
|
p = node.previousSibling
|
||||||
|
if (p?.nodeType not in [3, 4, 5, 6])
|
||||||
|
break
|
||||||
|
offset += p.nodeValue.length
|
||||||
|
node = p
|
||||||
|
cfi = ":" + offset + cfi
|
||||||
|
else # Not handled
|
||||||
|
log("Offsets for nodes of type #{ node.nodeType } are not handled")
|
||||||
|
|
||||||
|
# Construct the path to node from root
|
||||||
|
until node == doc
|
||||||
|
p = node.parentNode
|
||||||
|
if not p
|
||||||
|
if node.nodeType == 9 # Document node (iframe)
|
||||||
|
win = node.defaultView
|
||||||
|
if win.frameElement
|
||||||
|
node = win.frameElement
|
||||||
|
cfi = "!" + cfi
|
||||||
|
continue
|
||||||
|
break
|
||||||
|
# Increase index by the length of all previous sibling text nodes
|
||||||
|
index = 0
|
||||||
|
child = p.firstChild
|
||||||
|
while true
|
||||||
|
index |= 1
|
||||||
|
if child.nodeType in [1, 7]
|
||||||
|
index++
|
||||||
|
if child == node
|
||||||
|
break
|
||||||
|
child = child.nextSibling
|
||||||
|
|
||||||
|
# Add id assertions for robustness where possible
|
||||||
|
id = node.getAttribute?('id')
|
||||||
|
idspec = if id then "[#{ escape_for_cfi(id) }]" else ''
|
||||||
|
cfi = '/' + index + idspec + cfi
|
||||||
|
node = p
|
||||||
|
|
||||||
|
cfi
|
||||||
|
# }}}
|
||||||
|
|
||||||
|
decode: (cfi, doc=window?.document) -> # {{{
|
||||||
|
simple_node_regex = ///
|
||||||
|
^/(\d+) # The node count
|
||||||
|
(\[[^\]]*\])? # The optional id assertion
|
||||||
|
///
|
||||||
|
error = null
|
||||||
|
node = doc
|
||||||
|
|
||||||
|
until cfi.length <= 0 or error
|
||||||
|
if ( (r = cfi.match(simple_node_regex)) is not null ) # Path step
|
||||||
|
target = parseInt(r[1])
|
||||||
|
assertion = r[2]
|
||||||
|
if assertion
|
||||||
|
assertion = unescape_from_cfi(assertion.slice(1, assertion.length-1))
|
||||||
|
index = 0
|
||||||
|
child = node.firstChild
|
||||||
|
|
||||||
|
while true
|
||||||
|
if not child
|
||||||
|
if assertion # Try to use the assertion to find the node
|
||||||
|
child = doc.getElementById(assertion)
|
||||||
|
if child
|
||||||
|
node = child
|
||||||
|
if not child
|
||||||
|
error = "No matching child found for CFI: " + cfi
|
||||||
|
break
|
||||||
|
index |= 1 # Increment index by 1 if it is even
|
||||||
|
if child.nodeType in [1, 7] # We have an element or a PI
|
||||||
|
index++
|
||||||
|
if ( index == target )
|
||||||
|
cfi = cfi.substr(r[0].length)
|
||||||
|
node = child
|
||||||
|
break
|
||||||
|
child = child.nextSibling
|
||||||
|
|
||||||
|
else if cfi[0] == '!' # Indirection
|
||||||
|
if node.contentDocument
|
||||||
|
node = node.contentDocument
|
||||||
|
cfi = cfi.substr(1)
|
||||||
|
else
|
||||||
|
error = "Cannot reference #{ node.nodeName }'s content:" + cfi
|
||||||
|
|
||||||
|
else
|
||||||
|
break
|
||||||
|
|
||||||
|
if error
|
||||||
|
log(error)
|
||||||
|
return null
|
||||||
|
|
||||||
|
point = {}
|
||||||
|
error = null
|
||||||
|
|
||||||
|
point
|
||||||
|
|
||||||
|
# }}}
|
||||||
|
|
||||||
|
at: (x, y, doc=window?.document) -> # {{{
|
||||||
|
cdoc = doc
|
||||||
|
target = null
|
||||||
|
cwin = cdoc.defaultView
|
||||||
|
tail = ''
|
||||||
|
offset = null
|
||||||
|
name = null
|
||||||
|
|
||||||
|
# Drill down into iframes, etc.
|
||||||
|
while true
|
||||||
|
target = cdoc.elementFromPoint x, y
|
||||||
|
if not target or target.localName == 'html'
|
||||||
|
log("No element at (#{ x }, #{ y })")
|
||||||
|
return null
|
||||||
|
|
||||||
|
name = target.localName
|
||||||
|
if name not in ['iframe', 'embed', 'object']
|
||||||
|
break
|
||||||
|
|
||||||
|
cd = target.contentDocument
|
||||||
|
if not cd
|
||||||
|
break
|
||||||
|
|
||||||
|
x = x + cwin.pageXOffset - target.offsetLeft
|
||||||
|
y = y + cwin.pageYOffset - target.offsetTop
|
||||||
|
cdoc = cd
|
||||||
|
cwin = cdoc.defaultView
|
||||||
|
|
||||||
|
target.normalize()
|
||||||
|
|
||||||
|
if name in ['audio', 'video']
|
||||||
|
tail = "~" + fstr target.currentTime
|
||||||
|
|
||||||
|
if name in ['img', 'video']
|
||||||
|
px = ((x + cwin.scrollX - target.offsetLeft)*100)/target.offsetWidth
|
||||||
|
py = ((y + cwin.scrollY - target.offsetTop)*100)/target.offsetHeight
|
||||||
|
tail = "#{ tail }@#{ fstr px },#{ fstr py }"
|
||||||
|
else if name != 'audio'
|
||||||
|
if cdoc.caretRangeFromPoint # WebKit
|
||||||
|
range = cdoc.caretRangeFromPoint(x, y)
|
||||||
|
if range
|
||||||
|
target = range.startContainer
|
||||||
|
offset = range.startOffset
|
||||||
|
else
|
||||||
|
# TODO: implement a span bisection algorithm for UAs
|
||||||
|
# without caretRangeFromPoint (Gecko, IE)
|
||||||
|
|
||||||
|
this.encode(doc, target, offset, tail)
|
||||||
|
# }}}
|
||||||
|
|
||||||
|
if window?
|
||||||
|
window.cfi = new CanonicalFragmentIdentifier()
|
||||||
|
else if process?
|
||||||
|
# Some debugging code goes here to be run with the coffee interpreter
|
||||||
|
cfi = new CanonicalFragmentIdentifier()
|
||||||
|
t = 'a^!,1'
|
||||||
|
log(t)
|
||||||
|
log(escape_for_cfi(t))
|
||||||
|
log(unescape_from_cfi(escape_for_cfi(t)))
|
24
src/calibre/ebooks/oeb/display/test/cfi-test.coffee
Normal file
@ -0,0 +1,24 @@
|
|||||||
|
#!/usr/bin/env coffee
|
||||||
|
# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai
|
||||||
|
|
||||||
|
###
|
||||||
|
Copyright 2011, Kovid Goyal <kovid@kovidgoyal.net>
|
||||||
|
Released under the GPLv3 License
|
||||||
|
###
|
||||||
|
|
||||||
|
viewport_top = (node) ->
|
||||||
|
$(node).offset().top - window.pageYOffset
|
||||||
|
|
||||||
|
viewport_left = (node) ->
|
||||||
|
$(node).offset().left - window.pageXOffset
|
||||||
|
|
||||||
|
window.onload = ->
|
||||||
|
h1 = document.getElementsByTagName('h1')[0]
|
||||||
|
x = h1.scrollLeft + 150
|
||||||
|
y = viewport_top(h1) + h1.offsetHeight/2
|
||||||
|
e = document.elementFromPoint x, y
|
||||||
|
if e.getAttribute('id') != 'first-h1'
|
||||||
|
alert 'Failed to find top h1'
|
||||||
|
return
|
||||||
|
alert window.cfi.at x, y
|
||||||
|
|
14
src/calibre/ebooks/oeb/display/test/test.html
Normal file
@ -0,0 +1,14 @@
|
|||||||
|
<!DOCTYPE html>
|
||||||
|
<html>
|
||||||
|
<head>
|
||||||
|
<title>Testing CFI functionality</title>
|
||||||
|
<script type="text/javascript" src="cfi.js"></script>
|
||||||
|
<script type="text/javascript" src="jquery.js"></script>
|
||||||
|
<script type="text/javascript" src="cfi-test.js"></script>
|
||||||
|
</head>
|
||||||
|
<body>
|
||||||
|
<h1 id="first-h1" style="border: solid 1px red">Testing CFI functionality</h1>
|
||||||
|
</body>
|
||||||
|
</html>
|
||||||
|
|
||||||
|
|
26
src/calibre/ebooks/oeb/display/test/test.py
Normal file
@ -0,0 +1,26 @@
|
|||||||
|
#!/usr/bin/env python
|
||||||
|
# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai
|
||||||
|
from __future__ import (unicode_literals, division, absolute_import,
|
||||||
|
print_function)
|
||||||
|
|
||||||
|
__license__ = 'GPL v3'
|
||||||
|
__copyright__ = '2011, Kovid Goyal <kovid@kovidgoyal.net>'
|
||||||
|
__docformat__ = 'restructuredtext en'
|
||||||
|
|
||||||
|
import os
|
||||||
|
|
||||||
|
try:
|
||||||
|
from calibre.utils.coffeescript import serve
|
||||||
|
except ImportError:
|
||||||
|
import init_calibre
|
||||||
|
if False: init_calibre, serve
|
||||||
|
from calibre.utils.coffeescript import serve
|
||||||
|
|
||||||
|
|
||||||
|
def run_devel_server():
|
||||||
|
os.chdir(os.path.dirname(__file__))
|
||||||
|
serve(['../cfi.coffee', 'cfi-test.coffee'])
|
||||||
|
|
||||||
|
if __name__ == '__main__':
|
||||||
|
run_devel_server()
|
||||||
|
|
@ -1,256 +0,0 @@
|
|||||||
"""
|
|
||||||
Replacement for htmlentitydefs which uses purely numeric entities.
|
|
||||||
"""
|
|
||||||
|
|
||||||
__license__ = 'GPL v3'
|
|
||||||
__copyright__ = '2008, Marshall T. Vandegrift <llasram@gmail.com>'
|
|
||||||
|
|
||||||
ENTITYDEFS = \
|
|
||||||
{'AElig': 'Æ',
|
|
||||||
'Aacute': 'Á',
|
|
||||||
'Acirc': 'Â',
|
|
||||||
'Agrave': 'À',
|
|
||||||
'Alpha': 'Α',
|
|
||||||
'Aring': 'Å',
|
|
||||||
'Atilde': 'Ã',
|
|
||||||
'Auml': 'Ä',
|
|
||||||
'Beta': 'Β',
|
|
||||||
'Ccedil': 'Ç',
|
|
||||||
'Chi': 'Χ',
|
|
||||||
'Dagger': '‡',
|
|
||||||
'Delta': 'Δ',
|
|
||||||
'ETH': 'Ð',
|
|
||||||
'Eacute': 'É',
|
|
||||||
'Ecirc': 'Ê',
|
|
||||||
'Egrave': 'È',
|
|
||||||
'Epsilon': 'Ε',
|
|
||||||
'Eta': 'Η',
|
|
||||||
'Euml': 'Ë',
|
|
||||||
'Gamma': 'Γ',
|
|
||||||
'Iacute': 'Í',
|
|
||||||
'Icirc': 'Î',
|
|
||||||
'Igrave': 'Ì',
|
|
||||||
'Iota': 'Ι',
|
|
||||||
'Iuml': 'Ï',
|
|
||||||
'Kappa': 'Κ',
|
|
||||||
'Lambda': 'Λ',
|
|
||||||
'Mu': 'Μ',
|
|
||||||
'Ntilde': 'Ñ',
|
|
||||||
'Nu': 'Ν',
|
|
||||||
'OElig': 'Œ',
|
|
||||||
'Oacute': 'Ó',
|
|
||||||
'Ocirc': 'Ô',
|
|
||||||
'Ograve': 'Ò',
|
|
||||||
'Omega': 'Ω',
|
|
||||||
'Omicron': 'Ο',
|
|
||||||
'Oslash': 'Ø',
|
|
||||||
'Otilde': 'Õ',
|
|
||||||
'Ouml': 'Ö',
|
|
||||||
'Phi': 'Φ',
|
|
||||||
'Pi': 'Π',
|
|
||||||
'Prime': '″',
|
|
||||||
'Psi': 'Ψ',
|
|
||||||
'Rho': 'Ρ',
|
|
||||||
'Scaron': 'Š',
|
|
||||||
'Sigma': 'Σ',
|
|
||||||
'THORN': 'Þ',
|
|
||||||
'Tau': 'Τ',
|
|
||||||
'Theta': 'Θ',
|
|
||||||
'Uacute': 'Ú',
|
|
||||||
'Ucirc': 'Û',
|
|
||||||
'Ugrave': 'Ù',
|
|
||||||
'Upsilon': 'Υ',
|
|
||||||
'Uuml': 'Ü',
|
|
||||||
'Xi': 'Ξ',
|
|
||||||
'Yacute': 'Ý',
|
|
||||||
'Yuml': 'Ÿ',
|
|
||||||
'Zeta': 'Ζ',
|
|
||||||
'aacute': 'á',
|
|
||||||
'acirc': 'â',
|
|
||||||
'acute': '´',
|
|
||||||
'aelig': 'æ',
|
|
||||||
'agrave': 'à',
|
|
||||||
'alefsym': 'ℵ',
|
|
||||||
'alpha': 'α',
|
|
||||||
'and': '∧',
|
|
||||||
'ang': '∠',
|
|
||||||
'aring': 'å',
|
|
||||||
'asymp': '≈',
|
|
||||||
'atilde': 'ã',
|
|
||||||
'auml': 'ä',
|
|
||||||
'bdquo': '„',
|
|
||||||
'beta': 'β',
|
|
||||||
'brvbar': '¦',
|
|
||||||
'bull': '•',
|
|
||||||
'cap': '∩',
|
|
||||||
'ccedil': 'ç',
|
|
||||||
'cedil': '¸',
|
|
||||||
'cent': '¢',
|
|
||||||
'chi': 'χ',
|
|
||||||
'circ': 'ˆ',
|
|
||||||
'clubs': '♣',
|
|
||||||
'cong': '≅',
|
|
||||||
'copy': '©',
|
|
||||||
'crarr': '↵',
|
|
||||||
'cup': '∪',
|
|
||||||
'curren': '¤',
|
|
||||||
'dArr': '⇓',
|
|
||||||
'dagger': '†',
|
|
||||||
'darr': '↓',
|
|
||||||
'deg': '°',
|
|
||||||
'delta': 'δ',
|
|
||||||
'diams': '♦',
|
|
||||||
'divide': '÷',
|
|
||||||
'eacute': 'é',
|
|
||||||
'ecirc': 'ê',
|
|
||||||
'egrave': 'è',
|
|
||||||
'empty': '∅',
|
|
||||||
'emsp': ' ',
|
|
||||||
'ensp': ' ',
|
|
||||||
'epsilon': 'ε',
|
|
||||||
'equiv': '≡',
|
|
||||||
'eta': 'η',
|
|
||||||
'eth': 'ð',
|
|
||||||
'euml': 'ë',
|
|
||||||
'euro': '€',
|
|
||||||
'exist': '∃',
|
|
||||||
'fnof': 'ƒ',
|
|
||||||
'forall': '∀',
|
|
||||||
'frac12': '½',
|
|
||||||
'frac14': '¼',
|
|
||||||
'frac34': '¾',
|
|
||||||
'frasl': '⁄',
|
|
||||||
'gamma': 'γ',
|
|
||||||
'ge': '≥',
|
|
||||||
'hArr': '⇔',
|
|
||||||
'harr': '↔',
|
|
||||||
'hearts': '♥',
|
|
||||||
'hellip': '…',
|
|
||||||
'iacute': 'í',
|
|
||||||
'icirc': 'î',
|
|
||||||
'iexcl': '¡',
|
|
||||||
'igrave': 'ì',
|
|
||||||
'image': 'ℑ',
|
|
||||||
'infin': '∞',
|
|
||||||
'int': '∫',
|
|
||||||
'iota': 'ι',
|
|
||||||
'iquest': '¿',
|
|
||||||
'isin': '∈',
|
|
||||||
'iuml': 'ï',
|
|
||||||
'kappa': 'κ',
|
|
||||||
'lArr': '⇐',
|
|
||||||
'lambda': 'λ',
|
|
||||||
'lang': '〈',
|
|
||||||
'laquo': '«',
|
|
||||||
'larr': '←',
|
|
||||||
'lceil': '⌈',
|
|
||||||
'ldquo': '“',
|
|
||||||
'le': '≤',
|
|
||||||
'lfloor': '⌊',
|
|
||||||
'lowast': '∗',
|
|
||||||
'loz': '◊',
|
|
||||||
'lrm': '‎',
|
|
||||||
'lsaquo': '‹',
|
|
||||||
'lsquo': '‘',
|
|
||||||
'macr': '¯',
|
|
||||||
'mdash': '—',
|
|
||||||
'micro': 'µ',
|
|
||||||
'middot': '·',
|
|
||||||
'minus': '−',
|
|
||||||
'mu': 'μ',
|
|
||||||
'nabla': '∇',
|
|
||||||
'nbsp': ' ',
|
|
||||||
'ndash': '–',
|
|
||||||
'ne': '≠',
|
|
||||||
'ni': '∋',
|
|
||||||
'not': '¬',
|
|
||||||
'notin': '∉',
|
|
||||||
'nsub': '⊄',
|
|
||||||
'ntilde': 'ñ',
|
|
||||||
'nu': 'ν',
|
|
||||||
'oacute': 'ó',
|
|
||||||
'ocirc': 'ô',
|
|
||||||
'oelig': 'œ',
|
|
||||||
'ograve': 'ò',
|
|
||||||
'oline': '‾',
|
|
||||||
'omega': 'ω',
|
|
||||||
'omicron': 'ο',
|
|
||||||
'oplus': '⊕',
|
|
||||||
'or': '∨',
|
|
||||||
'ordf': 'ª',
|
|
||||||
'ordm': 'º',
|
|
||||||
'oslash': 'ø',
|
|
||||||
'otilde': 'õ',
|
|
||||||
'otimes': '⊗',
|
|
||||||
'ouml': 'ö',
|
|
||||||
'para': '¶',
|
|
||||||
'part': '∂',
|
|
||||||
'permil': '‰',
|
|
||||||
'perp': '⊥',
|
|
||||||
'phi': 'φ',
|
|
||||||
'pi': 'π',
|
|
||||||
'piv': 'ϖ',
|
|
||||||
'plusmn': '±',
|
|
||||||
'pound': '£',
|
|
||||||
'prime': '′',
|
|
||||||
'prod': '∏',
|
|
||||||
'prop': '∝',
|
|
||||||
'psi': 'ψ',
|
|
||||||
'rArr': '⇒',
|
|
||||||
'radic': '√',
|
|
||||||
'rang': '〉',
|
|
||||||
'raquo': '»',
|
|
||||||
'rarr': '→',
|
|
||||||
'rceil': '⌉',
|
|
||||||
'rdquo': '”',
|
|
||||||
'real': 'ℜ',
|
|
||||||
'reg': '®',
|
|
||||||
'rfloor': '⌋',
|
|
||||||
'rho': 'ρ',
|
|
||||||
'rlm': '‏',
|
|
||||||
'rsaquo': '›',
|
|
||||||
'rsquo': '’',
|
|
||||||
'sbquo': '‚',
|
|
||||||
'scaron': 'š',
|
|
||||||
'sdot': '⋅',
|
|
||||||
'sect': '§',
|
|
||||||
'shy': '­',
|
|
||||||
'sigma': 'σ',
|
|
||||||
'sigmaf': 'ς',
|
|
||||||
'sim': '∼',
|
|
||||||
'spades': '♠',
|
|
||||||
'sub': '⊂',
|
|
||||||
'sube': '⊆',
|
|
||||||
'sum': '∑',
|
|
||||||
'sup': '⊃',
|
|
||||||
'sup1': '¹',
|
|
||||||
'sup2': '²',
|
|
||||||
'sup3': '³',
|
|
||||||
'supe': '⊇',
|
|
||||||
'szlig': 'ß',
|
|
||||||
'tau': 'τ',
|
|
||||||
'there4': '∴',
|
|
||||||
'theta': 'θ',
|
|
||||||
'thetasym': 'ϑ',
|
|
||||||
'thinsp': ' ',
|
|
||||||
'thorn': 'þ',
|
|
||||||
'tilde': '˜',
|
|
||||||
'times': '×',
|
|
||||||
'trade': '™',
|
|
||||||
'uArr': '⇑',
|
|
||||||
'uacute': 'ú',
|
|
||||||
'uarr': '↑',
|
|
||||||
'ucirc': 'û',
|
|
||||||
'ugrave': 'ù',
|
|
||||||
'uml': '¨',
|
|
||||||
'upsih': 'ϒ',
|
|
||||||
'upsilon': 'υ',
|
|
||||||
'uuml': 'ü',
|
|
||||||
'weierp': '℘',
|
|
||||||
'xi': 'ξ',
|
|
||||||
'yacute': 'ý',
|
|
||||||
'yen': '¥',
|
|
||||||
'yuml': 'ÿ',
|
|
||||||
'zeta': 'ζ',
|
|
||||||
'zwj': '‍',
|
|
||||||
'zwnj': '‌'}
|
|
347
src/calibre/ebooks/oeb/parse_utils.py
Normal file
@ -0,0 +1,347 @@
|
|||||||
|
#!/usr/bin/env python
|
||||||
|
# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai
|
||||||
|
from __future__ import (unicode_literals, division, absolute_import,
|
||||||
|
print_function)
|
||||||
|
|
||||||
|
__license__ = 'GPL v3'
|
||||||
|
__copyright__ = '2011, Kovid Goyal <kovid@kovidgoyal.net>'
|
||||||
|
__docformat__ = 'restructuredtext en'
|
||||||
|
|
||||||
|
import re
|
||||||
|
|
||||||
|
from lxml import etree, html
|
||||||
|
|
||||||
|
from calibre import xml_replace_entities, force_unicode
|
||||||
|
from calibre.constants import filesystem_encoding
|
||||||
|
from calibre.ebooks.chardet import xml_to_unicode, strip_encoding_declarations
|
||||||
|
|
||||||
|
RECOVER_PARSER = etree.XMLParser(recover=True, no_network=True)
|
||||||
|
XHTML_NS = 'http://www.w3.org/1999/xhtml'
|
||||||
|
|
||||||
|
class NotHTML(Exception):
|
||||||
|
|
||||||
|
def __init__(self, root_tag):
|
||||||
|
Exception.__init__(self, 'Data is not HTML')
|
||||||
|
self.root_tag = root_tag
|
||||||
|
|
||||||
|
def barename(name):
|
||||||
|
return name.rpartition('}')[-1]
|
||||||
|
|
||||||
|
def namespace(name):
|
||||||
|
if '}' in name:
|
||||||
|
return name.split('}', 1)[0][1:]
|
||||||
|
return ''
|
||||||
|
|
||||||
|
def XHTML(name):
|
||||||
|
return '{%s}%s' % (XHTML_NS, name)
|
||||||
|
|
||||||
|
def xpath(elem, expr):
|
||||||
|
return elem.xpath(expr, namespaces={'h':XHTML_NS})
|
||||||
|
|
||||||
|
def XPath(expr):
|
||||||
|
return etree.XPath(expr, namespaces={'h':XHTML_NS})
|
||||||
|
|
||||||
|
META_XP = XPath('/h:html/h:head/h:meta[@http-equiv="Content-Type"]')
|
||||||
|
|
||||||
|
def merge_multiple_html_heads_and_bodies(root, log=None):
|
||||||
|
heads, bodies = xpath(root, '//h:head'), xpath(root, '//h:body')
|
||||||
|
if not (len(heads) > 1 or len(bodies) > 1): return root
|
||||||
|
for child in root: root.remove(child)
|
||||||
|
head = root.makeelement(XHTML('head'))
|
||||||
|
body = root.makeelement(XHTML('body'))
|
||||||
|
for h in heads:
|
||||||
|
for x in h:
|
||||||
|
head.append(x)
|
||||||
|
for b in bodies:
|
||||||
|
for x in b:
|
||||||
|
body.append(x)
|
||||||
|
map(root.append, (head, body))
|
||||||
|
if log is not None:
|
||||||
|
log.warn('Merging multiple <head> and <body> sections')
|
||||||
|
return root
|
||||||
|
|
||||||
|
def _html5_parse(data):
|
||||||
|
import html5lib
|
||||||
|
data = html5lib.parse(data, treebuilder='lxml').getroot()
|
||||||
|
html_ns = [ns for ns, val in data.nsmap.iteritems() if (val == XHTML_NS and
|
||||||
|
ns is not None)]
|
||||||
|
if html_ns:
|
||||||
|
# html5lib causes the XHTML namespace to not
|
||||||
|
# be set as the default namespace
|
||||||
|
nsmap = dict(data.nsmap)
|
||||||
|
nsmap[None] = XHTML_NS
|
||||||
|
for x in html_ns:
|
||||||
|
nsmap.pop(x)
|
||||||
|
nroot = etree.Element(data.tag, nsmap=nsmap,
|
||||||
|
attrib=dict(data.attrib))
|
||||||
|
nroot.text = data.text
|
||||||
|
nroot.tail = data.tail
|
||||||
|
for child in data:
|
||||||
|
nroot.append(child)
|
||||||
|
data = nroot
|
||||||
|
return data
|
||||||
|
|
||||||
|
def _html4_parse(data, prefer_soup=False):
|
||||||
|
if prefer_soup:
|
||||||
|
from calibre.utils.soupparser import fromstring
|
||||||
|
data = fromstring(data)
|
||||||
|
else:
|
||||||
|
data = html.fromstring(data)
|
||||||
|
data.attrib.pop('xmlns', None)
|
||||||
|
for elem in data.iter(tag=etree.Comment):
|
||||||
|
if elem.text:
|
||||||
|
elem.text = elem.text.strip('-')
|
||||||
|
data = etree.tostring(data, encoding=unicode)
|
||||||
|
|
||||||
|
# Setting huge_tree=True causes crashes in windows with large files
|
||||||
|
parser = etree.XMLParser(no_network=True)
|
||||||
|
try:
|
||||||
|
data = etree.fromstring(data, parser=parser)
|
||||||
|
except etree.XMLSyntaxError:
|
||||||
|
data = etree.fromstring(data, parser=RECOVER_PARSER)
|
||||||
|
return data
|
||||||
|
|
||||||
|
def clean_word_doc(data, log):
|
||||||
|
prefixes = []
|
||||||
|
for match in re.finditer(r'xmlns:(\S+?)=".*?microsoft.*?"', data):
|
||||||
|
prefixes.append(match.group(1))
|
||||||
|
if prefixes:
|
||||||
|
log.warn('Found microsoft markup, cleaning...')
|
||||||
|
# Remove empty tags as they are not rendered by browsers
|
||||||
|
# but can become renderable HTML tags like <p/> if the
|
||||||
|
# document is parsed by an HTML parser
|
||||||
|
pat = re.compile(
|
||||||
|
r'<(%s):([a-zA-Z0-9]+)[^>/]*?></\1:\2>'%('|'.join(prefixes)),
|
||||||
|
re.DOTALL)
|
||||||
|
data = pat.sub('', data)
|
||||||
|
pat = re.compile(
|
||||||
|
r'<(%s):([a-zA-Z0-9]+)[^>/]*?/>'%('|'.join(prefixes)))
|
||||||
|
data = pat.sub('', data)
|
||||||
|
return data
|
||||||
|
|
||||||
|
def parse_html(data, log=None, decoder=None, preprocessor=None,
|
||||||
|
filename='<string>', non_html_file_tags=frozenset()):
|
||||||
|
if log is None:
|
||||||
|
from calibre.utils.logging import default_log
|
||||||
|
log = default_log
|
||||||
|
|
||||||
|
filename = force_unicode(filename, enc=filesystem_encoding)
|
||||||
|
|
||||||
|
if not isinstance(data, unicode):
|
||||||
|
if decoder is not None:
|
||||||
|
data = decoder(data)
|
||||||
|
else:
|
||||||
|
data = xml_to_unicode(data)[0]
|
||||||
|
|
||||||
|
data = strip_encoding_declarations(data)
|
||||||
|
if preprocessor is not None:
|
||||||
|
data = preprocessor(data)
|
||||||
|
|
||||||
|
# There could be null bytes in data if it had � entities in it
|
||||||
|
data = data.replace('\0', '')
|
||||||
|
|
||||||
|
# Remove DOCTYPE declaration as it messes up parsing
|
||||||
|
# In particular, it causes tostring to insert xmlns
|
||||||
|
# declarations, which messes up the coercing logic
|
||||||
|
idx = data.find('<html')
|
||||||
|
if idx == -1:
|
||||||
|
idx = data.find('<HTML')
|
||||||
|
if idx > -1:
|
||||||
|
pre = data[:idx]
|
||||||
|
data = data[idx:]
|
||||||
|
if '<!DOCTYPE' in pre: # Handle user defined entities
|
||||||
|
user_entities = {}
|
||||||
|
for match in re.finditer(r'<!ENTITY\s+(\S+)\s+([^>]+)', pre):
|
||||||
|
val = match.group(2)
|
||||||
|
if val.startswith('"') and val.endswith('"'):
|
||||||
|
val = val[1:-1]
|
||||||
|
user_entities[match.group(1)] = val
|
||||||
|
if user_entities:
|
||||||
|
pat = re.compile(r'&(%s);'%('|'.join(user_entities.keys())))
|
||||||
|
data = pat.sub(lambda m:user_entities[m.group(1)], data)
|
||||||
|
|
||||||
|
data = clean_word_doc(data, log)
|
||||||
|
|
||||||
|
# Setting huge_tree=True causes crashes in windows with large files
|
||||||
|
parser = etree.XMLParser(no_network=True)
|
||||||
|
|
||||||
|
# Try with more & more drastic measures to parse
|
||||||
|
try:
|
||||||
|
data = etree.fromstring(data, parser=parser)
|
||||||
|
except etree.XMLSyntaxError:
|
||||||
|
log.debug('Initial parse failed, using more'
|
||||||
|
' forgiving parsers')
|
||||||
|
data = xml_replace_entities(data)
|
||||||
|
try:
|
||||||
|
data = etree.fromstring(data, parser=parser)
|
||||||
|
except etree.XMLSyntaxError:
|
||||||
|
log.debug('Parsing %s as HTML' % filename)
|
||||||
|
try:
|
||||||
|
data = _html5_parse(data)
|
||||||
|
except:
|
||||||
|
log.exception(
|
||||||
|
'HTML 5 parsing failed, falling back to older parsers')
|
||||||
|
data = _html4_parse(data)
|
||||||
|
|
||||||
|
if data.tag == 'HTML':
|
||||||
|
# Lower case all tag and attribute names
|
||||||
|
data.tag = data.tag.lower()
|
||||||
|
for x in data.iterdescendants():
|
||||||
|
try:
|
||||||
|
x.tag = x.tag.lower()
|
||||||
|
for key, val in list(x.attrib.iteritems()):
|
||||||
|
del x.attrib[key]
|
||||||
|
key = key.lower()
|
||||||
|
x.attrib[key] = val
|
||||||
|
except:
|
||||||
|
pass
|
||||||
|
|
||||||
|
if barename(data.tag) != 'html':
|
||||||
|
if barename(data.tag) in non_html_file_tags:
|
||||||
|
raise NotHTML(data.tag)
|
||||||
|
log.warn('File %r does not appear to be (X)HTML'%filename)
|
||||||
|
nroot = etree.fromstring('<html></html>')
|
||||||
|
has_body = False
|
||||||
|
for child in list(data):
|
||||||
|
if isinstance(child.tag, (unicode, str)) and barename(child.tag) == 'body':
|
||||||
|
has_body = True
|
||||||
|
break
|
||||||
|
parent = nroot
|
||||||
|
if not has_body:
|
||||||
|
log.warn('File %r appears to be a HTML fragment'%filename)
|
||||||
|
nroot = etree.fromstring('<html><body/></html>')
|
||||||
|
parent = nroot[0]
|
||||||
|
for child in list(data.iter()):
|
||||||
|
oparent = child.getparent()
|
||||||
|
if oparent is not None:
|
||||||
|
oparent.remove(child)
|
||||||
|
parent.append(child)
|
||||||
|
data = nroot
|
||||||
|
|
||||||
|
# Force into the XHTML namespace
|
||||||
|
if not namespace(data.tag):
|
||||||
|
log.warn('Forcing', filename, 'into XHTML namespace')
|
||||||
|
data.attrib['xmlns'] = XHTML_NS
|
||||||
|
data = etree.tostring(data, encoding=unicode)
|
||||||
|
|
||||||
|
try:
|
||||||
|
data = etree.fromstring(data, parser=parser)
|
||||||
|
except:
|
||||||
|
data = data.replace(':=', '=').replace(':>', '>')
|
||||||
|
data = data.replace('<http:/>', '')
|
||||||
|
try:
|
||||||
|
data = etree.fromstring(data, parser=parser)
|
||||||
|
except etree.XMLSyntaxError:
|
||||||
|
log.warn('Stripping comments from %s'%
|
||||||
|
filename)
|
||||||
|
data = re.compile(r'<!--.*?-->', re.DOTALL).sub('',
|
||||||
|
data)
|
||||||
|
data = data.replace(
|
||||||
|
"<?xml version='1.0' encoding='utf-8'?><o:p></o:p>",
|
||||||
|
'')
|
||||||
|
data = data.replace("<?xml version='1.0' encoding='utf-8'??>", '')
|
||||||
|
try:
|
||||||
|
data = etree.fromstring(data,
|
||||||
|
parser=RECOVER_PARSER)
|
||||||
|
except etree.XMLSyntaxError:
|
||||||
|
log.warn('Stripping meta tags from %s'% filename)
|
||||||
|
data = re.sub(r'<meta\s+[^>]+?>', '', data)
|
||||||
|
data = etree.fromstring(data, parser=RECOVER_PARSER)
|
||||||
|
elif namespace(data.tag) != XHTML_NS:
|
||||||
|
# OEB_DOC_NS, but possibly others
|
||||||
|
ns = namespace(data.tag)
|
||||||
|
attrib = dict(data.attrib)
|
||||||
|
nroot = etree.Element(XHTML('html'),
|
||||||
|
nsmap={None: XHTML_NS}, attrib=attrib)
|
||||||
|
for elem in data.iterdescendants():
|
||||||
|
if isinstance(elem.tag, basestring) and \
|
||||||
|
namespace(elem.tag) == ns:
|
||||||
|
elem.tag = XHTML(barename(elem.tag))
|
||||||
|
for elem in data:
|
||||||
|
nroot.append(elem)
|
||||||
|
data = nroot
|
||||||
|
|
||||||
|
data = merge_multiple_html_heads_and_bodies(data, log)
|
||||||
|
# Ensure has a <head/>
|
||||||
|
head = xpath(data, '/h:html/h:head')
|
||||||
|
head = head[0] if head else None
|
||||||
|
if head is None:
|
||||||
|
log.warn('File %s missing <head/> element' % filename)
|
||||||
|
head = etree.Element(XHTML('head'))
|
||||||
|
data.insert(0, head)
|
||||||
|
title = etree.SubElement(head, XHTML('title'))
|
||||||
|
title.text = _('Unknown')
|
||||||
|
elif not xpath(data, '/h:html/h:head/h:title'):
|
||||||
|
log.warn('File %s missing <title/> element' % filename)
|
||||||
|
title = etree.SubElement(head, XHTML('title'))
|
||||||
|
title.text = _('Unknown')
|
||||||
|
# Remove any encoding-specifying <meta/> elements
|
||||||
|
for meta in META_XP(data):
|
||||||
|
meta.getparent().remove(meta)
|
||||||
|
etree.SubElement(head, XHTML('meta'),
|
||||||
|
attrib={'http-equiv': 'Content-Type',
|
||||||
|
'content': '%s; charset=utf-8' % XHTML_NS})
|
||||||
|
# Ensure has a <body/>
|
||||||
|
if not xpath(data, '/h:html/h:body'):
|
||||||
|
body = xpath(data, '//h:body')
|
||||||
|
if body:
|
||||||
|
body = body[0]
|
||||||
|
body.getparent().remove(body)
|
||||||
|
data.append(body)
|
||||||
|
else:
|
||||||
|
log.warn('File %s missing <body/> element' % filename)
|
||||||
|
etree.SubElement(data, XHTML('body'))
|
||||||
|
|
||||||
|
# Remove microsoft office markup
|
||||||
|
r = [x for x in data.iterdescendants(etree.Element) if 'microsoft-com' in x.tag]
|
||||||
|
for x in r:
|
||||||
|
x.tag = XHTML('span')
|
||||||
|
|
||||||
|
# Remove lang redefinition inserted by the amazing Microsoft Word!
|
||||||
|
body = xpath(data, '/h:html/h:body')[0]
|
||||||
|
for key in list(body.attrib.keys()):
|
||||||
|
if key == 'lang' or key.endswith('}lang'):
|
||||||
|
body.attrib.pop(key)
|
||||||
|
|
||||||
|
def remove_elem(a):
|
||||||
|
p = a.getparent()
|
||||||
|
idx = p.index(a) -1
|
||||||
|
p.remove(a)
|
||||||
|
if a.tail:
|
||||||
|
if idx <= 0:
|
||||||
|
if p.text is None:
|
||||||
|
p.text = ''
|
||||||
|
p.text += a.tail
|
||||||
|
else:
|
||||||
|
if p[idx].tail is None:
|
||||||
|
p[idx].tail = ''
|
||||||
|
p[idx].tail += a.tail
|
||||||
|
|
||||||
|
# Remove hyperlinks with no content as they cause rendering
|
||||||
|
# artifacts in browser based renderers
|
||||||
|
# Also remove empty <b>, <u> and <i> tags
|
||||||
|
for a in xpath(data, '//h:a[@href]|//h:i|//h:b|//h:u'):
|
||||||
|
if a.get('id', None) is None and a.get('name', None) is None \
|
||||||
|
and len(a) == 0 and not a.text:
|
||||||
|
remove_elem(a)
|
||||||
|
|
||||||
|
# Convert <br>s with content into paragraphs as ADE can't handle
|
||||||
|
# them
|
||||||
|
for br in xpath(data, '//h:br'):
|
||||||
|
if len(br) > 0 or br.text:
|
||||||
|
br.tag = XHTML('div')
|
||||||
|
|
||||||
|
# Remove any stray text in the <head> section and format it nicely
|
||||||
|
data.text = '\n '
|
||||||
|
head = xpath(data, '//h:head')
|
||||||
|
if head:
|
||||||
|
head = head[0]
|
||||||
|
head.text = '\n '
|
||||||
|
head.tail = '\n '
|
||||||
|
for child in head:
|
||||||
|
child.tail = '\n '
|
||||||
|
child.tail = '\n '
|
||||||
|
|
||||||
|
return data
|
||||||
|
|
||||||
|
|
@ -19,16 +19,15 @@ from calibre.ebooks.oeb.base import OPF1_NS, OPF2_NS, OPF2_NSMAP, DC11_NS, \
|
|||||||
from calibre.ebooks.oeb.base import OEB_DOCS, OEB_STYLES, OEB_IMAGES, \
|
from calibre.ebooks.oeb.base import OEB_DOCS, OEB_STYLES, OEB_IMAGES, \
|
||||||
PAGE_MAP_MIME, JPEG_MIME, NCX_MIME, SVG_MIME
|
PAGE_MAP_MIME, JPEG_MIME, NCX_MIME, SVG_MIME
|
||||||
from calibre.ebooks.oeb.base import XMLDECL_RE, COLLAPSE_RE, \
|
from calibre.ebooks.oeb.base import XMLDECL_RE, COLLAPSE_RE, \
|
||||||
ENTITY_RE, MS_COVER_TYPE, iterlinks
|
MS_COVER_TYPE, iterlinks
|
||||||
from calibre.ebooks.oeb.base import namespace, barename, XPath, xpath, \
|
from calibre.ebooks.oeb.base import namespace, barename, XPath, xpath, \
|
||||||
urlnormalize, BINARY_MIME, \
|
urlnormalize, BINARY_MIME, \
|
||||||
OEBError, OEBBook, DirContainer
|
OEBError, OEBBook, DirContainer
|
||||||
from calibre.ebooks.oeb.writer import OEBWriter
|
from calibre.ebooks.oeb.writer import OEBWriter
|
||||||
from calibre.ebooks.oeb.entitydefs import ENTITYDEFS
|
|
||||||
from calibre.utils.localization import get_lang
|
from calibre.utils.localization import get_lang
|
||||||
from calibre.ptempfile import TemporaryDirectory
|
from calibre.ptempfile import TemporaryDirectory
|
||||||
from calibre.constants import __appname__, __version__
|
from calibre.constants import __appname__, __version__
|
||||||
from calibre import guess_type
|
from calibre import guess_type, xml_replace_entities
|
||||||
|
|
||||||
__all__ = ['OEBReader']
|
__all__ = ['OEBReader']
|
||||||
|
|
||||||
@ -107,8 +106,7 @@ class OEBReader(object):
|
|||||||
try:
|
try:
|
||||||
opf = etree.fromstring(data)
|
opf = etree.fromstring(data)
|
||||||
except etree.XMLSyntaxError:
|
except etree.XMLSyntaxError:
|
||||||
repl = lambda m: ENTITYDEFS.get(m.group(1), m.group(0))
|
data = xml_replace_entities(data, encoding=None)
|
||||||
data = ENTITY_RE.sub(repl, data)
|
|
||||||
try:
|
try:
|
||||||
opf = etree.fromstring(data)
|
opf = etree.fromstring(data)
|
||||||
self.logger.warn('OPF contains invalid HTML named entities')
|
self.logger.warn('OPF contains invalid HTML named entities')
|
||||||
@ -371,8 +369,15 @@ class OEBReader(object):
|
|||||||
else :
|
else :
|
||||||
description = None
|
description = None
|
||||||
|
|
||||||
|
index_image = xpath(child,
|
||||||
|
'descendant::calibre:meta[@name = "toc_thumbnail"]')
|
||||||
|
toc_thumbnail = (index_image[0].text if index_image else None)
|
||||||
|
if not toc_thumbnail or not toc_thumbnail.strip():
|
||||||
|
toc_thumbnail = None
|
||||||
|
|
||||||
node = toc.add(title, href, id=id, klass=klass,
|
node = toc.add(title, href, id=id, klass=klass,
|
||||||
play_order=po, description=description, author=author)
|
play_order=po, description=description, author=author,
|
||||||
|
toc_thumbnail=toc_thumbnail)
|
||||||
|
|
||||||
self._toc_from_navpoint(item, node, child)
|
self._toc_from_navpoint(item, node, child)
|
||||||
|
|
||||||
|
@ -159,15 +159,18 @@ class FlatFilenames(object): # {{{
|
|||||||
continue
|
continue
|
||||||
|
|
||||||
data = item.data
|
data = item.data
|
||||||
|
isp = item.spine_position
|
||||||
nhref = oeb.manifest.generate(href=nhref)[1]
|
nhref = oeb.manifest.generate(href=nhref)[1]
|
||||||
|
if isp is not None:
|
||||||
|
oeb.spine.remove(item)
|
||||||
|
oeb.manifest.remove(item)
|
||||||
|
|
||||||
nitem = oeb.manifest.add(item.id, nhref, item.media_type, data=data,
|
nitem = oeb.manifest.add(item.id, nhref, item.media_type, data=data,
|
||||||
fallback=item.fallback)
|
fallback=item.fallback)
|
||||||
self.rename_map[item.href] = nhref
|
self.rename_map[item.href] = nhref
|
||||||
self.renamed_items_map[nhref] = item
|
self.renamed_items_map[nhref] = item
|
||||||
if item.spine_position is not None:
|
if isp is not None:
|
||||||
oeb.spine.insert(item.spine_position, nitem, item.linear)
|
oeb.spine.insert(isp, nitem, item.linear)
|
||||||
oeb.spine.remove(item)
|
|
||||||
oeb.manifest.remove(item)
|
|
||||||
|
|
||||||
if self.rename_map:
|
if self.rename_map:
|
||||||
self.log('Found non-flat filenames, renaming to support broken'
|
self.log('Found non-flat filenames, renaming to support broken'
|
||||||
|
@ -154,7 +154,11 @@ class Split(object):
|
|||||||
|
|
||||||
def rewrite_links(self, url):
|
def rewrite_links(self, url):
|
||||||
href, frag = urldefrag(url)
|
href, frag = urldefrag(url)
|
||||||
href = self.current_item.abshref(href)
|
try:
|
||||||
|
href = self.current_item.abshref(href)
|
||||||
|
except ValueError:
|
||||||
|
# Unparseable URL
|
||||||
|
return url
|
||||||
if href in self.map:
|
if href in self.map:
|
||||||
anchor_map = self.map[href]
|
anchor_map = self.map[href]
|
||||||
nhref = anchor_map[frag if frag else None]
|
nhref = anchor_map[frag if frag else None]
|
||||||
|
@ -16,7 +16,7 @@ class UnsmartenPunctuation(object):
|
|||||||
|
|
||||||
def unsmarten(self, root):
|
def unsmarten(self, root):
|
||||||
for x in self.html_tags(root):
|
for x in self.html_tags(root):
|
||||||
if not barename(x) == 'pre':
|
if not barename(x.tag) == 'pre':
|
||||||
if getattr(x, 'text', None):
|
if getattr(x, 'text', None):
|
||||||
x.text = unsmarten_text(x.text)
|
x.text = unsmarten_text(x.text)
|
||||||
if getattr(x, 'tail', None) and x.tail:
|
if getattr(x, 'tail', None) and x.tail:
|
||||||
|
@ -56,8 +56,11 @@ def render_html(mi, css, vertical, widget, all_fields=False): # {{{
|
|||||||
</body>
|
</body>
|
||||||
<html>
|
<html>
|
||||||
'''%(f, c, css)
|
'''%(f, c, css)
|
||||||
|
fm = getattr(mi, 'field_metadata', field_metadata)
|
||||||
|
fl = dict(get_field_list(fm))
|
||||||
|
show_comments = (all_fields or fl.get('comments', True))
|
||||||
comments = u''
|
comments = u''
|
||||||
if mi.comments:
|
if mi.comments and show_comments:
|
||||||
comments = comments_to_html(force_unicode(mi.comments))
|
comments = comments_to_html(force_unicode(mi.comments))
|
||||||
right_pane = u'<div id="comments" class="comments">%s</div>'%comments
|
right_pane = u'<div id="comments" class="comments">%s</div>'%comments
|
||||||
|
|
||||||
|
@ -35,7 +35,10 @@ class PluginWidget(QWidget, Ui_Form):
|
|||||||
|
|
||||||
self.all_fields = [x for x in FIELDS if x != 'all']
|
self.all_fields = [x for x in FIELDS if x != 'all']
|
||||||
#add custom columns
|
#add custom columns
|
||||||
self.all_fields.extend([x for x in sorted(db.custom_field_keys())])
|
for x in sorted(db.custom_field_keys()):
|
||||||
|
self.all_fields.append(x)
|
||||||
|
if db.field_metadata[x]['datatype'] == 'series':
|
||||||
|
self.all_fields.append(x+'_index')
|
||||||
#populate
|
#populate
|
||||||
for x in self.all_fields:
|
for x in self.all_fields:
|
||||||
QListWidgetItem(x, self.db_fields)
|
QListWidgetItem(x, self.db_fields)
|
||||||
|
@ -33,6 +33,9 @@ class PluginWidget(QWidget, Ui_Form):
|
|||||||
self.all_fields.append(x)
|
self.all_fields.append(x)
|
||||||
QListWidgetItem(x, self.db_fields)
|
QListWidgetItem(x, self.db_fields)
|
||||||
|
|
||||||
|
fm = db.field_metadata[x]
|
||||||
|
if fm['datatype'] == 'series':
|
||||||
|
QListWidgetItem(x+'_index', self.db_fields)
|
||||||
|
|
||||||
def initialize(self, name, db):
|
def initialize(self, name, db):
|
||||||
self.name = name
|
self.name = name
|
||||||
|
@ -70,7 +70,7 @@ if pictureflow is not None:
|
|||||||
ans = ''
|
ans = ''
|
||||||
except:
|
except:
|
||||||
ans = ''
|
ans = ''
|
||||||
return ans
|
return ans.replace('&', '&&')
|
||||||
|
|
||||||
def subtitle(self, index):
|
def subtitle(self, index):
|
||||||
try:
|
try:
|
||||||
|
@ -8,7 +8,7 @@ __docformat__ = 'restructuredtext en'
|
|||||||
from functools import partial
|
from functools import partial
|
||||||
|
|
||||||
from PyQt4.Qt import QComboBox, QLabel, QSpinBox, QDoubleSpinBox, QDateTimeEdit, \
|
from PyQt4.Qt import QComboBox, QLabel, QSpinBox, QDoubleSpinBox, QDateTimeEdit, \
|
||||||
QDateTime, QGroupBox, QVBoxLayout, QSizePolicy, \
|
QDateTime, QGroupBox, QVBoxLayout, QSizePolicy, QGridLayout, \
|
||||||
QSpacerItem, QIcon, QCheckBox, QWidget, QHBoxLayout, SIGNAL, \
|
QSpacerItem, QIcon, QCheckBox, QWidget, QHBoxLayout, SIGNAL, \
|
||||||
QPushButton
|
QPushButton
|
||||||
|
|
||||||
@ -401,70 +401,106 @@ widgets = {
|
|||||||
'enumeration': Enumeration
|
'enumeration': Enumeration
|
||||||
}
|
}
|
||||||
|
|
||||||
def field_sort_key(y, x=None):
|
def field_sort_key(y, fm=None):
|
||||||
m1 = x[y]
|
m1 = fm[y]
|
||||||
n1 = 'zzzzz' if m1['datatype'] == 'comments' else m1['name']
|
name = icu_lower(m1['name'])
|
||||||
|
n1 = 'zzzzz' + name if m1['datatype'] == 'comments' else name
|
||||||
return sort_key(n1)
|
return sort_key(n1)
|
||||||
|
|
||||||
def populate_metadata_page(layout, db, book_id, bulk=False, two_column=False, parent=None):
|
def populate_metadata_page(layout, db, book_id, bulk=False, two_column=False, parent=None):
|
||||||
def widget_factory(type, col):
|
def widget_factory(typ, key):
|
||||||
if bulk:
|
if bulk:
|
||||||
w = bulk_widgets[type](db, col, parent)
|
w = bulk_widgets[typ](db, key, parent)
|
||||||
else:
|
else:
|
||||||
w = widgets[type](db, col, parent)
|
w = widgets[typ](db, key, parent)
|
||||||
if book_id is not None:
|
if book_id is not None:
|
||||||
w.initialize(book_id)
|
w.initialize(book_id)
|
||||||
return w
|
return w
|
||||||
x = db.custom_column_num_map
|
fm = db.field_metadata
|
||||||
cols = list(x)
|
|
||||||
cols.sort(key=partial(field_sort_key, x=x))
|
|
||||||
count_non_comment = len([c for c in cols if x[c]['datatype'] != 'comments'])
|
|
||||||
|
|
||||||
layout.setColumnStretch(1, 10)
|
# Get list of all non-composite custom fields. We must make widgets for these
|
||||||
|
fields = fm.custom_field_keys(include_composites=False)
|
||||||
|
cols_to_display = fields
|
||||||
|
cols_to_display.sort(key=partial(field_sort_key, fm=fm))
|
||||||
|
|
||||||
|
# This will contain the fields in the order to display them
|
||||||
|
cols = []
|
||||||
|
|
||||||
|
# The fields named here must be first in the widget list
|
||||||
|
tweak_cols = tweaks['metadata_edit_custom_column_order']
|
||||||
|
comments_in_tweak = 0
|
||||||
|
for key in (tweak_cols or ()):
|
||||||
|
# Add the key if it really exists in the database
|
||||||
|
if key in cols_to_display:
|
||||||
|
cols.append(key)
|
||||||
|
if fm[key]['datatype'] == 'comments':
|
||||||
|
comments_in_tweak += 1
|
||||||
|
|
||||||
|
# Add all the remaining fields
|
||||||
|
comments_not_in_tweak = 0
|
||||||
|
for key in cols_to_display:
|
||||||
|
if key not in cols:
|
||||||
|
cols.append(key)
|
||||||
|
if fm[key]['datatype'] == 'comments':
|
||||||
|
comments_not_in_tweak += 1
|
||||||
|
|
||||||
|
count = len(cols)
|
||||||
|
layout_rows_for_comments = 9
|
||||||
if two_column:
|
if two_column:
|
||||||
turnover_point = (count_non_comment+1)/2
|
turnover_point = ((count-comments_not_in_tweak+1) +
|
||||||
layout.setColumnStretch(3, 10)
|
comments_in_tweak*(layout_rows_for_comments-1))/2
|
||||||
else:
|
else:
|
||||||
# Avoid problems with multi-line widgets
|
# Avoid problems with multi-line widgets
|
||||||
turnover_point = count_non_comment + 1000
|
turnover_point = count + 1000
|
||||||
ans = []
|
ans = []
|
||||||
column = row = comments_row = 0
|
column = row = base_row = max_row = 0
|
||||||
for col in cols:
|
for key in cols:
|
||||||
if not x[col]['editable']:
|
if not fm[key]['is_editable']:
|
||||||
|
continue # this almost never happens
|
||||||
|
dt = fm[key]['datatype']
|
||||||
|
if dt == 'composite' or (bulk and dt == 'comments'):
|
||||||
continue
|
continue
|
||||||
dt = x[col]['datatype']
|
w = widget_factory(dt, fm[key]['colnum'])
|
||||||
if dt == 'composite':
|
|
||||||
continue
|
|
||||||
if dt == 'comments':
|
|
||||||
continue
|
|
||||||
w = widget_factory(dt, col)
|
|
||||||
ans.append(w)
|
ans.append(w)
|
||||||
|
if two_column and dt == 'comments':
|
||||||
|
# Here for compatibility with old layout. Comments always started
|
||||||
|
# in the left column
|
||||||
|
comments_in_tweak -= 1
|
||||||
|
# no special processing if the comment field was named in the tweak
|
||||||
|
if comments_in_tweak < 0 and comments_not_in_tweak > 0:
|
||||||
|
# Force a turnover, adding comments widgets below max_row.
|
||||||
|
# Save the row to return to if we turn over again
|
||||||
|
column = 0
|
||||||
|
row = max_row
|
||||||
|
base_row = row
|
||||||
|
turnover_point = row + (comments_not_in_tweak * layout_rows_for_comments)/2
|
||||||
|
comments_not_in_tweak = 0
|
||||||
|
|
||||||
|
l = QGridLayout()
|
||||||
|
if dt == 'comments':
|
||||||
|
layout.addLayout(l, row, column, layout_rows_for_comments, 1)
|
||||||
|
layout.setColumnStretch(column, 100)
|
||||||
|
row += layout_rows_for_comments
|
||||||
|
else:
|
||||||
|
layout.addLayout(l, row, column, 1, 1)
|
||||||
|
layout.setColumnStretch(column, 100)
|
||||||
|
row += 1
|
||||||
for c in range(0, len(w.widgets), 2):
|
for c in range(0, len(w.widgets), 2):
|
||||||
w.widgets[c].setWordWrap(True)
|
|
||||||
w.widgets[c].setBuddy(w.widgets[c+1])
|
|
||||||
layout.addWidget(w.widgets[c], row, column)
|
|
||||||
layout.addWidget(w.widgets[c+1], row, column+1)
|
|
||||||
row += 1
|
|
||||||
comments_row = max(comments_row, row)
|
|
||||||
if row >= turnover_point:
|
|
||||||
column += 2
|
|
||||||
turnover_point = count_non_comment + 1000
|
|
||||||
row = 0
|
|
||||||
if not bulk: # Add the comments fields
|
|
||||||
row = comments_row
|
|
||||||
column = 0
|
|
||||||
for col in cols:
|
|
||||||
dt = x[col]['datatype']
|
|
||||||
if dt != 'comments':
|
if dt != 'comments':
|
||||||
continue
|
w.widgets[c].setWordWrap(True)
|
||||||
w = widget_factory(dt, col)
|
w.widgets[c].setBuddy(w.widgets[c+1])
|
||||||
ans.append(w)
|
l.addWidget(w.widgets[c], c, 0)
|
||||||
layout.addWidget(w.widgets[0], row, column, 1, 2)
|
l.addWidget(w.widgets[c+1], c, 1)
|
||||||
if two_column and column == 0:
|
l.setColumnStretch(1, 10000)
|
||||||
column = 2
|
else:
|
||||||
continue
|
l.addWidget(w.widgets[0], 0, 0, 1, 2)
|
||||||
column = 0
|
l.addItem(QSpacerItem(0, 0, vPolicy=QSizePolicy.Expanding), c, 0, 1, 1)
|
||||||
row += 1
|
max_row = max(max_row, row)
|
||||||
|
if row >= turnover_point:
|
||||||
|
column = 1
|
||||||
|
turnover_point = count + 1000
|
||||||
|
row = base_row
|
||||||
|
|
||||||
items = []
|
items = []
|
||||||
if len(ans) > 0:
|
if len(ans) > 0:
|
||||||
items.append(QSpacerItem(10, 10, QSizePolicy.Minimum,
|
items.append(QSpacerItem(10, 10, QSizePolicy.Minimum,
|
||||||
|
@ -12,7 +12,7 @@ from PyQt4.Qt import QDialog, QApplication
|
|||||||
from calibre.gui2.dialogs.add_from_isbn_ui import Ui_Dialog
|
from calibre.gui2.dialogs.add_from_isbn_ui import Ui_Dialog
|
||||||
from calibre.ebooks.metadata import check_isbn
|
from calibre.ebooks.metadata import check_isbn
|
||||||
from calibre.constants import iswindows
|
from calibre.constants import iswindows
|
||||||
from calibre.gui2 import gprefs
|
from calibre.gui2 import gprefs, question_dialog, error_dialog
|
||||||
|
|
||||||
class AddFromISBN(QDialog, Ui_Dialog):
|
class AddFromISBN(QDialog, Ui_Dialog):
|
||||||
|
|
||||||
@ -44,6 +44,7 @@ class AddFromISBN(QDialog, Ui_Dialog):
|
|||||||
tags = list(filter(None, [x.strip() for x in tags]))
|
tags = list(filter(None, [x.strip() for x in tags]))
|
||||||
gprefs['add from ISBN tags'] = tags
|
gprefs['add from ISBN tags'] = tags
|
||||||
self.set_tags = tags
|
self.set_tags = tags
|
||||||
|
bad = set()
|
||||||
for line in unicode(self.isbn_box.toPlainText()).strip().splitlines():
|
for line in unicode(self.isbn_box.toPlainText()).strip().splitlines():
|
||||||
line = line.strip()
|
line = line.strip()
|
||||||
if not line:
|
if not line:
|
||||||
@ -64,5 +65,19 @@ class AddFromISBN(QDialog, Ui_Dialog):
|
|||||||
os.access(parts[1], os.R_OK) and os.path.isfile(parts[1]):
|
os.access(parts[1], os.R_OK) and os.path.isfile(parts[1]):
|
||||||
book['path'] = parts[1]
|
book['path'] = parts[1]
|
||||||
self.books.append(book)
|
self.books.append(book)
|
||||||
|
else:
|
||||||
|
bad.add(parts[0])
|
||||||
|
if bad:
|
||||||
|
if self.books:
|
||||||
|
if not question_dialog(self, _('Some invalid ISBNs'),
|
||||||
|
_('Some of the ISBNs you entered were invalid. They will'
|
||||||
|
' be ignored. Click Show Details to see which ones.'
|
||||||
|
' Do you want to proceed?'), det_msg='\n'.join(bad),
|
||||||
|
show_copy_button=True):
|
||||||
|
return
|
||||||
|
else:
|
||||||
|
return error_dialog(self, _('All invalid ISBNs'),
|
||||||
|
_('All the ISBNs you entered were invalid. No books'
|
||||||
|
' can be added.'), show=True)
|
||||||
QDialog.accept(self, *args)
|
QDialog.accept(self, *args)
|
||||||
|
|
||||||
|
@ -419,6 +419,13 @@ class Scheduler(QObject):
|
|||||||
QObject.__init__(self, parent)
|
QObject.__init__(self, parent)
|
||||||
self.internet_connection_failed = False
|
self.internet_connection_failed = False
|
||||||
self._parent = parent
|
self._parent = parent
|
||||||
|
self.no_internet_msg = _('Cannot download news as no internet connection '
|
||||||
|
'is active')
|
||||||
|
self.no_internet_dialog = d = error_dialog(self._parent,
|
||||||
|
self.no_internet_msg, _('No internet connection'),
|
||||||
|
show_copy_button=False)
|
||||||
|
d.setModal(False)
|
||||||
|
|
||||||
self.recipe_model = RecipeModel()
|
self.recipe_model = RecipeModel()
|
||||||
self.db = db
|
self.db = db
|
||||||
self.lock = QMutex(QMutex.Recursive)
|
self.lock = QMutex(QMutex.Recursive)
|
||||||
@ -434,7 +441,7 @@ class Scheduler(QObject):
|
|||||||
self.news_menu.addAction(self.cac)
|
self.news_menu.addAction(self.cac)
|
||||||
self.news_menu.addSeparator()
|
self.news_menu.addSeparator()
|
||||||
self.all_action = self.news_menu.addAction(
|
self.all_action = self.news_menu.addAction(
|
||||||
_('Download all scheduled new sources'),
|
_('Download all scheduled news sources'),
|
||||||
self.download_all_scheduled)
|
self.download_all_scheduled)
|
||||||
|
|
||||||
self.timer = QTimer(self)
|
self.timer = QTimer(self)
|
||||||
@ -523,7 +530,6 @@ class Scheduler(QObject):
|
|||||||
finally:
|
finally:
|
||||||
self.lock.unlock()
|
self.lock.unlock()
|
||||||
|
|
||||||
|
|
||||||
def download_clicked(self, urn):
|
def download_clicked(self, urn):
|
||||||
if urn is not None:
|
if urn is not None:
|
||||||
return self.download(urn)
|
return self.download(urn)
|
||||||
@ -534,18 +540,25 @@ class Scheduler(QObject):
|
|||||||
def download_all_scheduled(self):
|
def download_all_scheduled(self):
|
||||||
self.download_clicked(None)
|
self.download_clicked(None)
|
||||||
|
|
||||||
def download(self, urn):
|
def has_internet_connection(self):
|
||||||
self.lock.lock()
|
|
||||||
if not internet_connected():
|
if not internet_connected():
|
||||||
if not self.internet_connection_failed:
|
if not self.internet_connection_failed:
|
||||||
self.internet_connection_failed = True
|
self.internet_connection_failed = True
|
||||||
d = error_dialog(self._parent, _('No internet connection'),
|
if self._parent.is_minimized_to_tray:
|
||||||
_('Cannot download news as no internet connection '
|
self._parent.status_bar.show_message(self.no_internet_msg,
|
||||||
'is active'))
|
5000)
|
||||||
d.setModal(False)
|
elif not self.no_internet_dialog.isVisible():
|
||||||
d.show()
|
self.no_internet_dialog.show()
|
||||||
return False
|
return False
|
||||||
self.internet_connection_failed = False
|
self.internet_connection_failed = False
|
||||||
|
if self.no_internet_dialog.isVisible():
|
||||||
|
self.no_internet_dialog.hide()
|
||||||
|
return True
|
||||||
|
|
||||||
|
def download(self, urn):
|
||||||
|
self.lock.lock()
|
||||||
|
if not self.has_internet_connection():
|
||||||
|
return False
|
||||||
doit = urn not in self.download_queue
|
doit = urn not in self.download_queue
|
||||||
self.lock.unlock()
|
self.lock.unlock()
|
||||||
if doit:
|
if doit:
|
||||||
@ -555,7 +568,9 @@ class Scheduler(QObject):
|
|||||||
def check(self):
|
def check(self):
|
||||||
recipes = self.recipe_model.get_to_be_downloaded_recipes()
|
recipes = self.recipe_model.get_to_be_downloaded_recipes()
|
||||||
for urn in recipes:
|
for urn in recipes:
|
||||||
self.download(urn)
|
if not self.download(urn):
|
||||||
|
# No internet connection, we will try again in a minute
|
||||||
|
break
|
||||||
|
|
||||||
if __name__ == '__main__':
|
if __name__ == '__main__':
|
||||||
from calibre.gui2 import is_ok_to_use_qt
|
from calibre.gui2 import is_ok_to_use_qt
|
||||||
|
@ -28,11 +28,11 @@ class BaseModel(QAbstractListModel):
|
|||||||
|
|
||||||
def name_to_action(self, name, gui):
|
def name_to_action(self, name, gui):
|
||||||
if name == 'Donate':
|
if name == 'Donate':
|
||||||
return FakeAction(name, 'donate.png',
|
return FakeAction(_('Donate'), 'donate.png',
|
||||||
dont_add_to=frozenset(['context-menu',
|
dont_add_to=frozenset(['context-menu',
|
||||||
'context-menu-device']))
|
'context-menu-device']))
|
||||||
if name == 'Location Manager':
|
if name == 'Location Manager':
|
||||||
return FakeAction(name, None,
|
return FakeAction(_('Location Manager'), None,
|
||||||
_('Switch between library and device views'),
|
_('Switch between library and device views'),
|
||||||
dont_add_to=frozenset(['menubar', 'toolbar',
|
dont_add_to=frozenset(['menubar', 'toolbar',
|
||||||
'toolbar-child', 'context-menu',
|
'toolbar-child', 'context-menu',
|
||||||
|
@ -723,10 +723,10 @@ class Main(MainWindow, MainWindowMixin, DeviceMixin, EmailMixin, # {{{
|
|||||||
self.write_settings()
|
self.write_settings()
|
||||||
if self.system_tray_icon.isVisible():
|
if self.system_tray_icon.isVisible():
|
||||||
if not dynamic['systray_msg'] and not isosx:
|
if not dynamic['systray_msg'] and not isosx:
|
||||||
info_dialog(self, 'calibre', 'calibre '+\
|
info_dialog(self, 'calibre', 'calibre '+ \
|
||||||
_('will keep running in the system tray. To close it, '
|
_('will keep running in the system tray. To close it, '
|
||||||
'choose <b>Quit</b> in the context menu of the '
|
'choose <b>Quit</b> in the context menu of the '
|
||||||
'system tray.')).exec_()
|
'system tray.'), show_copy_button=False).exec_()
|
||||||
dynamic['systray_msg'] = True
|
dynamic['systray_msg'] = True
|
||||||
self.hide_windows()
|
self.hide_windows()
|
||||||
e.ignore()
|
e.ignore()
|
||||||
|
@ -537,6 +537,12 @@ class DocumentView(QWebView): # {{{
|
|||||||
self.dictionary_action.setShortcut(Qt.CTRL+Qt.Key_L)
|
self.dictionary_action.setShortcut(Qt.CTRL+Qt.Key_L)
|
||||||
self.dictionary_action.triggered.connect(self.lookup)
|
self.dictionary_action.triggered.connect(self.lookup)
|
||||||
self.addAction(self.dictionary_action)
|
self.addAction(self.dictionary_action)
|
||||||
|
self.search_action = QAction(QIcon(I('dictionary.png')),
|
||||||
|
_('&Search for next occurrence'), self)
|
||||||
|
self.search_action.setShortcut(Qt.CTRL+Qt.Key_S)
|
||||||
|
self.search_action.triggered.connect(self.search_next)
|
||||||
|
self.addAction(self.search_action)
|
||||||
|
|
||||||
self.goto_location_action = QAction(_('Go to...'), self)
|
self.goto_location_action = QAction(_('Go to...'), self)
|
||||||
self.goto_location_menu = m = QMenu(self)
|
self.goto_location_menu = m = QMenu(self)
|
||||||
self.goto_location_actions = a = {
|
self.goto_location_actions = a = {
|
||||||
@ -620,6 +626,7 @@ class DocumentView(QWebView): # {{{
|
|||||||
text = unicode(self.selectedText())
|
text = unicode(self.selectedText())
|
||||||
if text:
|
if text:
|
||||||
menu.insertAction(list(menu.actions())[0], self.dictionary_action)
|
menu.insertAction(list(menu.actions())[0], self.dictionary_action)
|
||||||
|
menu.insertAction(list(menu.actions())[0], self.search_action)
|
||||||
menu.addSeparator()
|
menu.addSeparator()
|
||||||
menu.addAction(self.goto_location_action)
|
menu.addAction(self.goto_location_action)
|
||||||
menu.exec_(ev.globalPos())
|
menu.exec_(ev.globalPos())
|
||||||
@ -630,6 +637,12 @@ class DocumentView(QWebView): # {{{
|
|||||||
if t:
|
if t:
|
||||||
self.manager.lookup(t.split()[0])
|
self.manager.lookup(t.split()[0])
|
||||||
|
|
||||||
|
def search_next(self):
|
||||||
|
if self.manager is not None:
|
||||||
|
t = unicode(self.selectedText()).strip()
|
||||||
|
if t:
|
||||||
|
self.manager.search.set_search_string(t)
|
||||||
|
|
||||||
def set_manager(self, manager):
|
def set_manager(self, manager):
|
||||||
self.manager = manager
|
self.manager = manager
|
||||||
self.scrollbar = manager.horizontal_scrollbar
|
self.scrollbar = manager.horizontal_scrollbar
|
||||||
|
@ -758,11 +758,12 @@ class EbookViewer(MainWindow, Ui_EbookViewer):
|
|||||||
self.set_page_number(frac)
|
self.set_page_number(frac)
|
||||||
|
|
||||||
def next_document(self):
|
def next_document(self):
|
||||||
if self.current_index < len(self.iterator.spine) - 1:
|
if (hasattr(self, 'current_index') and self.current_index <
|
||||||
|
len(self.iterator.spine) - 1):
|
||||||
self.load_path(self.iterator.spine[self.current_index+1])
|
self.load_path(self.iterator.spine[self.current_index+1])
|
||||||
|
|
||||||
def previous_document(self):
|
def previous_document(self):
|
||||||
if self.current_index > 0:
|
if hasattr(self, 'current_index') and self.current_index > 0:
|
||||||
self.load_path(self.iterator.spine[self.current_index-1], pos=1.0)
|
self.load_path(self.iterator.spine[self.current_index-1], pos=1.0)
|
||||||
|
|
||||||
def keyPressEvent(self, event):
|
def keyPressEvent(self, event):
|
||||||
|
@ -347,7 +347,9 @@ class BIBTEX(CatalogPlugin): # {{{
|
|||||||
|
|
||||||
for field in fields:
|
for field in fields:
|
||||||
if field.startswith('#'):
|
if field.startswith('#'):
|
||||||
item = db.get_field(entry['id'],field,index_is_id=True)
|
item = db.get_field(entry['id'],field,index_is_id=True)
|
||||||
|
if isinstance(item, (bool, float, int)):
|
||||||
|
item = repr(item)
|
||||||
elif field == 'title_sort':
|
elif field == 'title_sort':
|
||||||
item = entry['sort']
|
item = entry['sort']
|
||||||
else:
|
else:
|
||||||
@ -391,7 +393,7 @@ class BIBTEX(CatalogPlugin): # {{{
|
|||||||
|
|
||||||
elif field == 'isbn' :
|
elif field == 'isbn' :
|
||||||
# Could be 9, 10 or 13 digits
|
# Could be 9, 10 or 13 digits
|
||||||
bibtex_entry.append(u'isbn = "%s"' % re.sub(u'[\D]', u'', item))
|
bibtex_entry.append(u'isbn = "%s"' % re.sub(u'[0-9xX]', u'', item))
|
||||||
|
|
||||||
elif field == 'formats' :
|
elif field == 'formats' :
|
||||||
#Add file path if format is selected
|
#Add file path if format is selected
|
||||||
@ -413,7 +415,8 @@ class BIBTEX(CatalogPlugin): # {{{
|
|||||||
bibtex_entry.append(u'month = "%s"' % bibtexdict.utf8ToBibtex(strftime("%b", item)))
|
bibtex_entry.append(u'month = "%s"' % bibtexdict.utf8ToBibtex(strftime("%b", item)))
|
||||||
|
|
||||||
elif field.startswith('#') :
|
elif field.startswith('#') :
|
||||||
bibtex_entry.append(u'%s = "%s"' % (field[1:], bibtexdict.utf8ToBibtex(item)))
|
bibtex_entry.append(u'custom_%s = "%s"' % (field[1:],
|
||||||
|
bibtexdict.utf8ToBibtex(item)))
|
||||||
|
|
||||||
else:
|
else:
|
||||||
# elif field in ['title', 'publisher', 'cover', 'uuid', 'ondevice',
|
# elif field in ['title', 'publisher', 'cover', 'uuid', 'ondevice',
|
||||||
|
@ -64,8 +64,17 @@ def do_list(db, fields, afields, sort_by, ascending, search_text, line_width, se
|
|||||||
data = db.get_data_as_dict(prefix, authors_as_string=True)
|
data = db.get_data_as_dict(prefix, authors_as_string=True)
|
||||||
fields = ['id'] + fields
|
fields = ['id'] + fields
|
||||||
title_fields = fields
|
title_fields = fields
|
||||||
fields = [db.custom_column_label_map[x[1:]]['num'] if x[0]=='*'
|
def field_name(f):
|
||||||
else x for x in fields]
|
ans = f
|
||||||
|
if f[0] == '*':
|
||||||
|
if f.endswith('_index'):
|
||||||
|
fkey = f[1:-len('_index')]
|
||||||
|
num = db.custom_column_label_map[fkey]['num']
|
||||||
|
ans = '%d_index'%num
|
||||||
|
else:
|
||||||
|
ans = db.custom_column_label_map[f[1:]]['num']
|
||||||
|
return ans
|
||||||
|
fields = list(map(field_name, fields))
|
||||||
|
|
||||||
for f in data:
|
for f in data:
|
||||||
fmts = [x for x in f['formats'] if x is not None]
|
fmts = [x for x in f['formats'] if x is not None]
|
||||||
@ -121,8 +130,10 @@ def do_list(db, fields, afields, sort_by, ascending, search_text, line_width, se
|
|||||||
def list_option_parser(db=None):
|
def list_option_parser(db=None):
|
||||||
fields = set(FIELDS)
|
fields = set(FIELDS)
|
||||||
if db is not None:
|
if db is not None:
|
||||||
for f in db.custom_column_label_map:
|
for f, data in db.custom_column_label_map.iteritems():
|
||||||
fields.add('*'+f)
|
fields.add('*'+f)
|
||||||
|
if data['datatype'] == 'series':
|
||||||
|
fields.add('*'+f+'_index')
|
||||||
|
|
||||||
parser = get_parser(_(
|
parser = get_parser(_(
|
||||||
'''\
|
'''\
|
||||||
@ -161,8 +172,10 @@ def command_list(args, dbpath):
|
|||||||
opts, args = parser.parse_args(sys.argv[:1] + args)
|
opts, args = parser.parse_args(sys.argv[:1] + args)
|
||||||
afields = set(FIELDS)
|
afields = set(FIELDS)
|
||||||
if db is not None:
|
if db is not None:
|
||||||
for f in db.custom_column_label_map:
|
for f, data in db.custom_column_label_map.iteritems():
|
||||||
afields.add('*'+f)
|
afields.add('*'+f)
|
||||||
|
if data['datatype'] == 'series':
|
||||||
|
afields.add('*'+f+'_index')
|
||||||
fields = [str(f.strip().lower()) for f in opts.fields.split(',')]
|
fields = [str(f.strip().lower()) for f in opts.fields.split(',')]
|
||||||
if 'all' in fields:
|
if 'all' in fields:
|
||||||
fields = sorted(list(afields))
|
fields = sorted(list(afields))
|
||||||
|
@ -1089,8 +1089,12 @@ ALTER TABLE books ADD COLUMN isbn TEXT DEFAULT "" COLLATE NOCASE;
|
|||||||
ids = tuple(ids)
|
ids = tuple(ids)
|
||||||
if len(ids) > 50000:
|
if len(ids) > 50000:
|
||||||
return True
|
return True
|
||||||
|
if len(ids) == 1:
|
||||||
|
ids = '(%d)'%ids[0]
|
||||||
|
else:
|
||||||
|
ids = repr(ids)
|
||||||
return self.conn.get('''
|
return self.conn.get('''
|
||||||
SELECT data FROM conversion_options WHERE book IN %r AND
|
SELECT data FROM conversion_options WHERE book IN %s AND
|
||||||
format=? LIMIT 1'''%(ids,), (format,), all=False) is not None
|
format=? LIMIT 1'''%(ids,), (format,), all=False) is not None
|
||||||
|
|
||||||
def delete_conversion_options(self, id, format, commit=True):
|
def delete_conversion_options(self, id, format, commit=True):
|
||||||
|
@ -3376,11 +3376,15 @@ class LibraryDatabase2(LibraryDatabase, SchemaUpgrade, CustomColumns):
|
|||||||
'''
|
'''
|
||||||
if prefix is None:
|
if prefix is None:
|
||||||
prefix = self.library_path
|
prefix = self.library_path
|
||||||
FIELDS = set(['title', 'sort', 'authors', 'author_sort', 'publisher', 'rating',
|
fdata = self.custom_column_num_map
|
||||||
'timestamp', 'size', 'tags', 'comments', 'series', 'series_index',
|
|
||||||
'uuid', 'pubdate', 'last_modified', 'identifiers', 'languages'])
|
FIELDS = set(['title', 'sort', 'authors', 'author_sort', 'publisher',
|
||||||
for x in self.custom_column_num_map:
|
'rating', 'timestamp', 'size', 'tags', 'comments', 'series',
|
||||||
FIELDS.add(x)
|
'series_index', 'uuid', 'pubdate', 'last_modified', 'identifiers',
|
||||||
|
'languages']).union(set(fdata))
|
||||||
|
for x, data in fdata.iteritems():
|
||||||
|
if data['datatype'] == 'series':
|
||||||
|
FIELDS.add('%d_index'%x)
|
||||||
data = []
|
data = []
|
||||||
for record in self.data:
|
for record in self.data:
|
||||||
if record is None: continue
|
if record is None: continue
|
||||||
|
@ -154,7 +154,7 @@ class Formatter(TemplateFormatter):
|
|||||||
return self.composite_values[key]
|
return self.composite_values[key]
|
||||||
self.composite_values[key] = 'RECURSIVE_COMPOSITE FIELD (S2D) ' + key
|
self.composite_values[key] = 'RECURSIVE_COMPOSITE FIELD (S2D) ' + key
|
||||||
self.composite_values[key] = \
|
self.composite_values[key] = \
|
||||||
self.vformat(b['display']['composite_template'], [], kwargs)
|
self.evaluate(b['display']['composite_template'], [], kwargs)
|
||||||
return self.composite_values[key]
|
return self.composite_values[key]
|
||||||
if key in kwargs:
|
if key in kwargs:
|
||||||
val = kwargs[key]
|
val = kwargs[key]
|
||||||
|
@ -47,7 +47,7 @@ Overriding icons, templates, etcetera
|
|||||||
|
|
||||||
|app| allows you to override the static resources, like icons, templates, javascript, etc. with customized versions that you like.
|
|app| allows you to override the static resources, like icons, templates, javascript, etc. with customized versions that you like.
|
||||||
All static resources are stored in the resources sub-folder of the calibre install location. On Windows, this is usually
|
All static resources are stored in the resources sub-folder of the calibre install location. On Windows, this is usually
|
||||||
:file:`C:\Program Files\Calibre2\resources`. On OS X, :file:`/Applications/calibre.app/Contents/Resources/resources/`. On linux, if you are using the binary installer
|
:file:`C:/Program Files/Calibre2/resources`. On OS X, :file:`/Applications/calibre.app/Contents/Resources/resources/`. On linux, if you are using the binary installer
|
||||||
from the calibre website it will be :file:`/opt/calibre/resources`. These paths can change depending on where you choose to install |app|.
|
from the calibre website it will be :file:`/opt/calibre/resources`. These paths can change depending on where you choose to install |app|.
|
||||||
|
|
||||||
You should not change the files in this resources folder, as your changes will get overwritten the next time you update |app|. Instead, go to
|
You should not change the files in this resources folder, as your changes will get overwritten the next time you update |app|. Instead, go to
|
||||||
|
@ -112,7 +112,7 @@ Functions are always applied before format specifications. See further down for
|
|||||||
|
|
||||||
The syntax for using functions is ``{field:function(arguments)}``, or ``{field:function(arguments)|prefix|suffix}``. Arguments are separated by commas. Commas inside arguments must be preceeded by a backslash ( '\\' ). The last (or only) argument cannot contain a closing parenthesis ( ')' ). Functions return the value of the field used in the template, suitably modified.
|
The syntax for using functions is ``{field:function(arguments)}``, or ``{field:function(arguments)|prefix|suffix}``. Arguments are separated by commas. Commas inside arguments must be preceeded by a backslash ( '\\' ). The last (or only) argument cannot contain a closing parenthesis ( ')' ). Functions return the value of the field used in the template, suitably modified.
|
||||||
|
|
||||||
If you have programming experience, please note that the syntax in this mode (single function) is not what you might expect. Strings are not quoted. Spaces are significant. All arguments must be constants; there is no sub-evaluation. Use :ref:`template program mode <template_mode>` and :ref:`general program mode <general_mode>` to avoid these differences.
|
If you have programming experience, please note that the syntax in this mode (single function) is not what you might expect. Strings are not quoted. Spaces are significant. All arguments must be constants; there is no sub-evaluation. **Do not use subtemplates (`{ ... }`) as function arguments.** Instead, use :ref:`template program mode <template_mode>` and :ref:`general program mode <general_mode>`.
|
||||||
|
|
||||||
Many functions use regular expressions. In all cases, regular expression matching is case-insensitive.
|
Many functions use regular expressions. In all cases, regular expression matching is case-insensitive.
|
||||||
|
|
||||||
|