Sync to trunk.

This commit is contained in:
John Schember 2012-12-30 20:04:06 -05:00
commit bbb240d7cd
262 changed files with 102981 additions and 85914 deletions

View File

@ -19,6 +19,162 @@
# new recipes: # new recipes:
# - title: # - title:
- version: 0.9.12
date: 2012-12-28
new features:
- title: "Drivers for Kibano e-reader and Slick ER-700-2"
tickets: [1093570, 1093732]
- title: "Add support for downloading metadata from Amazon Brazil."
tickets: [1092594]
- title: "Copy to library: Allow specifying the destination library by path."
tickets: [1093231]
- title: "When adding empty books, allow setting of the series for the new books. Also select the newly added book records after adding."
- title: "PDF Output: Add a checkbox to override the page size defined by the output profile. This allows you to specify a custom page size even if the output profile is not set to default."
- title: "Add usb ids for newer kindle fire to the linux mtp driver"
bug fixes:
- title: "Linux: Temporarily redirect stdout to get rid of the annoying and pointless message about mtpz during libmtp initialization"
- title: "Fix multiple 'All column' coloring rules not being applied"
tickets: [1093574]
- title: "Use custom icons in the content server as well."
tickets: [1092098]
improved recipes:
- La Voce
- Harpers Magazine (printed edition)
- Pajamas Media
- NSFW corp
- The Hindu
- Nikkei News
new recipes:
- title: Various Ukranian news sources
author: rpalyvoda
- version: 0.9.11
date: 2012-12-21
new features:
- title: "Merry Christmas and Happy Holidays to all ☺"
- title: "When connecting to MTP devices such as the Kindle Fire HD or the Nook HD, speed up the process by ignoring some folders."
description: "calibre will now ignore folders for music, video, pictures, etc. when scanning the device. This can substantially speed up the connection process if you have thousands of non-ebook files on the device. The list of folders to be ignored can be customized by right clicking on the device icon in calibre and selecting 'Configure this device'."
- title: "Allow changing the icons for categories in the Tag Browser. Right click on a category and choose 'Change category icon'."
tickets: [1092098]
- title: "Allow setting the color of all columns with a single rule in Preferences->Look & Feel->Column Coloring"
- title: "MOBI: When reading metadata from mobi files, put the contents of the ASIN field into an identifier named mobi-asin. Note that this value is not used when downloading metadata as it is not possible to know which (country specific) amazon website the ASIN comes from."
tickets: [1090394]
bug fixes:
- title: "Windows build: Fix a regression in 0.9.9 that caused calibre to not start on some windows system that were missing the VC.90 dlls (some older XP systems)"
- title: "Kobo driver: Workaround for invalid shelves created by bugs in the Kobo server"
tickets: [1091932]
- title: "Metadata download: Fix cover downloading from non-US amazon sites broken by a website change."
tickets: [1090765]
improved recipes:
- Le Devoir
- Nin online
- countryfile
- Birmingham Post
- The Independent
- Various Polish news sources
new recipes:
- title: MobileBulgaria
author: Martin Tsanchev
- title: Various Polish news sources
author: fenuks
- version: 0.9.10
date: 2012-12-14
new features:
- title: "Drivers for Nextbook Premium 8 se, HTC Desire X and Emerson EM 543"
tickets: [1088149, 1088112, 1087978]
bug fixes:
- title: "Fix rich text delegate not working with Qt compiled in debug mode."
tickets: [1089011]
- title: "When deleting all books in the library, blank the book details panel"
- title: "Conversion: Fix malformed values in the bgcolor attribute causing conversion to abort"
- title: "Conversion: Fix heuristics applying incorrect style in some circumstances"
tickets: [1066507]
- title: "Possible fix for 64bit calibre not starting up on some Windows systems"
tickets: [1087816]
improved recipes:
- Sivil Dusunce
- Anchorage Daily News
- Le Monde
- Harpers
new recipes:
- title: Titanic
author: Krittika Goyal
- version: 0.9.9
date: 2012-12-07
new features:
- title: "64 bit build for windows"
type: major
description: "calibre now has a 64 bit version for windows, available at: http://calibre-ebook.com/download_windows64 The 64bit build is not limited to using only 3GB of RAM when converting large/complex documents. It may also be slightly faster for some tasks. You can have both the 32 bit and the 64 bit build installed at the same time, they will use the same libraries, plugins and settings."
- title: "Content server: Make the identifiers in each books metadata clickable."
tickets: [1085726]
bug fixes:
- title: "EPUB Input: Fix an infinite loop while trying to recover a damaged EPUB file."
tickets: [1086917]
- title: "KF8 Input: Fix handling of links in files that link to the obsolete <a name> tags instead of tags with an id attribute."
tickets: [1086705]
- title: "Conversion: Fix a bug in removal of invalid entries from the spine, where not all invalid entries were removed, causing conversion to fail."
tickets: [1086054]
- title: "KF8 Input: Ignore invalid flow references in the KF8 document instead of erroring out on them."
tickets: [1085306]
- title: "Fix command line output on linux systems with incorrect LANG/LC_TYPE env vars."
tickets: [1085103]
- title: "KF8 Input: Fix page breaks specified using the data-AmznPageBreak attribute being ignored by calibre."
- title: "PDF Output: Fix custom size field not accepting fractional numbers as sizes"
- title: "Get Books: Update libre.de and publio for website changes"
- title: "Wireless driver: Increase timeout interval, and when allocating a random port try 9090 first"
improved recipes:
- New York Times
- Weblogs SL
- Zaman Gazetesi
- Aksiyon Dergisi
- Endgadget
- Metro UK
- Heise Online
- version: 0.9.8 - version: 0.9.8
date: 2012-11-30 date: 2012-11-30

View File

@ -49,7 +49,7 @@ All the |app| python code is in the ``calibre`` package. This package contains t
* Metadata reading, writing, and downloading is all in ebooks.metadata * Metadata reading, writing, and downloading is all in ebooks.metadata
* Conversion happens in a pipeline, for the structure of the pipeline, * Conversion happens in a pipeline, for the structure of the pipeline,
see :ref:`conversion-introduction`. The pipeline consists of an input see :ref:`conversion-introduction`. The pipeline consists of an input
plugin, various transforms and an output plugin. The code constructs plugin, various transforms and an output plugin. The that code constructs
and drives the pipeline is in plumber.py. The pipeline works on a and drives the pipeline is in plumber.py. The pipeline works on a
representation of an ebook that is like an unzipped epub, with representation of an ebook that is like an unzipped epub, with
manifest, spine, toc, guide, html content, etc. The manifest, spine, toc, guide, html content, etc. The
@ -74,10 +74,6 @@ After installing Bazaar, you can get the |app| source code with the command::
On Windows you will need the complete path name, that will be something like :file:`C:\\Program Files\\Bazaar\\bzr.exe`. On Windows you will need the complete path name, that will be something like :file:`C:\\Program Files\\Bazaar\\bzr.exe`.
To update a branch to the latest code, use the command::
bzr merge
|app| is a very large project with a very long source control history, so the |app| is a very large project with a very long source control history, so the
above can take a while (10mins to an hour depending on your internet speed). above can take a while (10mins to an hour depending on your internet speed).
@ -88,6 +84,11 @@ using::
bzr branch --stacked lp:calibre bzr branch --stacked lp:calibre
To update a branch to the latest code, use the command::
bzr merge
Submitting your changes to be included Submitting your changes to be included
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

View File

@ -162,6 +162,8 @@ Follow these steps to find the problem:
* If you are connecting an Apple iDevice (iPad, iPod Touch, iPhone), use the 'Connect to iTunes' method in the 'Getting started' instructions in `Calibre + Apple iDevices: Start here <http://www.mobileread.com/forums/showthread.php?t=118559>`_. * If you are connecting an Apple iDevice (iPad, iPod Touch, iPhone), use the 'Connect to iTunes' method in the 'Getting started' instructions in `Calibre + Apple iDevices: Start here <http://www.mobileread.com/forums/showthread.php?t=118559>`_.
* Make sure you are running the latest version of |app|. The latest version can always be downloaded from `the calibre website <http://calibre-ebook.com/download>`_. * Make sure you are running the latest version of |app|. The latest version can always be downloaded from `the calibre website <http://calibre-ebook.com/download>`_.
* Ensure your operating system is seeing the device. That is, the device should show up in Windows Explorer (in Windows) or Finder (in OS X). * Ensure your operating system is seeing the device. That is, the device should show up in Windows Explorer (in Windows) or Finder (in OS X).
* In |app|, go to Preferences->Ignored Devices and check that your device
is not being ignored
* In |app|, go to Preferences->Plugins->Device Interface plugin and make sure the plugin for your device is enabled, the plugin icon next to it should be green when it is enabled. * In |app|, go to Preferences->Plugins->Device Interface plugin and make sure the plugin for your device is enabled, the plugin icon next to it should be green when it is enabled.
* If all the above steps fail, go to Preferences->Miscellaneous and click debug device detection with your device attached and post the output as a ticket on `the calibre bug tracker <http://bugs.calibre-ebook.com>`_. * If all the above steps fail, go to Preferences->Miscellaneous and click debug device detection with your device attached and post the output as a ticket on `the calibre bug tracker <http://bugs.calibre-ebook.com>`_.
@ -668,6 +670,9 @@ There are three possible things I know of, that can cause this:
the blacklist of programs inside RoboForm to fix this. Or uninstall the blacklist of programs inside RoboForm to fix this. Or uninstall
RoboForm. RoboForm.
* The Logitech SetPoint Settings application causes random crashes in
|app| when it is open. Close it before starting |app|.
|app| is not starting on OS X? |app| is not starting on OS X?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

View File

@ -9,11 +9,12 @@ class Adventure_zone(BasicNewsRecipe):
no_stylesheets = True no_stylesheets = True
oldest_article = 20 oldest_article = 20
max_articles_per_feed = 100 max_articles_per_feed = 100
cover_url = 'http://www.adventure-zone.info/inne/logoaz_2012.png'
index='http://www.adventure-zone.info/fusion/' index='http://www.adventure-zone.info/fusion/'
use_embedded_content=False use_embedded_content=False
preprocess_regexps = [(re.compile(r"<td class='capmain'>Komentarze</td>", re.IGNORECASE), lambda m: ''), preprocess_regexps = [(re.compile(r"<td class='capmain'>Komentarze</td>", re.IGNORECASE), lambda m: ''),
(re.compile(r'\<table .*?\>'), lambda match: ''), (re.compile(r'</?table.*?>'), lambda match: ''),
(re.compile(r'\<tbody\>'), lambda match: '')] (re.compile(r'</?tbody.*?>'), lambda match: '')]
remove_tags_before= dict(name='td', attrs={'class':'main-bg'}) remove_tags_before= dict(name='td', attrs={'class':'main-bg'})
remove_tags= [dict(name='img', attrs={'alt':'Drukuj'})] remove_tags= [dict(name='img', attrs={'alt':'Drukuj'})]
remove_tags_after= dict(id='comments') remove_tags_after= dict(id='comments')
@ -36,11 +37,11 @@ class Adventure_zone(BasicNewsRecipe):
return feeds return feeds
def get_cover_url(self): '''def get_cover_url(self):
soup = self.index_to_soup('http://www.adventure-zone.info/fusion/news.php') soup = self.index_to_soup('http://www.adventure-zone.info/fusion/news.php')
cover=soup.find(id='box_OstatninumerAZ') cover=soup.find(id='box_OstatninumerAZ')
self.cover_url='http://www.adventure-zone.info/fusion/'+ cover.center.a.img['src'] self.cover_url='http://www.adventure-zone.info/fusion/'+ cover.center.a.img['src']
return getattr(self, 'cover_url', self.cover_url) return getattr(self, 'cover_url', self.cover_url)'''
def skip_ad_pages(self, soup): def skip_ad_pages(self, soup):

View File

@ -5,14 +5,16 @@ class AdvancedUserRecipe1278347258(BasicNewsRecipe):
__author__ = 'rty' __author__ = 'rty'
oldest_article = 7 oldest_article = 7
max_articles_per_feed = 100 max_articles_per_feed = 100
auto_cleanup = True
feeds = [(u'Alaska News', u'http://www.adn.com/news/alaska/index.xml'), feeds = [(u'Alaska News', u'http://www.adn.com/news/alaska/index.xml'),
(u'Business', u'http://www.adn.com/money/index.xml'), (u'Business', u'http://www.adn.com/money/index.xml'),
(u'Sports', u'http://www.adn.com/sports/index.xml'), (u'Sports', u'http://www.adn.com/sports/index.xml'),
(u'Politics', u'http://www.adn.com/politics/index.xml'), (u'Politics', u'http://www.adn.com/politics/index.xml'),
(u'Lifestyles', u'http://www.adn.com/life/index.xml'), (u'Lifestyles', u'http://www.adn.com/life/index.xml'),
(u'Iditarod', u'http://www.adn.com/iditarod/index.xml') (u'Iditarod', u'http://www.adn.com/iditarod/index.xml')
] ]
description = ''''Alaska's Newspaper''' description = ''''Alaska's Newspaper'''
publisher = 'http://www.adn.com' publisher = 'http://www.adn.com'
category = 'news, Alaska, Anchorage' category = 'news, Alaska, Anchorage'
@ -28,13 +30,13 @@ class AdvancedUserRecipe1278347258(BasicNewsRecipe):
conversion_options = {'linearize_tables':True} conversion_options = {'linearize_tables':True}
masthead_url = 'http://media.adn.com/includes/assets/images/adn_logo.2.gif' masthead_url = 'http://media.adn.com/includes/assets/images/adn_logo.2.gif'
keep_only_tags = [ #keep_only_tags = [
dict(name='div', attrs={'class':'left_col story_mainbar'}), #dict(name='div', attrs={'class':'left_col story_mainbar'}),
] #]
remove_tags = [ #remove_tags = [
dict(name='div', attrs={'class':'story_tools'}), #dict(name='div', attrs={'class':'story_tools'}),
dict(name='p', attrs={'class':'ad_label'}), #dict(name='p', attrs={'class':'ad_label'}),
] #]
remove_tags_after = [ #remove_tags_after = [
dict(name='div', attrs={'class':'advertisement'}), #dict(name='div', attrs={'class':'advertisement'}),
] #]

View File

@ -3,11 +3,11 @@ from calibre.web.feeds.news import BasicNewsRecipe
class Android_com_pl(BasicNewsRecipe): class Android_com_pl(BasicNewsRecipe):
title = u'Android.com.pl' title = u'Android.com.pl'
__author__ = 'fenuks' __author__ = 'fenuks'
description = 'Android.com.pl - biggest polish Android site' description = u'Android.com.pl - to największe w Polsce centrum Android OS. Znajdziesz tu: nowości, forum, pomoc, recenzje, gry, aplikacje.'
category = 'Android, mobile' category = 'Android, mobile'
language = 'pl' language = 'pl'
use_embedded_content=True use_embedded_content=True
cover_url =u'http://upload.wikimedia.org/wikipedia/commons/thumb/d/d7/Android_robot.svg/220px-Android_robot.svg.png' cover_url =u'http://android.com.pl/wp-content/themes/android/images/logo.png'
oldest_article = 8 oldest_article = 8
max_articles_per_feed = 100 max_articles_per_feed = 100
feeds = [(u'Android', u'http://android.com.pl/component/content/frontpage/frontpage.feed?type=rss')] feeds = [(u'Android', u'http://android.com.pl/feed/')]

19
recipes/astroflesz.recipe Normal file
View File

@ -0,0 +1,19 @@
# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:fdm=marker:ai
from calibre.web.feeds.news import BasicNewsRecipe
class Astroflesz(BasicNewsRecipe):
title = u'Astroflesz'
oldest_article = 7
__author__ = 'fenuks'
description = u'astroflesz.pl - to portal poświęcony astronomii. Informuje zarówno o aktualnych wydarzeniach i odkryciach naukowych, jak również zapowiada ciekawe zjawiska astronomiczne'
category = 'astronomy'
language = 'pl'
cover_url = 'http://www.astroflesz.pl/templates/astroflesz/images/logo/logo.png'
ignore_duplicate_articles = {'title', 'url'}
max_articles_per_feed = 100
no_stylesheets = True
use_embedded_content = False
keep_only_tags = [dict(id="k2Container")]
remove_tags_after = dict(name='div', attrs={'class':'itemLinks'})
remove_tags = [dict(name='div', attrs={'class':['itemLinks', 'itemToolbar', 'itemRatingBlock']})]
feeds = [(u'Wszystkie', u'http://astroflesz.pl/?format=feed')]

View File

@ -1,9 +1,11 @@
from calibre.web.feeds.news import BasicNewsRecipe from calibre.web.feeds.news import BasicNewsRecipe
import re
import mechanize
class AdvancedUserRecipe1306097511(BasicNewsRecipe): class AdvancedUserRecipe1306097511(BasicNewsRecipe):
title = u'Birmingham post' title = u'Birmingham post'
description = 'Author D.Asbury. News for Birmingham UK' description = 'Author D.Asbury. News for Birmingham UK'
#timefmt = '' #timefmt = ''
# last update 8/9/12
__author__ = 'Dave Asbury' __author__ = 'Dave Asbury'
cover_url = 'http://profile.ak.fbcdn.net/hprofile-ak-snc4/161987_9010212100_2035706408_n.jpg' cover_url = 'http://profile.ak.fbcdn.net/hprofile-ak-snc4/161987_9010212100_2035706408_n.jpg'
oldest_article = 2 oldest_article = 2
@ -15,8 +17,30 @@ class AdvancedUserRecipe1306097511(BasicNewsRecipe):
#auto_cleanup = True #auto_cleanup = True
language = 'en_GB' language = 'en_GB'
cover_url = 'http://profile.ak.fbcdn.net/hprofile-ak-snc4/161987_9010212100_2035706408_n.jpg'
masthead_url = 'http://www.pressgazette.co.uk/Pictures/web/t/c/g/birmingham_post.jpg' masthead_url = 'http://www.trinitymirror.com/images/birminghampost-logo.gif'
def get_cover_url(self):
soup = self.index_to_soup('http://www.birminghampost.net')
# look for the block containing the sun button and url
cov = soup.find(attrs={'height' : re.compile('3'), 'alt' : re.compile('Birmingham Post')})
print
print '%%%%%%%%%%%%%%%',cov
print
cov2 = str(cov['src'])
# cov2=cov2[7:]
print '88888888 ',cov2,' 888888888888'
#cover_url=cov2
#return cover_url
br = mechanize.Browser()
br.set_handle_redirect(False)
try:
br.open_novisit(cov2)
cover_url = cov2
except:
cover_url = 'http://profile.ak.fbcdn.net/hprofile-ak-snc4/161987_9010212100_2035706408_n.jpg'
return cover_url
keep_only_tags = [ keep_only_tags = [

View File

@ -7,25 +7,30 @@ class AdvancedUserRecipe1325006965(BasicNewsRecipe):
#cover_url = 'http://www.countryfile.com/sites/default/files/imagecache/160px_wide/cover/2_1.jpg' #cover_url = 'http://www.countryfile.com/sites/default/files/imagecache/160px_wide/cover/2_1.jpg'
__author__ = 'Dave Asbury' __author__ = 'Dave Asbury'
description = 'The official website of Countryfile Magazine' description = 'The official website of Countryfile Magazine'
# last updated 7/10/12 # last updated 8/12/12
language = 'en_GB' language = 'en_GB'
oldest_article = 30 oldest_article = 30
max_articles_per_feed = 25 max_articles_per_feed = 25
remove_empty_feeds = True remove_empty_feeds = True
no_stylesheets = True no_stylesheets = True
auto_cleanup = True auto_cleanup = True
ignore_duplicate_articles = {'title', 'url'}
#articles_are_obfuscated = True #articles_are_obfuscated = True
ignore_duplicate_articles = {'title'} #article_already_exists = False
#feed_hash = ''
def get_cover_url(self): def get_cover_url(self):
soup = self.index_to_soup('http://www.countryfile.com/') soup = self.index_to_soup('http://www.countryfile.com/magazine')
cov = soup.find(attrs={'class' : re.compile('imagecache imagecache-250px_wide')})#'width' : '160',
print '&&&&&&&& ',cov,' ***'
cov=str(cov)
#cov2 = re.findall('http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', cov)
cov2 = re.findall('/(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', cov)
cov2 = str(cov2)
cov2= "http://www.countryfile.com"+cov2[2:len(cov2)-8]
cov = soup.find(attrs={'width' : '160', 'class' : re.compile('imagecache imagecache-160px_wide')})
print '******** ',cov,' ***'
cov2 = str(cov)
cov2=cov2[10:101]
print '******** ',cov2,' ***' print '******** ',cov2,' ***'
#cov2='http://www.countryfile.com/sites/default/files/imagecache/160px_wide/cover/1b_0.jpg' # try to get cover - if can't get known cover
# try to get cover - if can't get known cover
br = browser() br = browser()
br.set_handle_redirect(False) br.set_handle_redirect(False)
@ -45,5 +50,3 @@ class AdvancedUserRecipe1325006965(BasicNewsRecipe):
(u'Countryside', u'http://www.countryfile.com/rss/countryside'), (u'Countryside', u'http://www.countryfile.com/rss/countryside'),
] ]

View File

@ -0,0 +1,20 @@
# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:fdm=marker:ai
from calibre.web.feeds.news import BasicNewsRecipe
class CzasGentlemanow(BasicNewsRecipe):
title = u'Czas Gentlemanów'
__author__ = 'fenuks'
description = u'Historia mężczyzn z dala od wielkiej polityki'
category = 'blog'
language = 'pl'
cover_url = 'http://czasgentlemanow.pl/wp-content/uploads/2012/10/logo-Czas-Gentlemanow1.jpg'
ignore_duplicate_articles = {'title', 'url'}
oldest_article = 7
max_articles_per_feed = 100
no_stylesheets = True
remove_empty_feeds = True
use_embedded_content = False
keep_only_tags = [dict(name='div', attrs={'class':'content'})]
remove_tags = [dict(attrs={'class':'meta_comments'})]
remove_tags_after = dict(name='div', attrs={'class':'fblikebutton_button'})
feeds = [(u'M\u0119ski \u015awiat', u'http://czasgentlemanow.pl/category/meski-swiat/feed/'), (u'Styl', u'http://czasgentlemanow.pl/category/styl/feed/'), (u'Vademecum Gentlemana', u'http://czasgentlemanow.pl/category/vademecum/feed/'), (u'Dom i rodzina', u'http://czasgentlemanow.pl/category/dom-i-rodzina/feed/'), (u'Honor', u'http://czasgentlemanow.pl/category/honor/feed/'), (u'Gad\u017cety Gentlemana', u'http://czasgentlemanow.pl/category/gadzety-gentlemana/feed/')]

View File

@ -7,18 +7,64 @@ class Dzieje(BasicNewsRecipe):
cover_url = 'http://www.dzieje.pl/sites/default/files/dzieje_logo.png' cover_url = 'http://www.dzieje.pl/sites/default/files/dzieje_logo.png'
category = 'history' category = 'history'
language = 'pl' language = 'pl'
index='http://dzieje.pl' ignore_duplicate_articles = {'title', 'url'}
index = 'http://dzieje.pl'
oldest_article = 8 oldest_article = 8
max_articles_per_feed = 100 max_articles_per_feed = 100
remove_javascript=True remove_javascript=True
no_stylesheets= True no_stylesheets= True
keep_only_tags = [dict(name='h1', attrs={'class':'title'}), dict(id='content-area')] keep_only_tags = [dict(name='h1', attrs={'class':'title'}), dict(id='content-area')]
remove_tags = [dict(attrs={'class':'field field-type-computed field-field-tagi'}), dict(id='dogory')] remove_tags = [dict(attrs={'class':'field field-type-computed field-field-tagi'}), dict(id='dogory')]
feeds = [(u'Dzieje', u'http://dzieje.pl/rss.xml')] #feeds = [(u'Dzieje', u'http://dzieje.pl/rss.xml')]
def append_page(self, soup, appendtag):
tag = appendtag.find('li', attrs={'class':'pager-next'})
if tag:
while tag:
url = tag.a['href']
if not url.startswith('http'):
url = 'http://dzieje.pl'+tag.a['href']
soup2 = self.index_to_soup(url)
pagetext = soup2.find(id='content-area').find(attrs={'class':'content'})
for r in pagetext.findAll(attrs={'class':['fieldgroup group-groupkul', 'fieldgroup group-zdjeciekult', 'fieldgroup group-zdjecieciekaw', 'fieldgroup group-zdjecieksiazka', 'fieldgroup group-zdjeciedu', 'field field-type-filefield field-field-zdjecieglownawyd']}):
r.extract()
pos = len(appendtag.contents)
appendtag.insert(pos, pagetext)
tag = soup2.find('li', attrs={'class':'pager-next'})
for r in appendtag.findAll(attrs={'class':['item-list', 'field field-type-computed field-field-tagi', ]}):
r.extract()
def find_articles(self, url):
articles = []
soup=self.index_to_soup(url)
tag=soup.find(id='content-area').div.div
for i in tag.findAll('div', recursive=False):
temp = i.find(attrs={'class':'views-field-title'}).span.a
title = temp.string
url = self.index + temp['href']
date = '' #i.find(attrs={'class':'views-field-created'}).span.string
articles.append({'title' : title,
'url' : url,
'date' : date,
'description' : ''
})
return articles
def parse_index(self):
feeds = []
feeds.append((u"Wiadomości", self.find_articles('http://dzieje.pl/wiadomosci')))
feeds.append((u"Kultura i sztuka", self.find_articles('http://dzieje.pl/kulturaisztuka')))
feeds.append((u"Film", self.find_articles('http://dzieje.pl/kino')))
feeds.append((u"Rozmaitości historyczne", self.find_articles('http://dzieje.pl/rozmaitości')))
feeds.append((u"Książka", self.find_articles('http://dzieje.pl/ksiazka')))
feeds.append((u"Wystawa", self.find_articles('http://dzieje.pl/wystawa')))
feeds.append((u"Edukacja", self.find_articles('http://dzieje.pl/edukacja')))
feeds.append((u"Dzieje się", self.find_articles('http://dzieje.pl/wydarzenia')))
return feeds
def preprocess_html(self, soup): def preprocess_html(self, soup):
for a in soup('a'): for a in soup('a'):
if a.has_key('href') and 'http://' not in a['href'] and 'https://' not in a['href']: if a.has_key('href') and 'http://' not in a['href'] and 'https://' not in a['href']:
a['href']=self.index + a['href'] a['href']=self.index + a['href']
self.append_page(soup, soup.body)
return soup return soup

View File

@ -0,0 +1,24 @@
# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:fdm=marker:ai
from calibre.web.feeds.news import BasicNewsRecipe
import re
class EkologiaPl(BasicNewsRecipe):
title = u'Ekologia.pl'
__author__ = 'fenuks'
description = u'Portal ekologiczny - eko, ekologia, ochrona przyrody, ochrona środowiska, przyroda, środowisko online. Ekologia i ochrona środowiska. Ekologia dla dzieci.'
category = 'ecology'
language = 'pl'
cover_url = 'http://www.ekologia.pl/assets/images/logo/ekologia_pl_223x69.png'
ignore_duplicate_articles = {'title', 'url'}
extra_css = '.title {font-size: 200%;}'
oldest_article = 7
max_articles_per_feed = 100
no_stylesheets = True
remove_empty_feeds = True
use_embedded_content = False
remove_tags = [dict(attrs={'class':['ekoLogo', 'powrocArt', 'butonDrukuj']})]
feeds = [(u'Wiadomo\u015bci', u'http://www.ekologia.pl/rss/20,53,0'), (u'\u015arodowisko', u'http://www.ekologia.pl/rss/20,56,0'), (u'Styl \u017cycia', u'http://www.ekologia.pl/rss/20,55,0')]
def print_version(self, url):
id = re.search(r',(?P<id>\d+)\.html', url).group('id')
return 'http://drukuj.ekologia.pl/artykul/' + id

View File

@ -5,6 +5,7 @@ class AdvancedUserRecipe1341650280(BasicNewsRecipe):
title = u'Empire Magazine' title = u'Empire Magazine'
description = 'Author D.Asbury. Film articles from Empire Mag. ' description = 'Author D.Asbury. Film articles from Empire Mag. '
language = 'en'
__author__ = 'Dave Asbury' __author__ = 'Dave Asbury'
# last updated 7/7/12 # last updated 7/7/12
remove_empty_feeds = True remove_empty_feeds = True
@ -15,7 +16,7 @@ class AdvancedUserRecipe1341650280(BasicNewsRecipe):
cover_url = 'http://www.empireonline.com/images/magazine/cover.jpg' cover_url = 'http://www.empireonline.com/images/magazine/cover.jpg'
conversion_options = { conversion_options = {
'linearize_tables' : True, 'linearize_tables' : True,
} }
#auto_cleanup = True #auto_cleanup = True
preprocess_regexps = [ preprocess_regexps = [
(re.compile(r'<a href="http://twitter.com/share.*?</a>', re.IGNORECASE | re.DOTALL), lambda match: ''), (re.compile(r'<a href="http://twitter.com/share.*?</a>', re.IGNORECASE | re.DOTALL), lambda match: ''),
@ -32,20 +33,20 @@ class AdvancedUserRecipe1341650280(BasicNewsRecipe):
(re.compile(r'<!-- USER REVIEWS: START -->.*?<!-- USER REVIEWS: END -->', re.IGNORECASE | re.DOTALL), lambda match: '<!-- USER REVIEWS: START --><!-- USER REVIEWS: END -->'), (re.compile(r'<!-- USER REVIEWS: START -->.*?<!-- USER REVIEWS: END -->', re.IGNORECASE | re.DOTALL), lambda match: '<!-- USER REVIEWS: START --><!-- USER REVIEWS: END -->'),
(re.compile(r'Advertisement', re.IGNORECASE | re.DOTALL), lambda match: ''), (re.compile(r'Advertisement', re.IGNORECASE | re.DOTALL), lambda match: ''),
(re.compile(r'<a name="haveyoursay".*?now to have your say.', re.IGNORECASE | re.DOTALL), lambda match: ''), (re.compile(r'<a name="haveyoursay".*?now to have your say.', re.IGNORECASE | re.DOTALL), lambda match: ''),
] ]
keep_only_tags = [ keep_only_tags = [
# dict(name='h1'), # dict(name='h1'),
# dict(attrs={'class' : 'mediumblack'}), # dict(attrs={'class' : 'mediumblack'}),
] ]
remove_tags = [dict(name='td', attrs={'width':'200', 'valign' : 'top'}), remove_tags = [dict(name='td', attrs={'width':'200', 'valign' : 'top'}),
dict(name='b'), dict(name='b'),
dict(name='a',attrs={'name' : 'haveyoursay'}), dict(name='a',attrs={'name' : 'haveyoursay'}),
dict(attrs={'class' : 'newslink'}), dict(attrs={'class' : 'newslink'}),
] ]
feeds = [(u'News', u'http://feed43.com/7338478755673147.xml'), feeds = [(u'News', u'http://feed43.com/7338478755673147.xml'),
(u'Recent Features',u'http://feed43.com/4346347750304760.xml'), (u'Recent Features',u'http://feed43.com/4346347750304760.xml'),
(u'Interviews',u'http://feed43.com/3418350077724081.xml'), (u'Interviews',u'http://feed43.com/3418350077724081.xml'),
(u'Film Reviews',u'http://feed43.com/2643703076510627.xml'), (u'Film Reviews',u'http://feed43.com/2643703076510627.xml'),
] ]

View File

@ -0,0 +1,19 @@
# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:fdm=marker:ai
from calibre.web.feeds.news import BasicNewsRecipe
import re
class FilmOrgPl(BasicNewsRecipe):
title = u'Film.org.pl'
__author__ = 'fenuks'
description = u"Recenzje, analizy, artykuły, rankingi - wszystko o filmie dla miłośników kina. Opisy efektów specjalnych, wersji reżyserskich, remake'ów, sequeli. No i forum filmowe. Jedne z największych w Polsce."
category = 'film'
language = 'pl'
cover_url = 'http://film.org.pl/wp-content/themes/KMF/images/logo_kmf10.png'
ignore_duplicate_articles = {'title', 'url'}
oldest_article = 7
max_articles_per_feed = 100
no_stylesheets = True
remove_empty_feeds = True
use_embedded_content = True
preprocess_regexps = [(re.compile(ur'<h3>Przeczytaj także:</h3>.*', re.IGNORECASE|re.DOTALL), lambda m: '</body>'), (re.compile(ur'<div>Artykuł</div>', re.IGNORECASE), lambda m: ''), (re.compile(ur'<div>Ludzie filmu</div>', re.IGNORECASE), lambda m: '')]
remove_tags = [dict(name='img', attrs={'alt':['Ludzie filmu', u'Artykuł']})]
feeds = [(u'Recenzje', u'http://film.org.pl/r/recenzje/feed/'), (u'Artyku\u0142', u'http://film.org.pl/a/artykul/feed/'), (u'Analiza', u'http://film.org.pl/a/analiza/feed/'), (u'Ranking', u'http://film.org.pl/a/ranking/feed/'), (u'Blog', u'http://film.org.pl/kmf/blog/feed/'), (u'Ludzie', u'http://film.org.pl/a/ludzie/feed/'), (u'Seriale', u'http://film.org.pl/a/seriale/feed/'), (u'Oceanarium', u'http://film.org.pl/a/ocenarium/feed/'), (u'VHS', u'http://film.org.pl/a/vhs-a/feed/')]

View File

@ -17,6 +17,7 @@ class FilmWebPl(BasicNewsRecipe):
preprocess_regexps = [(re.compile(u'\(kliknij\,\ aby powiększyć\)', re.IGNORECASE), lambda m: ''), ]#(re.compile(ur' | ', re.IGNORECASE), lambda m: '')] preprocess_regexps = [(re.compile(u'\(kliknij\,\ aby powiększyć\)', re.IGNORECASE), lambda m: ''), ]#(re.compile(ur' | ', re.IGNORECASE), lambda m: '')]
extra_css = '.hdrBig {font-size:22px;} ul {list-style-type:none; padding: 0; margin: 0;}' extra_css = '.hdrBig {font-size:22px;} ul {list-style-type:none; padding: 0; margin: 0;}'
remove_tags= [dict(name='div', attrs={'class':['recommendOthers']}), dict(name='ul', attrs={'class':'fontSizeSet'}), dict(attrs={'class':'userSurname anno'})] remove_tags= [dict(name='div', attrs={'class':['recommendOthers']}), dict(name='ul', attrs={'class':'fontSizeSet'}), dict(attrs={'class':'userSurname anno'})]
remove_attributes = ['style',]
keep_only_tags= [dict(name='h1', attrs={'class':['hdrBig', 'hdrEntity']}), dict(name='div', attrs={'class':['newsInfo', 'newsInfoSmall', 'reviewContent description']})] keep_only_tags= [dict(name='h1', attrs={'class':['hdrBig', 'hdrEntity']}), dict(name='div', attrs={'class':['newsInfo', 'newsInfoSmall', 'reviewContent description']})]
feeds = [(u'News / Filmy w produkcji', 'http://www.filmweb.pl/feed/news/category/filminproduction'), feeds = [(u'News / Filmy w produkcji', 'http://www.filmweb.pl/feed/news/category/filminproduction'),
(u'News / Festiwale, nagrody i przeglądy', u'http://www.filmweb.pl/feed/news/category/festival'), (u'News / Festiwale, nagrody i przeglądy', u'http://www.filmweb.pl/feed/news/category/festival'),
@ -50,4 +51,9 @@ class FilmWebPl(BasicNewsRecipe):
for i in soup.findAll('sup'): for i in soup.findAll('sup'):
if not i.string or i.string.startswith('(kliknij'): if not i.string or i.string.startswith('(kliknij'):
i.extract() i.extract()
tag = soup.find(name='ul', attrs={'class':'inline sep-line'})
if tag:
tag.name = 'div'
for t in tag.findAll('li'):
t.name = 'div'
return soup return soup

View File

@ -4,9 +4,10 @@ import re
class Gildia(BasicNewsRecipe): class Gildia(BasicNewsRecipe):
title = u'Gildia.pl' title = u'Gildia.pl'
__author__ = 'fenuks' __author__ = 'fenuks'
description = 'Gildia - cultural site' description = u'Fantastyczny Portal Kulturalny - newsy, recenzje, galerie, wywiady. Literatura, film, gry komputerowe i planszowe, komiks, RPG, sklep. Nie lekceważ potęgi wyobraźni!'
cover_url = 'http://www.film.gildia.pl/_n_/portal/redakcja/logo/logo-gildia.pl-500.jpg' cover_url = 'http://www.film.gildia.pl/_n_/portal/redakcja/logo/logo-gildia.pl-500.jpg'
category = 'culture' category = 'culture'
cover_url = 'http://gildia.pl/images/logo-main.png'
language = 'pl' language = 'pl'
oldest_article = 8 oldest_article = 8
max_articles_per_feed = 100 max_articles_per_feed = 100
@ -23,10 +24,13 @@ class Gildia(BasicNewsRecipe):
content = soup.find('div', attrs={'class':'news'}) content = soup.find('div', attrs={'class':'news'})
if 'recenzj' in soup.title.string.lower(): if 'recenzj' in soup.title.string.lower():
for link in content.findAll(name='a'): for link in content.findAll(name='a'):
if 'recenzj' in link['href']: if 'recenzj' in link['href'] or 'muzyka/plyty' in link['href']:
self.log.warn('odnosnik')
self.log.warn(link['href'])
return self.index_to_soup(link['href'], raw=True) return self.index_to_soup(link['href'], raw=True)
if 'fragmen' in soup.title.string.lower():
for link in content.findAll(name='a'):
if 'fragment' in link['href']:
return self.index_to_soup(link['href'], raw=True)
def preprocess_html(self, soup): def preprocess_html(self, soup):
for a in soup('a'): for a in soup('a'):

View File

@ -1,19 +1,20 @@
from calibre.web.feeds.news import BasicNewsRecipe from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup
class Gram_pl(BasicNewsRecipe): class Gram_pl(BasicNewsRecipe):
title = u'Gram.pl' title = u'Gram.pl'
__author__ = 'fenuks' __author__ = 'fenuks'
description = 'Gram.pl - site about computer games' description = u'Serwis społecznościowy o grach: recenzje, newsy, zapowiedzi, encyklopedia gier, forum. Gry PC, PS3, X360, PS Vita, sprzęt dla graczy.'
category = 'games' category = 'games'
language = 'pl' language = 'pl'
oldest_article = 8 oldest_article = 8
index='http://www.gram.pl' index='http://www.gram.pl'
max_articles_per_feed = 100 max_articles_per_feed = 100
ignore_duplicate_articles = {'title', 'url'}
no_stylesheets= True no_stylesheets= True
extra_css = 'h2 {font-style: italic; font-size:20px;} .picbox div {float: left;}' #extra_css = 'h2 {font-style: italic; font-size:20px;} .picbox div {float: left;}'
cover_url=u'http://www.gram.pl/www/01/img/grampl_zima.png' cover_url=u'http://www.gram.pl/www/01/img/grampl_zima.png'
remove_tags= [dict(name='p', attrs={'class':['extraText', 'must-log-in']}), dict(attrs={'class':['el', 'headline', 'post-info', 'entry-footer clearfix']}), dict(name='div', attrs={'class':['twojaOcena', 'comment-body', 'comment-author vcard', 'comment-meta commentmetadata', 'tw_button', 'entry-comment-counter', 'snap_nopreview sharing robots-nocontent', 'sharedaddy sd-sharing-enabled']}), dict(id=['igit_rpwt_css', 'comments', 'reply-title', 'igit_title'])] keep_only_tags= [dict(id='articleModule')]
keep_only_tags= [dict(name='div', attrs={'class':['main', 'arkh-postmetadataheader', 'arkh-postcontent', 'post', 'content', 'news_header', 'news_subheader', 'news_text']}), dict(attrs={'class':['contentheading', 'contentpaneopen']}), dict(name='article')] remove_tags = [dict(attrs={'class':['breadCrump', 'dymek', 'articleFooter']})]
feeds = [(u'Informacje', u'http://www.gram.pl/feed_news.asp'), feeds = [(u'Informacje', u'http://www.gram.pl/feed_news.asp'),
(u'Publikacje', u'http://www.gram.pl/feed_news.asp?type=articles'), (u'Publikacje', u'http://www.gram.pl/feed_news.asp?type=articles'),
(u'Kolektyw- Indie Games', u'http://indie.gram.pl/feed/'), (u'Kolektyw- Indie Games', u'http://indie.gram.pl/feed/'),
@ -28,35 +29,21 @@ class Gram_pl(BasicNewsRecipe):
feed.articles.remove(article) feed.articles.remove(article)
return feeds return feeds
def append_page(self, soup, appendtag):
nexturl = appendtag.find('a', attrs={'class':'cpn'})
while nexturl:
soup2 = self.index_to_soup('http://www.gram.pl'+ nexturl['href'])
r=appendtag.find(id='pgbox')
if r:
r.extract()
pagetext = soup2.find(attrs={'class':'main'})
r=pagetext.find('h1')
if r:
r.extract()
r=pagetext.find('h2')
if r:
r.extract()
for r in pagetext.findAll('script'):
r.extract()
pos = len(appendtag.contents)
appendtag.insert(pos, pagetext)
nexturl = appendtag.find('a', attrs={'class':'cpn'})
r=appendtag.find(id='pgbox')
if r:
r.extract()
def preprocess_html(self, soup): def preprocess_html(self, soup):
self.append_page(soup, soup.body) tag=soup.find(name='div', attrs={'class':'summary'})
tag=soup.findAll(name='div', attrs={'class':'picbox'}) if tag:
for t in tag: tag.find(attrs={'class':'pros'}).insert(0, BeautifulSoup('<h2>Plusy:</h2>').h2)
t['style']='float: left;' tag.find(attrs={'class':'cons'}).insert(0, BeautifulSoup('<h2>Minusy:</h2>').h2)
tag = soup.find(name='section', attrs={'class':'cenzurka'})
if tag:
rate = tag.p.img['data-ocena']
tag.p.img.extract()
tag.p.insert(len(tag.p.contents)-2, BeautifulSoup('<h2>Ocena: {0}</h2>'.format(rate)).h2)
for a in soup('a'): for a in soup('a'):
if a.has_key('href') and 'http://' not in a['href'] and 'https://' not in a['href']: if a.has_key('href') and 'http://' not in a['href'] and 'https://' not in a['href']:
a['href']=self.index + a['href'] a['href']=self.index + a['href']
tag=soup.find(name='span', attrs={'class':'platforma'})
if tag:
tag.name = 'p'
return soup return soup

View File

@ -1,5 +1,5 @@
__license__ = 'GPL v3' __license__ = 'GPL v3'
__copyright__ = '2008-2010, Darko Miletic <darko.miletic at gmail.com>' __copyright__ = '2008-2012, Darko Miletic <darko.miletic at gmail.com>'
''' '''
harpers.org harpers.org
''' '''
@ -16,6 +16,7 @@ class Harpers(BasicNewsRecipe):
max_articles_per_feed = 100 max_articles_per_feed = 100
no_stylesheets = True no_stylesheets = True
use_embedded_content = False use_embedded_content = False
masthead_url = 'http://harpers.org/wp-content/themes/harpers/images/pheader.gif'
conversion_options = { conversion_options = {
'comment' : description 'comment' : description
@ -31,27 +32,9 @@ class Harpers(BasicNewsRecipe):
.caption{font-family:Verdana,sans-serif;font-size:x-small;color:#666666;} .caption{font-family:Verdana,sans-serif;font-size:x-small;color:#666666;}
''' '''
keep_only_tags = [ dict(name='div', attrs={'id':'cached'}) ] keep_only_tags = [ dict(name='div', attrs={'class':['postdetailFull', 'articlePost']}) ]
remove_tags = [ remove_tags = [dict(name=['link','object','embed','meta','base'])]
dict(name='table', attrs={'class':['rcnt','rcnt topline']})
,dict(name=['link','object','embed','meta','base'])
]
remove_attributes = ['width','height'] remove_attributes = ['width','height']
feeds = [(u"Harper's Magazine", u'http://www.harpers.org/rss/frontpage-rss20.xml')] feeds = [(u"Harper's Magazine", u'http://harpers.org/feed/')]
def get_cover_url(self):
cover_url = None
index = 'http://harpers.org/'
soup = self.index_to_soup(index)
link_item = soup.find(name = 'img',attrs= {'class':"cover"})
if link_item:
cover_url = 'http://harpers.org' + link_item['src']
return cover_url
def preprocess_html(self, soup):
for item in soup.findAll(style=True):
del item['style']
for item in soup.findAll(xmlns=True):
del item['xmlns']
return soup

View File

@ -1,18 +1,22 @@
__license__ = 'GPL v3' __license__ = 'GPL v3'
__copyright__ = '2008-2010, Darko Miletic <darko.miletic at gmail.com>' __copyright__ = '2008-2012, Darko Miletic <darko.miletic at gmail.com>'
''' '''
harpers.org - paid subscription/ printed issue articles harpers.org - paid subscription/ printed issue articles
This recipe only get's article's published in text format This recipe only get's article's published in text format
images and pdf's are ignored images and pdf's are ignored
If you have institutional subscription based on access IP you do not need to enter
anything in username/password fields
''' '''
import time, re
import urllib
from calibre import strftime from calibre import strftime
from calibre.web.feeds.news import BasicNewsRecipe from calibre.web.feeds.news import BasicNewsRecipe
class Harpers_full(BasicNewsRecipe): class Harpers_full(BasicNewsRecipe):
title = "Harper's Magazine - articles from printed edition" title = "Harper's Magazine - articles from printed edition"
__author__ = 'Darko Miletic' __author__ = 'Darko Miletic'
description = "Harper's Magazine: Founded June 1850." description = "Harper's Magazine, the oldest general-interest monthly in America, explores the issues that drive our national conversation, through long-form narrative journalism and essays, and such celebrated features as the iconic Harper's Index."
publisher = "Harpers's" publisher = "Harpers's"
category = 'news, politics, USA' category = 'news, politics, USA'
oldest_article = 30 oldest_article = 30
@ -21,52 +25,86 @@ class Harpers_full(BasicNewsRecipe):
use_embedded_content = False use_embedded_content = False
delay = 1 delay = 1
language = 'en' language = 'en'
needs_subscription = True encoding = 'utf8'
masthead_url = 'http://www.harpers.org/media/image/Harpers_305x100.gif' needs_subscription = 'optional'
publication_type = 'magazine' masthead_url = 'http://harpers.org/wp-content/themes/harpers/images/pheader.gif'
INDEX = strftime('http://www.harpers.org/archive/%Y/%m') publication_type = 'magazine'
LOGIN = 'http://www.harpers.org' LOGIN = 'http://harpers.org/wp-content/themes/harpers/ajax_login.php'
cover_url = strftime('http://www.harpers.org/media/pages/%Y/%m/gif/0001.gif') extra_css = """
extra_css = ' body{font-family: "Georgia",serif} ' body{font-family: adobe-caslon-pro,serif}
.category{font-size: small}
.articlePost p:first-letter{display: inline; font-size: xx-large; font-weight: bold}
"""
conversion_options = { conversion_options = {
'comment' : description 'comment' : description
, 'tags' : category , 'tags' : category
, 'publisher' : publisher , 'publisher' : publisher
, 'language' : language , 'language' : language
} }
keep_only_tags = [ dict(name='div', attrs={'id':'cached'}) ] keep_only_tags = [ dict(name='div', attrs={'class':['postdetailFull','articlePost']}) ]
remove_tags = [ remove_tags = [
dict(name='table', attrs={'class':['rcnt','rcnt topline']}) dict(name='div', attrs={'class':'fRight rightDivPad'})
,dict(name='link') ,dict(name=['link','meta','object','embed','iframe'])
] ]
remove_attributes=['xmlns'] remove_attributes=['xmlns']
def get_browser(self): def get_browser(self):
br = BasicNewsRecipe.get_browser() br = BasicNewsRecipe.get_browser()
br.open('http://harpers.org/')
if self.username is not None and self.password is not None: if self.username is not None and self.password is not None:
br.open(self.LOGIN) tt = time.localtime()*1000
br.select_form(nr=1) data = urllib.urlencode({ 'm':self.username
br['handle' ] = self.username ,'p':self.password
br['password'] = self.password ,'rt':'http://harpers.org/'
br.submit() ,'tt':tt
})
br.open(self.LOGIN, data)
return br return br
def parse_index(self): def parse_index(self):
#find current issue
soup = self.index_to_soup('http://harpers.org/')
currentIssue=soup.find('div',attrs={'class':'mainNavi'}).find('li',attrs={'class':'curentIssue'})
currentIssue_url=self.tag_to_string(currentIssue.a['href'])
self.log(currentIssue_url)
#go to the current issue
soup1 = self.index_to_soup(currentIssue_url)
date = re.split('\s\|\s',self.tag_to_string(soup1.head.title.string))[0]
self.timefmt = u' [%s]'%date
#get cover
coverurl='http://harpers.org/wp-content/themes/harpers/ajax_microfiche.php?img=harpers-'+re.split('harpers.org/',currentIssue_url)[1]+'gif/0001.gif'
soup2 = self.index_to_soup(coverurl)
self.cover_url = self.tag_to_string(soup2.find('img')['src'])
self.log(self.cover_url)
articles = [] articles = []
print 'Processing ' + self.INDEX count = 0
soup = self.index_to_soup(self.INDEX) for item in soup1.findAll('div', attrs={'class':'articleData'}):
for item in soup.findAll('div', attrs={'class':'title'}): text_links = item.findAll('h2')
text_link = item.parent.find('img',attrs={'alt':'Text'}) for text_link in text_links:
if text_link: if count == 0:
url = self.LOGIN + item.a['href'] count = 1
title = item.a.contents[0] else:
date = strftime(' %B %Y') url = text_link.a['href']
articles.append({ title = text_link.a.contents[0]
'title' :title date = strftime(' %B %Y')
,'date' :date articles.append({
,'url' :url 'title' :title
,'description':'' ,'date' :date
}) ,'url' :url
return [(soup.head.title.string, articles)] ,'description':''
})
return [(soup1.head.title.string, articles)]
def print_version(self, url):
return url + '?single=1'
def cleanup(self):
soup = self.index_to_soup('http://harpers.org/')
signouturl=self.tag_to_string(soup.find('li', attrs={'class':'subLogOut'}).findNext('li').a['href'])
self.log(signouturl)
self.browser.open(signouturl)

View File

@ -15,23 +15,12 @@ class AdvancedUserRecipe(BasicNewsRecipe):
timeout = 5 timeout = 5
no_stylesheets = True no_stylesheets = True
keep_only_tags = [dict(name='div', attrs={'id':'mitte_news'}),
dict(name='h1', attrs={'class':'clear'}),
dict(name='div', attrs={'class':'meldung_wrapper'})]
remove_tags_after = dict(name ='p', attrs={'class':'editor'})
remove_tags = [dict(id='navi_top_container'), remove_tags = [dict(id='navi_top_container'),
dict(id='navi_bottom'), dict(name='p', attrs={'class':'size80'})]
dict(id='mitte_rechts'),
dict(id='navigation'),
dict(id='subnavi'),
dict(id='social_bookmarks'),
dict(id='permalink'),
dict(id='content_foren'),
dict(id='seiten_navi'),
dict(id='adbottom'),
dict(id='sitemap'),
dict(name='div', attrs={'id':'sitemap'}),
dict(name='ul', attrs={'class':'erste_zeile'}),
dict(name='ul', attrs={'class':'zweite_zeile'}),
dict(name='div', attrs={'class':'navi_top_container'})]
feeds = [ feeds = [
('Newsticker', 'http://www.heise.de/newsticker/heise.rdf'), ('Newsticker', 'http://www.heise.de/newsticker/heise.rdf'),
@ -54,5 +43,3 @@ class AdvancedUserRecipe(BasicNewsRecipe):
def print_version(self, url): def print_version(self, url):
return url + '?view=print' return url + '?view=print'

View File

@ -16,10 +16,14 @@ class TheHindu(BasicNewsRecipe):
keep_only_tags = [dict(id='content')] keep_only_tags = [dict(id='content')]
remove_tags = [dict(attrs={'class':['article-links', 'breadcr']}), remove_tags = [dict(attrs={'class':['article-links', 'breadcr']}),
dict(id=['email-section', 'right-column', 'printfooter'])] dict(id=['email-section', 'right-column', 'printfooter', 'topover',
'slidebox', 'th_footer'])]
extra_css = '.photo-caption { font-size: smaller }' extra_css = '.photo-caption { font-size: smaller }'
def preprocess_raw_html(self, raw, url):
return raw.replace('<body><p>', '<p>').replace('</p></body>', '</p>')
def postprocess_html(self, soup, first_fetch): def postprocess_html(self, soup, first_fetch):
for t in soup.findAll(['table', 'tr', 'td','center']): for t in soup.findAll(['table', 'tr', 'td','center']):
t.name = 'div' t.name = 'div'

View File

@ -3,7 +3,7 @@ from calibre.web.feeds.news import BasicNewsRecipe
class Historia_org_pl(BasicNewsRecipe): class Historia_org_pl(BasicNewsRecipe):
title = u'Historia.org.pl' title = u'Historia.org.pl'
__author__ = 'fenuks' __author__ = 'fenuks'
description = u'history site' description = u'Artykuły dotyczące historii w układzie epok i tematów, forum. Najlepsza strona historii. Matura z historii i egzamin gimnazjalny z historii.'
cover_url = 'http://lh3.googleusercontent.com/_QeRQus12wGg/TOvHsZ2GN7I/AAAAAAAAD_o/LY1JZDnq7ro/logo5.jpg' cover_url = 'http://lh3.googleusercontent.com/_QeRQus12wGg/TOvHsZ2GN7I/AAAAAAAAD_o/LY1JZDnq7ro/logo5.jpg'
category = 'history' category = 'history'
language = 'pl' language = 'pl'
@ -12,16 +12,15 @@ class Historia_org_pl(BasicNewsRecipe):
no_stylesheets = True no_stylesheets = True
use_embedded_content = True use_embedded_content = True
max_articles_per_feed = 100 max_articles_per_feed = 100
ignore_duplicate_articles = {'title', 'url'}
feeds = [(u'Wszystkie', u'http://www.historia.org.pl/index.php?format=feed&type=atom'),
(u'Wiadomości', u'http://www.historia.org.pl/index.php/wiadomosci.feed?type=atom'), feeds = [(u'Wszystkie', u'http://historia.org.pl/feed/'),
(u'Publikacje', u'http://www.historia.org.pl/index.php/publikacje.feed?type=atom'), (u'Wiadomości', u'http://historia.org.pl/Kategoria/wiadomosci/feed/'),
(u'Publicystyka', u'http://www.historia.org.pl/index.php/publicystyka.feed?type=atom'), (u'Publikacje', u'http://historia.org.pl/Kategoria/artykuly/feed/'),
(u'Recenzje', u'http://historia.org.pl/index.php/recenzje.feed?type=atom'), (u'Publicystyka', u'http://historia.org.pl/Kategoria/publicystyka/feed/'),
(u'Kultura i sztuka', u'http://www.historia.org.pl/index.php/kultura-i-sztuka.feed?type=atom'), (u'Recenzje', u'http://historia.org.pl/Kategoria/recenzje/feed/'),
(u'Rekonstykcje', u'http://www.historia.org.pl/index.php/rekonstrukcje.feed?type=atom'), (u'Projekty', u'http://historia.org.pl/Kategoria/projekty/feed/'),]
(u'Projekty', u'http://www.historia.org.pl/index.php/projekty.feed?type=atom'),
(u'Konkursy'), (u'http://www.historia.org.pl/index.php/konkursy.feed?type=atom')]
def print_version(self, url): def print_version(self, url):

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.1 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 24 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 702 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 350 B

BIN
recipes/icons/tvp_info.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 329 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 412 B

View File

@ -47,9 +47,10 @@ class TheIndependentNew(BasicNewsRecipe):
dict(name='img',attrs={'alt' : ['Get Adobe Flash player']}), dict(name='img',attrs={'alt' : ['Get Adobe Flash player']}),
dict(name='img',attrs={'alt' : ['view gallery']}), dict(name='img',attrs={'alt' : ['view gallery']}),
dict(attrs={'style' : re.compile('.*')}), dict(attrs={'style' : re.compile('.*')}),
dict(attrs={'class':lambda x: x and 'voicesRelatedTopics' in x.split()}),
] ]
keep_only_tags =[dict(attrs={'id':'main'})] keep_only_tags =[dict(attrs={'id':['main','top']})]
recursions = 0 recursions = 0
# fixes non compliant html nesting and 'marks' article graphics links # fixes non compliant html nesting and 'marks' article graphics links
@ -69,7 +70,7 @@ class TheIndependentNew(BasicNewsRecipe):
} }
extra_css = """ extra_css = """
h1{font-family: Georgia,serif } h1{font-family: Georgia,serif ; font-size: x-large; }
body{font-family: Verdana,Arial,Helvetica,sans-serif} body{font-family: Verdana,Arial,Helvetica,sans-serif}
img{margin-bottom: 0.4em; display:block} img{margin-bottom: 0.4em; display:block}
.starRating img {float: left} .starRating img {float: left}
@ -77,16 +78,21 @@ class TheIndependentNew(BasicNewsRecipe):
.image {clear:left; font-size: x-small; color:#888888;} .image {clear:left; font-size: x-small; color:#888888;}
.articleByTimeLocation {font-size: x-small; color:#888888; .articleByTimeLocation {font-size: x-small; color:#888888;
margin-bottom:0.2em ; margin-top:0.2em ; display:block} margin-bottom:0.2em ; margin-top:0.2em ; display:block}
.subtitle {clear:left} .subtitle {clear:left ;}
.column-1 h1 { color: #191919} .column-1 h1 { color: #191919}
.column-1 h2 { color: #333333} .column-1 h2 { color: #333333}
.column-1 h3 { color: #444444} .column-1 h3 { color: #444444}
.column-1 p { color: #777777} .subtitle { color: #777777; font-size: medium;}
.column-1 p,a,h1,h2,h3 { margin: 0; } .column-1 a,h1,h2,h3 { margin: 0; }
.column-1 div{color:#888888; margin: 0;} .column-1 div{margin: 0;}
.articleContent {display: block; clear:left;} .articleContent {display: block; clear:left;}
.articleContent {color: #000000; font-size: medium;}
.ivDrip-section {color: #000000; font-size: medium;}
.datetime {color: #888888}
.title {font-weight:bold;}
.storyTop{} .storyTop{}
.pictureContainer img { max-width: 400px; max-height: 400px;} .pictureContainer img { max-width: 400px; max-height: 400px;}
.image img { max-width: 400px; max-height: 400px;}
""" """
oldest_article = 1 oldest_article = 1
@ -325,6 +331,20 @@ class TheIndependentNew(BasicNewsRecipe):
item.contents[0] = '' item.contents[0] = ''
def postprocess_html(self,soup, first_fetch): def postprocess_html(self,soup, first_fetch):
#mark subtitle parent as non-compliant nesting causes
# p's to be 'popped out' of the h3 tag they are nested in.
subtitle = soup.find('h3', attrs={'class' : 'subtitle'})
subtitle_div = None
if subtitle:
subtitle_div = subtitle.parent
if subtitle_div:
clazz = ''
if 'class' in subtitle_div:
clazz = subtitle_div['class'] + ' '
clazz = clazz + 'subtitle'
subtitle_div['class'] = clazz
#find broken images and remove captions #find broken images and remove captions
items_to_extract = [] items_to_extract = []
for item in soup.findAll('div', attrs={'class' : 'image'}): for item in soup.findAll('div', attrs={'class' : 'image'}):
@ -501,6 +521,9 @@ class TheIndependentNew(BasicNewsRecipe):
), ),
(u'Opinion', (u'Opinion',
u'http://www.independent.co.uk/opinion/?service=rss'), u'http://www.independent.co.uk/opinion/?service=rss'),
(u'Voices',
u'http://www.independent.co.uk/voices/?service=rss'
),
(u'Environment', (u'Environment',
u'http://www.independent.co.uk/environment/?service=rss'), u'http://www.independent.co.uk/environment/?service=rss'),
(u'Sport - Athletics', (u'Sport - Athletics',

View File

@ -9,6 +9,21 @@ class Kosmonauta(BasicNewsRecipe):
language = 'pl' language = 'pl'
cover_url='http://bi.gazeta.pl/im/4/10393/z10393414X,Kosmonauta-net.jpg' cover_url='http://bi.gazeta.pl/im/4/10393/z10393414X,Kosmonauta-net.jpg'
no_stylesheets = True no_stylesheets = True
INDEX = 'http://www.kosmonauta.net'
oldest_article = 7 oldest_article = 7
no_stylesheets = True
max_articles_per_feed = 100 max_articles_per_feed = 100
feeds = [(u'Kosmonauta.net', u'http://www.kosmonauta.net/index.php/feed/rss.html')] keep_only_tags = [dict(name='div', attrs={'class':'item-page'})]
remove_tags = [dict(attrs={'class':['article-tools clearfix', 'cedtag', 'nav clearfix', 'jwDisqusForm']})]
remove_tags_after = dict(name='div', attrs={'class':'cedtag'})
feeds = [(u'Kosmonauta.net', u'http://www.kosmonauta.net/?format=feed&type=atom')]
def preprocess_html(self, soup):
for a in soup.findAll(name='a'):
if a.has_key('href'):
href = a['href']
if not href.startswith('http'):
a['href'] = self.INDEX + href
print '%%%%%%%%%%%%%%%%%%%%%%%%%', a['href']
return soup

View File

@ -1,15 +1,16 @@
from calibre.web.feeds.news import BasicNewsRecipe from calibre.web.feeds.news import BasicNewsRecipe
import re import re
class Ksiazka_net_pl(BasicNewsRecipe): class Ksiazka_net_pl(BasicNewsRecipe):
title = u'ksiazka.net.pl' title = u'książka.net.pl'
__author__ = 'fenuks' __author__ = 'fenuks'
description = u'Ksiazka.net.pl - book vortal' description = u'Portal Księgarski - tematyczny serwis o książkach. Wydarzenia z rynku księgarsko-wydawniczego, nowości, zapowiedzi, bestsellery, setki recenzji. Niezbędne informacje dla każdego miłośnika książek, księgarza, bibliotekarza i wydawcy.'
cover_url = 'http://www.ksiazka.net.pl/fileadmin/templates/ksiazka.net.pl/images/1PortalKsiegarski-logo.jpg' cover_url = 'http://www.ksiazka.net.pl/fileadmin/templates/ksiazka.net.pl/images/1PortalKsiegarski-logo.jpg'
category = 'books' category = 'books'
language = 'pl' language = 'pl'
oldest_article = 8 oldest_article = 8
max_articles_per_feed = 100 max_articles_per_feed = 100
no_stylesheets= True no_stylesheets= True
remove_empty_feeds = True
#extra_css = 'img {float: right;}' #extra_css = 'img {float: right;}'
preprocess_regexps = [(re.compile(ur'Podoba mi się, kupuję:'), lambda match: '<br />')] preprocess_regexps = [(re.compile(ur'Podoba mi się, kupuję:'), lambda match: '<br />')]
remove_tags_before= dict(name='div', attrs={'class':'m-body'}) remove_tags_before= dict(name='div', attrs={'class':'m-body'})

View File

@ -2,7 +2,7 @@
__license__ = 'GPL v3' __license__ = 'GPL v3'
__author__ = 'Gabriele Marini, based on Darko Miletic' __author__ = 'Gabriele Marini, based on Darko Miletic'
__copyright__ = '2009, Darko Miletic <darko.miletic at gmail.com>' __copyright__ = '2009, Darko Miletic <darko.miletic at gmail.com>'
__description__ = 'La Stampa 05/05/2010' __description__ = 'La Stampa 28/12/2012'
''' '''
http://www.lastampa.it/ http://www.lastampa.it/
@ -14,10 +14,11 @@ class LaStampa(BasicNewsRecipe):
title = u'La Stampa' title = u'La Stampa'
language = 'it' language = 'it'
__author__ = 'Gabriele Marini' __author__ = 'Gabriele Marini'
oldest_article = 15 #oldest_article = 15
oldest_articlce = 7 #for daily schedule
max_articles_per_feed = 50 max_articles_per_feed = 50
recursion = 100 recursion = 100
cover_url = 'http://www.lastampa.it/edicola/PDF/1.pdf' cover_url = 'http://www1.lastampa.it/edicola/PDF/1.pdf'
use_embedded_content = False use_embedded_content = False
remove_javascript = True remove_javascript = True
no_stylesheets = True no_stylesheets = True
@ -33,35 +34,41 @@ class LaStampa(BasicNewsRecipe):
if link: if link:
return link[0]['href'] return link[0]['href']
keep_only_tags = [dict(attrs={'class':['boxocchiello2','titoloRub','titologir','catenaccio','sezione','articologirata']}), keep_only_tags = [dict(attrs={'class':['boxocchiello2','titoloRub','titologir','autore-girata','luogo-girata','catenaccio','sezione','articologirata','bodytext','news-single-img','ls-articoloCorpo','ls-blog-list-1col']}),
dict(name='div', attrs={'id':'corpoarticolo'}) dict(name='div', attrs={'id':'corpoarticolo'})
] ]
remove_tags = [dict(name='div', attrs={'id':'menutop'}),
dict(name='div', attrs={'id':'fwnetblocco'}), remove_tags = [dict(name='div', attrs={'id':['menutop','fwnetblocco']}),
dict(name='table', attrs={'id':'strumenti'}), dict(attrs={'class':['ls-toolbarCommenti','ls-boxCommentsBlog']}),
dict(name='table', attrs={'id':'imgesterna'}), dict(name='table', attrs={'id':['strumenti','imgesterna']}),
dict(name='a', attrs={'class':'linkblu'}), dict(name='a', attrs={'class':['linkblu','link']}),
dict(name='a', attrs={'class':'link'}),
dict(name='span', attrs={'class':['boxocchiello','boxocchiello2','sezione']}) dict(name='span', attrs={'class':['boxocchiello','boxocchiello2','sezione']})
] ]
feeds = [(u'BuonGiorno',u'http://www.lastampa.it/cultura/opinioni/buongiorno/rss.xml'),
feeds = [ (u'Jena', u'http://www.lastampa.it/cultura/opinioni/jena/rss.xml'),
(u'Home', u'http://www.lastampa.it/redazione/rss_home.xml'), (u'Editoriali', u'http://www.lastampa.it/cultura/opinioni/editoriali'),
(u'Editoriali', u'http://www.lastampa.it/cmstp/rubriche/oggetti/rss.asp?ID_blog=25'), (u'Finestra sull America', u'http://lastampa.feedsportal.com/c/32418/f/625713/index.rss'),
(u'Politica', u'http://www.lastampa.it/redazione/cmssezioni/politica/rss_politica.xml'), (u'HomePage', u'http://www.lastampa.it/rss.xml'),
(u'ArciItaliana', u'http://www.lastampa.it/cmstp/rubriche/oggetti/rss.asp?ID_blog=14'), (u'Politica Italia', u'http://www.lastampa.it/italia/politica/rss.xml'),
(u'Cronache', u'http://www.lastampa.it/redazione/cmssezioni/cronache/rss_cronache.xml'), (u'ArciItaliana', u'http://www.lastampa.it/rss/blog/arcitaliana'),
(u'Esteri', u'http://www.lastampa.it/redazione/cmssezioni/esteri/rss_esteri.xml'), (u'Cronache', u'http://www.lastampa.it/italia/cronache/rss.xml'),
(u'Danni Collaterali', u'http://www.lastampa.it/cmstp/rubriche/oggetti/rss.asp?ID_blog=90'), (u'Esteri', u'http://www.lastampa.it/esteri/rss.xml'),
(u'Economia', u'http://www.lastampa.it/redazione/cmssezioni/economia/rss_economia.xml'), (u'Danni Collaterali', u'http://www.lastampa.it/rss/blog/danni-collaterali'),
(u'Tecnologia ', u'http://www.lastampa.it/cmstp/rubriche/oggetti/rss.asp?ID_blog=30'), (u'Economia', u'http://www.lastampa.it/economia/rss.xml'),
(u'Spettacoli', u'http://www.lastampa.it/redazione/cmssezioni/spettacoli/rss_spettacoli.xml'), (u'Tecnologia ', u'http://www.lastampa.it/tecnologia/rss.xml'),
(u'Sport', u'http://www.lastampa.it/sport/rss_home.xml'), (u'Spettacoli', u'http://www.lastampa.it/spettacoli/rss.xml'),
(u'Torino', u'http://rss.feedsportal.com/c/32418/f/466938/index.rss'), (u'Sport', u'http://www.lastampa.it/sport/rss.xml'),
(u'Motori', u'http://www.lastampa.it/cmstp/rubriche/oggetti/rss.asp?ID_blog=57'), (u'Torino', u'http://www.lastampa.it/cronaca/rss.xml'),
(u'Scienza', u'http://www.lastampa.it/cmstp/rubriche/oggetti/rss.asp?ID_blog=38'), (u'Motori', u'http://www.lastampa.it/motori/rss.xml'),
(u'Fotografia', u'http://rss.feedsportal.com/c/32418/f/478449/index.rss'), (u'Scienza', u'http://www.lastampa.it/scienza/rss.xml'),
(u'Scuola', u'http://www.lastampa.it/cmstp/rubriche/oggetti/rss.asp?ID_blog=60'), (u'Cultura', u'http://www.lastampa.it/cultura/rss.xml'),
(u'Tempo Libero', u'http://www.lastampa.it/tempolibero/rss_home.xml') (u'Scuola', u'http://www.lastampa.it/cultura/scuola/rss.xml'),
(u'Benessere', u'http://www.lastampa.it/scienza/benessere/rss.xml'),
(u'Cucina', u'http://www.lastampa.it/societa/cucina/rss.xml'),
(u'Casa', u'http://www.lastampa.it/societa/casa/rss.xml'),
(u'Moda',u'http://www.lastampa.it/societa/moda/rss.xml'),
(u'Giochi',u'http://www.lastampa.it/tecnologia/giochi/rss.xml'),
(u'Viaggi',u'http://www.lastampa.it/societa/viaggi/rss.xml'),
(u'Ambiente', u'http://www.lastampa.it/scienza/ambiente/rss.xml')
] ]

View File

@ -7,9 +7,9 @@ class AdvancedUserRecipe1324114228(BasicNewsRecipe):
max_articles_per_feed = 100 max_articles_per_feed = 100
auto_cleanup = True auto_cleanup = True
masthead_url = 'http://www.lavoce.info/binary/la_voce/testata/lavoce.1184661635.gif' masthead_url = 'http://www.lavoce.info/binary/la_voce/testata/lavoce.1184661635.gif'
feeds = [(u'La Voce', u'http://www.lavoce.info/feed_rss.php?id_feed=1')] feeds = [(u'La Voce', u'http://www.lavoce.info/feed/')]
__author__ = 'faber1971' __author__ = 'faber1971'
description = 'Italian website on Economy - v1.01 (17, December 2011)' description = 'Italian website on Economy - v1.02 (27, December 2012)'
language = 'it' language = 'it'

View File

@ -22,13 +22,15 @@ class LeMonde(BasicNewsRecipe):
#publication_type = 'newsportal' #publication_type = 'newsportal'
extra_css = ''' extra_css = '''
h1{font-size:130%;} h1{font-size:130%;}
h2{font-size:100%;}
blockquote.aside {background-color: #DDD; padding: 0.5em;}
.ariane{font-size:xx-small;} .ariane{font-size:xx-small;}
.source{font-size:xx-small;} .source{font-size:xx-small;}
#.href{font-size:xx-small;} /*.href{font-size:xx-small;}*/
#.figcaption style{color:#666666; font-size:x-small;} /*.figcaption style{color:#666666; font-size:x-small;}*/
#.main-article-info{font-family:Arial,Helvetica,sans-serif;} /*.main-article-info{font-family:Arial,Helvetica,sans-serif;}*/
#full-contents{font-size:small; font-family:Arial,Helvetica,sans-serif;font-weight:normal;} /*full-contents{font-size:small; font-family:Arial,Helvetica,sans-serif;font-weight:normal;}*/
#match-stats-summary{font-size:small; font-family:Arial,Helvetica,sans-serif;font-weight:normal;} /*match-stats-summary{font-size:small; font-family:Arial,Helvetica,sans-serif;font-weight:normal;}*/
''' '''
#preprocess_regexps = [(re.compile(r'<!--.*?-->', re.DOTALL), lambda m: '')] #preprocess_regexps = [(re.compile(r'<!--.*?-->', re.DOTALL), lambda m: '')]
conversion_options = { conversion_options = {
@ -44,6 +46,9 @@ class LeMonde(BasicNewsRecipe):
filterDuplicates = True filterDuplicates = True
def preprocess_html(self, soup): def preprocess_html(self, soup):
for aside in soup.findAll('aside'):
aside.name='blockquote'
aside['class'] = "aside"
for alink in soup.findAll('a'): for alink in soup.findAll('a'):
if alink.string is not None: if alink.string is not None:
tstr = alink.string tstr = alink.string
@ -107,7 +112,9 @@ class LeMonde(BasicNewsRecipe):
] ]
remove_tags = [ remove_tags = [
dict(name='div', attrs={'class':['bloc_base meme_sujet']}), dict(attrs={'class':['rubriques_liees']}),
dict(attrs={'class':['sociaux']}),
dict(attrs={'class':['bloc_base meme_sujet']}),
dict(name='p', attrs={'class':['lire']}) dict(name='p', attrs={'class':['lire']})
] ]

View File

@ -32,26 +32,28 @@ class ledevoir(BasicNewsRecipe):
recursion = 10 recursion = 10
needs_subscription = 'optional' needs_subscription = 'optional'
filterDuplicates = False
url_list = [] url_list = []
remove_javascript = True remove_javascript = True
no_stylesheets = True no_stylesheets = True
auto_cleanup = True
preprocess_regexps = [(re.compile(r'(title|alt)=".*?>.*?"', re.DOTALL), lambda m: '')] preprocess_regexps = [(re.compile(r'(title|alt)=".*?>.*?"', re.DOTALL), lambda m: '')]
keep_only_tags = [ #keep_only_tags = [
dict(name='div', attrs={'id':'article'}), #dict(name='div', attrs={'id':'article_detail'}),
dict(name='div', attrs={'id':'colonne_principale'}) #dict(name='div', attrs={'id':'colonne_principale'})
] #]
remove_tags = [ #remove_tags = [
dict(name='div', attrs={'id':'dialog'}), #dict(name='div', attrs={'id':'dialog'}),
dict(name='div', attrs={'class':['interesse_actions','reactions']}), #dict(name='div', attrs={'class':['interesse_actions','reactions','taille_du_texte right clearfix','partage_sociaux clearfix']}),
dict(name='ul', attrs={'class':'mots_cles'}), #dict(name='aside', attrs={'class':['article_actions clearfix','reactions','partage_sociaux_wrapper']}),
dict(name='a', attrs={'class':'haut'}), #dict(name='ul', attrs={'class':'mots_cles'}),
dict(name='h5', attrs={'class':'interesse_actions'}) #dict(name='ul', attrs={'id':'commentaires'}),
] #dict(name='a', attrs={'class':'haut'}),
#dict(name='h5', attrs={'class':'interesse_actions'})
#]
feeds = [ feeds = [
(u'A la une', 'http://www.ledevoir.com/rss/manchettes.xml'), (u'A la une', 'http://www.ledevoir.com/rss/manchettes.xml'),
@ -95,10 +97,4 @@ class ledevoir(BasicNewsRecipe):
br.submit() br.submit()
return br return br
def print_version(self, url):
if self.filterDuplicates:
if url in self.url_list:
return
self.url_list.append(url)
return url

View File

@ -0,0 +1,12 @@
from calibre.web.feeds.news import BasicNewsRecipe
class AdvancedUserRecipe1356270446(BasicNewsRecipe):
title = u'\u041b\u044c\u0432\u0456\u0432\u0441\u044c\u043a\u0430 \u0433\u0430\u0437\u0435\u0442\u0430'
__author__ = 'rpalyvoda'
oldest_article = 7
max_articles_per_feed = 100
language = 'uk'
cover_url = 'http://lvivska.com/sites/all/themes/biblos/images/logo.png'
masthead_url = 'http://lvivska.com/sites/all/themes/biblos/images/logo.png'
auto_cleanup = True
feeds = [(u'\u041d\u043e\u0432\u0438\u043d\u0438', u'http://lvivska.com/rss/news.xml'), (u'\u041f\u043e\u043b\u0456\u0442\u0438\u043a\u0430', u'http://lvivska.com/rss/politic.xml'), (u'\u0415\u043a\u043e\u043d\u043e\u043c\u0456\u043a\u0430', u'http://lvivska.com/rss/economic.xml'), (u'\u041f\u0440\u0430\u0432\u043e', u'http://lvivska.com/rss/law.xml'), (u'\u0421\u0432\u0456\u0442', u'http://lvivska.com/rss/world.xml'), (u'\u0416\u0438\u0442\u0442\u044f', u'http://lvivska.com/rss/life.xml'), (u'\u041a\u0443\u043b\u044c\u0442\u0443\u0440\u0430', u'http://lvivska.com/rss/culture.xml'), (u'\u041b\u0430\u0441\u0443\u043d', u'http://lvivska.com/rss/cooking.xml'), (u'\u0421\u0442\u0438\u043b\u044c', u'http://lvivska.com/rss/style.xml'), (u'Galicia Incognita', u'http://lvivska.com/rss/galiciaincognita.xml'), (u'\u0421\u043f\u043e\u0440\u0442', u'http://lvivska.com/rss/sport.xml'), (u'\u0415\u043a\u043e\u043b\u043e\u0433\u0456\u044f', u'http://lvivska.com/rss/ecology.xml'), (u"\u0417\u0434\u043e\u0440\u043e\u0432'\u044f", u'http://lvivska.com/rss/health.xml'), (u'\u0410\u0432\u0442\u043e', u'http://lvivska.com/rss/auto.xml'), (u'\u0411\u043b\u043e\u0433\u0438', u'http://lvivska.com/rss/blog.xml')]

View File

@ -1,43 +1,74 @@
from calibre.web.feeds.news import BasicNewsRecipe from calibre.web.feeds.news import BasicNewsRecipe
from calibre import strftime
import re
import datetime
import time
class AdvancedUserRecipe1306097511(BasicNewsRecipe): class AdvancedUserRecipe1306097511(BasicNewsRecipe):
title = u'Metro UK' title = u'Metro UK'
description = 'Author Dave Asbury : News from The Metro - UK' description = 'News as provided by The Metro -UK'
#timefmt = '' #timefmt = ''
__author__ = 'Dave Asbury' __author__ = 'Dave Asbury'
#last update 9/9/12 #last update 9/6/12
cover_url = 'http://profile.ak.fbcdn.net/hprofile-ak-snc4/276636_117118184990145_2132092232_n.jpg' cover_url = 'http://profile.ak.fbcdn.net/hprofile-ak-snc4/276636_117118184990145_2132092232_n.jpg'
no_stylesheets = True
oldest_article = 1 oldest_article = 1
max_articles_per_feed = 12
remove_empty_feeds = True remove_empty_feeds = True
remove_javascript = True remove_javascript = True
#auto_cleanup = True auto_cleanup = True
encoding = 'UTF-8' encoding = 'UTF-8'
cover_url ='http://profile.ak.fbcdn.net/hprofile-ak-snc4/157897_117118184990145_840702264_n.jpg'
language = 'en_GB' language = 'en_GB'
masthead_url = 'http://e-edition.metro.co.uk/images/metro_logo.gif' masthead_url = 'http://e-edition.metro.co.uk/images/metro_logo.gif'
extra_css = '''
h1{font-family:Arial,Helvetica,sans-serif; font-weight:900;font-size:1.6em;}
h2{font-family:Arial,Helvetica,sans-serif; font-weight:normal;font-size:1.2em;}
p{font-family:Arial,Helvetica,sans-serif;font-size:1.0em;}
body{font-family:Helvetica,Arial,sans-serif;font-size:1.0em;}
'''
keep_only_tags = [
#dict(name='h1'),
#dict(name='h2'),
#dict(name='div', attrs={'class' : ['row','article','img-cnt figure','clrd']})
#dict(name='h3'),
#dict(attrs={'class' : 'BText'}),
]
remove_tags = [
dict(name='div',attrs={'class' : 'art-fd fd-gr1-b clrd'}),
dict(name='span',attrs={'class' : 'share'}),
dict(name='li'),
dict(attrs={'class' : ['twitter-share-button','header-forms','hdr-lnks','close','art-rgt','fd-gr1-b clrd google-article','news m12 clrd clr-b p5t shareBtm','item-ds csl-3-img news','c-1of3 c-last','c-1of1','pd','item-ds csl-3-img sport']}),
dict(attrs={'id' : ['','sky-left','sky-right','ftr-nav','and-ftr','notificationList','logo','miniLogo','comments-news','metro_extras']})
]
remove_tags_before = dict(name='h1')
#remove_tags_after = dict(attrs={'id':['topic-buttons']})
feeds = [ def parse_index(self):
(u'News', u'http://www.metro.co.uk/rss/news/'), (u'Money', u'http://www.metro.co.uk/rss/money/'), (u'Sport', u'http://www.metro.co.uk/rss/sport/'), (u'Film', u'http://www.metro.co.uk/rss/metrolife/film/'), (u'Music', u'http://www.metro.co.uk/rss/metrolife/music/'), (u'TV', u'http://www.metro.co.uk/rss/tv/'), (u'Showbiz', u'http://www.metro.co.uk/rss/showbiz/'), (u'Weird News', u'http://www.metro.co.uk/rss/weird/'), (u'Travel', u'http://www.metro.co.uk/rss/travel/'), (u'Lifestyle', u'http://www.metro.co.uk/rss/lifestyle/'), (u'Books', u'http://www.metro.co.uk/rss/lifestyle/books/'), (u'Food', u'http://www.metro.co.uk/rss/lifestyle/restaurants/')] articles = {}
key = None
ans = []
feeds = [ ('UK', 'http://metro.co.uk/news/uk/'),
('World', 'http://metro.co.uk/news/world/'),
('Weird', 'http://metro.co.uk/news/weird/'),
('Money', 'http://metro.co.uk/news/money/'),
('Sport', 'http://metro.co.uk/sport/'),
('Guilty Pleasures', 'http://metro.co.uk/guilty-pleasures/')
]
for key, feed in feeds:
soup = self.index_to_soup(feed)
articles[key] = []
ans.append(key)
today = datetime.date.today()
today = time.mktime(today.timetuple())-60*60*24
for a in soup.findAll('a'):
for name, value in a.attrs:
if name == "class" and value=="post":
url = a['href']
title = a['title']
print title
description = ''
m = re.search('^.*uk/([^/]*)/([^/]*)/([^/]*)/', url)
skip = 1
if len(m.groups()) == 3:
g = m.groups()
dt = datetime.datetime.strptime(''+g[0]+'-'+g[1]+'-'+g[2], '%Y-%m-%d')
pubdate = time.strftime('%a, %d %b', dt.timetuple())
dt = time.mktime(dt.timetuple())
if dt >= today:
print pubdate
skip = 0
else:
pubdate = strftime('%a, %d %b')
summary = a.find(True, attrs={'class':'excerpt'})
if summary:
description = self.tag_to_string(summary, use_alt=False)
if skip == 0:
articles[key].append(
dict(title=title, url=url, date=pubdate,
description=description,
content=''))
#ans = self.sort_index_by(ans, {'The Front Page':-1, 'Dining In, Dining Out':1, 'Obituaries':2})
ans = [(key, articles[key]) for key in ans if articles.has_key(key)]
return ans

View File

@ -2,7 +2,7 @@
from calibre.web.feeds.news import BasicNewsRecipe from calibre.web.feeds.news import BasicNewsRecipe
class Mlody_technik(BasicNewsRecipe): class Mlody_technik(BasicNewsRecipe):
title = u'Mlody technik' title = u'Młody technik'
__author__ = 'fenuks' __author__ = 'fenuks'
description = u'Młody technik' description = u'Młody technik'
category = 'science' category = 'science'

View File

@ -0,0 +1,27 @@
from calibre.web.feeds.news import BasicNewsRecipe
class AdvancedUserRecipe1329123365(BasicNewsRecipe):
title = u'Mobilebulgaria.com'
__author__ = 'M3 Web'
description = 'The biggest Bulgarian site covering mobile consumer electronics. Offers detailed reviews, popular discussion forum, shop and platform for selling new and second hand phones and gadgets.'
category = 'News, Reviews, Offers, Forum'
oldest_article = 45
max_articles_per_feed = 10
language = 'bg'
encoding = 'windows-1251'
no_stylesheets = False
remove_javascript = True
keep_only_tags = [dict(name='div', attrs={'class':'bigblock'}),
dict(name='div', attrs={'class':'verybigblock'}),
dict(name='table', attrs={'class':'obiaviresults'}),
dict(name='div', attrs={'class':'forumblock'}),
dict(name='div', attrs={'class':'forumblock_b1'}),
dict(name='div', attrs={'class':'block2_2colswrap'})]
feeds = [(u'News', u'http://www.mobilebulgaria.com/rss_full.php'),
(u'Reviews', u'http://www.mobilebulgaria.com/rss_reviews.php'),
(u'Offers', u'http://www.mobilebulgaria.com/obiavi/rss.php'),
(u'Forum', u'http://www.mobilebulgaria.com/rss_forum_last10.php')]
extra_css = '''
#gallery1 div{display: block; float: left; margin: 0 10px 10px 0;} '''

View File

@ -13,8 +13,11 @@ class NikkeiNet_paper_subscription(BasicNewsRecipe):
max_articles_per_feed = 30 max_articles_per_feed = 30
language = 'ja' language = 'ja'
no_stylesheets = True no_stylesheets = True
cover_url = 'http://parts.nikkei.com/parts/ds/images/common/logo_r1.svg' #cover_url = 'http://parts.nikkei.com/parts/ds/images/common/logo_r1.svg'
masthead_url = 'http://parts.nikkei.com/parts/ds/images/common/logo_r1.svg' cover_url = 'http://cdn.nikkei.co.jp/parts/ds/images/common/st_nikkei_r1_20101003_1.gif'
#masthead_url = 'http://parts.nikkei.com/parts/ds/images/common/logo_r1.svg'
masthead_url = 'http://cdn.nikkei.co.jp/parts/ds/images/common/st_nikkei_r1_20101003_1.gif'
cover_margins = (10, 188, '#ffffff')
remove_tags_before = {'class':"cmn-indent"} remove_tags_before = {'class':"cmn-indent"}
remove_tags = [ remove_tags = [
@ -40,8 +43,11 @@ class NikkeiNet_paper_subscription(BasicNewsRecipe):
print "-------------------------open top page-------------------------------------" print "-------------------------open top page-------------------------------------"
br.open('http://www.nikkei.com/') br.open('http://www.nikkei.com/')
print "-------------------------open first login form-----------------------------" print "-------------------------open first login form-----------------------------"
link = br.links(url_regex="www.nikkei.com/etc/accounts/login").next() try:
br.follow_link(link) url = br.links(url_regex="www.nikkei.com/etc/accounts/login").next().url
except StopIteration:
url = 'http://www.nikkei.com/etc/accounts/login?dps=3&pageflag=top&url=http%3A%2F%2Fwww.nikkei.com%2F'
br.open(url) #br.follow_link(link)
#response = br.response() #response = br.response()
#print response.get_data() #print response.get_data()
print "-------------------------JS redirect(send autoPostForm)--------------------" print "-------------------------JS redirect(send autoPostForm)--------------------"

View File

@ -15,7 +15,7 @@ class Nin(BasicNewsRecipe):
publisher = 'NIN d.o.o. - Ringier d.o.o.' publisher = 'NIN d.o.o. - Ringier d.o.o.'
category = 'news, politics, Serbia' category = 'news, politics, Serbia'
no_stylesheets = True no_stylesheets = True
oldest_article = 15 oldest_article = 180
encoding = 'utf-8' encoding = 'utf-8'
needs_subscription = True needs_subscription = True
remove_empty_feeds = True remove_empty_feeds = True
@ -25,7 +25,7 @@ class Nin(BasicNewsRecipe):
use_embedded_content = False use_embedded_content = False
language = 'sr' language = 'sr'
publication_type = 'magazine' publication_type = 'magazine'
masthead_url = 'http://www.nin.co.rs/img/head/logo.jpg' masthead_url = 'http://www.nin.co.rs/img/logo_print.jpg'
extra_css = """ extra_css = """
@font-face {font-family: "sans1";src:url(res:///opt/sony/ebook/FONT/tt0003m_.ttf)} @font-face {font-family: "sans1";src:url(res:///opt/sony/ebook/FONT/tt0003m_.ttf)}
body{font-family: Verdana, Lucida, sans1, sans-serif} body{font-family: Verdana, Lucida, sans1, sans-serif}
@ -42,11 +42,11 @@ class Nin(BasicNewsRecipe):
, 'tags' : category , 'tags' : category
, 'publisher' : publisher , 'publisher' : publisher
, 'language' : language , 'language' : language
, 'linearize_tables': True
} }
preprocess_regexps = [ preprocess_regexps = [
(re.compile(r'</body>.*?<html>', re.DOTALL|re.IGNORECASE),lambda match: '</body>') (re.compile(r'<div class="standardFont">.*', re.DOTALL|re.IGNORECASE),lambda match: '')
,(re.compile(r'</html>.*?</html>', re.DOTALL|re.IGNORECASE),lambda match: '</html>')
,(re.compile(u'\u0110'), lambda match: u'\u00D0') ,(re.compile(u'\u0110'), lambda match: u'\u00D0')
] ]
@ -60,42 +60,21 @@ class Nin(BasicNewsRecipe):
br.submit() br.submit()
return br return br
keep_only_tags =[dict(name='td', attrs={'width':'520'})] remove_tags_before = dict(name='div', attrs={'class':'titleFont'})
remove_tags_before =dict(name='span', attrs={'class':'izjava'}) remove_tags_after = dict(name='div', attrs={'class':'standardFont'})
remove_tags_after =dict(name='html') remove_tags = [dict(name=['object','link','iframe','meta','base'])]
remove_tags = [ remove_attributes = ['border','background','height','width','align','valign']
dict(name=['object','link','iframe','meta','base'])
,dict(attrs={'class':['fb-like','twitter-share-button']})
,dict(attrs={'rel':'nofollow'})
]
remove_attributes=['border','background','height','width','align','valign']
def get_cover_url(self): def get_cover_url(self):
cover_url = None cover_url = None
soup = self.index_to_soup(self.INDEX) soup = self.index_to_soup(self.INDEX)
for item in soup.findAll('a', href=True): cover = soup.find('img', attrs={'class':'issueImg'})
if item['href'].startswith('/pages/issue.php?id='): if cover:
simg = item.find('img') return self.PREFIX + cover['src']
if simg:
return self.PREFIX + item.img['src']
return cover_url return cover_url
feeds = [(u'NIN Online', u'http://www.nin.co.rs/misc/rss.php?feed=RSS2.0')] feeds = [(u'NIN Online', u'http://www.nin.co.rs/misc/rss.php?feed=RSS2.0')]
def preprocess_html(self, soup): def print_version(self, url):
for item in soup.findAll(style=True): return url + '&pf=1'
del item['style']
for item in soup.findAll('div'):
if len(item.contents) == 0:
item.extract()
for item in soup.findAll(['td','tr']):
item.name='div'
for item in soup.findAll('img'):
if not item.has_key('alt'):
item['alt'] = 'image'
for tbl in soup.findAll('table'):
img = tbl.find('img')
if img:
img.extract()
tbl.replaceWith(img)
return soup

View File

@ -6,7 +6,6 @@ www.nsfwcorp.com
''' '''
import urllib import urllib
from calibre import strftime
from calibre.web.feeds.news import BasicNewsRecipe from calibre.web.feeds.news import BasicNewsRecipe
class NotSafeForWork(BasicNewsRecipe): class NotSafeForWork(BasicNewsRecipe):
@ -21,8 +20,9 @@ class NotSafeForWork(BasicNewsRecipe):
needs_subscription = True needs_subscription = True
auto_cleanup = False auto_cleanup = False
INDEX = 'https://www.nsfwcorp.com' INDEX = 'https://www.nsfwcorp.com'
LOGIN = INDEX + '/login' LOGIN = INDEX + '/login/target/'
use_embedded_content = False SETTINGS = INDEX + '/settings/'
use_embedded_content = True
language = 'en' language = 'en'
publication_type = 'magazine' publication_type = 'magazine'
masthead_url = 'http://assets.nsfwcorp.com/media/headers/nsfw_banner.jpg' masthead_url = 'http://assets.nsfwcorp.com/media/headers/nsfw_banner.jpg'
@ -46,15 +46,6 @@ class NotSafeForWork(BasicNewsRecipe):
, 'language' : language , 'language' : language
} }
remove_tags_before = dict(attrs={'id':'fromToLine'})
remove_tags_after = dict(attrs={'id':'unlockButtonDiv'})
remove_tags=[
dict(name=['meta', 'link', 'iframe', 'embed', 'object'])
,dict(name='a', attrs={'class':'switchToDeskNotes'})
,dict(attrs={'id':'unlockButtonDiv'})
]
remove_attributes = ['lang']
def get_browser(self): def get_browser(self):
br = BasicNewsRecipe.get_browser() br = BasicNewsRecipe.get_browser()
br.open(self.LOGIN) br.open(self.LOGIN)
@ -65,30 +56,12 @@ class NotSafeForWork(BasicNewsRecipe):
br.open(self.LOGIN, data) br.open(self.LOGIN, data)
return br return br
def parse_index(self): def get_feeds(self):
articles = [] self.feeds = []
soup = self.index_to_soup(self.INDEX) soup = self.index_to_soup(self.SETTINGS)
dispatches = soup.find(attrs={'id':'dispatches'}) for item in soup.findAll('input', attrs={'type':'text'}):
if dispatches: if item.has_key('value') and item['value'].startswith('http://www.nsfwcorp.com/feed/'):
for item in dispatches.findAll('h3'): self.feeds.append(item['value'])
description = u'' return self.feeds
title_link = item.find('span', attrs={'class':'dispatchTitle'}) return self.feeds
description_link = item.find('span', attrs={'class':'dispatchSubtitle'})
feed_link = item.find('a', href=True)
if feed_link:
url = self.INDEX + feed_link['href']
title = self.tag_to_string(title_link)
description = self.tag_to_string(description_link)
date = strftime(self.timefmt)
articles.append({
'title' :title
,'date' :date
,'url' :url
,'description':description
})
return [('Dispatches', articles)]
def preprocess_html(self, soup):
for item in soup.findAll(style=True):
del item['style']
return soup

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -1,27 +1,27 @@
from calibre.web.feeds.news import BasicNewsRecipe from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup
class PajamasMedia(BasicNewsRecipe): class PajamasMedia(BasicNewsRecipe):
title = u'Pajamas Media' title = u'Pajamas Media'
description = u'Provides exclusive news and opinion for forty countries.' description = u'Provides exclusive news and opinion for forty countries.'
language = 'en' language = 'en'
__author__ = 'Krittika Goyal' __author__ = 'Krittika Goyal'
oldest_article = 1 #days oldest_article = 2 #days
max_articles_per_feed = 25 max_articles_per_feed = 25
recursions = 1 recursions = 1
match_regexps = [r'http://pajamasmedia.com/blog/.*/2/$'] match_regexps = [r'http://pajamasmedia.com/blog/.*/2/$']
#encoding = 'latin1' #encoding = 'latin1'
remove_stylesheets = True remove_stylesheets = True
#remove_tags_before = dict(name='h1', attrs={'class':'heading'}) auto_cleanup = True
remove_tags_after = dict(name='div', attrs={'class':'paged-nav'}) ##remove_tags_before = dict(name='h1', attrs={'class':'heading'})
remove_tags = [ #remove_tags_after = dict(name='div', attrs={'class':'paged-nav'})
dict(name='iframe'), #remove_tags = [
dict(name='div', attrs={'class':['pages']}), #dict(name='iframe'),
#dict(name='div', attrs={'id':['bookmark']}), #dict(name='div', attrs={'class':['pages']}),
#dict(name='span', attrs={'class':['related_link', 'slideshowcontrols']}), ##dict(name='div', attrs={'id':['bookmark']}),
#dict(name='ul', attrs={'class':'articleTools'}), ##dict(name='span', attrs={'class':['related_link', 'slideshowcontrols']}),
] ##dict(name='ul', attrs={'class':'articleTools'}),
#]
feeds = [ feeds = [
('pajamas Media', ('pajamas Media',
@ -29,20 +29,20 @@ class PajamasMedia(BasicNewsRecipe):
] ]
def preprocess_html(self, soup): #def preprocess_html(self, soup):
story = soup.find(name='div', attrs={'id':'innerpage-content'}) #story = soup.find(name='div', attrs={'id':'innerpage-content'})
#td = heading.findParent(name='td') ##td = heading.findParent(name='td')
#td.extract() ##td.extract()
soup = BeautifulSoup('<html><head><title>t</title></head><body></body></html>') #soup = BeautifulSoup('<html><head><title>t</title></head><body></body></html>')
body = soup.find(name='body') #body = soup.find(name='body')
body.insert(0, story) #body.insert(0, story)
return soup #return soup
def postprocess_html(self, soup, first): #def postprocess_html(self, soup, first):
if not first: #if not first:
h = soup.find(attrs={'class':'innerpage-header'}) #h = soup.find(attrs={'class':'innerpage-header'})
if h: h.extract() #if h: h.extract()
auth = soup.find(attrs={'class':'author'}) #auth = soup.find(attrs={'class':'author'})
if auth: auth.extract() #if auth: auth.extract()
return soup #return soup

View File

@ -0,0 +1,63 @@
# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:fdm=marker:ai
from calibre.web.feeds.news import BasicNewsRecipe
class PoradniaPWN(BasicNewsRecipe):
title = u'Poradnia Językowa PWN'
__author__ = 'fenuks'
description = u'Internetowa poradnia językowa Wydawnictwa Naukowego PWN. Poradnię prowadzi Redaktor Naczelny Słowników Języka Polskiego, prof. Mirosław Bańko. Pomagają mu eksperci - znani polscy językoznawcy. Współpracuje z nami m.in. prof. Jerzy Bralczyk oraz dr Jan Grzenia.'
category = 'language'
language = 'pl'
#cover_url = ''
oldest_article = 14
max_articles_per_feed = 100000
INDEX = "http://poradnia.pwn.pl/"
no_stylesheets = True
remove_attributes = ['style']
remove_javascript = True
use_embedded_content = False
#preprocess_regexps = [(re.compile('<li|ul', re.IGNORECASE), lambda m: '<div'),(re.compile('</li>', re.IGNORECASE), lambda m: '</div>'), (re.compile('</ul>', re.IGNORECASE), lambda m: '</div>')]
keep_only_tags = [dict(name="div", attrs={"class":"searchhi"})]
feeds = [(u'Poradnia', u'http://rss.pwn.pl/poradnia.rss')]
'''def find_articles(self, url):
articles = []
soup=self.index_to_soup(url)
counter = int(soup.find(name='p', attrs={'class':'count'}).findAll('b')[-1].string)
counter = 500
pos = 0
next = url
while next:
soup=self.index_to_soup(next)
tag=soup.find(id="listapytan")
art=tag.findAll(name='li')
for i in art:
if i.h4:
title=i.h4.a.string
url=self.INDEX+i.h4.a['href']
#date=soup.find(id='footer').ul.li.string[41:-1]
articles.append({'title' : title,
'url' : url,
'date' : '',
'description' : ''
})
pos += 10
if not pos >=counter:
next = 'http://poradnia.pwn.pl/lista.php?kat=18&od=' + str(pos)
print u'Tworzenie listy artykułów dla', next
else:
next = None
print articles
return articles
def parse_index(self):
feeds = []
feeds.append((u"Poradnia", self.find_articles('http://poradnia.pwn.pl/lista.php')))
return feeds'''
def preprocess_html(self, soup):
for i in soup.findAll(name=['ul', 'li']):
i.name="div"
for z in soup.findAll(name='a'):
if not z['href'].startswith('http'):
z['href'] = 'http://poradnia.pwn.pl/' + z['href']
return soup

View File

@ -1,12 +1,13 @@
from calibre.web.feeds.news import BasicNewsRecipe # -*- coding: utf-8 -*-
class BasicUserRecipe1324913680(BasicNewsRecipe): from calibre.web.feeds.news import BasicNewsRecipe
class AdvancedUserRecipe1355341662(BasicNewsRecipe):
title = u'Sivil Dusunce' title = u'Sivil Dusunce'
language = 'tr' language = 'tr'
__author__ = 'asalet_r' __author__ = 'asalet_r'
oldest_article = 7 oldest_article = 7
max_articles_per_feed = 20 max_articles_per_feed = 50
auto_cleanup = True auto_cleanup = True
feeds = [(u'Sivil Dusunce', u'http://www.sivildusunce.com/feed/')] feeds = [(u'Sivil Dusunce', u'http://www.sivildusunce.com/?t=rss&xml=1')]

View File

@ -8,19 +8,19 @@ Fetch sueddeutsche.de
from calibre.web.feeds.news import BasicNewsRecipe from calibre.web.feeds.news import BasicNewsRecipe
class Sueddeutsche(BasicNewsRecipe): class Sueddeutsche(BasicNewsRecipe):
title = u'Süddeutsche.de' # 2012-01-26 AGe Correct Title title = u'Süddeutsche.de'
description = 'News from Germany, Access to online content' # 2012-01-26 AGe description = 'News from Germany, Access to online content'
__author__ = 'Oliver Niesner and Armin Geller' #Update AGe 2012-01-26 __author__ = 'Oliver Niesner and Armin Geller' #Update AGe 2012-12-05
publisher = u'Süddeutsche Zeitung' # 2012-01-26 AGe add publisher = u'Süddeutsche Zeitung'
category = 'news, politics, Germany' # 2012-01-26 AGe add category = 'news, politics, Germany'
timefmt = ' [%a, %d %b %Y]' # 2012-01-26 AGe add %a timefmt = ' [%a, %d %b %Y]'
oldest_article = 7 oldest_article = 7
max_articles_per_feed = 100 max_articles_per_feed = 100
language = 'de' language = 'de'
encoding = 'utf-8' encoding = 'utf-8'
publication_type = 'newspaper' # 2012-01-26 add publication_type = 'newspaper'
cover_source = 'http://www.sueddeutsche.de/verlag' # 2012-01-26 AGe add from Darko Miletic paid content source cover_source = 'http://www.sueddeutsche.de/verlag' # 2012-01-26 AGe add from Darko Miletic paid content source
masthead_url = 'http://www.sueddeutsche.de/static_assets/build/img/sdesiteheader/logo_homepage.441d531c.png' # 2012-01-26 AGe add masthead_url = 'http://www.sueddeutsche.de/static_assets/img/sdesiteheader/logo_standard.a152b0df.png' # 2012-12-05 AGe add
use_embedded_content = False use_embedded_content = False
no_stylesheets = True no_stylesheets = True
@ -40,9 +40,9 @@ class Sueddeutsche(BasicNewsRecipe):
(u'Sport', u'http://suche.sueddeutsche.de/query/%23/sort/-docdatetime/drilldown/%C2%A7ressort%3A%5ESport%24?output=rss'), (u'Sport', u'http://suche.sueddeutsche.de/query/%23/sort/-docdatetime/drilldown/%C2%A7ressort%3A%5ESport%24?output=rss'),
(u'Leben', u'http://suche.sueddeutsche.de/query/%23/sort/-docdatetime/drilldown/%C2%A7ressort%3A%5ELeben%24?output=rss'), (u'Leben', u'http://suche.sueddeutsche.de/query/%23/sort/-docdatetime/drilldown/%C2%A7ressort%3A%5ELeben%24?output=rss'),
(u'Karriere', u'http://suche.sueddeutsche.de/query/%23/sort/-docdatetime/drilldown/%C2%A7ressort%3A%5EKarriere%24?output=rss'), (u'Karriere', u'http://suche.sueddeutsche.de/query/%23/sort/-docdatetime/drilldown/%C2%A7ressort%3A%5EKarriere%24?output=rss'),
(u'Bildung', u'http://rss.sueddeutsche.de/rss/bildung'), #2012-01-26 AGe New (u'Bildung', u'http://rss.sueddeutsche.de/rss/bildung'),
(u'Gesundheit', u'http://rss.sueddeutsche.de/rss/gesundheit'), #2012-01-26 AGe New (u'Gesundheit', u'http://rss.sueddeutsche.de/rss/gesundheit'),
(u'Stil', u'http://rss.sueddeutsche.de/rss/stil'), #2012-01-26 AGe New (u'Stil', u'http://rss.sueddeutsche.de/rss/stil'),
(u'München & Region', u'http://suche.sueddeutsche.de/query/%23/sort/-docdatetime/drilldown/%C2%A7ressort%3A%5EMünchen&Region%24?output=rss'), (u'München & Region', u'http://suche.sueddeutsche.de/query/%23/sort/-docdatetime/drilldown/%C2%A7ressort%3A%5EMünchen&Region%24?output=rss'),
(u'Bayern', u'http://suche.sueddeutsche.de/query/%23/sort/-docdatetime/drilldown/%C2%A7ressort%3A%5EBayern%24?output=rss'), (u'Bayern', u'http://suche.sueddeutsche.de/query/%23/sort/-docdatetime/drilldown/%C2%A7ressort%3A%5EBayern%24?output=rss'),
(u'Medien', u'http://suche.sueddeutsche.de/query/%23/sort/-docdatetime/drilldown/%C2%A7ressort%3A%5EMedien%24?output=rss'), (u'Medien', u'http://suche.sueddeutsche.de/query/%23/sort/-docdatetime/drilldown/%C2%A7ressort%3A%5EMedien%24?output=rss'),

20
recipes/titanic_de.recipe Normal file
View File

@ -0,0 +1,20 @@
from calibre.web.feeds.news import BasicNewsRecipe
class Titanic(BasicNewsRecipe):
title = u'Titanic'
language = 'de'
__author__ = 'Krittika Goyal'
oldest_article = 14 #days
max_articles_per_feed = 25
#encoding = 'cp1252'
use_embedded_content = False
no_stylesheets = True
auto_cleanup = True
feeds = [
('News',
'http://www.titanic-magazin.de/ich.war.bei.der.waffen.rss'),
]

20
recipes/tvp_info.recipe Normal file
View File

@ -0,0 +1,20 @@
# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:fdm=marker:ai
from calibre.web.feeds.news import BasicNewsRecipe
class TVPINFO(BasicNewsRecipe):
title = u'TVP.INFO'
__author__ = 'fenuks'
description = u'Serwis informacyjny TVP.INFO'
category = 'news'
language = 'pl'
cover_url = 'http://s.v3.tvp.pl/files/tvp-info/gfx/logo.png'
oldest_article = 7
max_articles_per_feed = 100
no_stylesheets = True
remove_empty_feeds = True
remove_javascript = True
use_embedded_content = False
ignore_duplicate_articles = {'title', 'url'}
keep_only_tags = [dict(id='contentNews')]
remove_tags = [dict(attrs={'class':['toolbox', 'modulBox read', 'modulBox social', 'videoPlayerBox']}), dict(id='belka')]
feeds = [(u'Wiadomo\u015bci', u'http://tvp.info/informacje?xslt=tvp-info/news/rss.xslt&src_id=191865'),
(u'\u015awiat', u'http://tvp.info/informacje/swiat?xslt=tvp-info/news/rss.xslt&src_id=191867'), (u'Biznes', u'http://tvp.info/informacje/biznes?xslt=tvp-info/news/rss.xslt&src_id=191868'), (u'Nauka', u'http://tvp.info/informacje/nauka?xslt=tvp-info/news/rss.xslt&src_id=191870'), (u'Kultura', u'http://tvp.info/informacje/kultura?xslt=tvp-info/news/rss.xslt&src_id=191869'), (u'Rozmaito\u015bci', u'http://tvp.info/informacje/rozmaitosci?xslt=tvp-info/news/rss.xslt&src_id=191872'), (u'Opinie', u'http://tvp.info/opinie?xslt=tvp-info/news/rss.xslt&src_id=191875'), (u'Komentarze', u'http://tvp.info/opinie/komentarze?xslt=tvp-info/news/rss.xslt&src_id=238200'), (u'Wywiady', u'http://tvp.info/opinie/wywiady?xslt=tvp-info/news/rss.xslt&src_id=236644')]

View File

@ -0,0 +1,13 @@
from calibre.web.feeds.news import BasicNewsRecipe
class AdvancedUserRecipe1356283265(BasicNewsRecipe):
title = u'\u0423\u043a\u0440\u0430\u0457\u043d\u0441\u044c\u043a\u0438\u0439 \u0422\u0438\u0436\u0434\u0435\u043d\u044c'
__author__ = 'rpalyvoda'
oldest_article = 7
max_articles_per_feed = 100
language = 'uk'
cover_url = 'http://tyzhden.ua/Images/Style1/tyzhden.ua-logo2.gif'
masthead_url = 'http://tyzhden.ua/Images/Style1/tyzhden.ua-logo2.gif'
auto_cleanup = True
feeds = [(u'\u041d\u043e\u0432\u0438\u043d\u0438', u'http://tyzhden.ua/RSS/News/'), (u'\u041e\u0440\u0438\u0433\u0456\u043d\u0430\u043b\u044c\u043d\u0456 \u043d\u043e\u0432\u0438\u043d\u0438', u'http://tyzhden.ua/RSS/News.Original/'), (u'\u041f\u0443\u0431\u043b\u0456\u043a\u0430\u0446\u0456\u0457', u'http://tyzhden.ua/RSS/Publications/')]

View File

@ -2,8 +2,8 @@
__license__ = 'GPL v3' __license__ = 'GPL v3'
__copyright__ = '4 February 2011, desUBIKado' __copyright__ = '4 February 2011, desUBIKado'
__author__ = 'desUBIKado' __author__ = 'desUBIKado'
__version__ = 'v0.08' __version__ = 'v0.09'
__date__ = '30, June 2012' __date__ = '02, December 2012'
''' '''
http://www.weblogssl.com/ http://www.weblogssl.com/
''' '''
@ -37,6 +37,7 @@ class weblogssl(BasicNewsRecipe):
,(u'Xataka Mexico', u'http://feeds.weblogssl.com/xatakamx') ,(u'Xataka Mexico', u'http://feeds.weblogssl.com/xatakamx')
,(u'Xataka M\xf3vil', u'http://feeds.weblogssl.com/xatakamovil') ,(u'Xataka M\xf3vil', u'http://feeds.weblogssl.com/xatakamovil')
,(u'Xataka Android', u'http://feeds.weblogssl.com/xatakandroid') ,(u'Xataka Android', u'http://feeds.weblogssl.com/xatakandroid')
,(u'Xataka Windows', u'http://feeds.weblogssl.com/xatakawindows')
,(u'Xataka Foto', u'http://feeds.weblogssl.com/xatakafoto') ,(u'Xataka Foto', u'http://feeds.weblogssl.com/xatakafoto')
,(u'Xataka ON', u'http://feeds.weblogssl.com/xatakaon') ,(u'Xataka ON', u'http://feeds.weblogssl.com/xatakaon')
,(u'Xataka Ciencia', u'http://feeds.weblogssl.com/xatakaciencia') ,(u'Xataka Ciencia', u'http://feeds.weblogssl.com/xatakaciencia')
@ -80,19 +81,31 @@ class weblogssl(BasicNewsRecipe):
keep_only_tags = [dict(name='div', attrs={'id':'infoblock'}), keep_only_tags = [dict(name='div', attrs={'id':'infoblock'}),
dict(name='div', attrs={'class':'post'}), dict(name='div', attrs={'class':'post'}),
dict(name='div', attrs={'id':'blog-comments'}) dict(name='div', attrs={'id':'blog-comments'}),
dict(name='div', attrs={'class':'container'}) #m.xataka.com
] ]
remove_tags = [dict(name='div', attrs={'id':'comment-nav'})] remove_tags = [dict(name='div', attrs={'id':'comment-nav'}),
dict(name='menu', attrs={'class':'social-sharing'}), #m.xataka.com
dict(name='section' , attrs={'class':'comments'}), #m.xataka.com
dict(name='div' , attrs={'class':'article-comments'}), #m.xataka.com
dict(name='nav' , attrs={'class':'article-taxonomy'}) #m.xataka.com
]
remove_tags_after = dict(name='section' , attrs={'class':'comments'})
def print_version(self, url): def print_version(self, url):
return url.replace('http://www.', 'http://m.') return url.replace('http://www.', 'http://m.')
preprocess_regexps = [ preprocess_regexps = [
# Para poner una linea en blanco entre un comentario y el siguiente # Para poner una linea en blanco entre un comentario y el siguiente
(re.compile(r'<li id="c', re.DOTALL|re.IGNORECASE), lambda match: '<br><br><li id="c') (re.compile(r'<li id="c', re.DOTALL|re.IGNORECASE), lambda match: '<br><br><li id="c'),
# Para ver las imágenes en las noticias de m.xataka.com
(re.compile(r'<noscript>', re.DOTALL|re.IGNORECASE), lambda m: ''),
(re.compile(r'</noscript>', re.DOTALL|re.IGNORECASE), lambda m: '')
] ]
# Para sustituir el video incrustado de YouTube por una imagen # Para sustituir el video incrustado de YouTube por una imagen
def preprocess_html(self, soup): def preprocess_html(self, soup):
@ -108,14 +121,16 @@ class weblogssl(BasicNewsRecipe):
# Para obtener la url original del articulo a partir de la de "feedsportal" # Para obtener la url original del articulo a partir de la de "feedsportal"
# El siguiente código es gracias al usuario "bosplans" de www.mobileread.com # El siguiente código es gracias al usuario "bosplans" de www.mobileread.com
# http://www.mobileread.com/forums/sho...d.php?t=130297 # http://www.mobileread.com/forums/showthread.php?t=130297
def get_article_url(self, article): def get_article_url(self, article):
link = article.get('link', None) link = article.get('link', None)
if link is None: if link is None:
return article return article
# if link.split('/')[-4]=="xataka2":
# return article.get('feedburner_origlink', article.get('link', article.get('guid')))
if link.split('/')[-4]=="xataka2": if link.split('/')[-4]=="xataka2":
return article.get('feedburner_origlink', article.get('link', article.get('guid'))) return article.get('guid', None)
if link.split('/')[-1]=="story01.htm": if link.split('/')[-1]=="story01.htm":
link=link.split('/')[-2] link=link.split('/')[-2]
a=['0B','0C','0D','0E','0F','0G','0N' ,'0L0S','0A'] a=['0B','0C','0D','0E','0F','0G','0N' ,'0L0S','0A']

View File

@ -9,15 +9,15 @@ class Zaman (BasicNewsRecipe):
__author__ = u'thomass' __author__ = u'thomass'
oldest_article = 2 oldest_article = 2
max_articles_per_feed =50 max_articles_per_feed =50
# no_stylesheets = True no_stylesheets = True
#delay = 1 #delay = 1
#use_embedded_content = False use_embedded_content = False
encoding = 'ISO 8859-9' encoding = 'utf-8'
publisher = 'Zaman' publisher = 'Feza Gazetecilik'
category = 'news, haberler,TR,gazete' category = 'news, haberler,TR,gazete'
language = 'tr' language = 'tr'
publication_type = 'newspaper ' publication_type = 'newspaper '
extra_css = '.buyukbaslik{font-weight: bold; font-size: 18px;color:#0000FF}'#body{ font-family: Verdana,Helvetica,Arial,sans-serif } .introduction{font-weight: bold} .story-feature{display: block; padding: 0; border: 1px solid; width: 40%; font-size: small} .story-feature h2{text-align: center; text-transform: uppercase} ' extra_css = 'h1{text-transform: capitalize; font-weight: bold; font-size: 22px;color:#0000FF} p{text-align:justify} ' #.introduction{font-weight: bold} .story-feature{display: block; padding: 0; border: 1px solid; width: 40%; font-size: small} .story-feature h2{text-align: center; text-transform: uppercase} '
conversion_options = { conversion_options = {
'tags' : category 'tags' : category
,'language' : language ,'language' : language
@ -26,25 +26,26 @@ class Zaman (BasicNewsRecipe):
} }
cover_img_url = 'https://fbcdn-profile-a.akamaihd.net/hprofile-ak-snc4/188140_81722291869_2111820_n.jpg' cover_img_url = 'https://fbcdn-profile-a.akamaihd.net/hprofile-ak-snc4/188140_81722291869_2111820_n.jpg'
masthead_url = 'http://medya.zaman.com.tr/extentions/zaman.com.tr/img/section/logo-section.png' masthead_url = 'http://medya.zaman.com.tr/extentions/zaman.com.tr/img/section/logo-section.png'
ignore_duplicate_articles = { 'title', 'url' }
auto_cleanup = False
remove_empty_feeds= True
#keep_only_tags = [dict(name='div', attrs={'id':[ 'news-detail-content']}), dict(name='td', attrs={'class':['columnist-detail','columnist_head']}) ] #keep_only_tags = [dict(name='div', attrs={'id':[ 'contentposition19']})]#,dict(name='div', attrs={'id':[ 'xxx']}),dict(name='div', attrs={'id':[ 'xxx']}),dict(name='div', attrs={'id':[ 'xxx']}),dict(name='div', attrs={'id':[ 'xxx']}),dict(name='div', attrs={'id':[ 'xxx']}),dict(name='div', attrs={'id':[ 'xxx']}),dict(name='div', attrs={'id':[ 'news-detail-content']}), dict(name='td', attrs={'class':['columnist-detail','columnist_head']}), ]
remove_tags = [ dict(name='img', attrs={'src':['http://medya.zaman.com.tr/zamantryeni/pics/zamanonline.gif']})]#,dict(name='div', attrs={'class':['radioEmbedBg','radyoProgramAdi']}),dict(name='a', attrs={'class':['webkit-html-attribute-value webkit-html-external-link']}),dict(name='table', attrs={'id':['yaziYorumTablosu']}),dict(name='img', attrs={'src':['http://medya.zaman.com.tr/pics/paylas.gif','http://medya.zaman.com.tr/extentions/zaman.com.tr/img/columnist/ma-16.png']}) remove_tags = [ dict(name='img', attrs={'src':['http://cmsmedya.zaman.com.tr/images/logo/logo.bmp']}),dict(name='hr', attrs={'class':['interactive-hr']})]# remove_tags = [ dict(name='div', attrs={'class':[ 'detayUyari']}),dict(name='div', attrs={'class':[ 'detayYorum']}),dict(name='div', attrs={'class':[ 'addthis_toolbox addthis_default_style ']}),dict(name='div', attrs={'id':[ 'tumYazi']})]#,dict(name='div', attrs={'id':[ 'xxx']}),dict(name='div', attrs={'id':[ 'xxx']}),dict(name='div', attrs={'id':[ 'xxx']}),dict(name='div', attrs={'id':[ 'xxx']}),dict(name='div', attrs={'id':[ 'xxx']}),dict(name='div', attrs={'id':[ 'xxx']}),dict(name='img', attrs={'src':['http://medya.zaman.com.tr/zamantryeni/pics/zamanonline.gif']}),dict(name='div', attrs={'class':['radioEmbedBg','radyoProgramAdi']}),dict(name='a', attrs={'class':['webkit-html-attribute-value webkit-html-external-link']}),dict(name='table', attrs={'id':['yaziYorumTablosu']}),dict(name='img', attrs={'src':['http://medya.zaman.com.tr/pics/paylas.gif','http://medya.zaman.com.tr/extentions/zaman.com.tr/img/columnist/ma-16.png']}),dict(name='div', attrs={'id':[ 'news-detail-gallery']}),dict(name='div', attrs={'id':[ 'news-detail-title-bottom-part']}),dict(name='div', attrs={'id':[ 'news-detail-news-paging-main']})]#
#remove_attributes = ['width','height'] #remove_attributes = ['width','height']
remove_empty_feeds= True remove_empty_feeds= True
feeds = [ feeds = [
( u'Anasayfa', u'http://www.zaman.com.tr/anasayfa.rss'), ( u'Manşet', u'http://www.zaman.com.tr/manset.rss'),
( u'Son Dakika', u'http://www.zaman.com.tr/sondakika.rss'),
#( u'En çok Okunanlar', u'http://www.zaman.com.tr/max_all.rss'),
#( u'Manşet', u'http://www.zaman.com.tr/manset.rss'),
( u'Gündem', u'http://www.zaman.com.tr/gundem.rss'),
( u'Yazarlar', u'http://www.zaman.com.tr/yazarlar.rss'), ( u'Yazarlar', u'http://www.zaman.com.tr/yazarlar.rss'),
( u'Politika', u'http://www.zaman.com.tr/politika.rss'), ( u'Politika', u'http://www.zaman.com.tr/politika.rss'),
( u'Ekonomi', u'http://www.zaman.com.tr/ekonomi.rss'), ( u'Ekonomi', u'http://www.zaman.com.tr/ekonomi.rss'),
( u'Dış Haberler', u'http://www.zaman.com.tr/dishaberler.rss'), ( u'Dış Haberler', u'http://www.zaman.com.tr/dishaberler.rss'),
( u'Son Dakika', u'http://www.zaman.com.tr/sondakika.rss'),
( u'Gündem', u'http://www.zaman.com.tr/gundem.rss'),
( u'Yorumlar', u'http://www.zaman.com.tr/yorumlar.rss'), ( u'Yorumlar', u'http://www.zaman.com.tr/yorumlar.rss'),
( u'Röportaj', u'http://www.zaman.com.tr/roportaj.rss'), ( u'Röportaj', u'http://www.zaman.com.tr/roportaj.rss'),
( u'Dizi Yazı', u'http://www.zaman.com.tr/dizi.rss'), ( u'Dizi Yazı', u'http://www.zaman.com.tr/dizi.rss'),
@ -59,8 +60,9 @@ class Zaman (BasicNewsRecipe):
( u'Cuma Eki', u'http://www.zaman.com.tr/cuma.rss'), ( u'Cuma Eki', u'http://www.zaman.com.tr/cuma.rss'),
( u'Cumaertesi Eki', u'http://www.zaman.com.tr/cumaertesi.rss'), ( u'Cumaertesi Eki', u'http://www.zaman.com.tr/cumaertesi.rss'),
( u'Pazar Eki', u'http://www.zaman.com.tr/pazar.rss'), ( u'Pazar Eki', u'http://www.zaman.com.tr/pazar.rss'),
( u'En çok Okunanlar', u'http://www.zaman.com.tr/max_all.rss'),
( u'Anasayfa', u'http://www.zaman.com.tr/anasayfa.rss'),
] ]
def print_version(self, url): def print_version(self, url):
return url.replace('http://www.zaman.com.tr/haber.do?haberno=', 'http://www.zaman.com.tr/yazdir.do?haberno=') return url.replace('http://www.zaman.com.tr/newsDetail_getNewsById.action?newsId=', 'http://www.zaman.com.tr/newsDetail_openPrintPage.action?newsId=')

View File

@ -0,0 +1,16 @@
# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:fdm=marker:ai
from calibre.web.feeds.news import BasicNewsRecipe
class ZTS(BasicNewsRecipe):
title = u'Zaufana Trzecia Strona'
__author__ = 'fenuks'
description = u'Niezależne źródło wiadomości o świecie bezpieczeństwa IT'
category = 'IT, security'
language = 'pl'
cover_url = 'http://www.zaufanatrzeciastrona.pl/wp-content/uploads/2012/08/z3s_h100.png'
oldest_article = 7
max_articles_per_feed = 100
no_stylesheets = True
remove_empty_feeds = True
keep_only_tags = [dict(name='div', attrs={'class':'post postcontent'})]
remove_tags = [dict(name='div', attrs={'class':'dolna-ramka'})]
feeds = [(u'Strona g\u0142\xf3wna', u'http://feeds.feedburner.com/ZaufanaTrzeciaStronaGlowna'), (u'Drobiazgi', u'http://feeds.feedburner.com/ZaufanaTrzeciaStronaDrobiazgi')]

13
recipes/zaxid_net.recipe Normal file
View File

@ -0,0 +1,13 @@
from calibre.web.feeds.news import BasicNewsRecipe
class AdvancedUserRecipe1356281741(BasicNewsRecipe):
title = u'Zaxid.net'
__author__ = 'rpalyvoda'
oldest_article = 7
max_articles_per_feed = 100
language = 'uk'
cover_url = 'http://upload.wikimedia.org/wikipedia/uk/b/bc/Zaxid-net.jpg'
masthead_url = 'http://upload.wikimedia.org/wikipedia/uk/b/bc/Zaxid-net.jpg'
auto_cleanup = True
feeds = [(u'\u0422\u043e\u043f \u043d\u043e\u0432\u0438\u043d\u0438', u'http://feeds.feedburner.com/zaxid/topNews'), (u'\u0421\u0442\u0440\u0456\u0447\u043a\u0430 \u043d\u043e\u0432\u0438\u043d', u'http://feeds.feedburner.com/zaxid/AllNews'), (u'\u041d\u043e\u0432\u0438\u043d\u0438 \u041b\u044c\u0432\u043e\u0432\u0430', u'http://feeds.feedburner.com/zaxid/Lviv'), (u'\u041d\u043e\u0432\u0438\u043d\u0438 \u0423\u043a\u0440\u0430\u0457\u043d\u0438', u'http://feeds.feedburner.com/zaxid/Ukraine'), (u'\u041d\u043e\u0432\u0438\u043d\u0438 \u0441\u0432\u0456\u0442\u0443', u'http://feeds.feedburner.com/zaxid/World'), (u'\u041d\u043e\u0432\u0438\u043d\u0438 - \u0420\u0430\u0434\u0456\u043e 24', u'\u0420\u0430\u0434\u0456\u043e 24'), (u'\u0411\u043b\u043e\u0433\u0438', u'http://feeds.feedburner.com/zaxid/Blogs'), (u"\u041f\u0443\u0431\u043b\u0456\u043a\u0430\u0446\u0456\u0457 - \u0406\u043d\u0442\u0435\u0440\u0432'\u044e", u'http://feeds.feedburner.com/zaxid/Interview'), (u'\u041f\u0443\u0431\u043b\u0456\u043a\u0430\u0446\u0456\u0457 - \u0421\u0442\u0430\u0442\u0442\u0456', u'http://feeds.feedburner.com/zaxid/Articles'), (u'\u0410\u0444\u0456\u0448\u0430', u'http://zaxid.net/rss/subcategory/140.xml'), (u'\u0413\u0430\u043b\u0438\u0447\u0438\u043d\u0430', u'http://feeds.feedburner.com/zaxid/Galicia'), (u'\u041a\u0443\u043b\u044c\u0442\u0443\u0440\u0430.NET', u'http://feeds.feedburner.com/zaxid/KulturaNET'), (u"\u043d\u0435\u0412\u0456\u0434\u043e\u043c\u0456 \u043b\u044c\u0432\u0456\u0432'\u044f\u043d\u0438", u'http://feeds.feedburner.com/zaxid/UnknownLviv'), (u'\u041b\u0435\u043e\u043f\u043e\u043b\u0456\u0441 MULTIPLEX', u'http://feeds.feedburner.com/zaxid/LeopolisMULTIPLEX'), (u'\u0411\u0438\u0442\u0432\u0430 \u0437\u0430 \u043c\u043e\u0432\u0443', u'http://zaxid.net/rss/subcategory/138.xml'), (u'\u0422\u0440\u0430\u043d\u0441\u043f\u043e\u0440\u0442\u043d\u0430 \u0441\u0445\u0435\u043c\u0430 \u041b\u044c\u0432\u043e\u0432\u0430', u'http://zaxid.net/rss/subcategory/132.xml'), (u'\u0414\u0435\u043c\u0456\u0444\u043e\u043b\u043e\u0433\u0456\u0437\u0430\u0446\u0456\u044f', u'http://zaxid.net/rss/subcategory/130.xml'), (u"\u041c\u0438 \u043f\u0430\u043c'\u044f\u0442\u0430\u0454\u043c\u043e", u'http://feeds.feedburner.com/zaxid/WeRemember'), (u'20 \u0440\u043e\u043a\u0456\u0432 \u041d\u0435\u0437\u0430\u043b\u0435\u0436\u043d\u043e\u0441\u0442\u0456', u'http://zaxid.net/rss/subcategory/129.xml'), (u'\u041f\u0440\u0430\u0432\u043e \u043d\u0430 \u0434\u0438\u0442\u0438\u043d\u0441\u0442\u0432\u043e', u'http://feeds.feedburner.com/zaxid/Childhood'), (u'\u0410\u043d\u043e\u043d\u0441\u0438', u'http://feeds.feedburner.com/zaxid/Announcements')]

Binary file not shown.

View File

@ -81,6 +81,7 @@ body {
background-color: #39a9cf; background-color: #39a9cf;
-moz-border-radius: 5px; -moz-border-radius: 5px;
-webkit-border-radius: 5px; -webkit-border-radius: 5px;
border-radius: 5px;
text-shadow: #27211b 1px 1px 1px; text-shadow: #27211b 1px 1px 1px;
-moz-box-shadow: 5px 5px 5px #222; -moz-box-shadow: 5px 5px 5px #222;
-webkit-box-shadow: 5px 5px 5px #222; -webkit-box-shadow: 5px 5px 5px #222;

Binary file not shown.

Before

Width:  |  Height:  |  Size: 17 KiB

After

Width:  |  Height:  |  Size: 62 KiB

View File

@ -12,6 +12,7 @@ let g:syntastic_cpp_include_dirs = [
\'/usr/include/fontconfig', \'/usr/include/fontconfig',
\'src/qtcurve/common', 'src/qtcurve', \'src/qtcurve/common', 'src/qtcurve',
\'src/unrar', \'src/unrar',
\'src/qt-harfbuzz/src',
\'/usr/include/ImageMagick', \'/usr/include/ImageMagick',
\] \]
let g:syntastic_c_include_dirs = g:syntastic_cpp_include_dirs let g:syntastic_c_include_dirs = g:syntastic_cpp_include_dirs

View File

@ -6,12 +6,13 @@ __license__ = 'GPL v3'
__copyright__ = '2009, Kovid Goyal <kovid@kovidgoyal.net>' __copyright__ = '2009, Kovid Goyal <kovid@kovidgoyal.net>'
__docformat__ = 'restructuredtext en' __docformat__ = 'restructuredtext en'
import os, socket, struct, subprocess, sys, glob import os, socket, struct, subprocess, glob
from distutils.spawn import find_executable from distutils.spawn import find_executable
from PyQt4 import pyqtconfig from PyQt4 import pyqtconfig
from setup import isosx, iswindows, islinux, is64bit from setup import isosx, iswindows, islinux, is64bit
is64bit
OSX_SDK = '/Developer/SDKs/MacOSX10.5.sdk' OSX_SDK = '/Developer/SDKs/MacOSX10.5.sdk'
@ -81,6 +82,7 @@ def consolidate(envvar, default):
pyqt = pyqtconfig.Configuration() pyqt = pyqtconfig.Configuration()
qt_inc = pyqt.qt_inc_dir qt_inc = pyqt.qt_inc_dir
qt_private_inc = []
qt_lib = pyqt.qt_lib_dir qt_lib = pyqt.qt_lib_dir
ft_lib_dirs = [] ft_lib_dirs = []
ft_libs = [] ft_libs = []
@ -140,6 +142,8 @@ elif isosx:
png_libs = ['png12'] png_libs = ['png12']
ft_libs = ['freetype'] ft_libs = ['freetype']
ft_inc_dirs = ['/sw/include/freetype2'] ft_inc_dirs = ['/sw/include/freetype2']
bq = glob.glob('/sw/build/qt-*/include')[-1]
qt_private_inc = ['%s/%s'%(bq, m) for m in ('QtGui', 'QtCore')]
else: else:
# Include directories # Include directories
png_inc_dirs = pkgconfig_include_dirs('libpng', 'PNG_INC_DIR', png_inc_dirs = pkgconfig_include_dirs('libpng', 'PNG_INC_DIR',

View File

@ -102,7 +102,8 @@ class Check(Command):
errors = True errors = True
if errors: if errors:
cPickle.dump(cache, open(self.CACHE, 'wb'), -1) cPickle.dump(cache, open(self.CACHE, 'wb'), -1)
subprocess.call(['gvim', '-f', f]) subprocess.call(['gvim', '-S',
self.j(self.SRC, '../session.vim'), '-f', f])
raise SystemExit(1) raise SystemExit(1)
cache[f] = mtime cache[f] = mtime
for x in builtins: for x in builtins:

View File

@ -18,7 +18,7 @@ from setup.build_environment import (chmlib_inc_dirs,
msvc, MT, win_inc, win_lib, win_ddk, magick_inc_dirs, magick_lib_dirs, msvc, MT, win_inc, win_lib, win_ddk, magick_inc_dirs, magick_lib_dirs,
magick_libs, chmlib_lib_dirs, sqlite_inc_dirs, icu_inc_dirs, magick_libs, chmlib_lib_dirs, sqlite_inc_dirs, icu_inc_dirs,
icu_lib_dirs, win_ddk_lib_dirs, ft_libs, ft_lib_dirs, ft_inc_dirs, icu_lib_dirs, win_ddk_lib_dirs, ft_libs, ft_lib_dirs, ft_inc_dirs,
zlib_libs, zlib_lib_dirs, zlib_inc_dirs, is64bit) zlib_libs, zlib_lib_dirs, zlib_inc_dirs, is64bit, qt_private_inc)
MT MT
isunix = islinux or isosx or isbsd isunix = islinux or isosx or isbsd
@ -183,6 +183,13 @@ extensions = [
sip_files = ['calibre/gui2/progress_indicator/QProgressIndicator.sip'] sip_files = ['calibre/gui2/progress_indicator/QProgressIndicator.sip']
), ),
Extension('qt_hack',
['calibre/ebooks/pdf/render/qt_hack.cpp'],
inc_dirs = qt_private_inc + ['calibre/ebooks/pdf/render', 'qt-harfbuzz/src'],
headers = ['calibre/ebooks/pdf/render/qt_hack.h'],
sip_files = ['calibre/ebooks/pdf/render/qt_hack.sip']
),
Extension('unrar', Extension('unrar',
['unrar/%s.cpp'%(x.partition('.')[0]) for x in ''' ['unrar/%s.cpp'%(x.partition('.')[0]) for x in '''
rar.o strlist.o strfn.o pathfn.o savepos.o smallfn.o global.o file.o rar.o strlist.o strfn.o pathfn.o savepos.o smallfn.o global.o file.o
@ -545,6 +552,9 @@ class Build(Command):
VERSION = 1.0.0 VERSION = 1.0.0
CONFIG += %s CONFIG += %s
''')%(ext.name, ' '.join(ext.headers), ' '.join(ext.sources), archs) ''')%(ext.name, ' '.join(ext.headers), ' '.join(ext.sources), archs)
if ext.inc_dirs:
idir = ' '.join(ext.inc_dirs)
pro += 'INCLUDEPATH = %s\n'%idir
pro = pro.replace('\\', '\\\\') pro = pro.replace('\\', '\\\\')
open(ext.name+'.pro', 'wb').write(pro) open(ext.name+'.pro', 'wb').write(pro)
qmc = [QMAKE, '-o', 'Makefile'] qmc = [QMAKE, '-o', 'Makefile']

View File

@ -39,18 +39,6 @@ class Win32(WinBase):
def msi64(self): def msi64(self):
return installer_name('msi', is64bit=True) return installer_name('msi', is64bit=True)
def sign_msi(self):
import xattr
print ('Signing installers ...')
sign64 = False
msi64 = self.msi64
if os.path.exists(msi64) and 'user.signed' not in xattr.list(msi64):
subprocess.check_call(['scp', msi64, self.VM_NAME +
':build/%s/%s'%(__appname__, msi64)])
sign64 = True
subprocess.check_call(['ssh', self.VM_NAME, '~/sign.sh'], shell=False)
return sign64
def do_dl(self, installer, errmsg): def do_dl(self, installer, errmsg):
subprocess.check_call(('scp', subprocess.check_call(('scp',
'%s:build/%s/%s'%(self.VM_NAME, __appname__, installer), 'dist')) '%s:build/%s/%s'%(self.VM_NAME, __appname__, installer), 'dist'))
@ -62,14 +50,8 @@ class Win32(WinBase):
installer = self.installer() installer = self.installer()
if os.path.exists('build/winfrozen'): if os.path.exists('build/winfrozen'):
shutil.rmtree('build/winfrozen') shutil.rmtree('build/winfrozen')
sign64 = self.sign_msi()
if sign64:
self.do_dl(self.msi64, 'Failed to d/l signed 64 bit installer')
import xattr
xattr.set(self.msi64, 'user.signed', 'true')
self.do_dl(installer, 'Failed to freeze') self.do_dl(installer, 'Failed to freeze')
installer = 'dist/%s-portable-installer-%s.exe'%(__appname__, __version__) installer = 'dist/%s-portable-installer-%s.exe'%(__appname__, __version__)
self.do_dl(installer, 'Failed to get portable installer') self.do_dl(installer, 'Failed to get portable installer')

View File

@ -91,6 +91,7 @@ class Win32Freeze(Command, WixMixIn):
if not is64bit: if not is64bit:
self.build_portable() self.build_portable()
self.build_portable_installer() self.build_portable_installer()
self.sign_installers()
def remove_CRT_from_manifests(self): def remove_CRT_from_manifests(self):
''' '''
@ -101,7 +102,8 @@ class Win32Freeze(Command, WixMixIn):
repl_pat = re.compile( repl_pat = re.compile(
r'(?is)<dependency>.*?Microsoft\.VC\d+\.CRT.*?</dependency>') r'(?is)<dependency>.*?Microsoft\.VC\d+\.CRT.*?</dependency>')
for dll in glob.glob(self.j(self.dll_dir, '*.dll')): for dll in (glob.glob(self.j(self.dll_dir, '*.dll')) +
glob.glob(self.j(self.plugins_dir, '*.pyd'))):
bn = self.b(dll) bn = self.b(dll)
with open(dll, 'rb') as f: with open(dll, 'rb') as f:
raw = f.read() raw = f.read()
@ -488,6 +490,17 @@ class Win32Freeze(Command, WixMixIn):
subprocess.check_call([LZMA + r'\bin\elzma.exe', '-9', '--lzip', name]) subprocess.check_call([LZMA + r'\bin\elzma.exe', '-9', '--lzip', name])
def sign_installers(self):
self.info('Signing installers...')
files = glob.glob(self.j('dist', '*.msi')) + glob.glob(self.j('dist',
'*.exe'))
if not files:
raise ValueError('No installers found')
subprocess.check_call(['signtool.exe', 'sign', '/a', '/d',
'calibre - E-book management', '/du',
'http://calibre-ebook.com', '/t',
'http://timestamp.verisign.com/scripts/timstamp.dll'] + files)
def add_dir_to_zip(self, zf, path, prefix=''): def add_dir_to_zip(self, zf, path, prefix=''):
''' '''
Add a directory recursively to the zip file with an optional prefix. Add a directory recursively to the zip file with an optional prefix.
@ -586,6 +599,10 @@ class Win32Freeze(Command, WixMixIn):
# from files # from files
'unrar.pyd', 'wpd.pyd', 'podofo.pyd', 'unrar.pyd', 'wpd.pyd', 'podofo.pyd',
'progress_indicator.pyd', 'progress_indicator.pyd',
# As per this https://bugs.launchpad.net/bugs/1087816
# on some systems magick.pyd fails to load from memory
# on 64 bit
'magick.pyd',
}: }:
self.add_to_zipfile(zf, pyd, x) self.add_to_zipfile(zf, pyd, x)
os.remove(self.j(x, pyd)) os.remove(self.j(x, pyd))

File diff suppressed because it is too large Load Diff

View File

@ -9,14 +9,14 @@ msgstr ""
"Project-Id-Version: calibre\n" "Project-Id-Version: calibre\n"
"Report-Msgid-Bugs-To: FULL NAME <EMAIL@ADDRESS>\n" "Report-Msgid-Bugs-To: FULL NAME <EMAIL@ADDRESS>\n"
"POT-Creation-Date: 2011-11-25 14:01+0000\n" "POT-Creation-Date: 2011-11-25 14:01+0000\n"
"PO-Revision-Date: 2012-08-15 10:30+0000\n" "PO-Revision-Date: 2012-12-24 08:05+0000\n"
"Last-Translator: Jellby <Unknown>\n" "Last-Translator: Adolfo Jayme Barrientos <fitoschido@gmail.com>\n"
"Language-Team: Español; Castellano <>\n" "Language-Team: Español; Castellano <>\n"
"MIME-Version: 1.0\n" "MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n" "Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n" "Content-Transfer-Encoding: 8bit\n"
"X-Launchpad-Export-Date: 2012-08-16 04:40+0000\n" "X-Launchpad-Export-Date: 2012-12-25 04:46+0000\n"
"X-Generator: Launchpad (build 15810)\n" "X-Generator: Launchpad (build 16378)\n"
#. name for aaa #. name for aaa
msgid "Ghotuo" msgid "Ghotuo"
@ -9584,27 +9584,27 @@ msgstr "Holikachuk"
#. name for hoj #. name for hoj
msgid "Hadothi" msgid "Hadothi"
msgstr "" msgstr "Hadothi"
#. name for hol #. name for hol
msgid "Holu" msgid "Holu"
msgstr "" msgstr "Holu"
#. name for hom #. name for hom
msgid "Homa" msgid "Homa"
msgstr "" msgstr "Homa"
#. name for hoo #. name for hoo
msgid "Holoholo" msgid "Holoholo"
msgstr "" msgstr "Holoholo"
#. name for hop #. name for hop
msgid "Hopi" msgid "Hopi"
msgstr "" msgstr "Hopi"
#. name for hor #. name for hor
msgid "Horo" msgid "Horo"
msgstr "" msgstr "Horo"
#. name for hos #. name for hos
msgid "Ho Chi Minh City Sign Language" msgid "Ho Chi Minh City Sign Language"
@ -9612,27 +9612,27 @@ msgstr "Lengua de signos de Ho Chi Minh"
#. name for hot #. name for hot
msgid "Hote" msgid "Hote"
msgstr "" msgstr "Hote"
#. name for hov #. name for hov
msgid "Hovongan" msgid "Hovongan"
msgstr "" msgstr "Hovongan"
#. name for how #. name for how
msgid "Honi" msgid "Honi"
msgstr "" msgstr "Honi"
#. name for hoy #. name for hoy
msgid "Holiya" msgid "Holiya"
msgstr "" msgstr "Holiya"
#. name for hoz #. name for hoz
msgid "Hozo" msgid "Hozo"
msgstr "" msgstr "Hozo"
#. name for hpo #. name for hpo
msgid "Hpon" msgid "Hpon"
msgstr "" msgstr "Hpon"
#. name for hps #. name for hps
msgid "Hawai'i Pidgin Sign Language" msgid "Hawai'i Pidgin Sign Language"
@ -9640,15 +9640,15 @@ msgstr "Lengua de signos pidyin hawaiana"
#. name for hra #. name for hra
msgid "Hrangkhol" msgid "Hrangkhol"
msgstr "" msgstr "Hrangkhol"
#. name for hre #. name for hre
msgid "Hre" msgid "Hre"
msgstr "" msgstr "Hre"
#. name for hrk #. name for hrk
msgid "Haruku" msgid "Haruku"
msgstr "" msgstr "Haruku"
#. name for hrm #. name for hrm
msgid "Miao; Horned" msgid "Miao; Horned"
@ -9656,19 +9656,19 @@ msgstr ""
#. name for hro #. name for hro
msgid "Haroi" msgid "Haroi"
msgstr "" msgstr "Haroi"
#. name for hrr #. name for hrr
msgid "Horuru" msgid "Horuru"
msgstr "" msgstr "Horuru"
#. name for hrt #. name for hrt
msgid "Hértevin" msgid "Hértevin"
msgstr "" msgstr "Hértevin"
#. name for hru #. name for hru
msgid "Hruso" msgid "Hruso"
msgstr "" msgstr "Hruso"
#. name for hrv #. name for hrv
msgid "Croatian" msgid "Croatian"

View File

@ -12,14 +12,14 @@ msgstr ""
"Report-Msgid-Bugs-To: Debian iso-codes team <pkg-isocodes-" "Report-Msgid-Bugs-To: Debian iso-codes team <pkg-isocodes-"
"devel@lists.alioth.debian.org>\n" "devel@lists.alioth.debian.org>\n"
"POT-Creation-Date: 2011-11-25 14:01+0000\n" "POT-Creation-Date: 2011-11-25 14:01+0000\n"
"PO-Revision-Date: 2011-09-27 15:44+0000\n" "PO-Revision-Date: 2012-12-13 13:56+0000\n"
"Last-Translator: IIDA Yosiaki <iida@gnu.org>\n" "Last-Translator: Shushi Kurose <md81bird@hitaki.net>\n"
"Language-Team: Japanese <translation-team-ja@lists.sourceforge.net>\n" "Language-Team: Japanese <translation-team-ja@lists.sourceforge.net>\n"
"MIME-Version: 1.0\n" "MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n" "Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n" "Content-Transfer-Encoding: 8bit\n"
"X-Launchpad-Export-Date: 2011-11-26 05:21+0000\n" "X-Launchpad-Export-Date: 2012-12-14 05:34+0000\n"
"X-Generator: Launchpad (build 14381)\n" "X-Generator: Launchpad (build 16369)\n"
"Language: ja\n" "Language: ja\n"
#. name for aaa #. name for aaa
@ -86,12 +86,9 @@ msgstr ""
msgid "Abnaki; Eastern" msgid "Abnaki; Eastern"
msgstr "" msgstr ""
# 以下「国国」は、国立国会図書館のサイト。
# ジブチ
# マイペディア「ジブチ」の項に「アファル語」
#. name for aar #. name for aar
msgid "Afar" msgid "Afar"
msgstr "アファル語" msgstr "アファル語"
#. name for aas #. name for aas
msgid "Aasáx" msgid "Aasáx"

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -4,7 +4,7 @@ __license__ = 'GPL v3'
__copyright__ = '2008, Kovid Goyal kovid@kovidgoyal.net' __copyright__ = '2008, Kovid Goyal kovid@kovidgoyal.net'
__docformat__ = 'restructuredtext en' __docformat__ = 'restructuredtext en'
__appname__ = u'calibre' __appname__ = u'calibre'
numeric_version = (0, 9, 8) numeric_version = (0, 9, 12)
__version__ = u'.'.join(map(unicode, numeric_version)) __version__ = u'.'.join(map(unicode, numeric_version))
__author__ = u"Kovid Goyal <kovid@kovidgoyal.net>" __author__ = u"Kovid Goyal <kovid@kovidgoyal.net>"
@ -100,6 +100,7 @@ class Plugins(collections.Mapping):
'freetype', 'freetype',
'woff', 'woff',
'unrar', 'unrar',
'qt_hack',
] ]
if iswindows: if iswindows:
plugins.extend(['winutil', 'wpd', 'winfonts']) plugins.extend(['winutil', 'wpd', 'winfonts'])

View File

@ -661,7 +661,7 @@ from calibre.devices.nuut2.driver import NUUT2
from calibre.devices.iriver.driver import IRIVER_STORY from calibre.devices.iriver.driver import IRIVER_STORY
from calibre.devices.binatone.driver import README from calibre.devices.binatone.driver import README
from calibre.devices.hanvon.driver import (N516, EB511, ALEX, AZBOOKA, THEBOOK, from calibre.devices.hanvon.driver import (N516, EB511, ALEX, AZBOOKA, THEBOOK,
LIBREAIR, ODYSSEY) LIBREAIR, ODYSSEY, KIBANO)
from calibre.devices.edge.driver import EDGE from calibre.devices.edge.driver import EDGE
from calibre.devices.teclast.driver import (TECLAST_K3, NEWSMY, IPAPYRUS, from calibre.devices.teclast.driver import (TECLAST_K3, NEWSMY, IPAPYRUS,
SOVOS, PICO, SUNSTECH_EB700, ARCHOS7O, STASH, WEXLER) SOVOS, PICO, SUNSTECH_EB700, ARCHOS7O, STASH, WEXLER)
@ -712,7 +712,7 @@ plugins += [
BOOQ, BOOQ,
EB600, EB600,
README, README,
N516, N516, KIBANO,
THEBOOK, LIBREAIR, THEBOOK, LIBREAIR,
EB511, EB511,
ELONEX, ELONEX,

View File

@ -121,6 +121,8 @@ def debug(ioreg_to_tmp=False, buf=None, plugins=None,
out('\nDisabled plugins:', textwrap.fill(' '.join([x.__class__.__name__ for x in out('\nDisabled plugins:', textwrap.fill(' '.join([x.__class__.__name__ for x in
disabled_plugins]))) disabled_plugins])))
out(' ') out(' ')
else:
out('\nNo disabled plugins')
found_dev = False found_dev = False
for dev in devplugins: for dev in devplugins:
if not dev.MANAGES_DEVICE_PRESENCE: continue if not dev.MANAGES_DEVICE_PRESENCE: continue

View File

@ -10,7 +10,7 @@ import cStringIO
from calibre.devices.usbms.driver import USBMS from calibre.devices.usbms.driver import USBMS
HTC_BCDS = [0x100, 0x0222, 0x0226, 0x227, 0x228, 0x229, 0x9999] HTC_BCDS = [0x100, 0x0222, 0x0226, 0x227, 0x228, 0x229, 0x0231, 0x9999]
class ANDROID(USBMS): class ANDROID(USBMS):
@ -48,6 +48,7 @@ class ANDROID(USBMS):
0x2910 : HTC_BCDS, 0x2910 : HTC_BCDS,
0xe77 : HTC_BCDS, 0xe77 : HTC_BCDS,
0xff9 : HTC_BCDS, 0xff9 : HTC_BCDS,
0x0001 : [0x255],
}, },
# Eken # Eken
@ -92,7 +93,7 @@ class ANDROID(USBMS):
# Google # Google
0x18d1 : { 0x18d1 : {
0x0001 : [0x0223, 0x230, 0x9999], 0x0001 : [0x0223, 0x230, 0x9999],
0x0003 : [0x0230], 0x0003 : [0x0230, 0x9999],
0x4e11 : [0x0100, 0x226, 0x227], 0x4e11 : [0x0100, 0x226, 0x227],
0x4e12 : [0x0100, 0x226, 0x227], 0x4e12 : [0x0100, 0x226, 0x227],
0x4e21 : [0x0100, 0x226, 0x227, 0x231], 0x4e21 : [0x0100, 0x226, 0x227, 0x231],
@ -212,7 +213,8 @@ class ANDROID(USBMS):
'VIZIO', 'GOOGLE', 'FREESCAL', 'KOBO_INC', 'LENOVO', 'ROCKCHIP', 'VIZIO', 'GOOGLE', 'FREESCAL', 'KOBO_INC', 'LENOVO', 'ROCKCHIP',
'POCKET', 'ONDA_MID', 'ZENITHIN', 'INGENIC', 'PMID701C', 'PD', 'POCKET', 'ONDA_MID', 'ZENITHIN', 'INGENIC', 'PMID701C', 'PD',
'PMP5097C', 'MASS', 'NOVO7', 'ZEKI', 'COBY', 'SXZ', 'USB_2.0', 'PMP5097C', 'MASS', 'NOVO7', 'ZEKI', 'COBY', 'SXZ', 'USB_2.0',
'COBY_MID', 'VS', 'AINOL', 'TOPWISE', 'PAD703'] 'COBY_MID', 'VS', 'AINOL', 'TOPWISE', 'PAD703', 'NEXT8D12',
'MEDIATEK']
WINDOWS_MAIN_MEM = ['ANDROID_PHONE', 'A855', 'A853', 'INC.NEXUS_ONE', WINDOWS_MAIN_MEM = ['ANDROID_PHONE', 'A855', 'A853', 'INC.NEXUS_ONE',
'__UMS_COMPOSITE', '_MB200', 'MASS_STORAGE', '_-_CARD', 'SGH-I897', '__UMS_COMPOSITE', '_MB200', 'MASS_STORAGE', '_-_CARD', 'SGH-I897',
'GT-I9000', 'FILE-STOR_GADGET', 'SGH-T959_CARD', 'SGH-T959', 'SAMSUNG_ANDROID', 'GT-I9000', 'FILE-STOR_GADGET', 'SGH-T959_CARD', 'SGH-T959', 'SAMSUNG_ANDROID',
@ -232,7 +234,7 @@ class ANDROID(USBMS):
'THINKPAD_TABLET', 'SGH-T989', 'YP-G70', 'STORAGE_DEVICE', 'THINKPAD_TABLET', 'SGH-T989', 'YP-G70', 'STORAGE_DEVICE',
'ADVANCED', 'SGH-I727', 'USB_FLASH_DRIVER', 'ANDROID', 'ADVANCED', 'SGH-I727', 'USB_FLASH_DRIVER', 'ANDROID',
'S5830I_CARD', 'MID7042', 'LINK-CREATE', '7035', 'VIEWPAD_7E', 'S5830I_CARD', 'MID7042', 'LINK-CREATE', '7035', 'VIEWPAD_7E',
'NOVO7', 'MB526', '_USB#WYK7MSF8KE', 'TABLET_PC'] 'NOVO7', 'MB526', '_USB#WYK7MSF8KE', 'TABLET_PC', 'F', 'MT65XX_MS']
WINDOWS_CARD_A_MEM = ['ANDROID_PHONE', 'GT-I9000_CARD', 'SGH-I897', WINDOWS_CARD_A_MEM = ['ANDROID_PHONE', 'GT-I9000_CARD', 'SGH-I897',
'FILE-STOR_GADGET', 'SGH-T959_CARD', 'SGH-T959', 'SAMSUNG_ANDROID', 'GT-P1000_CARD', 'FILE-STOR_GADGET', 'SGH-T959_CARD', 'SGH-T959', 'SAMSUNG_ANDROID', 'GT-P1000_CARD',
'A70S', 'A101IT', '7', 'INCREDIBLE', 'A7EB', 'SGH-T849_CARD', 'A70S', 'A101IT', '7', 'INCREDIBLE', 'A7EB', 'SGH-T849_CARD',
@ -243,7 +245,7 @@ class ANDROID(USBMS):
'FILE-CD_GADGET', 'GT-I9001_CARD', 'USB_2.0', 'XT875', 'FILE-CD_GADGET', 'GT-I9001_CARD', 'USB_2.0', 'XT875',
'UMS_COMPOSITE', 'PRO', '.KOBO_VOX', 'SGH-T989_CARD', 'SGH-I727', 'UMS_COMPOSITE', 'PRO', '.KOBO_VOX', 'SGH-T989_CARD', 'SGH-I727',
'USB_FLASH_DRIVER', 'ANDROID', 'MID7042', '7035', 'VIEWPAD_7E', 'USB_FLASH_DRIVER', 'ANDROID', 'MID7042', '7035', 'VIEWPAD_7E',
'NOVO7', 'ADVANCED', 'TABLET_PC'] 'NOVO7', 'ADVANCED', 'TABLET_PC', 'F']
OSX_MAIN_MEM = 'Android Device Main Memory' OSX_MAIN_MEM = 'Android Device Main Memory'

File diff suppressed because it is too large Load Diff

View File

@ -41,6 +41,20 @@ class N516(USBMS):
def can_handle(self, device_info, debug=False): def can_handle(self, device_info, debug=False):
return not is_alex(device_info) return not is_alex(device_info)
class KIBANO(N516):
name = 'Kibano driver'
gui_name = 'Kibano'
description = _('Communicate with the Kibano eBook reader.')
FORMATS = ['epub', 'pdf', 'txt']
BCD = [0x323]
VENDOR_NAME = 'EBOOK'
# We use EXTERNAL_SD_CARD for main mem as some devices have not working
# main memories
WINDOWS_MAIN_MEM = WINDOWS_CARD_A_MEM = ['INTERNAL_SD_CARD',
'EXTERNAL_SD_CARD']
class THEBOOK(N516): class THEBOOK(N516):
name = 'The Book driver' name = 'The Book driver'
gui_name = 'The Book' gui_name = 'The Book'

View File

@ -199,6 +199,11 @@ class KTCollectionsBookList(CollectionsBookList):
('series' in collection_attributes and ('series' in collection_attributes and
book.get('series', None) == category): book.get('series', None) == category):
is_series = True is_series = True
# The category should not be None, but, it has happened.
if not category:
continue
cat_name = category.strip(' ,') cat_name = category.strip(' ,')
if cat_name not in collections: if cat_name not in collections:

View File

@ -1537,7 +1537,11 @@ class KOBOTOUCH(KOBO):
return bookshelves return bookshelves
cursor = connection.cursor() cursor = connection.cursor()
query = "select ShelfName from ShelfContent where ContentId = ? and _IsDeleted = 'false'" query = "select ShelfName " \
"from ShelfContent " \
"where ContentId = ? " \
"and _IsDeleted = 'false' " \
"and ShelfName is not null" # This should never be nulll, but it is protection against an error cause by a sync to the Kobo server
values = (ContentID, ) values = (ContentID, )
cursor.execute(query, values) cursor.execute(query, values)
for i, row in enumerate(cursor): for i, row in enumerate(cursor):
@ -2357,6 +2361,8 @@ class KOBOTOUCH(KOBO):
update_query = 'UPDATE content SET Series=?, SeriesNumber==? where BookID is Null and ContentID = ?' update_query = 'UPDATE content SET Series=?, SeriesNumber==? where BookID is Null and ContentID = ?'
if book.series is None: if book.series is None:
update_values = (None, None, book.contentID, ) update_values = (None, None, book.contentID, )
elif book.series_index is None: # This should never happen, but...
update_values = (book.series, None, book.contentID, )
else: else:
update_values = (book.series, "%g"%book.series_index, book.contentID, ) update_values = (book.series, "%g"%book.series_index, book.contentID, )

View File

@ -13,6 +13,7 @@ from itertools import izip
from calibre import prints from calibre import prints
from calibre.constants import iswindows, numeric_version from calibre.constants import iswindows, numeric_version
from calibre.devices.errors import PathError
from calibre.devices.mtp.base import debug from calibre.devices.mtp.base import debug
from calibre.devices.mtp.defaults import DeviceDefaults from calibre.devices.mtp.defaults import DeviceDefaults
from calibre.ptempfile import SpooledTemporaryFile, PersistentTemporaryDirectory from calibre.ptempfile import SpooledTemporaryFile, PersistentTemporaryDirectory
@ -23,6 +24,12 @@ from calibre.utils.filenames import shorten_components_to
BASE = importlib.import_module('calibre.devices.mtp.%s.driver'%( BASE = importlib.import_module('calibre.devices.mtp.%s.driver'%(
'windows' if iswindows else 'unix')).MTP_DEVICE 'windows' if iswindows else 'unix')).MTP_DEVICE
class MTPInvalidSendPathError(PathError):
def __init__(self, folder):
PathError.__init__(self, 'Trying to send to ignored folder: %s'%folder)
self.folder = folder
class MTP_DEVICE(BASE): class MTP_DEVICE(BASE):
METADATA_CACHE = 'metadata.calibre' METADATA_CACHE = 'metadata.calibre'
@ -46,6 +53,7 @@ class MTP_DEVICE(BASE):
self._prefs = None self._prefs = None
self.device_defaults = DeviceDefaults() self.device_defaults = DeviceDefaults()
self.current_device_defaults = {} self.current_device_defaults = {}
self.highlight_ignored_folders = False
@property @property
def prefs(self): def prefs(self):
@ -59,9 +67,25 @@ class MTP_DEVICE(BASE):
p.defaults['blacklist'] = [] p.defaults['blacklist'] = []
p.defaults['history'] = {} p.defaults['history'] = {}
p.defaults['rules'] = [] p.defaults['rules'] = []
p.defaults['ignored_folders'] = {}
return self._prefs return self._prefs
def is_folder_ignored(self, storage_or_storage_id, name,
ignored_folders=None):
storage_id = unicode(getattr(storage_or_storage_id, 'object_id',
storage_or_storage_id))
name = icu_lower(name)
if ignored_folders is None:
ignored_folders = self.get_pref('ignored_folders')
if storage_id in ignored_folders:
return name in {icu_lower(x) for x in ignored_folders[storage_id]}
return name in {
'alarms', 'android', 'dcim', 'movies', 'music', 'notifications',
'pictures', 'ringtones', 'samsung', 'sony', 'htc', 'bluetooth',
'games', 'lost.dir', 'video', 'whatsapp', 'image'}
def configure_for_kindle_app(self): def configure_for_kindle_app(self):
proxy = self.prefs proxy = self.prefs
with proxy: with proxy:
@ -371,6 +395,8 @@ class MTP_DEVICE(BASE):
for infile, fname, mi in izip(files, names, metadata): for infile, fname, mi in izip(files, names, metadata):
path = self.create_upload_path(prefix, mi, fname, routing) path = self.create_upload_path(prefix, mi, fname, routing)
if path and self.is_folder_ignored(storage, path[0]):
raise MTPInvalidSendPathError(path[0])
parent = self.ensure_parent(storage, path) parent = self.ensure_parent(storage, path)
if hasattr(infile, 'read'): if hasattr(infile, 'read'):
pos = infile.tell() pos = infile.tell()
@ -472,7 +498,7 @@ class MTP_DEVICE(BASE):
def config_widget(self): def config_widget(self):
from calibre.gui2.device_drivers.mtp_config import MTPConfig from calibre.gui2.device_drivers.mtp_config import MTPConfig
return MTPConfig(self) return MTPConfig(self, highlight_ignored_folders=self.highlight_ignored_folders)
def save_settings(self, cw): def save_settings(self, cw):
cw.commit() cw.commit()

View File

@ -239,12 +239,12 @@ class TestDeviceInteraction(unittest.TestCase):
# Test get_filesystem # Test get_filesystem
used_by_one = self.measure_memory_usage(1, used_by_one = self.measure_memory_usage(1,
self.dev.dev.get_filesystem, self.storage.object_id, lambda x: self.dev.dev.get_filesystem, self.storage.object_id,
x) lambda x, l:True)
used_by_many = self.measure_memory_usage(5, used_by_many = self.measure_memory_usage(5,
self.dev.dev.get_filesystem, self.storage.object_id, lambda x: self.dev.dev.get_filesystem, self.storage.object_id,
x) lambda x, l: True)
self.check_memory(used_by_one, used_by_many, self.check_memory(used_by_one, used_by_many,
'Memory consumption during get_filesystem') 'Memory consumption during get_filesystem')

View File

@ -13,6 +13,8 @@ const calibre_device_entry_t calibre_mtp_device_table[] = {
// Amazon Kindle Fire HD // Amazon Kindle Fire HD
, { "Amazon", 0x1949, "Fire HD", 0x0007, DEVICE_FLAGS_ANDROID_BUGS} , { "Amazon", 0x1949, "Fire HD", 0x0007, DEVICE_FLAGS_ANDROID_BUGS}
, { "Amazon", 0x1949, "Fire HD", 0x0008, DEVICE_FLAGS_ANDROID_BUGS}
, { "Amazon", 0x1949, "Fire HD", 0x000a, DEVICE_FLAGS_ANDROID_BUGS}
// Nexus 10 // Nexus 10
, { "Google", 0x18d1, "Nexus 10", 0x4ee2, DEVICE_FLAGS_ANDROID_BUGS} , { "Google", 0x18d1, "Nexus 10", 0x4ee2, DEVICE_FLAGS_ANDROID_BUGS}

View File

@ -212,8 +212,13 @@ class MTP_DEVICE(MTPDeviceBase):
ans += pprint.pformat(storage) ans += pprint.pformat(storage)
return ans return ans
def _filesystem_callback(self, entry): def _filesystem_callback(self, entry, level):
self.filesystem_callback(_('Found object: %s')%entry.get('name', '')) name = entry.get('name', '')
self.filesystem_callback(_('Found object: %s')%name)
if (level == 0 and
self.is_folder_ignored(self._currently_getting_sid, name)):
return False
return True
@property @property
def filesystem_cache(self): def filesystem_cache(self):
@ -234,6 +239,7 @@ class MTP_DEVICE(MTPDeviceBase):
storage.append({'id':sid, 'size':capacity, storage.append({'id':sid, 'size':capacity,
'is_folder':True, 'name':name, 'can_delete':False, 'is_folder':True, 'name':name, 'can_delete':False,
'is_system':True}) 'is_system':True})
self._currently_getting_sid = unicode(sid)
items, errs = self.dev.get_filesystem(sid, items, errs = self.dev.get_filesystem(sid,
self._filesystem_callback) self._filesystem_callback)
all_items.extend(items), all_errs.extend(errs) all_items.extend(items), all_errs.extend(errs)

View File

@ -8,7 +8,9 @@
#define UNICODE #define UNICODE
#include <Python.h> #include <Python.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdlib.h> #include <stdlib.h>
#include <libmtp.h> #include <libmtp.h>
@ -122,7 +124,7 @@ static PyObject* build_file_metadata(LIBMTP_file_t *nf, uint32_t storage_id) {
PyObject *ans = NULL; PyObject *ans = NULL;
ans = Py_BuildValue("{s:s, s:k, s:k, s:k, s:K, s:L, s:O}", ans = Py_BuildValue("{s:s, s:k, s:k, s:k, s:K, s:L, s:O}",
"name", (unsigned long)nf->filename, "name", nf->filename,
"id", (unsigned long)nf->item_id, "id", (unsigned long)nf->item_id,
"parent_id", (unsigned long)nf->parent_id, "parent_id", (unsigned long)nf->parent_id,
"storage_id", (unsigned long)storage_id, "storage_id", (unsigned long)storage_id,
@ -357,10 +359,10 @@ Device_storage_info(Device *self, void *closure) {
// Device.get_filesystem {{{ // Device.get_filesystem {{{
static int recursive_get_files(LIBMTP_mtpdevice_t *dev, uint32_t storage_id, uint32_t parent_id, PyObject *ans, PyObject *errs, PyObject *callback) { static int recursive_get_files(LIBMTP_mtpdevice_t *dev, uint32_t storage_id, uint32_t parent_id, PyObject *ans, PyObject *errs, PyObject *callback, unsigned int level) {
LIBMTP_file_t *f, *files; LIBMTP_file_t *f, *files;
PyObject *entry; PyObject *entry, *r;
int ok = 1; int ok = 1, recurse;
Py_BEGIN_ALLOW_THREADS; Py_BEGIN_ALLOW_THREADS;
files = LIBMTP_Get_Files_And_Folders(dev, storage_id, parent_id); files = LIBMTP_Get_Files_And_Folders(dev, storage_id, parent_id);
@ -372,13 +374,15 @@ static int recursive_get_files(LIBMTP_mtpdevice_t *dev, uint32_t storage_id, uin
entry = build_file_metadata(f, storage_id); entry = build_file_metadata(f, storage_id);
if (entry == NULL) { ok = 0; } if (entry == NULL) { ok = 0; }
else { else {
Py_XDECREF(PyObject_CallFunctionObjArgs(callback, entry, NULL)); r = PyObject_CallFunction(callback, "OI", entry, level);
recurse = (r != NULL && PyObject_IsTrue(r)) ? 1 : 0;
Py_XDECREF(r);
if (PyList_Append(ans, entry) != 0) { ok = 0; } if (PyList_Append(ans, entry) != 0) { ok = 0; }
Py_DECREF(entry); Py_DECREF(entry);
} }
if (ok && f->filetype == LIBMTP_FILETYPE_FOLDER) { if (ok && recurse && f->filetype == LIBMTP_FILETYPE_FOLDER) {
if (!recursive_get_files(dev, storage_id, f->item_id, ans, errs, callback)) { if (!recursive_get_files(dev, storage_id, f->item_id, ans, errs, callback, level+1)) {
ok = 0; ok = 0;
} }
} }
@ -408,7 +412,7 @@ Device_get_filesystem(Device *self, PyObject *args) {
if (errs == NULL || ans == NULL) { PyErr_NoMemory(); return NULL; } if (errs == NULL || ans == NULL) { PyErr_NoMemory(); return NULL; }
LIBMTP_Clear_Errorstack(self->device); LIBMTP_Clear_Errorstack(self->device);
ok = recursive_get_files(self->device, (uint32_t)storage_id, 0, ans, errs, callback); ok = recursive_get_files(self->device, (uint32_t)storage_id, 0xFFFFFFFF, ans, errs, callback, 0);
dump_errorstack(self->device, errs); dump_errorstack(self->device, errs);
if (!ok) { if (!ok) {
Py_DECREF(ans); Py_DECREF(ans);
@ -537,7 +541,7 @@ static PyMethodDef Device_methods[] = {
}, },
{"get_filesystem", (PyCFunction)Device_get_filesystem, METH_VARARGS, {"get_filesystem", (PyCFunction)Device_get_filesystem, METH_VARARGS,
"get_filesystem(storage_id, callback) -> Get the list of files and folders on the device in storage_id. Returns files, errors. callback must be a callable that accepts a single argument. It is called with every found object." "get_filesystem(storage_id, callback) -> Get the list of files and folders on the device in storage_id. Returns files, errors. callback must be a callable that is called as with (entry, level). It is called with every found object. If callback returns False and the object is a folder, it is not recursed into."
}, },
{"get_file", (PyCFunction)Device_get_file, METH_VARARGS, {"get_file", (PyCFunction)Device_get_file, METH_VARARGS,
@ -726,7 +730,20 @@ initlibmtp(void) {
if (MTPError == NULL) return; if (MTPError == NULL) return;
PyModule_AddObject(m, "MTPError", MTPError); PyModule_AddObject(m, "MTPError", MTPError);
// Redirect stdout to get rid of the annoying message about mtpz. Really,
// who designs a library without anyway to control/redirect the debugging
// output, and hardcoded paths that cannot be changed?
int bak, new;
fflush(stdout);
bak = dup(STDOUT_FILENO);
new = open("/dev/null", O_WRONLY);
dup2(new, STDOUT_FILENO);
close(new);
LIBMTP_Init(); LIBMTP_Init();
fflush(stdout);
dup2(bak, STDOUT_FILENO);
close(bak);
LIBMTP_Set_Debug(LIBMTP_DEBUG_NONE); LIBMTP_Set_Debug(LIBMTP_DEBUG_NONE);
Py_INCREF(&DeviceType); Py_INCREF(&DeviceType);

View File

@ -133,12 +133,14 @@ class GetBulkCallback : public IPortableDevicePropertiesBulkCallback {
public: public:
PyObject *items; PyObject *items;
PyObject *subfolders;
unsigned int level;
HANDLE complete; HANDLE complete;
ULONG self_ref; ULONG self_ref;
PyThreadState *thread_state; PyThreadState *thread_state;
PyObject *callback; PyObject *callback;
GetBulkCallback(PyObject *items_dict, HANDLE ev, PyObject* pycallback) : items(items_dict), complete(ev), self_ref(1), thread_state(NULL), callback(pycallback) {} GetBulkCallback(PyObject *items_dict, PyObject *subfolders, unsigned int level, HANDLE ev, PyObject* pycallback) : items(items_dict), subfolders(subfolders), level(level), complete(ev), self_ref(1), thread_state(NULL), callback(pycallback) {}
~GetBulkCallback() {} ~GetBulkCallback() {}
HRESULT __stdcall OnStart(REFGUID Context) { return S_OK; } HRESULT __stdcall OnStart(REFGUID Context) { return S_OK; }
@ -172,7 +174,7 @@ public:
DWORD num = 0, i; DWORD num = 0, i;
wchar_t *property = NULL; wchar_t *property = NULL;
IPortableDeviceValues *properties = NULL; IPortableDeviceValues *properties = NULL;
PyObject *temp, *obj; PyObject *temp, *obj, *r;
HRESULT hr; HRESULT hr;
if (SUCCEEDED(values->GetCount(&num))) { if (SUCCEEDED(values->GetCount(&num))) {
@ -196,7 +198,11 @@ public:
Py_DECREF(temp); Py_DECREF(temp);
set_properties(obj, properties); set_properties(obj, properties);
Py_XDECREF(PyObject_CallFunctionObjArgs(callback, obj, NULL)); r = PyObject_CallFunction(callback, "OI", obj, this->level);
if (r != NULL && PyObject_IsTrue(r)) {
PyList_Append(this->subfolders, PyDict_GetItemString(obj, "id"));
}
Py_XDECREF(r);
properties->Release(); properties = NULL; properties->Release(); properties = NULL;
} }
@ -209,8 +215,7 @@ public:
}; };
static PyObject* bulk_get_filesystem(IPortableDevice *device, IPortableDevicePropertiesBulk *bulk_properties, const wchar_t *storage_id, IPortableDevicePropVariantCollection *object_ids, PyObject *pycallback) { static bool bulk_get_filesystem(unsigned int level, IPortableDevice *device, IPortableDevicePropertiesBulk *bulk_properties, IPortableDevicePropVariantCollection *object_ids, PyObject *pycallback, PyObject *ans, PyObject *subfolders) {
PyObject *folders = NULL;
GUID guid_context = GUID_NULL; GUID guid_context = GUID_NULL;
HANDLE ev = NULL; HANDLE ev = NULL;
IPortableDeviceKeyCollection *properties; IPortableDeviceKeyCollection *properties;
@ -218,18 +223,15 @@ static PyObject* bulk_get_filesystem(IPortableDevice *device, IPortableDevicePro
HRESULT hr; HRESULT hr;
DWORD wait_result; DWORD wait_result;
int pump_result; int pump_result;
BOOL ok = TRUE; bool ok = true;
ev = CreateEvent(NULL, FALSE, FALSE, NULL); ev = CreateEvent(NULL, FALSE, FALSE, NULL);
if (ev == NULL) return PyErr_NoMemory(); if (ev == NULL) {PyErr_NoMemory(); return false; }
folders = PyDict_New();
if (folders == NULL) {PyErr_NoMemory(); goto end;}
properties = create_filesystem_properties_collection(); properties = create_filesystem_properties_collection();
if (properties == NULL) goto end; if (properties == NULL) goto end;
callback = new (std::nothrow) GetBulkCallback(folders, ev, pycallback); callback = new (std::nothrow) GetBulkCallback(ans, subfolders, level, ev, pycallback);
if (callback == NULL) { PyErr_NoMemory(); goto end; } if (callback == NULL) { PyErr_NoMemory(); goto end; }
hr = bulk_properties->QueueGetValuesByObjectList(object_ids, properties, callback, &guid_context); hr = bulk_properties->QueueGetValuesByObjectList(object_ids, properties, callback, &guid_context);
@ -245,13 +247,13 @@ static PyObject* bulk_get_filesystem(IPortableDevice *device, IPortableDevicePro
break; // Event was signalled, bulk operation complete break; // Event was signalled, bulk operation complete
} else if (wait_result == WAIT_OBJECT_0 + 1) { // Messages need to be dispatched } else if (wait_result == WAIT_OBJECT_0 + 1) { // Messages need to be dispatched
pump_result = pump_waiting_messages(); pump_result = pump_waiting_messages();
if (pump_result == 1) { PyErr_SetString(PyExc_RuntimeError, "Application has been asked to quit."); ok = FALSE; break;} if (pump_result == 1) { PyErr_SetString(PyExc_RuntimeError, "Application has been asked to quit."); ok = false; break;}
} else if (wait_result == WAIT_TIMEOUT) { } else if (wait_result == WAIT_TIMEOUT) {
// 60 seconds with no updates, looks bad // 60 seconds with no updates, looks bad
PyErr_SetString(WPDError, "The device seems to have hung."); ok = FALSE; break; PyErr_SetString(WPDError, "The device seems to have hung."); ok = false; break;
} else if (wait_result == WAIT_ABANDONED_0) { } else if (wait_result == WAIT_ABANDONED_0) {
// This should never happen // This should never happen
PyErr_SetString(WPDError, "An unknown error occurred (mutex abandoned)"); ok = FALSE; break; PyErr_SetString(WPDError, "An unknown error occurred (mutex abandoned)"); ok = false; break;
} else { } else {
// The wait failed for some reason // The wait failed for some reason
PyErr_SetFromWindowsErr(0); ok = FALSE; break; PyErr_SetFromWindowsErr(0); ok = FALSE; break;
@ -261,22 +263,21 @@ static PyObject* bulk_get_filesystem(IPortableDevice *device, IPortableDevicePro
if (!ok) { if (!ok) {
bulk_properties->Cancel(guid_context); bulk_properties->Cancel(guid_context);
pump_waiting_messages(); pump_waiting_messages();
Py_DECREF(folders); folders = NULL;
} }
end: end:
if (ev != NULL) CloseHandle(ev); if (ev != NULL) CloseHandle(ev);
if (properties != NULL) properties->Release(); if (properties != NULL) properties->Release();
if (callback != NULL) callback->Release(); if (callback != NULL) callback->Release();
return folders; return ok;
} }
// }}} // }}}
// find_all_objects_in() {{{ // find_objects_in() {{{
static BOOL find_all_objects_in(IPortableDeviceContent *content, IPortableDevicePropVariantCollection *object_ids, const wchar_t *parent_id, PyObject *callback) { static bool find_objects_in(IPortableDeviceContent *content, IPortableDevicePropVariantCollection *object_ids, const wchar_t *parent_id) {
/* /*
* Find all children of the object identified by parent_id, recursively. * Find all children of the object identified by parent_id.
* The child ids are put into object_ids. Returns False if any errors * The child ids are put into object_ids. Returns False if any errors
* occurred (also sets the python exception). * occurred (also sets the python exception).
*/ */
@ -285,8 +286,7 @@ static BOOL find_all_objects_in(IPortableDeviceContent *content, IPortableDevice
PWSTR child_ids[10]; PWSTR child_ids[10];
DWORD fetched, i; DWORD fetched, i;
PROPVARIANT pv; PROPVARIANT pv;
BOOL ok = 1; bool ok = true;
PyObject *id;
PropVariantInit(&pv); PropVariantInit(&pv);
pv.vt = VT_LPWSTR; pv.vt = VT_LPWSTR;
@ -295,7 +295,7 @@ static BOOL find_all_objects_in(IPortableDeviceContent *content, IPortableDevice
hr = content->EnumObjects(0, parent_id, NULL, &children); hr = content->EnumObjects(0, parent_id, NULL, &children);
Py_END_ALLOW_THREADS; Py_END_ALLOW_THREADS;
if (FAILED(hr)) {hresult_set_exc("Failed to get children from device", hr); ok = 0; goto end;} if (FAILED(hr)) {hresult_set_exc("Failed to get children from device", hr); ok = false; goto end;}
hr = S_OK; hr = S_OK;
@ -306,19 +306,12 @@ static BOOL find_all_objects_in(IPortableDeviceContent *content, IPortableDevice
if (SUCCEEDED(hr)) { if (SUCCEEDED(hr)) {
for(i = 0; i < fetched; i++) { for(i = 0; i < fetched; i++) {
pv.pwszVal = child_ids[i]; pv.pwszVal = child_ids[i];
id = wchar_to_unicode(pv.pwszVal);
if (id != NULL) {
Py_XDECREF(PyObject_CallFunctionObjArgs(callback, id, NULL));
Py_DECREF(id);
}
hr2 = object_ids->Add(&pv); hr2 = object_ids->Add(&pv);
pv.pwszVal = NULL; pv.pwszVal = NULL;
if (FAILED(hr2)) { hresult_set_exc("Failed to add child ids to propvariantcollection", hr2); break; } if (FAILED(hr2)) { hresult_set_exc("Failed to add child ids to propvariantcollection", hr2); break; }
ok = find_all_objects_in(content, object_ids, child_ids[i], callback);
if (!ok) break;
} }
for (i = 0; i < fetched; i++) { CoTaskMemFree(child_ids[i]); child_ids[i] = NULL; } for (i = 0; i < fetched; i++) { CoTaskMemFree(child_ids[i]); child_ids[i] = NULL; }
if (FAILED(hr2) || !ok) { ok = 0; goto end; } if (FAILED(hr2) || !ok) { ok = false; goto end; }
} }
} }
@ -340,13 +333,8 @@ static PyObject* get_object_properties(IPortableDeviceProperties *devprops, IPor
Py_END_ALLOW_THREADS; Py_END_ALLOW_THREADS;
if (FAILED(hr)) { hresult_set_exc("Failed to get properties for object", hr); goto end; } if (FAILED(hr)) { hresult_set_exc("Failed to get properties for object", hr); goto end; }
temp = wchar_to_unicode(object_id); ans = Py_BuildValue("{s:N}", "id", wchar_to_unicode(object_id));
if (temp == NULL) goto end; if (ans == NULL) goto end;
ans = PyDict_New();
if (ans == NULL) { PyErr_NoMemory(); goto end; }
if (PyDict_SetItemString(ans, "id", temp) != 0) { Py_DECREF(ans); ans = NULL; PyErr_NoMemory(); goto end; }
set_properties(ans, values); set_properties(ans, values);
end: end:
@ -355,12 +343,12 @@ end:
return ans; return ans;
} }
static PyObject* single_get_filesystem(IPortableDeviceContent *content, const wchar_t *storage_id, IPortableDevicePropVariantCollection *object_ids, PyObject *callback) { static bool single_get_filesystem(unsigned int level, IPortableDeviceContent *content, IPortableDevicePropVariantCollection *object_ids, PyObject *callback, PyObject *ans, PyObject *subfolders) {
DWORD num, i; DWORD num, i;
PROPVARIANT pv; PROPVARIANT pv;
HRESULT hr; HRESULT hr;
BOOL ok = 1; bool ok = true;
PyObject *ans = NULL, *item = NULL; PyObject *item = NULL, *r = NULL, *recurse = NULL;
IPortableDeviceProperties *devprops = NULL; IPortableDeviceProperties *devprops = NULL;
IPortableDeviceKeyCollection *properties = NULL; IPortableDeviceKeyCollection *properties = NULL;
@ -373,32 +361,36 @@ static PyObject* single_get_filesystem(IPortableDeviceContent *content, const wc
hr = object_ids->GetCount(&num); hr = object_ids->GetCount(&num);
if (FAILED(hr)) { hresult_set_exc("Failed to get object id count", hr); goto end; } if (FAILED(hr)) { hresult_set_exc("Failed to get object id count", hr); goto end; }
ans = PyDict_New();
if (ans == NULL) goto end;
for (i = 0; i < num; i++) { for (i = 0; i < num; i++) {
ok = 0; ok = false;
recurse = NULL;
PropVariantInit(&pv); PropVariantInit(&pv);
hr = object_ids->GetAt(i, &pv); hr = object_ids->GetAt(i, &pv);
if (SUCCEEDED(hr) && pv.pwszVal != NULL) { if (SUCCEEDED(hr) && pv.pwszVal != NULL) {
item = get_object_properties(devprops, properties, pv.pwszVal); item = get_object_properties(devprops, properties, pv.pwszVal);
if (item != NULL) { if (item != NULL) {
Py_XDECREF(PyObject_CallFunctionObjArgs(callback, item, NULL)); r = PyObject_CallFunction(callback, "OI", item, level);
if (r != NULL && PyObject_IsTrue(r)) recurse = item;
Py_XDECREF(r);
PyDict_SetItem(ans, PyDict_GetItemString(item, "id"), item); PyDict_SetItem(ans, PyDict_GetItemString(item, "id"), item);
Py_DECREF(item); item = NULL; Py_DECREF(item); item = NULL;
ok = 1; ok = true;
} }
} else hresult_set_exc("Failed to get item from IPortableDevicePropVariantCollection", hr); } else hresult_set_exc("Failed to get item from IPortableDevicePropVariantCollection", hr);
PropVariantClear(&pv); PropVariantClear(&pv);
if (!ok) { Py_DECREF(ans); ans = NULL; break; } if (!ok) break;
if (recurse != NULL) {
if (PyList_Append(subfolders, PyDict_GetItemString(recurse, "id")) == -1) ok = false;
}
if (!ok) break;
} }
end: end:
if (devprops != NULL) devprops->Release(); if (devprops != NULL) devprops->Release();
if (properties != NULL) properties->Release(); if (properties != NULL) properties->Release();
return ans; return ok;
} }
// }}} // }}}
@ -438,35 +430,60 @@ end:
return values; return values;
} // }}} } // }}}
PyObject* wpd::get_filesystem(IPortableDevice *device, const wchar_t *storage_id, IPortableDevicePropertiesBulk *bulk_properties, PyObject *callback) { // {{{ static bool get_files_and_folders(unsigned int level, IPortableDevice *device, IPortableDeviceContent *content, IPortableDevicePropertiesBulk *bulk_properties, const wchar_t *parent_id, PyObject *callback, PyObject *ans) { // {{{
PyObject *folders = NULL; bool ok = true;
IPortableDevicePropVariantCollection *object_ids = NULL; IPortableDevicePropVariantCollection *object_ids = NULL;
PyObject *subfolders = NULL;
HRESULT hr;
subfolders = PyList_New(0);
if (subfolders == NULL) { ok = false; goto end; }
Py_BEGIN_ALLOW_THREADS;
hr = CoCreateInstance(CLSID_PortableDevicePropVariantCollection, NULL,
CLSCTX_INPROC_SERVER, IID_PPV_ARGS(&object_ids));
Py_END_ALLOW_THREADS;
if (FAILED(hr)) { hresult_set_exc("Failed to create propvariantcollection", hr); ok = false; goto end; }
ok = find_objects_in(content, object_ids, parent_id);
if (!ok) goto end;
if (bulk_properties != NULL) ok = bulk_get_filesystem(level, device, bulk_properties, object_ids, callback, ans, subfolders);
else ok = single_get_filesystem(level, content, object_ids, callback, ans, subfolders);
if (!ok) goto end;
for (Py_ssize_t i = 0; i < PyList_GET_SIZE(subfolders); i++) {
const wchar_t *child_id = unicode_to_wchar(PyList_GET_ITEM(subfolders, i));
if (child_id == NULL) { ok = false; break; }
ok = get_files_and_folders(level+1, device, content, bulk_properties, child_id, callback, ans);
if (!ok) break;
}
end:
if (object_ids != NULL) object_ids->Release();
Py_XDECREF(subfolders);
return ok;
} // }}}
PyObject* wpd::get_filesystem(IPortableDevice *device, const wchar_t *storage_id, IPortableDevicePropertiesBulk *bulk_properties, PyObject *callback) { // {{{
PyObject *ans = NULL;
IPortableDeviceContent *content = NULL; IPortableDeviceContent *content = NULL;
HRESULT hr; HRESULT hr;
BOOL ok;
ans = PyDict_New();
if (ans == NULL) return PyErr_NoMemory();
Py_BEGIN_ALLOW_THREADS; Py_BEGIN_ALLOW_THREADS;
hr = device->Content(&content); hr = device->Content(&content);
Py_END_ALLOW_THREADS; Py_END_ALLOW_THREADS;
if (FAILED(hr)) { hresult_set_exc("Failed to create content interface", hr); goto end; } if (FAILED(hr)) { hresult_set_exc("Failed to create content interface", hr); goto end; }
Py_BEGIN_ALLOW_THREADS; if (!get_files_and_folders(0, device, content, bulk_properties, storage_id, callback, ans)) {
hr = CoCreateInstance(CLSID_PortableDevicePropVariantCollection, NULL, Py_DECREF(ans); ans = NULL;
CLSCTX_INPROC_SERVER, IID_PPV_ARGS(&object_ids)); }
Py_END_ALLOW_THREADS;
if (FAILED(hr)) { hresult_set_exc("Failed to create propvariantcollection", hr); goto end; }
ok = find_all_objects_in(content, object_ids, storage_id, callback);
if (!ok) goto end;
if (bulk_properties != NULL) folders = bulk_get_filesystem(device, bulk_properties, storage_id, object_ids, callback);
else folders = single_get_filesystem(content, storage_id, object_ids, callback);
end: end:
if (content != NULL) content->Release(); if (content != NULL) content->Release();
if (object_ids != NULL) object_ids->Release(); return ans;
return folders;
} // }}} } // }}}
PyObject* wpd::get_file(IPortableDevice *device, const wchar_t *object_id, PyObject *dest, PyObject *callback) { // {{{ PyObject* wpd::get_file(IPortableDevice *device, const wchar_t *object_id, PyObject *dest, PyObject *callback) { // {{{

View File

@ -164,7 +164,7 @@ static PyMethodDef Device_methods[] = {
}, },
{"get_filesystem", (PyCFunction)py_get_filesystem, METH_VARARGS, {"get_filesystem", (PyCFunction)py_get_filesystem, METH_VARARGS,
"get_filesystem(storage_id, callback) -> Get all files/folders on the storage identified by storage_id. Tries to use bulk operations when possible. callback must be a callable that accepts a single argument. It is called with every found id and then with the metadata for every id." "get_filesystem(storage_id, callback) -> Get all files/folders on the storage identified by storage_id. Tries to use bulk operations when possible. callback must be a callable that is called as (object, level). It is called with every found object. If the callback returns False and the object is a folder, it is not recursed into."
}, },
{"get_file", (PyCFunction)py_get_file, METH_VARARGS, {"get_file", (PyCFunction)py_get_file, METH_VARARGS,

View File

@ -214,13 +214,14 @@ class MTP_DEVICE(MTPDeviceBase):
return True return True
def _filesystem_callback(self, obj): def _filesystem_callback(self, obj, level):
if isinstance(obj, dict): n = obj.get('name', '')
n = obj.get('name', '') msg = _('Found object: %s')%n
msg = _('Found object: %s')%n if (level == 0 and
else: self.is_folder_ignored(self._currently_getting_sid, n)):
msg = _('Found id: %s')%obj return False
self.filesystem_callback(msg) self.filesystem_callback(msg)
return obj.get('is_folder', False)
@property @property
def filesystem_cache(self): def filesystem_cache(self):
@ -241,6 +242,7 @@ class MTP_DEVICE(MTPDeviceBase):
break break
storage = {'id':storage_id, 'size':capacity, 'name':name, storage = {'id':storage_id, 'size':capacity, 'name':name,
'is_folder':True, 'can_delete':False, 'is_system':True} 'is_folder':True, 'can_delete':False, 'is_system':True}
self._currently_getting_sid = unicode(storage_id)
id_map = self.dev.get_filesystem(storage_id, id_map = self.dev.get_filesystem(storage_id,
self._filesystem_callback) self._filesystem_callback)
for x in id_map.itervalues(): x['storage_id'] = storage_id for x in id_map.itervalues(): x['storage_id'] = storage_id

View File

@ -12,24 +12,24 @@ pprint, io
def build(mod='wpd'): def build(mod='wpd'):
master = subprocess.Popen('ssh -MN getafix'.split()) master = subprocess.Popen('ssh -MN getafix'.split())
master2 = subprocess.Popen('ssh -MN xp_build'.split()) master2 = subprocess.Popen('ssh -MN win64'.split())
try: try:
while not glob.glob(os.path.expanduser('~/.ssh/*kovid@xp_build*')): while not glob.glob(os.path.expanduser('~/.ssh/*kovid@win64*')):
time.sleep(0.05) time.sleep(0.05)
builder = subprocess.Popen('ssh xp_build ~/build-wpd'.split()) builder = subprocess.Popen('ssh win64 ~/build-wpd'.split())
if builder.wait() != 0: if builder.wait() != 0:
raise Exception('Failed to build plugin') raise Exception('Failed to build plugin')
while not glob.glob(os.path.expanduser('~/.ssh/*kovid@getafix*')): while not glob.glob(os.path.expanduser('~/.ssh/*kovid@getafix*')):
time.sleep(0.05) time.sleep(0.05)
syncer = subprocess.Popen('ssh getafix ~/test-wpd'.split()) syncer = subprocess.Popen('ssh getafix ~/update-calibre'.split())
if syncer.wait() != 0: if syncer.wait() != 0:
raise Exception('Failed to rsync to getafix') raise Exception('Failed to rsync to getafix')
subprocess.check_call( subprocess.check_call(
('scp xp_build:build/calibre/src/calibre/plugins/%s.pyd /tmp'%mod).split()) ('scp win64:build/calibre/src/calibre/plugins/%s.pyd /tmp'%mod).split())
subprocess.check_call( subprocess.check_call(
('scp /tmp/%s.pyd getafix:calibre/src/calibre/devices/mtp/windows'%mod).split()) ('scp /tmp/%s.pyd getafix:calibre-src/src/calibre/devices/mtp/windows'%mod).split())
p = subprocess.Popen( p = subprocess.Popen(
'ssh getafix calibre-debug -e calibre/src/calibre/devices/mtp/windows/remote.py'.split()) 'ssh getafix calibre-debug -e calibre-src/src/calibre/devices/mtp/windows/remote.py'.split())
p.wait() p.wait()
print() print()
finally: finally:
@ -59,7 +59,7 @@ def main():
# return # return
from calibre.devices.scanner import win_scanner from calibre.devices.scanner import win_scanner
from calibre.devices.mtp.windows.driver import MTP_DEVICE from calibre.devices.mtp.driver import MTP_DEVICE
dev = MTP_DEVICE(None) dev = MTP_DEVICE(None)
dev.startup() dev.startup()
print (dev.wpd, dev.wpd_error) print (dev.wpd, dev.wpd_error)

View File

@ -54,6 +54,8 @@ def synchronous(tlockname):
class ConnectionListener (Thread): class ConnectionListener (Thread):
NOT_SERVICED_COUNT = 6
def __init__(self, driver): def __init__(self, driver):
Thread.__init__(self) Thread.__init__(self)
self.daemon = True self.daemon = True
@ -78,8 +80,8 @@ class ConnectionListener (Thread):
if not self.driver.connection_queue.empty(): if not self.driver.connection_queue.empty():
queue_not_serviced_count += 1 queue_not_serviced_count += 1
if queue_not_serviced_count >= 3: if queue_not_serviced_count >= self.NOT_SERVICED_COUNT:
self.driver._debug('queue not serviced') self.driver._debug('queue not serviced', queue_not_serviced_count)
try: try:
sock = self.driver.connection_queue.get_nowait() sock = self.driver.connection_queue.get_nowait()
s = self.driver._json_encode( s = self.driver._json_encode(
@ -1281,10 +1283,10 @@ class SMART_DEVICE_APP(DeviceConfig, DevicePlugin):
self._close_listen_socket() self._close_listen_socket()
return message return message
else: else:
while i < 100: # try up to 100 random port numbers while i < 100: # try 9090 then up to 99 random port numbers
i += 1 i += 1
port = self._attach_to_port(self.listen_socket, port = self._attach_to_port(self.listen_socket,
random.randint(8192, 32000)) 9090 if i == 1 else random.randint(8192, 32000))
if port != 0: if port != 0:
break break
if port == 0: if port == 0:

View File

@ -19,9 +19,10 @@ class TECLAST_K3(USBMS):
PRODUCT_ID = [0x3203] PRODUCT_ID = [0x3203]
BCD = [0x0000, 0x0100] BCD = [0x0000, 0x0100]
VENDOR_NAME = ['TECLAST', 'IMAGIN', 'RK28XX', 'PER3274B', 'BEBOOK'] VENDOR_NAME = ['TECLAST', 'IMAGIN', 'RK28XX', 'PER3274B', 'BEBOOK',
'RK2728', 'MR700']
WINDOWS_MAIN_MEM = WINDOWS_CARD_A_MEM = ['DIGITAL_PLAYER', 'TL-K5', WINDOWS_MAIN_MEM = WINDOWS_CARD_A_MEM = ['DIGITAL_PLAYER', 'TL-K5',
'EREADER', 'USB-MSC', 'PER3274B', 'BEBOOK'] 'EREADER', 'USB-MSC', 'PER3274B', 'BEBOOK', 'USER']
MAIN_MEMORY_VOLUME_LABEL = 'K3 Main Memory' MAIN_MEMORY_VOLUME_LABEL = 'K3 Main Memory'
STORAGE_CARD_VOLUME_LABEL = 'K3 Storage Card' STORAGE_CARD_VOLUME_LABEL = 'K3 Storage Card'

View File

@ -14,50 +14,32 @@ import os
from calibre.customize.conversion import OutputFormatPlugin, \ from calibre.customize.conversion import OutputFormatPlugin, \
OptionRecommendation OptionRecommendation
from calibre.ptempfile import TemporaryDirectory from calibre.ptempfile import TemporaryDirectory
from calibre.constants import iswindows
UNITS = [ UNITS = ['millimeter', 'centimeter', 'point', 'inch' , 'pica' , 'didot',
'millimeter', 'cicero', 'devicepixel']
'point',
'inch' ,
'pica' ,
'didot',
'cicero',
'devicepixel',
]
PAPER_SIZES = ['b2', PAPER_SIZES = [u'a0', u'a1', u'a2', u'a3', u'a4', u'a5', u'a6', u'b0', u'b1',
'a9', u'b2', u'b3', u'b4', u'b5', u'b6', u'legal', u'letter']
'executive',
'tabloid',
'b4',
'b5',
'b6',
'b7',
'b0',
'b1',
'letter',
'b3',
'a7',
'a8',
'b8',
'b9',
'a3',
'a1',
'folio',
'c5e',
'dle',
'a0',
'ledger',
'legal',
'a6',
'a2',
'b10',
'a5',
'comm10e',
'a4']
ORIENTATIONS = ['portrait', 'landscape'] class PDFMetadata(object): # {{{
def __init__(self, oeb_metadata=None):
from calibre import force_unicode
from calibre.ebooks.metadata import authors_to_string
self.title = _(u'Unknown')
self.author = _(u'Unknown')
self.tags = u''
if oeb_metadata != None:
if len(oeb_metadata.title) >= 1:
self.title = oeb_metadata.title[0].value
if len(oeb_metadata.creator) >= 1:
self.author = authors_to_string([x.value for x in oeb_metadata.creator])
if oeb_metadata.subject:
self.tags = u', '.join(map(unicode, oeb_metadata.subject))
self.title = force_unicode(self.title)
self.author = force_unicode(self.author)
# }}}
class PDFOutput(OutputFormatPlugin): class PDFOutput(OutputFormatPlugin):
@ -66,9 +48,14 @@ class PDFOutput(OutputFormatPlugin):
file_type = 'pdf' file_type = 'pdf'
options = set([ options = set([
OptionRecommendation(name='override_profile_size', recommended_value=False,
help=_('Normally, the PDF page size is set by the output profile'
' chosen under page options. This option will cause the '
' page size settings under PDF Output to override the '
' size specified by the output profile.')),
OptionRecommendation(name='unit', recommended_value='inch', OptionRecommendation(name='unit', recommended_value='inch',
level=OptionRecommendation.LOW, short_switch='u', choices=UNITS, level=OptionRecommendation.LOW, short_switch='u', choices=UNITS,
help=_('The unit of measure. Default is inch. Choices ' help=_('The unit of measure for page sizes. Default is inch. Choices '
'are %s ' 'are %s '
'Note: This does not override the unit for margins!') % UNITS), 'Note: This does not override the unit for margins!') % UNITS),
OptionRecommendation(name='paper_size', recommended_value='letter', OptionRecommendation(name='paper_size', recommended_value='letter',
@ -80,10 +67,6 @@ class PDFOutput(OutputFormatPlugin):
help=_('Custom size of the document. Use the form widthxheight ' help=_('Custom size of the document. Use the form widthxheight '
'EG. `123x321` to specify the width and height. ' 'EG. `123x321` to specify the width and height. '
'This overrides any specified paper-size.')), 'This overrides any specified paper-size.')),
OptionRecommendation(name='orientation', recommended_value='portrait',
level=OptionRecommendation.LOW, choices=ORIENTATIONS,
help=_('The orientation of the page. Default is portrait. Choices '
'are %s') % ORIENTATIONS),
OptionRecommendation(name='preserve_cover_aspect_ratio', OptionRecommendation(name='preserve_cover_aspect_ratio',
recommended_value=False, recommended_value=False,
help=_('Preserve the aspect ratio of the cover, instead' help=_('Preserve the aspect ratio of the cover, instead'
@ -108,6 +91,14 @@ class PDFOutput(OutputFormatPlugin):
OptionRecommendation(name='pdf_mono_font_size', OptionRecommendation(name='pdf_mono_font_size',
recommended_value=16, help=_( recommended_value=16, help=_(
'The default font size for monospaced text')), 'The default font size for monospaced text')),
OptionRecommendation(name='pdf_mark_links', recommended_value=False,
help=_('Surround all links with a red box, useful for debugging.')),
OptionRecommendation(name='old_pdf_engine', recommended_value=False,
help=_('Use the old, less capable engine to generate the PDF')),
OptionRecommendation(name='uncompressed_pdf',
recommended_value=False, help=_(
'Generate an uncompressed PDF, useful for debugging, '
'only works with the new PDF engine.')),
]) ])
def convert(self, oeb_book, output_path, input_plugin, opts, log): def convert(self, oeb_book, output_path, input_plugin, opts, log):
@ -200,33 +191,18 @@ class PDFOutput(OutputFormatPlugin):
if k in family_map: if k in family_map:
val[i].value = family_map[k] val[i].value = family_map[k]
def remove_font_specification(self):
# Qt produces image based pdfs on windows when non-generic fonts are specified
# This might change in Qt WebKit 2.3+ you will have to test.
for item in self.oeb.manifest:
if not hasattr(item.data, 'cssRules'): continue
for i, rule in enumerate(item.data.cssRules):
if rule.type != rule.STYLE_RULE: continue
ff = rule.style.getProperty('font-family')
if ff is None: continue
val = ff.propertyValue
for i in xrange(val.length):
k = icu_lower(val[i].value)
if k not in {'serif', 'sans', 'sans-serif', 'sansserif',
'monospace', 'cursive', 'fantasy'}:
val[i].value = ''
def convert_text(self, oeb_book): def convert_text(self, oeb_book):
from calibre.ebooks.pdf.writer import PDFWriter
from calibre.ebooks.metadata.opf2 import OPF from calibre.ebooks.metadata.opf2 import OPF
if self.opts.old_pdf_engine:
from calibre.ebooks.pdf.writer import PDFWriter
PDFWriter
else:
from calibre.ebooks.pdf.render.from_html import PDFWriter
self.log.debug('Serializing oeb input to disk for processing...') self.log.debug('Serializing oeb input to disk for processing...')
self.get_cover_data() self.get_cover_data()
if iswindows: self.handle_embedded_fonts()
self.remove_font_specification()
else:
self.handle_embedded_fonts()
with TemporaryDirectory('_pdf_out') as oeb_dir: with TemporaryDirectory('_pdf_out') as oeb_dir:
from calibre.customize.ui import plugin_for_output_format from calibre.customize.ui import plugin_for_output_format
@ -240,9 +216,9 @@ class PDFOutput(OutputFormatPlugin):
'toc', None)) 'toc', None))
def write(self, Writer, items, toc): def write(self, Writer, items, toc):
from calibre.ebooks.pdf.writer import PDFMetadata
writer = Writer(self.opts, self.log, cover_data=self.cover_data, writer = Writer(self.opts, self.log, cover_data=self.cover_data,
toc=toc) toc=toc)
writer.report_progress = self.report_progress
close = False close = False
if not hasattr(self.output_path, 'write'): if not hasattr(self.output_path, 'write'):

View File

@ -1125,7 +1125,7 @@ OptionRecommendation(name='search_replace',
RemoveFakeMargins()(self.oeb, self.log, self.opts) RemoveFakeMargins()(self.oeb, self.log, self.opts)
RemoveAdobeMargins()(self.oeb, self.log, self.opts) RemoveAdobeMargins()(self.oeb, self.log, self.opts)
if self.opts.subset_embedded_fonts: if self.opts.subset_embedded_fonts and self.output_plugin.file_type != 'pdf':
from calibre.ebooks.oeb.transforms.subset import SubsetFonts from calibre.ebooks.oeb.transforms.subset import SubsetFonts
SubsetFonts()(self.oeb, self.log, self.opts) SubsetFonts()(self.oeb, self.log, self.opts)

View File

@ -335,32 +335,50 @@ class HeuristicProcessor(object):
This function intentionally leaves hyphenated content alone as that is handled by the This function intentionally leaves hyphenated content alone as that is handled by the
dehyphenate routine in a separate step dehyphenate routine in a separate step
''' '''
def style_unwrap(match):
style_close = match.group('style_close')
style_open = match.group('style_open')
if style_open and style_close:
return style_close+' '+style_open
elif style_open and not style_close:
return ' '+style_open
elif not style_open and style_close:
return style_close+' '
else:
return ' '
# define the pieces of the regex # define the pieces of the regex
lookahead = "(?<=.{"+str(length)+u"}([a-zäëïöüàèìòùáćéíĺóŕńśúýâêîôûçąężıãõñæøþðßěľščťžňďřů,:)\IA\u00DF]|(?<!\&\w{4});))" # (?<!\&\w{4});) is a semicolon not part of an entity lookahead = "(?<=.{"+str(length)+u"}([a-zäëïöüàèìòùáćéíĺóŕńśúýâêîôûçąężıãõñæøþðßěľščťžňďřů,:)\IA\u00DF]|(?<!\&\w{4});))" # (?<!\&\w{4});) is a semicolon not part of an entity
em_en_lookahead = "(?<=.{"+str(length)+u"}[\u2013\u2014])" em_en_lookahead = "(?<=.{"+str(length)+u"}[\u2013\u2014])"
soft_hyphen = u"\xad" soft_hyphen = u"\xad"
line_ending = "\s*</(span|[iubp]|div)>\s*(</(span|[iubp]|div)>)?" line_ending = "\s*(?P<style_close></(span|[iub])>)?\s*(</(p|div)>)?"
blanklines = "\s*(?P<up2threeblanks><(p|span|div)[^>]*>\s*(<(p|span|div)[^>]*>\s*</(span|p|div)>\s*)</(span|p|div)>\s*){0,3}\s*" blanklines = "\s*(?P<up2threeblanks><(p|span|div)[^>]*>\s*(<(p|span|div)[^>]*>\s*</(span|p|div)>\s*)</(span|p|div)>\s*){0,3}\s*"
line_opening = "<(span|[iubp]|div)[^>]*>\s*(<(span|[iubp]|div)[^>]*>)?\s*" line_opening = "<(p|div)[^>]*>\s*(?P<style_open><(span|[iub])[^>]*>)?\s*"
txt_line_wrap = u"((\u0020|\u0009)*\n){1,4}" txt_line_wrap = u"((\u0020|\u0009)*\n){1,4}"
unwrap_regex = lookahead+line_ending+blanklines+line_opening
em_en_unwrap_regex = em_en_lookahead+line_ending+blanklines+line_opening
shy_unwrap_regex = soft_hyphen+line_ending+blanklines+line_opening
if format == 'txt': if format == 'txt':
unwrap_regex = lookahead+txt_line_wrap unwrap_regex = lookahead+txt_line_wrap
em_en_unwrap_regex = em_en_lookahead+txt_line_wrap em_en_unwrap_regex = em_en_lookahead+txt_line_wrap
shy_unwrap_regex = soft_hyphen+txt_line_wrap shy_unwrap_regex = soft_hyphen+txt_line_wrap
else:
unwrap_regex = lookahead+line_ending+blanklines+line_opening
em_en_unwrap_regex = em_en_lookahead+line_ending+blanklines+line_opening
shy_unwrap_regex = soft_hyphen+line_ending+blanklines+line_opening
unwrap = re.compile(u"%s" % unwrap_regex, re.UNICODE) unwrap = re.compile(u"%s" % unwrap_regex, re.UNICODE)
em_en_unwrap = re.compile(u"%s" % em_en_unwrap_regex, re.UNICODE) em_en_unwrap = re.compile(u"%s" % em_en_unwrap_regex, re.UNICODE)
shy_unwrap = re.compile(u"%s" % shy_unwrap_regex, re.UNICODE) shy_unwrap = re.compile(u"%s" % shy_unwrap_regex, re.UNICODE)
content = unwrap.sub(' ', content) if format == 'txt':
content = em_en_unwrap.sub('', content) content = unwrap.sub(' ', content)
content = shy_unwrap.sub('', content) content = em_en_unwrap.sub('', content)
content = shy_unwrap.sub('', content)
else:
content = unwrap.sub(style_unwrap, content)
content = em_en_unwrap.sub(style_unwrap, content)
content = shy_unwrap.sub(style_unwrap, content)
return content return content
def txt_process(self, match): def txt_process(self, match):

View File

@ -17,7 +17,7 @@ from urllib import unquote
from calibre.ebooks.chardet import detect_xml_encoding from calibre.ebooks.chardet import detect_xml_encoding
from calibre.constants import iswindows from calibre.constants import iswindows
from calibre import unicode_path, as_unicode from calibre import unicode_path, as_unicode, replace_entities
class Link(object): class Link(object):
''' '''
@ -147,6 +147,7 @@ class HTMLFile(object):
url = match.group(i) url = match.group(i)
if url: if url:
break break
url = replace_entities(url)
try: try:
link = self.resolve(url) link = self.resolve(url)
except ValueError: except ValueError:

View File

@ -75,6 +75,20 @@ class Worker(Thread): # Get details {{{
9: ['sept'], 9: ['sept'],
12: ['déc'], 12: ['déc'],
}, },
'br': {
1: ['janeiro'],
2: ['fevereiro'],
3: ['março'],
4: ['abril'],
5: ['maio'],
6: ['junho'],
7: ['julho'],
8: ['agosto'],
9: ['setembro'],
10: ['outubro'],
11: ['novembro'],
12: ['dezembro'],
},
'es': { 'es': {
1: ['enero'], 1: ['enero'],
2: ['febrero'], 2: ['febrero'],
@ -89,7 +103,7 @@ class Worker(Thread): # Get details {{{
11: ['noviembre'], 11: ['noviembre'],
12: ['diciembre'], 12: ['diciembre'],
}, },
'jp': { 'jp': {
1: [u'1月'], 1: [u'1月'],
2: [u'2月'], 2: [u'2月'],
3: [u'3月'], 3: [u'3月'],
@ -117,6 +131,7 @@ class Worker(Thread): # Get details {{{
text()="Product details" or \ text()="Product details" or \
text()="Détails sur le produit" or \ text()="Détails sur le produit" or \
text()="Detalles del producto" or \ text()="Detalles del producto" or \
text()="Detalhes do produto" or \
text()="登録情報"]/../div[@class="content"] text()="登録情報"]/../div[@class="content"]
''' '''
# Editor: is for Spanish # Editor: is for Spanish
@ -126,6 +141,7 @@ class Worker(Thread): # Get details {{{
starts-with(text(), "Editore:") or \ starts-with(text(), "Editore:") or \
starts-with(text(), "Editeur") or \ starts-with(text(), "Editeur") or \
starts-with(text(), "Editor:") or \ starts-with(text(), "Editor:") or \
starts-with(text(), "Editora:") or \
starts-with(text(), "出版社:")] starts-with(text(), "出版社:")]
''' '''
self.language_xpath = ''' self.language_xpath = '''
@ -141,7 +157,7 @@ class Worker(Thread): # Get details {{{
''' '''
self.ratings_pat = re.compile( self.ratings_pat = re.compile(
r'([0-9.]+) ?(out of|von|su|étoiles sur|つ星のうち|de un máximo de) ([\d\.]+)( (stars|Sternen|stelle|estrellas)){0,1}') r'([0-9.]+) ?(out of|von|su|étoiles sur|つ星のうち|de un máximo de|de) ([\d\.]+)( (stars|Sternen|stelle|estrellas|estrelas)){0,1}')
lm = { lm = {
'eng': ('English', 'Englisch'), 'eng': ('English', 'Englisch'),
@ -150,6 +166,7 @@ class Worker(Thread): # Get details {{{
'deu': ('German', 'Deutsch'), 'deu': ('German', 'Deutsch'),
'spa': ('Spanish', 'Espa\xf1ol', 'Espaniol'), 'spa': ('Spanish', 'Espa\xf1ol', 'Espaniol'),
'jpn': ('Japanese', u'日本語'), 'jpn': ('Japanese', u'日本語'),
'por': ('Portuguese', 'Português'),
} }
self.lang_map = {} self.lang_map = {}
for code, names in lm.iteritems(): for code, names in lm.iteritems():
@ -435,7 +452,7 @@ class Worker(Thread): # Get details {{{
def parse_cover(self, root): def parse_cover(self, root):
imgs = root.xpath('//img[@id="prodImage" and @src]') imgs = root.xpath('//img[(@id="prodImage" or @id="original-main-image") and @src]')
if imgs: if imgs:
src = imgs[0].get('src') src = imgs[0].get('src')
if '/no-image-avail' not in src: if '/no-image-avail' not in src:
@ -505,6 +522,7 @@ class Amazon(Source):
'it' : _('Italy'), 'it' : _('Italy'),
'jp' : _('Japan'), 'jp' : _('Japan'),
'es' : _('Spain'), 'es' : _('Spain'),
'br' : _('Brazil'),
} }
options = ( options = (
@ -570,6 +588,8 @@ class Amazon(Source):
url = 'http://amzn.com/'+asin url = 'http://amzn.com/'+asin
elif domain == 'uk': elif domain == 'uk':
url = 'http://www.amazon.co.uk/dp/'+asin url = 'http://www.amazon.co.uk/dp/'+asin
elif domain == 'br':
url = 'http://www.amazon.com.br/dp/'+asin
else: else:
url = 'http://www.amazon.%s/dp/%s'%(domain, asin) url = 'http://www.amazon.%s/dp/%s'%(domain, asin)
if url: if url:
@ -629,7 +649,7 @@ class Amazon(Source):
q['field-isbn'] = isbn q['field-isbn'] = isbn
else: else:
# Only return book results # Only return book results
q['search-alias'] = 'stripbooks' q['search-alias'] = 'digital-text' if domain == 'br' else 'stripbooks'
if title: if title:
title_tokens = list(self.get_title_tokens(title)) title_tokens = list(self.get_title_tokens(title))
if title_tokens: if title_tokens:
@ -661,6 +681,8 @@ class Amazon(Source):
udomain = 'co.uk' udomain = 'co.uk'
elif domain == 'jp': elif domain == 'jp':
udomain = 'co.jp' udomain = 'co.jp'
elif domain == 'br':
udomain = 'com.br'
url = 'http://www.amazon.%s/s/?'%udomain + urlencode(encoded_q) url = 'http://www.amazon.%s/s/?'%udomain + urlencode(encoded_q)
return url, domain return url, domain
@ -978,6 +1000,16 @@ if __name__ == '__main__': # tests {{{
), ),
] # }}} ] # }}}
br_tests = [ # {{{
(
{'title':'Guerra dos Tronos'},
[title_test('A Guerra dos Tronos - As Crônicas de Gelo e Fogo',
exact=True), authors_test(['George R. R. Martin'])
]
),
] # }}}
def do_test(domain, start=0, stop=None): def do_test(domain, start=0, stop=None):
tests = globals().get(domain+'_tests') tests = globals().get(domain+'_tests')
if stop is None: if stop is None:
@ -988,7 +1020,7 @@ if __name__ == '__main__': # tests {{{
do_test('com') do_test('com')
#do_test('de') # do_test('de')
# }}} # }}}

View File

@ -483,8 +483,8 @@ def identify(log, abort, # {{{
log('The identify phase took %.2f seconds'%(time.time() - start_time)) log('The identify phase took %.2f seconds'%(time.time() - start_time))
log('The longest time (%f) was taken by:'%longest, lp) log('The longest time (%f) was taken by:'%longest, lp)
log('Merging results from different sources and finding earliest', log('Merging results from different sources and finding earliest ',
'publication dates from the xisbn service') 'publication dates from the worldcat.org service')
start_time = time.time() start_time = time.time()
results = merge_identify_results(results, log) results = merge_identify_results(results, log)

View File

@ -126,6 +126,7 @@ class EXTHHeader(object): # {{{
elif idx == 113: # ASIN or other id elif idx == 113: # ASIN or other id
try: try:
self.uuid = content.decode('ascii') self.uuid = content.decode('ascii')
self.mi.set_identifier('mobi-asin', self.uuid)
except: except:
self.uuid = None self.uuid = None
elif idx == 116: elif idx == 116:

View File

@ -74,11 +74,12 @@ def remove_kindlegen_markup(parts):
part = "".join(srcpieces) part = "".join(srcpieces)
parts[i] = part parts[i] = part
# we can safely remove all of the Kindlegen generated data-AmznPageBreak tags # we can safely remove all of the Kindlegen generated data-AmznPageBreak
# attributes
find_tag_with_AmznPageBreak_pattern = re.compile( find_tag_with_AmznPageBreak_pattern = re.compile(
r'''(<[^>]*\sdata-AmznPageBreak=[^>]*>)''', re.IGNORECASE) r'''(<[^>]*\sdata-AmznPageBreak=[^>]*>)''', re.IGNORECASE)
within_tag_AmznPageBreak_position_pattern = re.compile( within_tag_AmznPageBreak_position_pattern = re.compile(
r'''\sdata-AmznPageBreak=['"][^'"]*['"]''') r'''\sdata-AmznPageBreak=['"]([^'"]*)['"]''')
for i in xrange(len(parts)): for i in xrange(len(parts)):
part = parts[i] part = parts[i]
@ -86,10 +87,8 @@ def remove_kindlegen_markup(parts):
for j in range(len(srcpieces)): for j in range(len(srcpieces)):
tag = srcpieces[j] tag = srcpieces[j]
if tag.startswith('<'): if tag.startswith('<'):
for m in within_tag_AmznPageBreak_position_pattern.finditer(tag): srcpieces[j] = within_tag_AmznPageBreak_position_pattern.sub(
replacement = '' lambda m:' style="page-break-after:%s"'%m.group(1), tag)
tag = within_tag_AmznPageBreak_position_pattern.sub(replacement, tag, 1)
srcpieces[j] = tag
part = "".join(srcpieces) part = "".join(srcpieces)
parts[i] = part parts[i] = part

View File

@ -44,6 +44,18 @@ def locate_beg_end_of_tag(ml, aid):
return plt, pgt return plt, pgt
return 0, 0 return 0, 0
def reverse_tag_iter(block):
''' Iterate over all tags in block in reverse order, i.e. last tag
to first tag. '''
end = len(block)
while True:
pgt = block.rfind(b'>', 0, end)
if pgt == -1: break
plt = block.rfind(b'<', 0, pgt)
if plt == -1: break
yield block[plt:pgt+1]
end = plt
class Mobi8Reader(object): class Mobi8Reader(object):
def __init__(self, mobi6_reader, log): def __init__(self, mobi6_reader, log):
@ -275,13 +287,12 @@ class Mobi8Reader(object):
return '%s/%s'%(fi.type, fi.filename), idtext return '%s/%s'%(fi.type, fi.filename), idtext
def get_id_tag(self, pos): def get_id_tag(self, pos):
# find the correct tag by actually searching in the destination # Find the first tag with a named anchor (name or id attribute) before
# textblock at position # pos
fi = self.get_file_info(pos) fi = self.get_file_info(pos)
if fi.num is None and fi.start is None: if fi.num is None and fi.start is None:
raise ValueError('No file contains pos: %d'%pos) raise ValueError('No file contains pos: %d'%pos)
textblock = self.parts[fi.num] textblock = self.parts[fi.num]
id_map = []
npos = pos - fi.start npos = pos - fi.start
pgt = textblock.find(b'>', npos) pgt = textblock.find(b'>', npos)
plt = textblock.find(b'<', npos) plt = textblock.find(b'<', npos)
@ -290,28 +301,15 @@ class Mobi8Reader(object):
if plt == npos or pgt < plt: if plt == npos or pgt < plt:
npos = pgt + 1 npos = pgt + 1
textblock = textblock[0:npos] textblock = textblock[0:npos]
# find id links only inside of tags id_re = re.compile(br'''<[^>]+\sid\s*=\s*['"]([^'"]+)['"]''')
# inside any < > pair find all "id=' and return whatever is inside name_re = re.compile(br'''<\s*a\s*\sname\s*=\s*['"]([^'"]+)['"]''')
# the quotes for tag in reverse_tag_iter(textblock):
id_pattern = re.compile(br'''<[^>]*\sid\s*=\s*['"]([^'"]*)['"][^>]*>''', m = id_re.match(tag) or name_re.match(tag)
re.IGNORECASE) if m is not None:
for m in re.finditer(id_pattern, textblock): return m.group(1)
id_map.append((m.start(), m.group(1)))
if not id_map: # No tag found, link to start of file
# Found no id in the textblock, link must be to top of file return b''
return b''
# if npos is before first id= inside a tag, return the first
if npos < id_map[0][0]:
return id_map[0][1]
# if npos is after the last id= inside a tag, return the last
if npos > id_map[-1][0]:
return id_map[-1][1]
# otherwise find last id before npos
for i, item in enumerate(id_map):
if npos < item[0]:
return id_map[i-1][1]
return id_map[0][1]
def create_guide(self): def create_guide(self):
guide = Guide() guide = Guide()

View File

@ -92,6 +92,31 @@ class BookIndexing
this.last_check = [body.scrollWidth, body.scrollHeight] this.last_check = [body.scrollWidth, body.scrollHeight]
return ans return ans
all_links_and_anchors: () ->
body = document.body
links = []
anchors = {}
for a in document.querySelectorAll("body a[href], body [id], body a[name]")
if window.paged_display?.in_paged_mode
geom = window.paged_display.column_location(a)
else
br = a.getBoundingClientRect()
[left, top] = viewport_to_document(br.left, br.top, a.ownerDocument)
geom = {'left':left, 'top':top, 'width':br.right-br.left, 'height':br.bottom-br.top}
href = a.getAttribute('href')
if href
links.push([href, geom])
id = a.getAttribute("id")
if id and id not in anchors
anchors[id] = geom
if a.tagName in ['A', "a"]
name = a.getAttribute("name")
if name and name not in anchors
anchors[name] = geom
return {'links':links, 'anchors':anchors}
if window? if window?
window.book_indexing = new BookIndexing() window.book_indexing = new BookIndexing()

View File

@ -242,6 +242,18 @@ class PagedDisplay
# Return the number of the column that contains xpos # Return the number of the column that contains xpos
return Math.floor(xpos/this.page_width) return Math.floor(xpos/this.page_width)
column_location: (elem) ->
# Return the location of elem relative to its containing column
br = elem.getBoundingClientRect()
[left, top] = calibre_utils.viewport_to_document(br.left, br.top, elem.ownerDocument)
c = this.column_at(left)
width = Math.min(br.right, (c+1)*this.page_width) - br.left
if br.bottom < br.top
br.bottom = window.innerHeight
height = Math.min(br.bottom, window.innerHeight) - br.top
left -= c*this.page_width
return {'column':c, 'left':left, 'top':top, 'width':width, 'height':height}
column_boundaries: () -> column_boundaries: () ->
# Return the column numbers at the left edge and after the right edge # Return the column numbers at the left edge and after the right edge
# of the viewport # of the viewport

View File

@ -320,13 +320,11 @@ class OEBReader(object):
self.logger.warn(u'Spine item %r not found' % idref) self.logger.warn(u'Spine item %r not found' % idref)
continue continue
item = manifest.ids[idref] item = manifest.ids[idref]
spine.add(item, elem.get('linear')) if item.media_type.lower() in OEB_DOCS and hasattr(item.data, 'xpath'):
for item in spine: spine.add(item, elem.get('linear'))
if item.media_type.lower() not in OEB_DOCS: else:
if not hasattr(item.data, 'xpath'): self.oeb.log.warn('The item %s is not a XML document.'
self.oeb.log.warn('The item %s is not a XML document.' ' Removing it from spine.'%item.href)
' Removing it from spine.'%item.href)
spine.remove(item)
if len(spine) == 0: if len(spine) == 0:
raise OEBError("Spine is empty") raise OEBError("Spine is empty")
self._spine_add_extra() self._spine_add_extra()

Some files were not shown because too many files have changed in this diff Show More