Sync to trunk.

This commit is contained in:
John Schember 2011-03-06 08:04:03 -05:00
commit 5064222036
292 changed files with 51669 additions and 36067 deletions

View File

@ -19,6 +19,169 @@
# new recipes:
# - title:
- version: 0.7.48
date: 2011-03-04
new features:
- title: "Changes to the internal database structure used by calibre"
description: >
"These changes will allow calibre, in the future, to support book language, arbitrary book identifiers and keep track of when the metadata for a book was last modified. WARNING: Because of these changes, if you downgrade calibre versions after upgrading to 0.7.48, you will lose any changes you make to the ISBN of book entries in your calibre database, so do not downgrade unless you really have to. Also note that the first time you start calibre after this update, the startup will be slow as the database structure is being changed."
- title: "Launch of a new website that catalogues DRM free ebooks. http://drmfree.calibre-ebook.com"
description: "A growing catalogue of DRM free ebooks. Ebooks that you actually own after paying, instead of just renting."
type: major
- title: "News download: Add an option to keep at most x issues of a particular periodical in the calibre library. Use the Advanced tab in the Fetch news dialog for your news source to set this option."
tickets: [9168]
- title: "You can now right click on the cover in the book details panel to copy/paste a new cover."
tickets: [9255]
- title: "Add an entry to the add books drop down menu to easily add formats to an existing book record"
- title: "Tag browser: Clicking on a nested category now searches for the category alone. Clicking twice searches for the category and all its descendants and so on."
tickets: [9166, 9169]
- title: "Add a button to the Manage authors dialog to copy author sort values to author"
- title: "Decrease startup times on large libraries by using a faster algorithm to parse stored dates"
- title: "Add quick create links to easily create custom columns of commonly used types to the add custom column dialog"
- title: "Allow drag drop of images to change cover in book details window."
tickets: [9226]
- title: "Device susbsytem: Create a drive info file named driveinfo.calibre in the root of each device drive for USB connected devices. This file contains various useful data. API Change: The open method of the device plugins now accepts an extra parameter library_uuid which is the id of the calibre library connected tot eh device"
bug fixes:
- title: "Conversion pipeline: Fix regression in 0.7.46 that caused loss of some CSS information when converting HTML produced by Microsoft Word. Also remove empty tags from microsoft namespaces when parsing HTML"
- title: "Try harder to ensure that the worker log temporary files are deleted in windows"
- title: "CHM Input: Handle CHM files that dont specify a topics file."
tickets: [9253]
- title: "Fix regression that caused memory leak in Tag Browser. This would show up as the memory usage of calibre increasing when switching libraries."
tickets: [9246]
- title: "Fix bug that caused preferences->behavior to not show the output format set by the welcome wizard, and instead default to showing EPUB"
- title: "Fix bug that caused wrong books to be deleted from library if you choose 'delete from library and device' while the library is sorted by the On device column"
- title: "MOBI Input: Ignore all ASCII control codes except CR, NL and Tab."
tickets: [9219]
improved recipes:
- Credit Slips
- Seattle Times
- MacWorld
- Austin Statesman
- EPL Talk
- Gawker
- Deadspin
new recipes:
- title: "Thai Post Today and Daily Post"
author: "Chotechai P."
- title: "RBC.ru"
author: Chewi
- title: Helsingin Sanomat
author: oneillpt
- title: "LWN Weekly"
author: David Cavalca
- title: "New York Times Sports and Technology Blogs"
author: rylsfan
- title: "Historia and Buctaras"
author: Silviu Cotoara
- title: "Buffalo News"
author: ChappyOnIce
- title: "Dotpod"
author: Federico Escalada
- version: 0.7.47
date: 2011-02-25
new features:
- title: "Tag Browser: Support the creation of nested User Categories"
description: "See http://calibre-ebook.com/user_manual/gui.html#tag-browser for details"
type: major
- title: "Disable Kent District Library plugin to download series information. The website could not handle the load calibre's 2 million users put on it. You can manually re-enable it if you really want series information, but it is very slow"
- title: "Drivers for the Wexler T7001, Archos 7, Wink and Xperia X10"
- title: "Comic Input: Add option to not add links to individual pages to the Table of Contents when converting CBC files"
- title: "EPUB Output: Try to ensure that the cover image always has an id='cover' to workaround Nook cover reading bug."
tickets: [8182]
- title: "ODT input: Update odfpy library to latest version, adds support for bookmarks"
- title: "EPUB Output: Remove unnecessary CSS page breaks as they confuse the latest release of iBooks"
bug fixes:
- title: "Fix regression in 0.7.46 that broke creating date and composite custom columns"
- title: "Linux binary build: Fix ImageMagick trying to load system modules instead of bundled modules"
- title: "Kobo driver: Handle missing firmware version file"
- title: "ODT Input: Do not force the background color to white."
tickets: [9118]
- title: "MOBI Input: Do not speciy text-align for every paragraph. Fixes text-align inheritance issues for newer MOBIs with nested divs."
tickets: [9098]
- title: "EPUB Output: Do not set the file-as attribute on title elements in the OPF as the current OPF spec does not support file-as. Instead use a calibre extension to OPF."
tickets: [9109]
- title: "Content server: Fix regression that broke browsing User Categories via OPDS"
tickets: [9090]
- title: "Update the book details panel after adding books incase automerge is turned on and the current book is affected"
tickets: [9073]
- title: "FB2 Output: Fix paragraph spacing sometime incorrect."
tickets: [8927]
- title: "Tag Browser: Fix generation of search query for authors with quote characters in their names"
tickets: [9071]
- title: "Fix bug that could cause download of cover/social metadata from Amazon to sometimes fail"
- title: "LRF Input: Workaround for broken LRF files from BookDesigner that have incomplete TextStyle elements"
improved recipes:
- Le Monde
- Gizmodo
- Lifehacker
- ESPN
- Adevarul
- gsp.ro
- Ming Pao
new recipes:
- title: "Flickr Blog"
author: Ricardo Jurado
- title: "Various Romanian news sources"
author: Silviu Cotoara
- title: "Osnews.pl and SwiatCzytnikow"
author: Tomasz Dlugosz
- title: "Roger Ebert Journal"
author: Shane Erstad
- version: 0.7.46
date: 2011-02-18
@ -60,6 +223,8 @@
- title: "TXT Input: New paragraph-type option (off) to disable modifying the paragraph structure."
- title: "Device driver for the Kendo/Yifang M7 and the Wolder Mibuk Life"
- title: "For people building calibre from source, note that calibre now requires SIP >= 4.12 to build"
bug fixes:
- title: "Fix main memory and storage card for Cybook Orizon being swapped with some firmwares"

View File

@ -108,8 +108,10 @@ function init() {
function toplevel_layout() {
var last = $(".toplevel li").last();
var title = $('.toplevel h3').first();
var bottom = last.position().top + last.height() - title.position().top;
$("#main").height(Math.max(200, bottom+75));
if (title && title.position()) {
var bottom = last.position().top + last.height() - title.position().top;
$("#main").height(Math.max(200, bottom+75));
}
}
function toplevel() {

View File

@ -83,7 +83,7 @@ categories_use_field_for_author_name = 'author'
# Note that the "r'" in front of the { is necessary if there are backslashes
# (\ characters) in the template. It doesn't hurt anything to leave it there
# even if there aren't any backslashes.
categories_collapsed_name_template = r'{first.sort:shorten(4,"",0)} - {last.sort:shorten(4,"",0)}'
categories_collapsed_name_template = r'{first.sort:shorten(4,,0)} - {last.sort:shorten(4,,0)}'
categories_collapsed_rating_template = r'{first.avg_rating:4.2f:ifempty(0)} - {last.avg_rating:4.2f:ifempty(0)}'
categories_collapsed_popularity_template = r'{first.count:d} - {last.count:d}'
@ -349,3 +349,9 @@ public_smtp_relay_delay = 301
# after a restart of calibre.
draw_hidden_section_indicators = True
#: The maximum width and height for covers saved in the calibre library
# All covers in the calibre library will be resized, preserving aspect ratio,
# to fit within this size. This is to prevent slowdowns caused by extremely
# large covers
maximum_cover_size = (1200, 1600)

Binary file not shown.

After

Width:  |  Height:  |  Size: 6.3 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.8 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 800 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 249 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 401 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 627 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 765 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 617 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.6 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 181 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 4.7 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 340 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.3 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 686 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 501 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 507 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 836 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 527 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 827 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 367 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 521 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 722 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 722 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 411 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 863 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 432 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 387 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 657 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 219 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 123 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.1 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 11 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 272 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 4.0 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 222 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 229 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 510 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 307 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 441 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 327 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.9 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 4.1 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 6.3 KiB

View File

@ -0,0 +1,68 @@
__license__ = 'GPL v3'
__copyright__ = '2011, Darko Miletic <darko.miletic at gmail.com>'
'''
www.20minutos.es
'''
from calibre.web.feeds.news import BasicNewsRecipe
class t20Minutos(BasicNewsRecipe):
title = '20 Minutos'
__author__ = 'Darko Miletic'
description = 'Diario de informacion general y local mas leido de Espania, noticias de ultima hora de Espania, el mundo, local, deportes, noticias curiosas y mas'
publisher = '20 Minutos Online SL'
category = 'news, politics, Spain'
oldest_article = 2
max_articles_per_feed = 200
no_stylesheets = True
encoding = 'utf8'
use_embedded_content = True
language = 'es'
remove_empty_feeds = True
publication_type = 'newspaper'
masthead_url = 'http://estaticos.20minutos.es/css4/img/ui/logo-301x54.png'
extra_css = """
body{font-family: Arial,Helvetica,sans-serif }
img{margin-bottom: 0.4em; display:block}
"""
conversion_options = {
'comment' : description
, 'tags' : category
, 'publisher' : publisher
, 'language' : language
}
remove_tags = [dict(attrs={'class':'mf-viral'})]
remove_attributes=['border']
feeds = [
(u'Principal' , u'http://20minutos.feedsportal.com/c/32489/f/478284/index.rss')
,(u'Cine' , u'http://20minutos.feedsportal.com/c/32489/f/478285/index.rss')
,(u'Internacional' , u'http://20minutos.feedsportal.com/c/32489/f/492689/index.rss')
,(u'Deportes' , u'http://20minutos.feedsportal.com/c/32489/f/478286/index.rss')
,(u'Nacional' , u'http://20minutos.feedsportal.com/c/32489/f/492688/index.rss')
,(u'Economia' , u'http://20minutos.feedsportal.com/c/32489/f/492690/index.rss')
,(u'Tecnologia' , u'http://20minutos.feedsportal.com/c/32489/f/478292/index.rss')
]
def preprocess_html(self, soup):
for item in soup.findAll(style=True):
del item['style']
for item in soup.findAll('a'):
limg = item.find('img')
if item.string is not None:
str = item.string
item.replaceWith(str)
else:
if limg:
item.name = 'div'
item.attrs = []
else:
str = self.tag_to_string(item)
item.replaceWith(str)
for item in soup.findAll('img'):
if not item.has_key('alt'):
item['alt'] = 'image'
return soup

View File

@ -0,0 +1,51 @@
# -*- coding: utf-8 -*-
#!/usr/bin/env python
__license__ = 'GPL v3'
__copyright__ = u'2011, Silviu Cotoar\u0103'
'''
sapteseri.ro
'''
from calibre.web.feeds.news import BasicNewsRecipe
class SapteSeri(BasicNewsRecipe):
title = u'Sapte Seri'
__author__ = u'Silviu Cotoar\u0103'
description = u'Sapte Seri'
publisher = u'Sapte Seri'
oldest_article = 5
language = 'ro'
max_articles_per_feed = 100
no_stylesheets = True
use_embedded_content = False
category = 'Ziare,Oras,Distractie,Fun'
encoding = 'utf-8'
remove_empty_feeds = True
remove_javascript = True
cover_url = 'http://www.sapteseri.ro/Images/logo.jpg'
conversion_options = {
'comments' : description
,'tags' : category
,'language' : language
,'publisher' : publisher
}
keep_only_tags = [
dict(name='h1', attrs={'id':'title'})
, dict(name='div', attrs={'class':'mt10 mb10'})
, dict(name='div', attrs={'class':'mb20 mt10'})
, dict(name='div', attrs={'class':'mt5 mb20'})
]
remove_tags = [
dict(name='div', attrs={'id':['entityimgworking']})
]
feeds = [
(u'Ce se intampla azi in Bucuresti', u'http://www.sapteseri.ro/ro/feed/ce-se-intampla-azi/bucuresti/')
]
def preprocess_html(self, soup):
return self.adeify_images(soup)

View File

@ -32,16 +32,25 @@ class Adevarul(BasicNewsRecipe):
}
keep_only_tags = [ dict(name='div', attrs={'class':'article_header'})
,dict(name='div', attrs={'class':'bd'})
,dict(name='div', attrs={'class':'bb-tu first-t bb-article-body'})
]
remove_tags = [ dict(name='div', attrs={'class':'bb-wg-article_related_attachements'})
remove_tags = [
dict(name='li', attrs={'class':'author'})
,dict(name='li', attrs={'class':'date'})
,dict(name='li', attrs={'class':'comments'})
,dict(name='div', attrs={'class':'bb-wg-article_related_attachements'})
,dict(name='div', attrs={'class':'bb-md bb-md-article_comments'})
,dict(name='form', attrs={'id':'bb-comment-create-form'})
]
,dict(name='form', attrs={'id':'bb-comment-create-form'})
,dict(name='div', attrs={'id':'mediatag'})
,dict(name='div', attrs={'id':'ft'})
,dict(name='div', attrs={'id':'comment_wrapper'})
]
remove_tags_after = [ dict(name='form', attrs={'id':'bb-comment-create-form'}) ]
remove_tags_after = [
dict(name='div', attrs={'id':'comment_wrapper'}),
]
feeds = [ (u'\u0218tiri', u'http://www.adevarul.ro/rss/latest') ]

View File

@ -0,0 +1,51 @@
# -*- coding: utf-8 -*-
#!/usr/bin/env python
__license__ = 'GPL v3'
__copyright__ = u'2011, Silviu Cotoar\u0103'
'''
aventurilapescuit.ro
'''
from calibre.web.feeds.news import BasicNewsRecipe
class AventuriLaPescuit(BasicNewsRecipe):
title = u'Aventuri La Pescuit'
__author__ = u'Silviu Cotoar\u0103'
description = 'Aventuri La Pescuit'
publisher = 'Aventuri La Pescuit'
oldest_article = 5
language = 'ro'
max_articles_per_feed = 100
no_stylesheets = True
use_embedded_content = False
category = 'Ziare,Pescuit,Hobby'
encoding = 'utf-8'
cover_url = 'http://www.aventurilapescuit.ro/images/logo.gif'
conversion_options = {
'comments' : description
,'tags' : category
,'language' : language
,'publisher' : publisher
}
keep_only_tags = [
dict(name='div', attrs={'id':'Article'})
]
remove_tags = [
dict(name='div', attrs={'class':['right option']})
, dict(name='iframe', attrs={'scrolling':['no']})
]
remove_tags_after = [
dict(name='iframe', attrs={'scrolling':['no']})
]
feeds = [
(u'Feeds', u'http://www.aventurilapescuit.ro/sections/rssread/1')
]
def preprocess_html(self, soup):
return self.adeify_images(soup)

View File

@ -0,0 +1,56 @@
# -*- coding: utf-8 -*-
#!/usr/bin/env python
__license__ = 'GPL v3'
__copyright__ = u'2011, Silviu Cotoar\u0103'
'''
bucataras.ro
'''
from calibre.web.feeds.news import BasicNewsRecipe
class Bucataras(BasicNewsRecipe):
title = u'Bucataras'
__author__ = u'Silviu Cotoar\u0103'
description = ''
publisher = 'Bucataras'
oldest_article = 5
language = 'ro'
max_articles_per_feed = 100
no_stylesheets = True
use_embedded_content = False
category = 'Ziare,Bucatarie,Retete'
encoding = 'utf-8'
cover_url = 'http://www.bucataras.ro/templates/default/images/pink/logo.jpg'
conversion_options = {
'comments' : description
,'tags' : category
,'language' : language
,'publisher' : publisher
}
keep_only_tags = [
dict(name='h1', attrs={'class':'titlu'})
, dict(name='div', attrs={'class':'contentL'})
, dict(name='div', attrs={'class':'contentBottom'})
]
remove_tags = [
dict(name='div', attrs={'class':['sociale']})
, dict(name='div', attrs={'class':['contentR']})
, dict(name='a', attrs={'target':['_self']})
, dict(name='div', attrs={'class':['comentarii']})
]
remove_tags_after = [
dict(name='div', attrs={'class':['comentarii']})
]
feeds = [
(u'Feeds', u'http://www.bucataras.ro/rss/retete/')
]
def preprocess_html(self, soup):
return self.adeify_images(soup)

View File

@ -0,0 +1,58 @@
__license__ = 'GPL v3'
__author__ = 'Todd Chapman'
__copyright__ = 'Todd Chapman'
__version__ = 'v0.2'
__date__ = '2 March 2011'
'''
http://www.buffalonews.com/RSS/
'''
from calibre.web.feeds.news import BasicNewsRecipe
class AdvancedUserRecipe1298680852(BasicNewsRecipe):
title = u'Buffalo News'
oldest_article = 2
language = 'en'
__author__ = 'ChappyOnIce'
max_articles_per_feed = 20
encoding = 'utf-8'
masthead_url = 'http://www.buffalonews.com/buffalonews/skins/buffalonews/images/masthead/the_buffalo_news_logo.png'
remove_javascript = True
extra_css = 'body {text-align: justify;}\n \
p {text-indent: 20px;}'
keep_only_tags = [
dict(name='div', attrs={'class':['main-content-left']})
]
remove_tags = [
dict(name='div', attrs={'id':['commentCount']}),
dict(name='div', attrs={'class':['story-list-links']})
]
remove_tags_after = dict(name='div', attrs={'class':['body storyContent']})
feeds = [(u'City of Buffalo', u'http://www.buffalonews.com/city/communities/buffalo/?widget=rssfeed&view=feed&contentId=77944'),
(u'Southern Erie County', u'http://www.buffalonews.com/city/communities/southern-erie/?widget=rssfeed&view=feed&contentId=77944'),
(u'Eastern Erie County', u'http://www.buffalonews.com/city/communities/eastern-erie/?widget=rssfeed&view=feed&contentId=77944'),
(u'Southern Tier', u'http://www.buffalonews.com/city/communities/southern-tier/?widget=rssfeed&view=feed&contentId=77944'),
(u'Niagara County', u'http://www.buffalonews.com/city/communities/niagara-county/?widget=rssfeed&view=feed&contentId=77944'),
(u'Business', u'http://www.buffalonews.com/business/?widget=rssfeed&view=feed&contentId=77944'),
(u'MoneySmart', u'http://www.buffalonews.com/business/moneysmart/?widget=rssfeed&view=feed&contentId=77944'),
(u'Bills & NFL', u'http://www.buffalonews.com/sports/bills-nfl/?widget=rssfeed&view=feed&contentId=77944'),
(u'Sabres & NHL', u'http://www.buffalonews.com/sports/sabres-nhl/?widget=rssfeed&view=feed&contentId=77944'),
(u'Bob DiCesare', u'http://www.buffalonews.com/sports/columns/bob-dicesare/?widget=rssfeed&view=feed&contentId=77944'),
(u'Bucky Gleason', u'http://www.buffalonews.com/sports/columns/bucky-gleason/?widget=rssfeed&view=feed&contentId=77944'),
(u'Mark Gaughan', u'http://www.buffalonews.com/sports/bills-nfl/inside-the-nfl/?widget=rssfeed&view=feed&contentId=77944'),
(u'Mike Harrington', u'http://www.buffalonews.com/sports/columns/mike-harrington/?widget=rssfeed&view=feed&contentId=77944'),
(u'Jerry Sullivan', u'http://www.buffalonews.com/sports/columns/jerry-sullivan/?widget=rssfeed&view=feed&contentId=77944'),
(u'Other Sports Columns', u'http://www.buffalonews.com/sports/columns/other-sports-columns/?widget=rssfeed&view=feed&contentId=77944'),
(u'Life', u'http://www.buffalonews.com/life/?widget=rssfeed&view=feed&contentId=77944'),
(u'Bruce Andriatch', u'http://www.buffalonews.com/city/columns/bruce-andriatch/?widget=rssfeed&view=feed&contentId=77944'),
(u'Donn Esmonde', u'http://www.buffalonews.com/city/columns/donn-esmonde/?widget=rssfeed&view=feed&contentId=77944'),
(u'Rod Watson', u'http://www.buffalonews.com/city/columns/rod-watson/?widget=rssfeed&view=feed&contentId=77944'),
(u'Entertainment', u'http://www.buffalonews.com/entertainment/?widget=rssfeed&view=feed&contentId=77944'),
(u'Off Main Street', u'http://www.buffalonews.com/city/columns/off-main-street/?widget=rssfeed&view=feed&contentId=77944'),
(u'Editorials', u'http://www.buffalonews.com/editorial-page/buffalo-news-editorials/?widget=rssfeed&view=feed&contentId=77944')
]

View File

@ -0,0 +1,52 @@
# -*- coding: utf-8 -*-
#!/usr/bin/env python
__license__ = 'GPL v3'
__copyright__ = u'2011, Silviu Cotoar\u0103'
'''
chip.ro
'''
from calibre.web.feeds.news import BasicNewsRecipe
class ChipRo(BasicNewsRecipe):
title = u'Chip Online'
__author__ = u'Silviu Cotoar\u0103'
description = 'Chip Online'
publisher = 'Chip Online'
oldest_article = 5
language = 'ro'
max_articles_per_feed = 100
no_stylesheets = True
use_embedded_content = False
category = 'Ziare,Reviste,IT'
encoding = 'utf-8'
cover_url = 'http://www.chip.ro/images/logo.png'
conversion_options = {
'comments' : description
,'tags' : category
,'language' : language
,'publisher' : publisher
}
keep_only_tags = [
dict(name='h2', attrs={'class':'contentheading clearfix'})
, dict(name='span', attrs={'class':'createby'})
, dict(name='div', attrs={'class':'article-content'})
]
remove_tags = [
dict(name='div', attrs={'class':['sharemecompactbutton']})
,dict(name='div', attrs={'align':['left']})
,dict(name='div', attrs={'align':['center']})
,dict(name='th', attrs={'class':['pagenav_prev']})
,dict(name='table', attrs={'class':['pagenav']})
]
feeds = [
(u'Feeds', u'http://www.chip.ro/index.php?option=com_ninjarsssyndicator&feed_id=9&format=raw')
]
def preprocess_html(self, soup):
return self.adeify_images(soup)

View File

@ -1,35 +1,44 @@
#!/usr/bin/env python
__license__ = 'GPL 3'
__copyright__ = 'zotzot'
__copyright__ = 'zotzo'
__docformat__ = 'restructuredtext en'
from calibre.web.feeds.news import BasicNewsRecipe
class CreditSlips(BasicNewsRecipe):
__license__ = 'GPL v3'
__author__ = 'zotzot'
language = 'en'
version = 1
__author__ = 'zotzot'
version = 2
title = u'Credit Slips.org'
publisher = u'Bankr-L'
category = u'Economic blog'
description = u'All things about credit.'
cover_url = 'http://bit.ly/hyZSTr'
oldest_article = 50
description = u'A discussion on credit and bankruptcy'
cover_url = 'http://bit.ly/eAKNCB'
oldest_article = 15
max_articles_per_feed = 100
use_embedded_content = True
no_stylesheets = True
remove_javascript = True
conversion_options = {
'comments': description,
'tags': category,
'language': 'en',
'publisher': publisher,
}
feeds = [
(u'Credit Slips', u'http://www.creditslips.org/creditslips/atom.xml')
]
conversion_options = {
'comments': description,
'tags': category,
'language': 'en',
'publisher': publisher
}
extra_css = '''
body{font-family:verdana,arial,helvetica,geneva,sans-serif;}
img {float: left; margin-right: 0.5em;}
'''
(u'Credit Slips', u'http://www.creditslips.org/creditslips/atom.xml')
]
extra_css = '''
.author {font-family:Helvetica,sans-serif; font-weight:normal;font-size:small;}
h1 {font-family:Arial,Helvetica,sans-serif; font-weight:bold;font-size:large;}
p {font-family:Helvetica,Arial,sans-serif;font-size:small;}
body {font-family:Helvetica,Arial,sans-serif;font-size:small;}
'''
def populate_article_metadata(self, article, soup, first):
h2 = soup.find('h2')
h2.replaceWith(h2.prettify() + '<p><em>Posted by ' + article.author + '</em></p>')

View File

@ -0,0 +1,52 @@
# -*- coding: utf-8 -*-
#!/usr/bin/env python
__license__ = 'GPL v3'
__copyright__ = u'2011, Silviu Cotoar\u0103'
'''
csid.ro
'''
from calibre.web.feeds.news import BasicNewsRecipe
class CSID(BasicNewsRecipe):
title = u'Ce se \u00eent\u00e2mpl\u0103 doctore?'
__author__ = u'Silviu Cotoar\u0103'
description = u'Ce se \u00eent\u00e2mpl\u0103 doctore?'
publisher = 'CSID'
oldest_article = 5
language = 'ro'
max_articles_per_feed = 100
no_stylesheets = True
use_embedded_content = False
category = 'Ziare,Reviste,Femei,Health,Beauty'
encoding = 'utf-8'
cover_url = 'http://www.csid.ro/images/default/csid.gif'
conversion_options = {
'comments' : description
,'tags' : category
,'language' : language
,'publisher' : publisher
}
keep_only_tags = [
dict(name='div', attrs={'class':'content floatleft'})
]
remove_tags = [
dict(name='div', attrs={'id':['article_links']})
, dict(name='div', attrs={'id':['tags']})
, dict(name='p', attrs={'id':['tags']})
]
remove_tags_after = [
dict(name='p', attrs={'id':['tags']})
]
feeds = [
(u'Feeds', u'http://www.csid.ro/rss/')
]
def preprocess_html(self, soup):
return self.adeify_images(soup)

View File

@ -0,0 +1,54 @@
# -*- coding: utf-8 -*-
#!/usr/bin/env python
__license__ = 'GPL v3'
__copyright__ = u'2011, Silviu Cotoar\u0103'
'''
curierulnational.ro
'''
from calibre.web.feeds.news import BasicNewsRecipe
class CurierulNal(BasicNewsRecipe):
title = u'Curierul Na\u0163ional'
__author__ = u'Silviu Cotoar\u0103'
description = ''
publisher = 'Curierul Na\u0163ional'
oldest_article = 5
language = 'ro'
max_articles_per_feed = 100
no_stylesheets = True
use_embedded_content = False
category = 'Ziare,Stiri'
encoding = 'utf-8'
cover_url = 'http://www.curierulnational.ro/logo.gif'
conversion_options = {
'comments' : description
,'tags' : category
,'language' : language
,'publisher' : publisher
}
keep_only_tags = [
dict(name='div', attrs={'id':'col1'})
, dict(name='img', attrs={'id':'placeholder'})
]
remove_tags = [
dict(name='p', attrs={'id':['alteArticole']})
, dict(name='div', attrs={'id':['textSize']})
, dict(name='ul', attrs={'class':['unit-rating']})
, dict(name='div', attrs={'id':['comments']})
]
remove_tags_after = [
dict(name='ul', attrs={'class':'unit-rating'})
]
feeds = [
(u'Feeds', u'http://www.curierulnational.ro/feed.xml')
]
def preprocess_html(self, soup):
return self.adeify_images(soup)

View File

@ -16,14 +16,9 @@ class Deadspin(BasicNewsRecipe):
max_articles_per_feed = 100
no_stylesheets = True
encoding = 'utf-8'
use_embedded_content = False
use_embedded_content = True
language = 'en'
masthead_url = 'http://cache.gawkerassets.com/assets/deadspin.com/img/logo.png'
extra_css = '''
body{font-family: "Lucida Grande",Helvetica,Arial,sans-serif}
img{margin-bottom: 1em}
h1{font-family :Arial,Helvetica,sans-serif; font-size:large}
'''
conversion_options = {
'comment' : description
, 'tags' : category
@ -31,13 +26,11 @@ class Deadspin(BasicNewsRecipe):
, 'language' : language
}
remove_attributes = ['width','height']
keep_only_tags = [dict(attrs={'class':'content permalink'})]
remove_tags_before = dict(name='h1')
remove_tags = [dict(attrs={'class':'contactinfo'})]
remove_tags_after = dict(attrs={'class':'contactinfo'})
remove_tags = [
{'class': 'feedflare'},
]
feeds = [(u'Articles', u'http://feeds.gawker.com/deadspin/full')]
feeds = [(u'Articles', u'http://feeds.gawker.com/deadspin/vip?format=xml')]
def preprocess_html(self, soup):
return self.adeify_images(soup)

View File

@ -0,0 +1,57 @@
# -*- coding: utf-8 -*-
#!/usr/bin/env python
__license__ = 'GPL v3'
__copyright__ = u'2011, Silviu Cotoar\u0103'
'''
descopera.ro
'''
from calibre.web.feeds.news import BasicNewsRecipe
class Descopera(BasicNewsRecipe):
title = u'Descoper\u0103'
__author__ = u'Silviu Cotoar\u0103'
description = 'E lumea ta'
publisher = 'Descopera'
oldest_article = 5
language = 'ro'
max_articles_per_feed = 100
no_stylesheets = True
use_embedded_content = False
category = 'Ziare,Reviste,Descopera'
encoding = 'utf-8'
cover_url = 'http://www.descopera.ro/images/header_images/logo.gif'
conversion_options = {
'comments' : description
,'tags' : category
,'language' : language
,'publisher' : publisher
}
keep_only_tags = [
dict(name='h1', attrs={'style':'font-family: Arial,Helvetica,sans-serif; font-size: 18px; color: rgb(51, 51, 51); font-weight: bold; margin: 10px 0pt; clear: both; float: left;width: 610px;'})
,dict(name='div', attrs={'style':'margin-right: 15px; margin-bottom: 15px; float: left;'})
, dict(name='p', attrs={'id':'itemDescription'})
,dict(name='div', attrs={'id':'itemBody'})
]
remove_tags = [
dict(name='div', attrs={'class':['tools']})
, dict(name='div', attrs={'class':['share']})
, dict(name='div', attrs={'class':['category']})
, dict(name='div', attrs={'id':['comments']})
]
remove_tags_after = [
dict(name='div', attrs={'id':'comments'})
]
feeds = [
(u'Feeds', u'http://www.descopera.ro/rss')
]
def preprocess_html(self, soup):
return self.adeify_images(soup)

View File

@ -0,0 +1,27 @@
__license__ = 'GPL v3'
__copyright__ = '2011-2011, Federico Escalada <fedeescalada at gmail.com>'
from calibre.web.feeds.news import BasicNewsRecipe
class Dotpod(BasicNewsRecipe):
__author__ = 'Federico Escalada'
description = 'Tecnologia y Comunicacion Audiovisual'
encoding = 'utf-8'
language = 'es'
max_articles_per_feed = 100
no_stylesheets = True
oldest_article = 7
publication_type = 'blog'
title = 'Dotpod'
authors = 'Federico Picone'
conversion_options = {
'authors' : authors
,'comments' : description
,'language' : language
}
feeds = [('Dotpod', 'http://www.dotpod.com.ar/feed/')]
remove_tags = [dict(name='div', attrs={'class':'feedflare'})]

View File

@ -0,0 +1,55 @@
# -*- coding: utf-8 -*-
#!/usr/bin/env python
__license__ = 'GPL v3'
__copyright__ = u'2011, Silviu Cotoar\u0103'
'''
ecuisine.ro
'''
from calibre.web.feeds.news import BasicNewsRecipe
class EcuisineRo(BasicNewsRecipe):
title = u'eCuisine'
__author__ = u'Silviu Cotoar\u0103'
description = u'Reinventeaz\u0103 pl\u0103cerea de a g\u0103ti'
publisher = 'eCuisine'
oldest_article = 5
language = 'ro'
max_articles_per_feed = 100
no_stylesheets = True
use_embedded_content = False
category = 'Ziare,Retete,Bucatarie'
encoding = 'utf-8'
cover_url = ''
conversion_options = {
'comments' : description
,'tags' : category
,'language' : language
,'publisher' : publisher
}
keep_only_tags = [
dict(name='div', attrs={'class':'page-title'})
, dict(name='div', attrs={'class':'content clearfix'})
]
remove_tags = [
dict(name='ul', attrs={'id':['recipe-tabs']})
, dict(name='div', attrs={'class':['recipe-body-rating clearfix']})
, dict(name='div', attrs={'class':['recipe-body-flags']})
, dict(name='div', attrs={'id':['tweetmeme_button']})
, dict(name='div', attrs={'class':['fbshare']})
, dict(name='a', attrs={'class':['button-rounded']})
, dict(name='div', attrs={'class':['recipe-body-related']})
, dict(name='div', attrs={'class':['fbshare']})
, dict(name='div', attrs={'class':['link-wrapper']})
]
feeds = [
(u'Feeds', u'http://www.ecuisine.ro/rss')
]
def preprocess_html(self, soup):
return self.adeify_images(soup)

View File

@ -0,0 +1,43 @@
# -*- coding: utf-8 -*-
#!/usr/bin/env python
__license__ = 'GPL v3'
__copyright__ = u'2011, Silviu Cotoar\u0103'
'''
egirl.ro
'''
from calibre.web.feeds.news import BasicNewsRecipe
class EgirlRo(BasicNewsRecipe):
title = u'egirl'
__author__ = u'Silviu Cotoar\u0103'
description = u'Necesar pentru tine'
publisher = u'egirl'
oldest_article = 5
language = 'ro'
max_articles_per_feed = 100
no_stylesheets = True
use_embedded_content = False
category = 'Ziare,Reviste,Femei'
encoding = 'utf-8'
cover_url = 'http://www.egirl.ro/images/egirlNou/logo_egirl.gif'
conversion_options = {
'comments' : description
,'tags' : category
,'language' : language
,'publisher' : publisher
}
keep_only_tags = [
dict(name='div', attrs={'id':'title_art'})
, dict(name='div', attrs={'class':'content_style'})
]
feeds = [
(u'Feeds', u'http://www.egirl.ro/rss/egirl.xml')
]
def preprocess_html(self, soup):
return self.adeify_images(soup)

View File

@ -1,6 +1,6 @@
#!/usr/bin/env python
__license__ = 'GPL 3'
__copyright__ = 'zotzot'
__copyright__ = 'zotzo'
__docformat__ = 'restructuredtext en'
'''
http://www.epltalk.com
@ -9,10 +9,9 @@ from calibre.web.feeds.news import BasicNewsRecipe
class EPLTalkRecipe(BasicNewsRecipe):
__license__ = 'GPL v3'
__author__ = u'The Gaffer'
language = 'en'
version = 1
version = 2
__author__ = 'rylsfan'
title = u'EPL Talk'
publisher = u'The Gaffer'
@ -21,17 +20,40 @@ class EPLTalkRecipe(BasicNewsRecipe):
description = u'News and Analysis from the English Premier League'
cover_url = 'http://bit.ly/hJxZPu'
oldest_article = 45
max_articles_per_feed = 150
oldest_article = 3
max_articles_per_feed = 100
use_embedded_content = True
remove_javascript = True
encoding = 'utf8'
remove_tags_after = [dict(name='div', attrs={'class':'pd-rating'})]
conversion_options = {
'comment' : description
, 'tags' : category
, 'publisher' : publisher
, 'language' : language
}
feeds = [(u'EPL Talk', u'http://feeds.feedburner.com/EPLTalk')]
remove_tags = [
{'class': 'feedflare'},
{'class': 'tweetmeme_button'},
{'class': 'eplrelated'},
{'p': 'Related posts:<ol>'},
]
def preprocess_html(self, soup):
return self.adeify_images(soup)
feeds =[
(u'EPL Talk', u'http://feeds.feedburner.com/EPLTalk'),
(u'MLS Talk', u'http://feeds.feedburner.com/majorleaguesoccertalksite'),
#(),
#(),
#(),
]
extra_css = '''
body{font-family:verdana,arial,helvetica,geneva,sans-serif;}
img {float: left; margin-right: 0.5em;}
'''
h1{font-family:Arial,Helvetica,sans-serif; font-weight:bold;font-size:large;}
h2{font-family:Arial,Helvetica,sans-serif; font-weight:normal;font-size:small;}
p{font-family:Arial,Helvetica,sans-serif;font-size:small;}
body{font-family:Helvetica,Arial,sans-serif;font-size:small;}
'''

View File

@ -41,7 +41,8 @@ class ESPN(BasicNewsRecipe):
'''
feeds = [('Top Headlines', 'http://sports.espn.go.com/espn/rss/news'),
feeds = [
('Top Headlines', 'http://sports.espn.go.com/espn/rss/news'),
'http://sports.espn.go.com/espn/rss/nfl/news',
'http://sports.espn.go.com/espn/rss/nba/news',
'http://sports.espn.go.com/espn/rss/mlb/news',
@ -107,10 +108,11 @@ class ESPN(BasicNewsRecipe):
if match and 'soccernet' not in url and 'bassmaster' not in url:
return 'http://sports.espn.go.com/espn/print?'+match.group(1)+'&type=story'
else:
if match and 'soccernet' in url:
splitlist = url.split("&", 5)
newurl = 'http://soccernet.espn.go.com/print?'+match.group(1)+'&type=story' + '&' + str(splitlist[2] )
return newurl
if 'soccernet' in url:
match = re.search(r'/id/(\d+)/', url)
if match:
return \
'http://soccernet.espn.go.com/print?id=%s&type=story' % match.group(1)
#else:
# if 'bassmaster' in url:
# return url

View File

@ -0,0 +1,53 @@
# -*- coding: utf-8 -*-
#!/usr/bin/env python
__license__ = 'GPL v3'
__copyright__ = u'2011, Silviu Cotoar\u0103'
'''
fhm.ro
'''
from calibre.web.feeds.news import BasicNewsRecipe
class FHMro(BasicNewsRecipe):
title = u'FHM Ro'
__author__ = u'Silviu Cotoar\u0103'
description = u'Pentru c\u0103 noi putem'
publisher = 'FHM'
oldest_article = 5
language = 'ro'
max_articles_per_feed = 100
no_stylesheets = True
use_embedded_content = False
category = 'Reviste'
encoding = 'utf-8'
cover_url = 'http://www.fhm.com/App_Resources/Images/Site/re-design/logo.gif'
conversion_options = {
'comments' : description
,'tags' : category
,'language' : language
,'publisher' : publisher
}
keep_only_tags = [
dict(name='div', attrs={'class':'contentMainTitle'})
, dict(name='div', attrs={'class':'entry'})
]
remove_tags_after = [
dict(name='div', attrs={'class':['ratingblock ']})
, dict(name='a', attrs={'rel':['tag']})
]
remove_tags = [
dict(name='div', attrs={'class':['ratingblock ']})
, dict(name='div', attrs={'class':['socialize-containter']})
]
feeds = [
(u'Feeds', u'http://www.fhm.ro/feed')
]
def preprocess_html(self, soup):
return self.adeify_images(soup)

View File

@ -0,0 +1,48 @@
__license__ = 'GPL v3'
__author__ = 'Ricardo Jurado'
__copyright__ = 'Ricardo Jurado'
__version__ = 'v0.1'
__date__ = '22 February 2011'
'''
http://blog.flickr.net/
'''
from calibre.web.feeds.news import BasicNewsRecipe
class AdvancedUserRecipe1297031650(BasicNewsRecipe):
title = u'Flickr Blog'
masthead_url = 'http://flickrtheblog.files.wordpress.com/2008/11/flickblog_logo.gif'
cover_url = 'http://flickrtheblog.files.wordpress.com/2008/11/flickblog_logo.gif'
publisher = u''
__author__ = 'Ricardo Jurado'
description = 'Pictures Blog'
category = 'Blog,Pictures'
oldest_article = 120
max_articles_per_feed = 10
no_stylesheets = True
use_embedded_content = False
encoding = 'UTF-8'
remove_javascript = True
language = 'en'
extra_css = """
p{text-align: justify; font-size: 100%}
body{ text-align: left; font-size:100% }
h2{font-family: sans-serif; font-size:130%; font-weight:bold; text-align: justify; }
.published{font-family:Arial,Helvetica,sans-serif; font-size:80%; }
.posted{font-family:Arial,Helvetica,sans-serif; font-size:80%; }
"""
keep_only_tags = [
dict(name='div', attrs={'class':'entry'})
]
feeds = [
(u'BLOG', u'http://feeds.feedburner.com/Flickrblog'),
#(u'BLOG', u'http://blog.flickr.net/es/feed/atom/')
]

View File

@ -0,0 +1,47 @@
__license__ = 'GPL v3'
__author__ = 'Ricardo Jurado'
__copyright__ = 'Ricardo Jurado'
__version__ = 'v0.1'
__date__ = '22 February 2011'
'''
http://blog.flickr.net/
'''
from calibre.web.feeds.news import BasicNewsRecipe
class AdvancedUserRecipe1297031650(BasicNewsRecipe):
title = u'Flickr Blog'
masthead_url = 'http://flickrtheblog.files.wordpress.com/2008/11/flickblog_logo.gif'
cover_url = 'http://flickrtheblog.files.wordpress.com/2008/11/flickblog_logo.gif'
publisher = u''
__author__ = 'Ricardo Jurado'
description = 'Pictures Blog'
category = 'Blog,Pictures'
oldest_article = 120
max_articles_per_feed = 10
no_stylesheets = True
use_embedded_content = False
encoding = 'UTF-8'
remove_javascript = True
language = 'es'
extra_css = """
p{text-align: justify; font-size: 100%}
body{ text-align: left; font-size:100% }
h2{font-family: sans-serif; font-size:130%; font-weight:bold; text-align: justify; }
.published{font-family:Arial,Helvetica,sans-serif; font-size:80%; }
.posted{font-family:Arial,Helvetica,sans-serif; font-size:80%; }
"""
keep_only_tags = [
dict(name='div', attrs={'class':'entry'})
]
feeds = [
(u'BLOG', u'http://blog.flickr.net/es/feed/atom/')
]

View File

@ -16,14 +16,10 @@ class Gawker(BasicNewsRecipe):
max_articles_per_feed = 100
no_stylesheets = True
encoding = 'utf-8'
use_embedded_content = False
use_embedded_content = True
language = 'en'
masthead_url = 'http://cache.gawkerassets.com/assets/gawker.com/img/logo.png'
extra_css = '''
body{font-family: "Lucida Grande",Helvetica,Arial,sans-serif}
img{margin-bottom: 1em}
h1{font-family :Arial,Helvetica,sans-serif; font-size:large}
'''
conversion_options = {
'comment' : description
, 'tags' : category
@ -31,13 +27,11 @@ class Gawker(BasicNewsRecipe):
, 'language' : language
}
remove_attributes = ['width','height']
keep_only_tags = [dict(attrs={'class':'content permalink'})]
remove_tags_before = dict(name='h1')
remove_tags = [dict(attrs={'class':'contactinfo'})]
remove_tags_after = dict(attrs={'class':'contactinfo'})
remove_tags = [
{'class': 'feedflare'},
]
feeds = [(u'Articles', u'http://feeds.gawker.com/gawker/full')]
feeds = [(u'Articles', u'http://feeds.gawker.com/gawker/vip?format=xml')]
def preprocess_html(self, soup):
return self.adeify_images(soup)

View File

@ -17,10 +17,9 @@ class Gizmodo(BasicNewsRecipe):
max_articles_per_feed = 100
no_stylesheets = True
encoding = 'utf-8'
use_embedded_content = False
use_embedded_content = True
language = 'en'
masthead_url = 'http://cache.gawkerassets.com/assets/gizmodo.com/img/logo.png'
extra_css = ' body{font-family: "Lucida Grande",Helvetica,Arial,sans-serif} img{margin-bottom: 1em} '
conversion_options = {
'comment' : description
@ -29,13 +28,12 @@ class Gizmodo(BasicNewsRecipe):
, 'language' : language
}
remove_attributes = ['width','height']
keep_only_tags = [dict(attrs={'class':'content permalink'})]
remove_tags_before = dict(name='h1')
remove_tags = [dict(attrs={'class':'contactinfo'})]
remove_tags_after = dict(attrs={'class':'contactinfo'})
feeds = [(u'Articles', u'http://feeds.gawker.com/gizmodo/vip?format=xml')]
remove_tags = [
{'class': 'feedflare'},
]
feeds = [(u'Articles', u'http://feeds.gawker.com/gizmodo/full')]
def preprocess_html(self, soup):
return self.adeify_images(soup)

View File

@ -0,0 +1,48 @@
# -*- coding: utf-8 -*-
#!/usr/bin/env python
__license__ = 'GPL v3'
__copyright__ = u'2011, Silviu Cotoar\u0103'
'''
go4it.ro
'''
from calibre.web.feeds.news import BasicNewsRecipe
class Go4ITro(BasicNewsRecipe):
title = u'go4it'
__author__ = u'Silviu Cotoar\u0103'
description = 'Gadgeturi, Lifestyle, Tehnologie'
publisher = 'go4it'
oldest_article = 5
language = 'ro'
max_articles_per_feed = 100
no_stylesheets = True
use_embedded_content = False
category = 'Reviste,Ziare,IT'
encoding = 'utf-8'
cover_url = 'http://www.go4it.ro/images/logo.png'
conversion_options = {
'comments' : description
,'tags' : category
,'language' : language
,'publisher' : publisher
}
keep_only_tags = [
dict(name='div', attrs={'class':'subTitle clearfix'})
, dict(name='div', attrs={'class':'story'})
]
remove_tags = [
dict(name='span', attrs={'class':['data']})
, dict(name='a', attrs={'class':['comments']})
]
feeds = [
(u'Feeds', u'http://feeds2.feedburner.com/Go4itro-Stiri')
]
def preprocess_html(self, soup):
return self.adeify_images(soup)

View File

@ -1,20 +1,43 @@
# -*- coding: utf-8 -*-
#!/usr/bin/env python
__license__ = 'GPL v3'
__copyright__ = u'2011, Silviu Cotoar\u0103'
'''
gsp.ro
'''
from calibre.web.feeds.news import BasicNewsRecipe
class AdvancedUserRecipe1286351181(BasicNewsRecipe):
title = u'gsp.ro'
__author__ = 'bucsie'
oldest_article = 2
class GSP(BasicNewsRecipe):
title = u'Gazeta Sporturilor'
language = 'ro'
__author__ = u'Silviu Cotoar\u0103'
description = u'Gazeta Sporturilor'
publisher = u'Gazeta Sporturilor'
category = 'Ziare,Sport,Stiri,Romania'
oldest_article = 5
max_articles_per_feed = 100
language='ro'
cover_url ='http://www.gsp.ro/images/sigla_rosu.jpg'
no_stylesheets = True
use_embedded_content = False
encoding = 'utf-8'
remove_javascript = True
cover_url = 'http://www.gsp.ro/images/logo.jpg'
remove_tags = [
dict(name='div', attrs={'class':['related_articles', 'articol_noteaza straight_line dotted_line_top', 'comentarii','mai_multe_articole']}),
dict(name='div', attrs={'id':'icons'})
]
remove_tags_after = dict(name='div', attrs={'id':'adoceanintactrovccmgpmnyt'})
conversion_options = {
'comments' : description
,'tags' : category
,'language' : language
,'publisher' : publisher
}
feeds = [(u'toate stirile', u'http://www.gsp.ro/index.php?section=section&screen=rss')]
keep_only_tags = [ dict(name='h1', attrs={'class':'serif title_2'})
,dict(name='div', attrs={'id':'only_text'})
,dict(name='span', attrs={'class':'block poza_principala'})
]
feeds = [ (u'\u0218tiri', u'http://www.gsp.ro/rss.xml') ]
def preprocess_html(self, soup):
return self.adeify_images(soup)
def print_version(self, url):
return 'http://www1.gsp.ro/print/' + url[(url.rindex('/')+1):]

View File

@ -0,0 +1,31 @@
from calibre.web.feeds.news import BasicNewsRecipe
class AdvancedUserRecipe1298137661(BasicNewsRecipe):
title = u'Helsingin Sanomat'
__author__ = 'oneillpt'
language = 'fi'
oldest_article = 7
max_articles_per_feed = 100
no_stylesheets = True
remove_javascript = True
conversion_options = {
'linearize_tables' : True
}
remove_tags = [
dict(name='a', attrs={'id':'articleCommentUrl'}),
dict(name='p', attrs={'class':'newsSummary'}),
dict(name='div', attrs={'class':'headerTools'})
]
feeds = [(u'Uutiset - HS.fi', u'http://www.hs.fi/uutiset/rss/'), (u'Politiikka - HS.fi', u'http://www.hs.fi/politiikka/rss/'),
(u'Ulkomaat - HS.fi', u'http://www.hs.fi/ulkomaat/rss/'), (u'Kulttuuri - HS.fi', u'http://www.hs.fi/kulttuuri/rss/'),
(u'Kirjat - HS.fi', u'http://www.hs.fi/kulttuuri/kirjat/rss/'), (u'Elokuvat - HS.fi', u'http://www.hs.fi/kulttuuri/elokuvat/rss/')
]
def print_version(self, url):
j = url.rfind("/")
s = url[j:]
i = s.rfind("?ref=rss")
if i > 0:
s = s[:i]
return "http://www.hs.fi/tulosta" + s

View File

@ -0,0 +1,51 @@
# -*- coding: utf-8 -*-
#!/usr/bin/env python
__license__ = 'GPL v3'
__copyright__ = u'2011, Silviu Cotoar\u0103'
'''
historia.ro
'''
from calibre.web.feeds.news import BasicNewsRecipe
class HistoriaRo(BasicNewsRecipe):
title = u'Historia'
__author__ = u'Silviu Cotoar\u0103'
description = ''
publisher = 'Historia'
oldest_article = 5
language = 'ro'
max_articles_per_feed = 100
no_stylesheets = True
use_embedded_content = False
category = 'Ziare,Reviste,Istorie'
encoding = 'utf-8'
cover_url = 'http://www.historia.ro/sites/all/themes/historia/images/historia.png'
conversion_options = {
'comments' : description
,'tags' : category
,'language' : language
,'publisher' : publisher
}
keep_only_tags = [
dict(name='div', attrs={'class':'c_antet_title'})
, dict(name='a', attrs={'class':'overlaybox'})
, dict(name='div', attrs={'class':'art_content'})
]
remove_tags = [
dict(name='div', attrs={'class':['fl_left']})
, dict(name='div', attrs={'id':['article_toolbar']})
, dict(name='div', attrs={'class':['zoom_cont']})
]
feeds = [
(u'Feeds', u'http://www.historia.ro/rss.xml')
]
def preprocess_html(self, soup):
return self.adeify_images(soup)

View File

@ -0,0 +1,43 @@
# -*- coding: utf-8 -*-
#!/usr/bin/env python
__license__ = 'GPL v3'
__copyright__ = u'2011, Silviu Cotoar\u0103'
'''
hotcity.ro
'''
from calibre.web.feeds.news import BasicNewsRecipe
class HotcityRo(BasicNewsRecipe):
title = u'Hotcity'
__author__ = u'Silviu Cotoar\u0103'
description = u'Cultura urban\u0103 feminin\u0103'
publisher = 'Hotcity'
oldest_article = 5
language = 'ro'
max_articles_per_feed = 100
no_stylesheets = True
use_embedded_content = False
category = 'Ziare,Reviste'
encoding = 'utf-8'
cover_url = 'http://www.hotcity.ro/i/bg_header.gif'
conversion_options = {
'comments' : description
,'tags' : category
,'language' : language
,'publisher' : publisher
}
keep_only_tags = [
dict(name='div', attrs={'class':'articol_title'})
, dict(name='div', attrs={'class':'text'})
]
feeds = [
(u'Feeds', u'http://www.hotcity.ro/rss')
]
def preprocess_html(self, soup):
return self.adeify_images(soup)

View File

@ -0,0 +1,52 @@
# -*- coding: utf-8 -*-
#!/usr/bin/env python
__license__ = 'GPL v3'
__copyright__ = u'2011, Silviu Cotoar\u0103'
'''
intrefete.ro
'''
from calibre.web.feeds.news import BasicNewsRecipe
class Intrefete(BasicNewsRecipe):
title = u'\u00centre fete'
__author__ = u'Silviu Cotoar\u0103'
description = u'Petrece ziua cu stil, afl\u0103 ce e nou \u00eentre fete'
publisher = u'Intre fete'
oldest_article = 5
language = 'ro'
max_articles_per_feed = 100
no_stylesheets = True
use_embedded_content = False
category = 'Ziare,Reviste,Femei'
encoding = 'utf-8'
cover_url = 'http://storage0.dms.mpinteractiv.ro/media/2/1401/16788/5878693/5/logo.jpg?width=300'
conversion_options = {
'comments' : description
,'tags' : category
,'language' : language
,'publisher' : publisher
}
keep_only_tags = [
dict(name='div', attrs={'class':'article'})
]
remove_tags = [
dict(name='div', attrs={'class':['author']})
, dict(name='div', attrs={'class':['tags']})
, dict(name='iframe', attrs={'scrolling':['no']})
]
remove_tags_after = [
dict(name='iframe', attrs={'scrolling':['no']})
]
feeds = [
(u'Feeds', u'http://www.intrefete.ro/rss/')
]
def preprocess_html(self, soup):
return self.adeify_images(soup)

View File

@ -0,0 +1,47 @@
# -*- coding: utf-8 -*-
#!/usr/bin/env python
__license__ = 'GPL v3'
__copyright__ = u'2011, Silviu Cotoar\u0103'
'''
kudika.ro
'''
from calibre.web.feeds.news import BasicNewsRecipe
class Kudika(BasicNewsRecipe):
title = u'Kudika'
__author__ = u'Silviu Cotoar\u0103'
description = u'Revist\u0103 pentru femei'
publisher = 'Kudika'
oldest_article = 5
language = 'ro'
max_articles_per_feed = 100
no_stylesheets = True
use_embedded_content = False
category = 'Ziare,Reviste,Femei'
encoding = 'utf-8'
cover_url = 'http://img.kudika.ro/images/template/page-logo.png'
conversion_options = {
'comments' : description
,'tags' : category
,'language' : language
,'publisher' : publisher
}
keep_only_tags = [
dict(name='div', attrs={'class':'header_recommend_article'}),
dict(name='div', attrs={'id':'intertext_women'})
]
remove_tags = [
dict(name='p', attrs={'class':['page_breadcrumbs']})
, dict(name='div', attrs={'class':['standard']})
, dict(name='div', attrs={'id':['recommend_allover']})
]
feeds = [ (u'Feeds', u'http://www.kudika.ro/feed.xml') ]
def preprocess_html(self, soup):
return self.adeify_images(soup)

View File

@ -1,10 +1,15 @@
__license__ = 'GPL v3'
__copyright__ = '2011'
'''
lemonde.fr
'''
import re
from calibre.web.feeds.recipes import BasicNewsRecipe
class LeMonde(BasicNewsRecipe):
title = 'Le Monde'
__author__ = 'veezh'
description = u'Actualit\xe9s'
description = 'Actualités'
oldest_article = 1
max_articles_per_feed = 100
no_stylesheets = True
@ -12,13 +17,27 @@ class LeMonde(BasicNewsRecipe):
use_embedded_content = False
encoding = 'cp1252'
publisher = 'lemonde.fr'
category = 'news, France, world'
language = 'fr'
#publication_type = 'newsportal'
extra_css = '''
h1{font-size:130%;}
.ariane{font-size:xx-small;}
.source{font-size:xx-small;}
#.href{font-size:xx-small;}
.LM_caption{color:#666666; font-size:x-small;}
#.main-article-info{font-family:Arial,Helvetica,sans-serif;}
#full-contents{font-size:small; font-family:Arial,Helvetica,sans-serif;font-weight:normal;}
#match-stats-summary{font-size:small; font-family:Arial,Helvetica,sans-serif;font-weight:normal;}
'''
#preprocess_regexps = [(re.compile(r'<!--.*?-->', re.DOTALL), lambda m: '')]
conversion_options = {
'comments' : description
,'language' : language
,'publisher' : publisher
,'linearize_tables': True
}
'comments' : description
,'tags' : category
,'language' : language
,'publisher' : publisher
,'linearize_tables': True
}
remove_empty_feeds = True
@ -32,15 +51,28 @@ class LeMonde(BasicNewsRecipe):
return soup
preprocess_regexps = [
(re.compile(r'([0-9])%'), lambda m: m.group(1) + '&nbsp;%'),
(re.compile(r'([0-9])([0-9])([0-9]) ([0-9])([0-9])([0-9])'), lambda m: m.group(1) + m.group(2) + m.group(3) + '&nbsp;' + m.group(4) + m.group(5) + m.group(6)),
(re.compile(r'([0-9]) ([0-9])([0-9])([0-9])'), lambda m: m.group(1) + '&nbsp;' + m.group(2) + m.group(3) + m.group(4)),
(re.compile(r'<span>'), lambda match: ' <span>'),
(re.compile(r'\("'), lambda match: '(&laquo;&nbsp;'),
(re.compile(r'"\)'), lambda match: '&nbsp;&raquo;)'),
(re.compile(r'&ldquo;'), lambda match: '(&laquo;&nbsp;'),
(re.compile(r'&rdquo;'), lambda match: '&nbsp;&raquo;)'),
(re.compile(r'>\''), lambda match: '>&lsquo;'),
(re.compile(r' \''), lambda match: ' &lsquo;'),
(re.compile(r'\''), lambda match: '&rsquo;'),
(re.compile(r'"<'), lambda match: '&nbsp;&raquo;<'),
(re.compile(r'"<em>'), lambda match: '<em>&laquo;&nbsp;'),
(re.compile(r'"<em>"</em><em>'), lambda match: '<em>&laquo;&nbsp;'),
(re.compile(r'"<a href='), lambda match: '&laquo;&nbsp;<a href='),
(re.compile(r'</em>"'), lambda match: '&nbsp;&raquo;</em>'),
(re.compile(r'</a>"'), lambda match: '&nbsp;&raquo;</a>'),
(re.compile(r'"</'), lambda match: '&nbsp;&raquo;</'),
(re.compile(r'>"'), lambda match: '>&laquo;&nbsp;'),
(re.compile(r'"<'), lambda match: '&nbsp;&raquo;<'),
(re.compile(r'&rsquo;"'), lambda match: '&rsquo;«&nbsp;'),
(re.compile(r' "'), lambda match: ' &laquo;&nbsp;'),
(re.compile(r'" '), lambda match: '&nbsp;&raquo; '),
(re.compile(r'\("'), lambda match: '(&laquo;&nbsp;'),
(re.compile(r'"\)'), lambda match: '&nbsp;&raquo;)'),
(re.compile(r'"\.'), lambda match: '&nbsp;&raquo;.'),
(re.compile(r'",'), lambda match: '&nbsp;&raquo;,'),
(re.compile(r'"\?'), lambda match: '&nbsp;&raquo;?'),
@ -56,8 +88,14 @@ class LeMonde(BasicNewsRecipe):
(re.compile(r' %'), lambda match: '&nbsp;%'),
(re.compile(r'\.jpg&nbsp;&raquo; border='), lambda match: '.jpg'),
(re.compile(r'\.png&nbsp;&raquo; border='), lambda match: '.png'),
(re.compile(r' &ndash; '), lambda match: '&nbsp;&ndash; '),
(re.compile(r' '), lambda match: '&nbsp;&ndash; '),
(re.compile(r' - '), lambda match: '&nbsp;&ndash; '),
(re.compile(r' -,'), lambda match: '&nbsp;&ndash;,'),
(re.compile(r'&raquo;:'), lambda match: '&raquo;&nbsp;:'),
]
keep_only_tags = [
dict(name='div', attrs={'class':['contenu']})
]
@ -65,11 +103,15 @@ class LeMonde(BasicNewsRecipe):
remove_tags_after = [dict(id='appel_temoignage')]
def get_article_url(self, article):
link = article.get('link')
if 'blog' not in link:
return link
url = article.get('guid', None)
if '/chat/' in url or '.blog' in url or '/video/' in url or '/sport/' in url or '/portfolio/' in url or '/visuel/' in url :
url = None
return url
# def get_article_url(self, article):
# link = article.get('link')
# if 'blog' not in link and ('chat' not in link):
# return link
feeds = [
('A la une', 'http://www.lemonde.fr/rss/une.xml'),
@ -94,3 +136,4 @@ class LeMonde(BasicNewsRecipe):
cover_url = link_item.img['src']
return cover_url

View File

@ -1,40 +1,52 @@
from calibre.web.feeds.news import BasicNewsRecipe
class AdvancedUserRecipe1292550626(BasicNewsRecipe):
title = 'Leduc - Wetaskiwin Pipestone Flyer'
__author__ = 'Brian Hahn'
description = 'News from Alberta, Canada'
oldest_article = 56
max_articles_per_feed = 100
no_stylesheets = True
#delay = 1
use_embedded_content = False
publisher = 'Pipestone Publishing'
category = 'News, Alberta, Canada'
language = 'en_CA'
encoding = 'iso-8859-1'
cover_url = 'http://www.pipestoneflyer.ca/images/calibre-cover.jpg'
remove_tags_before = dict(id='ContentPanel')
remove_tags_after = dict(id='ContentPanel')
remove_tags = [dict(name='div', attrs={'id':'StoryNav'}),dict(name='div', attrs={'id':'BottomAds'}),dict(name='div', attrs={'id':'MoreStoryLinks'})]
extra_css = 'img { margin:5px }'
feeds = [
('Feature', 'http://www.pipestoneflyer.ca/Feature.rss'),
('Editors Desk', 'http://www.pipestoneflyer.ca/Editor%27s%20Desk.rss'),
('Letters', 'http://www.pipestoneflyer.ca/Letters.rss'),
('A Loco Viewpoint', 'http://www.pipestoneflyer.ca/A%20Loco%20Viewpoint.rss'),
('Lifes Doorway', 'http://www.pipestoneflyer.ca/Life%27s%20Doorway.rss'),
('From the Otherside', 'http://www.pipestoneflyer.ca/From%20the%20Otherside.rss'),
('Opinion', 'http://www.pipestoneflyer.ca/Opinion.rss'),
('Community', 'http://www.pipestoneflyer.ca/Community.rss'),
('Sports', 'http://www.pipestoneflyer.ca/Sports.rss'),
('Chambers', 'http://www.pipestoneflyer.ca/Chambers.rss'),
('Government', 'http://www.pipestoneflyer.ca/Government.rss'),
('Environment', 'http://www.pipestoneflyer.ca/Environment.rss'),
('Health', 'http://www.pipestoneflyer.ca/Health.rss'),
('Funnies', 'http://www.pipestoneflyer.ca/Funnies.rss'),
('Faith', 'http://www.pipestoneflyer.ca/Faith.rss'),
('News and Views', 'http://www.pipestoneflyer.ca/News%20and%20Views.rss'),
('Obituaries', 'http://www.pipestoneflyer.ca/Obituaries.rss'),
('Police Blotter', 'http://www.pipestoneflyer.ca/Police%20Blotter.rss'),
]
title = 'Leduc - Wetaskiwin Pipestone Flyer'
__author__ = 'Brian Hahn'
description = '''Provides news from central Alberta, Canada. This is a
weekly publication that provides coverage from the Cities of Leduc and
Wetaskiwin, including news from two complete counties, plus the towns and
villages within. The counties of Leduc and Wetaskiwin provide news
coverage of agriculture, sports, government, family, events and opinion.
This publication updated weekly every Thursday.'''
oldest_article = 13
max_articles_per_feed = 100
no_stylesheets = True
#delay = 1
use_embedded_content = False
publisher = 'Pipestone Publishing'
category = 'News, Alberta, Canada'
language = 'en_CA'
encoding = 'iso-8859-1'
cover_url = 'http://www.pipestoneflyer.ca/images/calibre-cover.jpg'
remove_tags_before = dict(id='ContentPanel')
remove_tags_after = dict(id='ContentPanel')
remove_tags = [dict(name='div',
attrs={'id':'StoryNav'}),dict(name='div',
attrs={'id':'BottomAds'}),dict(name='div', attrs={'id':'MoreStoryLinks'})]
extra_css = 'img { margin:5px }'
feeds = [
('Feature', 'http://www.pipestoneflyer.ca/Feature.rss'),
('Editors Desk', 'http://www.pipestoneflyer.ca/Editor%27s%20Desk.rss'),
('Letters', 'http://www.pipestoneflyer.ca/Letters.rss'),
('A Loco Viewpoint',
'http://www.pipestoneflyer.ca/A%20Loco%20Viewpoint.rss'),
('Lifes Doorway', 'http://www.pipestoneflyer.ca/Life%27s%20Doorway.rss'),
('From the Otherside',
'http://www.pipestoneflyer.ca/From%20the%20Otherside.rss'),
('Opinion', 'http://www.pipestoneflyer.ca/Opinion.rss'),
('Community', 'http://www.pipestoneflyer.ca/Community.rss'),
('Sports', 'http://www.pipestoneflyer.ca/Sports.rss'),
('Chambers', 'http://www.pipestoneflyer.ca/Chambers.rss'),
('Government', 'http://www.pipestoneflyer.ca/Government.rss'),
('Travel ', 'http://www.pipestoneflyer.ca/Travel%20.rss'),
('Environment', 'http://www.pipestoneflyer.ca/Environment.rss'),
('Health', 'http://www.pipestoneflyer.ca/Health.rss'),
('Funnies', 'http://www.pipestoneflyer.ca/Funnies.rss'),
('Events', 'http://www.pipestoneflyer.ca/Events.rss'),
('Faith', 'http://www.pipestoneflyer.ca/Faith.rss'),
('News and Views', 'http://www.pipestoneflyer.ca/News%20and%20Views.rss'),
('Obituaries', 'http://www.pipestoneflyer.ca/Obituaries.rss'),
('Police Blotter', 'http://www.pipestoneflyer.ca/Police%20Blotter.rss'),
('Careers', 'http://www.pipestoneflyer.ca/Careers.rss'),
]

View File

@ -16,15 +16,9 @@ class Lifehacker(BasicNewsRecipe):
max_articles_per_feed = 100
no_stylesheets = True
encoding = 'utf-8'
use_embedded_content = False
use_embedded_content = True
language = 'en'
masthead_url = 'http://cache.gawkerassets.com/assets/lifehacker.com/img/logo.png'
extra_css = '''
body{font-family: "Lucida Grande",Helvetica,Arial,sans-serif}
img{margin-bottom: 1em}
h1{font-family :Arial,Helvetica,sans-serif; font-size:large}
h2{font-family :Arial,Helvetica,sans-serif; font-size:x-small}
'''
conversion_options = {
'comment' : description
, 'tags' : category
@ -32,20 +26,12 @@ class Lifehacker(BasicNewsRecipe):
, 'language' : language
}
remove_attributes = ['width', 'height', 'style']
remove_tags_before = dict(name='h1')
keep_only_tags = [dict(id='container')]
remove_tags_after = dict(attrs={'class':'post-body'})
remove_tags = [
dict(id="sharemenu"),
{'class': 'related'},
{'class': 'feedflare'},
]
feeds = [(u'Articles', u'http://feeds.gawker.com/lifehacker/full')]
feeds = [(u'Articles', u'http://feeds.gawker.com/lifehacker/vip?format=xml')]
def preprocess_html(self, soup):
return self.adeify_images(soup)
def print_version(self, url):
return url.replace('#!', '?_escaped_fragment_=')

View File

@ -0,0 +1,104 @@
#!/usr/bin/env python
__license__ = 'GPL v3'
__copyright__ = '2011, Davide Cavalca <davide125 at tiscali.it>'
'''
lwn.net
'''
from calibre.web.feeds.news import BasicNewsRecipe
import re
class WeeklyLWN(BasicNewsRecipe):
title = 'LWN.net Weekly Edition'
description = 'Weekly summary of what has happened in the free software world.'
__author__ = 'Davide Cavalca'
language = 'en'
cover_url = 'http://lwn.net/images/lcorner.png'
#masthead_url = 'http://lwn.net/images/lcorner.png'
publication_type = 'magazine'
remove_tags_before = dict(attrs={'class':'PageHeadline'})
remove_tags_after = dict(attrs={'class':'ArticleText'})
remove_tags = [dict(name=['h2', 'form'])]
conversion_options = { 'linearize_tables' : True }
oldest_article = 7.0
needs_subscription = 'optional'
def get_browser(self):
br = BasicNewsRecipe.get_browser()
if self.username is not None and self.password is not None:
br.open('https://lwn.net/login')
br.select_form(name='loginform')
br['Username'] = self.username
br['Password'] = self.password
br.submit()
return br
def parse_index(self):
if self.username is not None and self.password is not None:
index_url = 'http://lwn.net/current/bigpage'
else:
index_url = 'http://lwn.net/free/bigpage'
soup = self.index_to_soup(index_url)
body = soup.body
articles = {}
ans = []
url_re = re.compile('^http://lwn.net/Articles/')
while True:
tag_title = body.findNext(name='p', attrs={'class':'SummaryHL'})
if tag_title == None:
break
tag_section = tag_title.findPrevious(name='p', attrs={'class':'Cat1HL'})
if tag_section == None:
section = 'Front Page'
else:
section = tag_section.string
tag_section2 = tag_title.findPrevious(name='p', attrs={'class':'Cat2HL'})
if tag_section2 != None:
if tag_section2.findPrevious(name='p', attrs={'class':'Cat1HL'}) == tag_section:
section = "%s: %s" %(section, tag_section2.string)
if section not in articles.keys():
articles[section] = []
if section not in ans:
ans.append(section)
body = tag_title
while True:
tag_url = body.findNext(name='a', attrs={'href':url_re})
if tag_url == None:
break
body = tag_url
if tag_url.string == None:
continue
elif tag_url.string == 'Full Story':
break
elif tag_url.string.startswith('Comments ('):
break
else:
continue
if tag_url == None:
break
article = dict(
title=tag_title.string,
url=tag_url['href'].split('#')[0],
description='', content='', date='')
articles[section].append(article)
ans = [(key, articles[key]) for key in ans if articles.has_key(key)]
if not ans:
raise Exception('Could not find any articles.')
return ans
# vim: expandtab:ts=4:sw=4

View File

@ -11,7 +11,6 @@ http://www.macworld.co.uk/
'''
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ptempfile import PersistentTemporaryFile
temp_files = []
articles_are_obfuscated = True
@ -36,26 +35,17 @@ class macWorld(BasicNewsRecipe):
remove_javascript = True
no_stylesheets = True
def get_obfuscated_article(self, url):
br = self.get_browser()
br.open(url+'&print')
response = br.follow_link(url, nr = 0)
html = response.read()
self.temp_files.append(PersistentTemporaryFile('_fa.html'))
self.temp_files[-1].write(html)
self.temp_files[-1].close()
return self.temp_files[-1].name
keep_only_tags = [
dict(name='div', attrs={'id':'article'})
dict(name='div', attrs={'id':'content'})
]
remove_tags = [
dict(name='div', attrs={'class':['toolBar','mac_tags','toolBar btmTools','textAds']}),
{'class':['toolBar','mac_tags','toolBar btmTools','textAds']},
dict(name='p', attrs={'class':'breadcrumbs'}),
dict(name='div', attrs={'id':['breadcrumb','sidebar','comments']})
dict(id=['breadcrumb','sidebar','comments','topContentWrapper',
'rightColumn', 'aboveFootPromo', 'storyCarousel']),
{'class':lambda x: x and ('tools' in x or 'toolBar'
in x)}
]

View File

@ -1,7 +1,9 @@
__license__ = 'GPL v3'
__copyright__ = '2010, Eddie Lau'
__copyright__ = '2010-2011, Eddie Lau'
'''
Change Log:
2011/02/20: skip duplicated links in finance section, put photos which may extend a whole page to the back of the articles
clean up the indentation
2010/12/07: add entertainment section, use newspaper front page as ebook cover, suppress date display in section list
(to avoid wrong date display in case the user generates the ebook in a time zone different from HKT)
2010/11/22: add English section, remove eco-news section which is not updated daily, correct
@ -18,21 +20,21 @@ from calibre.web.feeds.recipes import BasicNewsRecipe
from contextlib import nested
from calibre import __appname__
from calibre.ebooks.BeautifulSoup import BeautifulSoup
from calibre.ebooks.metadata.opf2 import OPFCreator
from calibre.ebooks.metadata.toc import TOC
from calibre.ebooks.metadata import MetaInformation
class MPHKRecipe(BasicNewsRecipe):
IsKindleUsed = True # to avoid generating periodical in which CJK characters can't be displayed in section/article view
IsCJKWellSupported = True # Set to False to avoid generating periodical in which CJK characters can't be displayed in section/article view
title = 'Ming Pao - Hong Kong'
oldest_article = 1
max_articles_per_feed = 100
__author__ = 'Eddie Lau'
description = 'Hong Kong Chinese Newspaper'
publisher = 'news.mingpao.com'
description = ('Hong Kong Chinese Newspaper (http://news.mingpao.com). If'
'you are using a Kindle with firmware < 3.1, customize the'
'recipe')
publisher = 'MingPao'
category = 'Chinese, News, Hong Kong'
remove_javascript = True
use_embedded_content = False
@ -46,19 +48,20 @@ class MPHKRecipe(BasicNewsRecipe):
masthead_url = 'http://news.mingpao.com/image/portals_top_logo_news.gif'
keep_only_tags = [dict(name='h1'),
dict(name='font', attrs={'style':['font-size:14pt; line-height:160%;']}), # for entertainment page title
dict(attrs={'class':['photo']}),
dict(attrs={'id':['newscontent']}), # entertainment page content
dict(attrs={'id':['newscontent01','newscontent02']})]
dict(attrs={'id':['newscontent01','newscontent02']}),
dict(attrs={'class':['photo']})
]
remove_tags = [dict(name='style'),
dict(attrs={'id':['newscontent135']})] # for the finance page
remove_attributes = ['width']
preprocess_regexps = [
(re.compile(r'<h5>', re.DOTALL|re.IGNORECASE),
lambda match: '<h1>'),
(re.compile(r'</h5>', re.DOTALL|re.IGNORECASE),
lambda match: '</h1>'),
(re.compile(r'<p><a href=.+?</a></p>', re.DOTALL|re.IGNORECASE), # for entertainment page
lambda match: '')
(re.compile(r'<h5>', re.DOTALL|re.IGNORECASE),
lambda match: '<h1>'),
(re.compile(r'</h5>', re.DOTALL|re.IGNORECASE),
lambda match: '</h1>'),
(re.compile(r'<p><a href=.+?</a></p>', re.DOTALL|re.IGNORECASE), # for entertainment page
lambda match: '')
]
def image_url_processor(cls, baseurl, url):
@ -107,6 +110,9 @@ class MPHKRecipe(BasicNewsRecipe):
def get_fetchdate(self):
return self.get_dtlocal().strftime("%Y%m%d")
def get_fetchformatteddate(self):
return self.get_dtlocal().strftime("%Y-%m-%d")
def get_fetchday(self):
# convert UTC to local hk time - at around HKT 6.00am, all news are available
return self.get_dtlocal().strftime("%d")
@ -121,84 +127,66 @@ class MPHKRecipe(BasicNewsRecipe):
return cover
def parse_index(self):
feeds = []
dateStr = self.get_fetchdate()
for title, url in [(u'\u8981\u805e Headline', 'http://news.mingpao.com/' + dateStr + '/gaindex.htm'),
(u'\u6559\u80b2 Education', 'http://news.mingpao.com/' + dateStr + '/gfindex.htm'),
(u'\u6e2f\u805e Local', 'http://news.mingpao.com/' + dateStr + '/gbindex.htm'),
(u'\u793e\u8a55\u2027\u7b46\u9663 Editorial', 'http://news.mingpao.com/' + dateStr + '/mrindex.htm'),
(u'\u8ad6\u58c7 Forum', 'http://news.mingpao.com/' + dateStr + '/faindex.htm'),
(u'\u4e2d\u570b China', 'http://news.mingpao.com/' + dateStr + '/caindex.htm'),
(u'\u570b\u969b World', 'http://news.mingpao.com/' + dateStr + '/taindex.htm'),
('Tech News', 'http://news.mingpao.com/' + dateStr + '/naindex.htm'),
(u'\u9ad4\u80b2 Sport', 'http://news.mingpao.com/' + dateStr + '/spindex.htm'),
(u'\u526f\u520a Supplement', 'http://news.mingpao.com/' + dateStr + '/jaindex.htm'),
(u'\u82f1\u6587 English', 'http://news.mingpao.com/' + dateStr + '/emindex.htm')]:
articles = self.parse_section(url)
if articles:
feeds.append((title, articles))
# special - finance
fin_articles = self.parse_fin_section('http://www.mpfinance.com/htm/Finance/' + dateStr + '/News/ea,eb,ecindex.htm')
if fin_articles:
feeds.append((u'\u7d93\u6fdf Finance', fin_articles))
# special - eco-friendly
# eco_articles = self.parse_eco_section('http://tssl.mingpao.com/htm/marketing/eco/cfm/Eco1.cfm')
# if eco_articles:
# feeds.append((u'\u74b0\u4fdd Eco News', eco_articles))
# special - entertainment
ent_articles = self.parse_ent_section('http://ol.mingpao.com/cfm/star1.cfm')
if ent_articles:
feeds.append((u'\u5f71\u8996 Entertainment', ent_articles))
return feeds
feeds = []
dateStr = self.get_fetchdate()
for title, url in [(u'\u8981\u805e Headline', 'http://news.mingpao.com/' + dateStr + '/gaindex.htm'),
(u'\u6e2f\u805e Local', 'http://news.mingpao.com/' + dateStr + '/gbindex.htm'),
(u'\u793e\u8a55/\u7b46\u9663 Editorial', 'http://news.mingpao.com/' + dateStr + '/mrindex.htm'),
(u'\u8ad6\u58c7 Forum', 'http://news.mingpao.com/' + dateStr + '/faindex.htm'),
(u'\u4e2d\u570b China', 'http://news.mingpao.com/' + dateStr + '/caindex.htm'),
(u'\u570b\u969b World', 'http://news.mingpao.com/' + dateStr + '/taindex.htm'),
('Tech News', 'http://news.mingpao.com/' + dateStr + '/naindex.htm'),
(u'\u6559\u80b2 Education', 'http://news.mingpao.com/' + dateStr + '/gfindex.htm'),
(u'\u9ad4\u80b2 Sport', 'http://news.mingpao.com/' + dateStr + '/spindex.htm'),
(u'\u526f\u520a Supplement', 'http://news.mingpao.com/' + dateStr + '/jaindex.htm'),
(u'\u82f1\u6587 English', 'http://news.mingpao.com/' + dateStr + '/emindex.htm')]:
articles = self.parse_section(url)
if articles:
feeds.append((title, articles))
# special - finance
fin_articles = self.parse_fin_section('http://www.mpfinance.com/htm/Finance/' + dateStr + '/News/ea,eb,ecindex.htm')
if fin_articles:
feeds.append((u'\u7d93\u6fdf Finance', fin_articles))
# special - entertainment
ent_articles = self.parse_ent_section('http://ol.mingpao.com/cfm/star1.cfm')
if ent_articles:
feeds.append((u'\u5f71\u8996 Film/TV', ent_articles))
return feeds
def parse_section(self, url):
dateStr = self.get_fetchdate()
soup = self.index_to_soup(url)
divs = soup.findAll(attrs={'class': ['bullet','bullet_grey']})
current_articles = []
included_urls = []
divs.reverse()
for i in divs:
a = i.find('a', href = True)
title = self.tag_to_string(a)
url = a.get('href', False)
url = 'http://news.mingpao.com/' + dateStr + '/' +url
if url not in included_urls and url.rfind('Redirect') == -1:
current_articles.append({'title': title, 'url': url, 'description':'', 'date':''})
included_urls.append(url)
current_articles.reverse()
return current_articles
dateStr = self.get_fetchdate()
soup = self.index_to_soup(url)
divs = soup.findAll(attrs={'class': ['bullet','bullet_grey']})
current_articles = []
included_urls = []
divs.reverse()
for i in divs:
a = i.find('a', href = True)
title = self.tag_to_string(a)
url = a.get('href', False)
url = 'http://news.mingpao.com/' + dateStr + '/' +url
if url not in included_urls and url.rfind('Redirect') == -1:
current_articles.append({'title': title, 'url': url, 'description':'', 'date':''})
included_urls.append(url)
current_articles.reverse()
return current_articles
def parse_fin_section(self, url):
dateStr = self.get_fetchdate()
soup = self.index_to_soup(url)
a = soup.findAll('a', href= True)
current_articles = []
for i in a:
url = i.get('href', False)
if not url.rfind(dateStr) == -1 and url.rfind('index') == -1:
title = self.tag_to_string(i)
url = 'http://www.mpfinance.com/cfm/' +url
current_articles.append({'title': title, 'url': url, 'description':''})
return current_articles
def parse_eco_section(self, url):
dateStr = self.get_fetchdate()
soup = self.index_to_soup(url)
divs = soup.findAll(attrs={'class': ['bullet']})
current_articles = []
included_urls = []
for i in divs:
a = i.find('a', href = True)
title = self.tag_to_string(a)
url = a.get('href', False)
url = 'http://tssl.mingpao.com/htm/marketing/eco/cfm/' +url
if url not in included_urls and url.rfind('Redirect') == -1 and not url.rfind('.txt') == -1 and not url.rfind(dateStr) == -1:
for i in a:
url = 'http://www.mpfinance.com/cfm/' + i.get('href', False)
if url not in included_urls and not url.rfind(dateStr) == -1 and url.rfind('index') == -1:
title = self.tag_to_string(i)
current_articles.append({'title': title, 'url': url, 'description':''})
included_urls.append(url)
return current_articles
def parse_ent_section(self, url):
self.get_fetchdate()
soup = self.index_to_soup(url)
a = soup.findAll('a', href=True)
a.reverse()
@ -223,67 +211,71 @@ class MPHKRecipe(BasicNewsRecipe):
return soup
def create_opf(self, feeds, dir=None):
if self.IsKindleUsed == False:
super(MPHKRecipe,self).create_opf(feeds, dir)
return
if dir is None:
dir = self.output_dir
title = self.short_title()
title += ' ' + self.get_fetchdate()
#if self.output_profile.periodical_date_in_title:
# title += strftime(self.timefmt)
mi = MetaInformation(title, [__appname__])
mi.publisher = __appname__
mi.author_sort = __appname__
mi.publication_type = self.publication_type+':'+self.short_title()
#mi.timestamp = nowf()
mi.timestamp = self.get_dtlocal()
mi.comments = self.description
if not isinstance(mi.comments, unicode):
mi.comments = mi.comments.decode('utf-8', 'replace')
#mi.pubdate = nowf()
mi.pubdate = self.get_dtlocal()
opf_path = os.path.join(dir, 'index.opf')
ncx_path = os.path.join(dir, 'index.ncx')
opf = OPFCreator(dir, mi)
# Add mastheadImage entry to <guide> section
mp = getattr(self, 'masthead_path', None)
if mp is not None and os.access(mp, os.R_OK):
from calibre.ebooks.metadata.opf2 import Guide
ref = Guide.Reference(os.path.basename(self.masthead_path), os.getcwdu())
ref.type = 'masthead'
ref.title = 'Masthead Image'
opf.guide.append(ref)
if self.IsCJKWellSupported == True:
# use Chinese title
title = u'\u660e\u5831 (\u9999\u6e2f) ' + self.get_fetchformatteddate()
else:
# use English title
title = self.short_title() + ' ' + self.get_fetchformatteddate()
if True: # force date in title
# title += strftime(self.timefmt)
mi = MetaInformation(title, [self.publisher])
mi.publisher = self.publisher
mi.author_sort = self.publisher
if self.IsCJKWellSupported == True:
mi.publication_type = 'periodical:'+self.publication_type+':'+self.short_title()
else:
mi.publication_type = self.publication_type+':'+self.short_title()
#mi.timestamp = nowf()
mi.timestamp = self.get_dtlocal()
mi.comments = self.description
if not isinstance(mi.comments, unicode):
mi.comments = mi.comments.decode('utf-8', 'replace')
#mi.pubdate = nowf()
mi.pubdate = self.get_dtlocal()
opf_path = os.path.join(dir, 'index.opf')
ncx_path = os.path.join(dir, 'index.ncx')
opf = OPFCreator(dir, mi)
# Add mastheadImage entry to <guide> section
mp = getattr(self, 'masthead_path', None)
if mp is not None and os.access(mp, os.R_OK):
from calibre.ebooks.metadata.opf2 import Guide
ref = Guide.Reference(os.path.basename(self.masthead_path), os.getcwdu())
ref.type = 'masthead'
ref.title = 'Masthead Image'
opf.guide.append(ref)
manifest = [os.path.join(dir, 'feed_%d'%i) for i in range(len(feeds))]
manifest.append(os.path.join(dir, 'index.html'))
manifest.append(os.path.join(dir, 'index.ncx'))
manifest = [os.path.join(dir, 'feed_%d'%i) for i in range(len(feeds))]
manifest.append(os.path.join(dir, 'index.html'))
manifest.append(os.path.join(dir, 'index.ncx'))
# Get cover
cpath = getattr(self, 'cover_path', None)
if cpath is None:
pf = open(os.path.join(dir, 'cover.jpg'), 'wb')
if self.default_cover(pf):
cpath = pf.name
if cpath is not None and os.access(cpath, os.R_OK):
opf.cover = cpath
manifest.append(cpath)
# Get cover
cpath = getattr(self, 'cover_path', None)
if cpath is None:
pf = open(os.path.join(dir, 'cover.jpg'), 'wb')
if self.default_cover(pf):
cpath = pf.name
if cpath is not None and os.access(cpath, os.R_OK):
opf.cover = cpath
manifest.append(cpath)
# Get masthead
mpath = getattr(self, 'masthead_path', None)
if mpath is not None and os.access(mpath, os.R_OK):
manifest.append(mpath)
# Get masthead
mpath = getattr(self, 'masthead_path', None)
if mpath is not None and os.access(mpath, os.R_OK):
manifest.append(mpath)
opf.create_manifest_from_files_in(manifest)
for mani in opf.manifest:
if mani.path.endswith('.ncx'):
mani.id = 'ncx'
if mani.path.endswith('mastheadImage.jpg'):
mani.id = 'masthead-image'
entries = ['index.html']
toc = TOC(base_path=dir)
self.play_order_counter = 0
self.play_order_map = {}
opf.create_manifest_from_files_in(manifest)
for mani in opf.manifest:
if mani.path.endswith('.ncx'):
mani.id = 'ncx'
if mani.path.endswith('mastheadImage.jpg'):
mani.id = 'masthead-image'
entries = ['index.html']
toc = TOC(base_path=dir)
self.play_order_counter = 0
self.play_order_map = {}
def feed_index(num, parent):
f = feeds[num]
@ -321,7 +313,7 @@ class MPHKRecipe(BasicNewsRecipe):
prefix = '/'.join('..'for i in range(2*len(re.findall(r'link\d+', last))))
templ = self.navbar.generate(True, num, j, len(f),
not self.has_single_feed,
a.orig_url, __appname__, prefix=prefix,
a.orig_url, self.publisher, prefix=prefix,
center=self.center_navbar)
elem = BeautifulSoup(templ.render(doctype='xhtml').decode('utf-8')).find('div')
body.insert(len(body.contents), elem)
@ -344,7 +336,7 @@ class MPHKRecipe(BasicNewsRecipe):
if not desc:
desc = None
feed_index(i, toc.add_item('feed_%d/index.html'%i, None,
f.title, play_order=po, description=desc, author=auth))
f.title, play_order=po, description=desc, author=auth))
else:
entries.append('feed_%d/index.html'%0)
@ -357,4 +349,3 @@ class MPHKRecipe(BasicNewsRecipe):
with nested(open(opf_path, 'wb'), open(ncx_path, 'wb')) as (opf_file, ncx_file):
opf.render(opf_file, ncx_file)

View File

@ -0,0 +1,48 @@
# -*- coding: utf-8 -*-
#!/usr/bin/env python
__license__ = 'GPL v3'
__copyright__ = u'2011, Silviu Cotoar\u0103'
'''
natgeo.ro
'''
from calibre.web.feeds.news import BasicNewsRecipe
class NationalGeoRo(BasicNewsRecipe):
title = u'National Geographic RO'
__author__ = u'Silviu Cotoar\u0103'
description = u'S\u0103 avem grij\u0103 de planet\u0103'
publisher = 'National Geographic'
oldest_article = 5
language = 'ro'
max_articles_per_feed = 100
no_stylesheets = True
use_embedded_content = False
category = 'Reviste'
encoding = 'utf-8'
cover_url = 'http://wiki.benecke.com/images/c/c4/NatGeographic_Logo.jpg'
conversion_options = {
'comments' : description
,'tags' : category
,'language' : language
,'publisher' : publisher
}
keep_only_tags = [
dict(name='h2', attrs={'class':'contentheading clearfix'})
, dict(name='div', attrs={'class':'article-content'})
]
remove_tags = [
dict(name='div', attrs={'class':['phocagallery']})
]
feeds = [
(u'Feeds', u'http://www.natgeo.ro/index.php?format=feed&type=rss')
]
def preprocess_html(self, soup):
return self.adeify_images(soup)

View File

@ -88,8 +88,8 @@ class NYTimes(BasicNewsRecipe):
if headlinesOnly:
title='New York Times Headlines'
description = 'Headlines from the New York Times'
needs_subscription = False
description = 'Headlines from the New York Times. Needs a subscription from http://www.nytimes.com'
needs_subscription = 'optional'
elif webEdition:
title='New York Times (Web)'
description = 'New York Times on the Web'

View File

@ -0,0 +1,55 @@
#!/usr/bin/env python
# encoding: utf-8
from __future__ import with_statement
__license__ = 'GPL 3'
__copyright__ = 'zotzo'
__docformat__ = 'restructuredtext en'
"""
http://fifthdown.blogs.nytimes.com/
http://offthedribble.blogs.nytimes.com/
http://thequad.blogs.nytimes.com/
http://slapshot.blogs.nytimes.com/
http://goal.blogs.nytimes.com/
http://bats.blogs.nytimes.com/
http://straightsets.blogs.nytimes.com/
http://formulaone.blogs.nytimes.com/
http://onpar.blogs.nytimes.com/
"""
from calibre.web.feeds.news import BasicNewsRecipe
class NYTimesSports(BasicNewsRecipe):
title = 'New York Times Sports Beat'
language = 'en'
__author__ = 'rylsfan'
description = 'Indepth sports from the New York Times'
publisher = 'The New York Times'
category = 'Sports'
oldest_article = 3
max_articles_per_feed = 25
no_stylesheets = True
language = 'en'
#cover_url ='http://bit.ly/h8F4DO'
feeds = [
(u'The Fifth Down', u'http://fifthdown.blogs.nytimes.com/feed/'),
(u'Off The Dribble', u'http://offthedribble.blogs.nytimes.com/feed/'),
(u'The Quad', u'http://thequad.blogs.nytimes.com/feed/'),
(u'Slap Shot', u'http://slapshot.blogs.nytimes.com/feed/'),
(u'Goal', u'http://goal.blogs.nytimes.com/feed/'),
(u'Bats', u'http://bats.blogs.nytimes.com/feed/'),
(u'Straight Sets', u'http://straightsets.blogs.nytimes.com/feed/'),
(u'Formula One', u'http://formulaone.blogs.nytimes.com/feed/'),
(u'On Par', u'http://onpar.blogs.nytimes.com/feed/'),
]
keep_only_tags = [dict(name='div', attrs={'id':'header'}),
dict(name='h1'),
dict(name='h2'),
dict(name='div', attrs={'class':'entry-content'})]
extra_css = '''
h1{font-family:Arial,Helvetica,sans-serif; font-weight:bold;font-size:large;}
h2{font-family:Arial,Helvetica,sans-serif; font-weight:normal;font-size:small;}
p{font-family:Arial,Helvetica,sans-serif;font-size:small;}
body{font-family:Helvetica,Arial,sans-serif;font-size:small;}
'''

View File

@ -96,18 +96,18 @@ class NYTimes(BasicNewsRecipe):
if headlinesOnly:
title='New York Times Headlines'
description = 'Headlines from the New York Times'
needs_subscription = False
needs_subscription = True
elif webEdition:
title='New York Times (Web)'
description = 'New York Times on the Web'
needs_subscription = True
elif replaceKindleVersion:
title='The New York Times'
title='The New York Times'
description = 'Today\'s New York Times'
needs_subscription = True
else:
title='New York Times'
description = 'Today\'s New York Times'
description = 'Today\'s New York Times. Needs subscription from http://www.nytimes.com'
needs_subscription = True
@ -676,7 +676,7 @@ class NYTimes(BasicNewsRecipe):
if hlines:
for hline in hlines:
hline.extract()
#find all section headers
hlines = runAround.findAll('h6')
if hlines:

View File

@ -0,0 +1,46 @@
#!/usr/bin/env python
# encoding: utf-8
from __future__ import with_statement
__license__ = 'GPL 3'
__copyright__ = 'zotzo'
__docformat__ = 'restructuredtext en'
"""
http://pogue.blogs.nytimes.com/
"""
from calibre.web.feeds.news import BasicNewsRecipe
class NYTimesTechnology(BasicNewsRecipe):
title = 'New York Times Technology Beat'
language = 'en'
__author__ = 'David Pogue'
description = 'The latest in technology from David Pogue'
publisher = 'The New York Times'
category = 'Technology'
oldest_article = 14
max_articles_per_feed = 25
no_stylesheets = True
language = 'en'
cover_url ='http://bit.ly/g0SKJT'
feeds = [
(u'Pogues Posts', u'http://pogue.blogs.nytimes.com/feed/'),
(u'Bits', u'http://bits.blogs.nytimes.com/feed/'),
(u'Gadgetwise', u'http://gadgetwise.blogs.nytimes.com/feed/'),
(u'Open', u'http://open.blogs.nytimes.com/feed/')
]
keep_only_tags = [dict(name='div', attrs={'id':'header'}),
dict(name='h1'),
dict(name='h2'),
dict(name='div', attrs={'class':'entry-content'})]
extra_css = '''
h1{font-family:Arial,Helvetica,sans-serif;
font-weight:bold;font-size:large;}
h2{font-family:Arial,Helvetica,sans-serif;
font-weight:normal;font-size:small;}
p{font-family:Arial,Helvetica,sans-serif;font-size:small;}
body{font-family:Helvetica,Arial,sans-serif;font-size:small;}
'''

View File

@ -0,0 +1,50 @@
#!/usr/bin/env python
__license__ = 'GPL v3'
'''
OSNews.pl
'''
from calibre.web.feeds.news import BasicNewsRecipe
import re
class OSNewsRecipe(BasicNewsRecipe):
__author__ = u'Mori & Tomasz D\u0142ugosz'
language = 'pl'
title = u'OSnews.pl'
publisher = u'OSnews.pl'
description = u'OSnews.pl jest spo\u0142eczno\u015bciowym serwisem informacyjnym po\u015bwi\u0119conym oprogramowaniu, systemom operacyjnym i \u015bwiatowi IT'
no_stylesheets = True
remove_javascript = True
encoding = 'utf-8'
use_embedded_content = False;
oldest_article = 7
max_articles_per_feed = 100
extra_css = '''
.news-heading {font-size:150%}
.newsinformations li {display:inline;}
blockquote {border:2px solid #000; padding:5px;}
'''
feeds = [
(u'OSNews.pl', u'http://feeds.feedburner.com/OSnewspl')
]
keep_only_tags = [
dict(name = 'a', attrs = {'class' : 'news-heading'}),
dict(name = 'div', attrs = {'class' : 'newsinformations'}),
dict(name = 'div', attrs = {'id' : 'news-content'})
]
remove_tags = [
dict(name = 'div', attrs = {'class' : 'sociable'}),
dict(name = 'div', attrs = {'class' : 'post_prev'}),
dict(name = 'div', attrs = {'class' : 'post_next'}),
dict(name = 'div', attrs = {'class' : 'clr'})
]
preprocess_regexps = [(re.compile(u'</span>Komentarze: \(?[0-9]+\)? ?<span'), lambda match: '</span><span')]

View File

@ -0,0 +1,21 @@
from calibre.web.feeds.news import BasicNewsRecipe
class AdvancedUserRecipe1299061355(BasicNewsRecipe):
title = u'Post Today'
language = 'th'
__author__ = "Chotechai P."
oldest_article = 7
max_articles_per_feed = 100
cover_url = 'http://upload.wikimedia.org/wikipedia/th/2/2e/Posttoday_Logo.png'
feeds = [(u'Breaking News', u'http://www.posttoday.com/rss/src/breakingnews.xml'), (u'\u0e02\u0e48\u0e32\u0e27', u'http://www.posttoday.com/rss/src/news.xml'), (u'\u0e27\u0e34\u0e40\u0e04\u0e23\u0e32\u0e30\u0e2b\u0e4c', u'http://www.posttoday.com/rss/src/analyse.xml'), (u'\u0e40\u0e21\u0e32\u0e17\u0e4c\u0e01\u0e31\u0e19\u0e43\u0e2b\u0e49 z', u'http://www.posttoday.com/rss/src/mouth.xml'), (u'\u0e44\u0e17\u0e22\u0e42\u0e0b\u0e44\u0e0b\u0e15\u0e35\u0e49', u'http://www.posttoday.com/rss/src/thaisociety.xml'), (u'\u0e44\u0e25\u0e1f\u0e4c\u0e2a\u0e44\u0e15\u0e25\u0e4c', u'http://www.posttoday.com/rss/src/lifestyle.xml'), (u'\u0e0a\u0e35\u0e49\u0e0a\u0e48\u0e2d\u0e07\u0e23\u0e27\u0e22', u'http://www.posttoday.com/rss/src/moneyguide.xml'), (u'\u0e1a\u0e49\u0e32\u0e19-\u0e04\u0e2d\u0e19\u0e42\u0e14', u'http://www.posttoday.com/rss/src/homecondo.xml'), (u'\u0e22\u0e32\u0e19\u0e22\u0e19\u0e15\u0e4c', u'http://www.posttoday.com/rss/src/motor.xml'), (u'\u0e14\u0e34\u0e08\u0e34\u0e15\u0e2d\u0e25\u0e44\u0e25\u0e1f\u0e4c', u'http://www.posttoday.com/rss/src/digitallife.xml'), (u'\u0e01\u0e35\u0e2c\u0e32', u'http://www.posttoday.com/rss/src/sport.xml'), (u'\u0e23\u0e2d\u0e1a\u0e42\u0e25\u0e01', u'http://www.posttoday.com/rss/src/world.xml'), (u'\u0e01\u0e34\u0e19-\u0e40\u0e17\u0e35\u0e48\u0e22\u0e27', u'http://www.posttoday.com/rss/src/eattravel.xml'), (u'Mind & Soul', u'http://www.posttoday.com/rss/src/mindsoul.xml'), (u'\u0e1a\u0e25\u0e47\u0e2d\u0e01 \u0e1a\u0e01.', u'http://www.posttoday.com/rss/src/blogs.xml')]
keep_only_tags = []
keep_only_tags.append(dict(name = 'div', attrs = {'class' :
'articleContents'}))
remove_tags = []
remove_tags.append(dict(name = 'label'))
remove_tags.append(dict(name = 'span'))
remove_tags.append(dict(name = 'div', attrs = {'class' :
'socialBookmark'}))
remove_tags.append(dict(name = 'div', attrs = {'class' :
'misc'}))

View File

@ -0,0 +1,49 @@
from calibre.web.feeds.news import BasicNewsRecipe
class AdvancedUserRecipe1286819935(BasicNewsRecipe):
title = u'RBC.ru'
__author__ = 'A. Chewi'
oldest_article = 7
max_articles_per_feed = 100
no_stylesheets = True
use_embedded_content = False
conversion_options = {'linearize_tables' : True}
remove_attributes = ['style']
language = 'ru'
timefmt = ' [%a, %d %b, %Y]'
keep_only_tags = [dict(name='h2', attrs={}),
dict(name='div', attrs={'class': 'box _ga1_on_'}),
dict(name='h1', attrs={'class': 'news_section'}),
dict(name='div', attrs={'class': 'news_body dotted_border_bottom'}),
dict(name='table', attrs={'class': 'newsBody'}),
dict(name='h2', attrs={'class': 'black'})]
feeds = [(u'Главные новости', u'http://static.feed.rbc.ru/rbc/internal/rss.rbc.ru/rbc.ru/mainnews.rss'),
(u'Политика', u'http://static.feed.rbc.ru/rbc/internal/rss.rbc.ru/rbc.ru/politics.rss'),
(u'Экономика', u'http://static.feed.rbc.ru/rbc/internal/rss.rbc.ru/rbc.ru/economics.rss'),
(u'Общество', u'http://static.feed.rbc.ru/rbc/internal/rss.rbc.ru/rbc.ru/society.rss'),
(u'Происшествия', u'http://static.feed.rbc.ru/rbc/internal/rss.rbc.ru/rbc.ru/incidents.rss'),
(u'Финансовые новости Quote.rbc.ru', u'http://static.feed.rbc.ru/rbc/internal/rss.rbc.ru/quote.ru/mainnews.rss')]
remove_tags = [dict(name='div', attrs={'class': "video-frame"}),
dict(name='div', attrs={'class': "photo-container videoContainer videoSWFLinks videoPreviewSlideContainer notes"}),
dict(name='div', attrs={'class': "notes"}),
dict(name='div', attrs={'class': "publinks"}),
dict(name='a', attrs={'class': "print"}),
dict(name='div', attrs={'class': "photo-report_new notes newslider"}),
dict(name='div', attrs={'class': "videoContainer"}),
dict(name='div', attrs={'class': "videoPreviewSlideContainer"}),
dict(name='a', attrs={'class': "videoPreviewContainer"}),
dict(name='a', attrs={'class': "red"}),]
def preprocess_html(self, soup):
for alink in soup.findAll('a'):
if alink.string is not None:
tstr = alink.string
alink.replaceWith(tstr)
return soup
def print_version(self, url):
return url + '?print=true'

View File

@ -0,0 +1,136 @@
import re
import urllib2
import time
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup, SoupStrainer
from calibre import strftime
'''
Help Needed:
Still can't figure out why I'm getting strange characters. Esp. the Great Movies descriptions in the TOC.
Anyone help me figure that out?
Change Log:
2011-02-19: Version 2: Added "Oscars" section and fixed date problem
'''
class Ebert(BasicNewsRecipe):
title = 'Roger Ebert'
__author__ = 'Shane Erstad'
version = 2
description = 'Roger Ebert Movie Reviews'
publisher = 'Chicago Sun Times'
category = 'movies'
oldest_article = 8
max_articles_per_feed = 100
no_stylesheets = True
use_embedded_content = False
encoding = 'UTF-8'
masthead_url = 'http://rogerebert.suntimes.com/graphics/global/roger.jpg'
language = 'en'
remove_empty_feeds = False
PREFIX = 'http://rogerebert.suntimes.com'
patternReviews = r'<span class="*?movietitle"*?>(.*?)</span>.*?<div class="*?headline"*?>(.*?)</div>(.*?)</div>'
patternCommentary = r'<div class="*?headline"*?>.*?(<a href="/apps/pbcs.dll/article\?AID=.*?COMMENTARY.*?" id="ltred">.*?</a>).*?<div class="blurb clear">(.*?)</div>'
patternPeople = r'<div class="*?headline"*?>.*?(<a href="/apps/pbcs.dll/article\?AID=.*?PEOPLE.*?" id="ltred">.*?</a>).*?<div class="blurb clear">(.*?)</div>'
patternOscars = r'<div class="*?headline"*?>.*?(<a href="/apps/pbcs.dll/article\?AID=.*?OSCARS.*?" id="ltred">.*?</a>).*?<div class="blurb clear">(.*?)</div>'
patternGlossary = r'<div class="*?headline"*?>.*?(<a href="/apps/pbcs.dll/article\?AID=.*?GLOSSARY.*?" id="ltred">.*?</a>).*?<div class="blurb clear">(.*?)</div>'
conversion_options = {
'comment' : description
, 'tags' : category
, 'publisher' : publisher
, 'language' : language
, 'linearize_tables' : True
}
feeds = [
(u'Reviews' , u'http://rogerebert.suntimes.com/apps/pbcs.dll/section?category=reviews' )
,(u'Commentary' , u'http://rogerebert.suntimes.com/apps/pbcs.dll/section?category=COMMENTARY')
,(u'Great Movies' , u'http://rogerebert.suntimes.com/apps/pbcs.dll/section?category=REVIEWS08')
,(u'People' , u'http://rogerebert.suntimes.com/apps/pbcs.dll/section?category=PEOPLE')
,(u'Oscars' , u'http://rogerebert.suntimes.com/apps/pbcs.dll/section?category=OSCARS')
,(u'Glossary' , u'http://rogerebert.suntimes.com/apps/pbcs.dll/section?category=GLOSSARY')
]
preprocess_regexps = [
(re.compile(r'<font.*?>.*?This is a printer friendly.*?</font>.*?<hr>', re.DOTALL|re.IGNORECASE),
lambda m: '')
]
def print_version(self, url):
return url + '&template=printart'
def parse_index(self):
totalfeeds = []
lfeeds = self.get_feeds()
for feedobj in lfeeds:
feedtitle, feedurl = feedobj
self.log('\tFeedurl: ', feedurl)
self.report_progress(0, _('Fetching feed')+' %s...'%(feedtitle if feedtitle else feedurl))
articles = []
page = urllib2.urlopen(feedurl).read()
if feedtitle == 'Reviews' or feedtitle == 'Great Movies':
pattern = self.patternReviews
elif feedtitle == 'Commentary':
pattern = self.patternCommentary
elif feedtitle == 'People':
pattern = self.patternPeople
elif feedtitle == 'Glossary':
pattern = self.patternGlossary
elif feedtitle == 'Oscars':
pattern = self.patternOscars
regex = re.compile(pattern, re.IGNORECASE|re.DOTALL)
for match in regex.finditer(page):
if feedtitle == 'Reviews' or feedtitle == 'Great Movies':
movietitle = match.group(1)
thislink = match.group(2)
description = match.group(3)
elif feedtitle == 'Commentary' or feedtitle == 'People' or feedtitle == 'Glossary' or feedtitle == 'Oscars':
thislink = match.group(1)
description = match.group(2)
self.log(thislink)
for link in BeautifulSoup(thislink, parseOnlyThese=SoupStrainer('a')):
thisurl = self.PREFIX + link['href']
thislinktext = self.tag_to_string(link)
if feedtitle == 'Reviews' or feedtitle == 'Great Movies':
thistitle = movietitle
elif feedtitle == 'Commentary' or feedtitle == 'People' or feedtitle == 'Glossary' or feedtitle == 'Oscars':
thistitle = thislinktext
if thistitle == '':
continue
pattern2 = r'AID=\/(.*?)\/'
reg2 = re.compile(pattern2, re.IGNORECASE|re.DOTALL)
match2 = reg2.search(thisurl)
if match2:
c = time.strptime(match2.group(1),"%Y%m%d")
mydate=strftime("%A, %B %d, %Y", c)
else:
mydate = strftime("%A, %B %d, %Y")
self.log(mydate)
articles.append({
'title' :thistitle
,'date' :' [' + mydate + ']'
,'url' :thisurl
,'description':description
})
totalfeeds.append((feedtitle, articles))
return totalfeeds

View File

@ -0,0 +1,59 @@
# -*- coding: utf-8 -*-
#!/usr/bin/env python
__license__ = 'GPL v3'
__copyright__ = u'2011, Silviu Cotoar\u0103'
'''
romanialibera.ro
'''
from calibre.web.feeds.news import BasicNewsRecipe
class RomaniaLibera(BasicNewsRecipe):
title = u'Rom\u00e2nia Liber\u0103'
__author__ = u'Silviu Cotoar\u0103'
description = u'Rom\u00e2nia Liber\u0103'
publisher = u'Rom\u00e2nia Liber\u0103'
oldest_article = 5
language = 'ro'
max_articles_per_feed = 100
no_stylesheets = True
use_embedded_content = False
category = 'Ziare,Stiri'
encoding = 'utf-8'
cover_url = 'http://www.romanialibera.ro/templates/lilac/images/sigla_1.gif'
conversion_options = {
'comments' : description
,'tags' : category
,'language' : language
,'publisher' : publisher
}
keep_only_tags = [
dict(name='div', attrs={'id':'articol'})
]
remove_tags = [
dict(name='div', attrs={'id':['art_actions']})
, dict(name='div', attrs={'class':['stats']})
, dict(name='div', attrs={'class':['data']})
, dict(name='div', attrs={'class':['autori']})
, dict(name='div', attrs={'class':['banda_explicatii_text']})
, dict(name='td', attrs={'class':['connect_widget_vertical_center connect_widget_button_cell']})
, dict(name='div', attrs={'class':['aceeasi_tema']})
, dict(name='div', attrs={'class':['art_after_text']})
, dict(name='div', attrs={'class':['navigare']})
, dict(name='div', attrs={'id':['art_text_left']})
]
remove_tags_after = [
dict(name='div', attrs={'class':'art_after_text'})
]
feeds = [
(u'Feeds', u'http://www.romanialibera.ro/rss.xml')
]
def preprocess_html(self, soup):
return self.adeify_images(soup)

View File

@ -69,12 +69,16 @@ class SeattleTimes(BasicNewsRecipe):
u'http://seattletimes.nwsource.com/rss/mostreadarticles.xml'),
]
keep_only_tags = [dict(id='content')]
remove_tags = [
dict(name=['object','link','script'])
,dict(name='p', attrs={'class':'permission'})
dict(name=['object','link','script']),
{'class':['permission', 'note', 'bottomtools',
'homedelivery']},
dict(id=["rightcolumn", 'footer', 'adbottom']),
]
def print_version(self, url):
return url
start_url, sep, rest_url = url.rpartition('_')
rurl, rsep, article_id = start_url.rpartition('/')
return u'http://seattletimes.nwsource.com/cgi-bin/PrintStory.pl?document_id=' + article_id

View File

@ -0,0 +1,55 @@
# -*- coding: utf-8 -*-
#!/usr/bin/env python
__license__ = 'GPL v3'
__copyright__ = u'2011, Silviu Cotoar\u0103'
'''
sfin.ro
'''
from calibre.web.feeds.news import BasicNewsRecipe
class Sfin(BasicNewsRecipe):
title = u'S\u0103pt\u0103m\u00e2na Financiar\u0103'
__author__ = u'Silviu Cotoar\u0103'
description = 'SFIN'
publisher = 'SFIN'
oldest_article = 5
language = 'ro'
max_articles_per_feed = 100
no_stylesheets = True
use_embedded_content = False
category = 'Ziare,Stiri,Economie,Business'
encoding = 'utf-8'
cover_url = 'http://img.9am.ro/images/logo_surse/saptamana_financiara.gif'
conversion_options = {
'comments' : description
,'tags' : category
,'language' : language
,'publisher' : publisher
}
keep_only_tags = [
dict(name='div', attrs={'class':'col2ContentLeft'})
, dict(name='div', attrs={'id':'contentArticol'})
]
remove_tags = [
dict(name='div', attrs={'class':['infoArticol']})
, dict(name='div', attrs={'class':['separator']})
, dict(name='div', attrs={'class':['tags']})
, dict(name='div', attrs={'id':['comments']})
, dict(name='div', attrs={'class':'boxForm'})
]
remove_tags_after = [
dict(name='div', attrs={'class':'tags'})
]
feeds = [
(u'Feeds', u'http://www.sfin.ro/rss')
]
def preprocess_html(self, soup):
return self.adeify_images(soup)

View File

@ -10,12 +10,14 @@ class AdvancedUserRecipe1278049615(BasicNewsRecipe):
max_articles_per_feed = 100
feeds = [(u'News', u'http://www.statesman.com/section-rss.do?source=news&includeSubSections=true'),
(u'Business', u'http://www.statesman.com/section-rss.do?source=business&includeSubSections=true'),
(u'Life', u'http://www.statesman.com/section-rss.do?source=life&includesubsection=true'),
(u'Editorial', u'http://www.statesman.com/section-rss.do?source=opinion&includesubsections=true'),
(u'Sports', u'http://www.statesman.com/section-rss.do?source=sports&includeSubSections=true')
]
feeds = [(u'News',
u'http://www.statesman.com/section-rss.do?source=news&includeSubSections=true'),
(u'Local', u'http://www.statesman.com/section-rss.do?source=local&includeSubSections=true'),
(u'Business', u'http://www.statesman.com/section-rss.do?source=business&includeSubSections=true'),
(u'Life', u'http://www.statesman.com/section-rss.do?source=life&includesubsection=true'),
(u'Editorial', u'http://www.statesman.com/section-rss.do?source=opinion&includesubsections=true'),
(u'Sports', u'http://www.statesman.com/section-rss.do?source=sports&includeSubSections=true')
]
masthead_url = "http://www.statesman.com/images/cmg-logo.gif"
#temp_files = []
#articles_are_obfuscated = True
@ -28,8 +30,11 @@ class AdvancedUserRecipe1278049615(BasicNewsRecipe):
conversion_options = {'linearize_tables':True}
remove_tags = [
dict(name='div', attrs={'id':'cxArticleOptions'}),
{'class':['perma', 'comments', 'trail', 'share-buttons',
'toggle_show_on']},
]
keep_only_tags = [
dict(name='div', attrs={'class':'cxArticleHeader'}),
dict(name='div', attrs={'id':'cxArticleBodyText'}),
dict(name='div', attrs={'class':'cxArticleHeader'}),
dict(name='div', attrs={'id':['cxArticleBodyText',
'content']}),
]

View File

@ -0,0 +1,51 @@
# -*- coding: utf-8 -*-
#!/usr/bin/env python
__license__ = 'GPL v3'
__copyright__ = u'2011, Silviu Cotoar\u0103'
'''
superbebe.ro
'''
from calibre.web.feeds.news import BasicNewsRecipe
class Superbebe(BasicNewsRecipe):
title = u'Superbebe'
__author__ = u'Silviu Cotoar\u0103'
description = 'Superbebe'
publisher = 'Superbebe'
oldest_article = 5
language = 'ro'
max_articles_per_feed = 100
no_stylesheets = True
use_embedded_content = False
category = 'Ziare,Reviste,Bebe,Mamici'
encoding = 'utf-8'
cover_url = 'http://www.superbebe.ro/images/superbebe.gif'
conversion_options = {
'comments' : description
,'tags' : category
,'language' : language
,'publisher' : publisher
}
keep_only_tags = [
dict(name='div', attrs={'class':'articol'})
]
remove_tags = [
dict(name='div', attrs={'class':['info']})
, dict(name='div', attrs={'class':['tags']})
]
remove_tags_after = [
dict(name='div', attrs={'class':['tags']})
]
feeds = [
(u'Feeds', u'http://www.superbebe.ro/rss')
]
def preprocess_html(self, soup):
return self.adeify_images(soup)

View File

@ -0,0 +1,25 @@
#!/usr/bin/env python
__license__ = 'GPL v3'
__copyright__ = u'2011, Tomasz Dlugosz <tomek3d@gmail.com>'
'''
swiatczytnikow.pl
'''
import re
from calibre.web.feeds.news import BasicNewsRecipe
class swiatczytnikow(BasicNewsRecipe):
title = u'Swiat Czytnikow'
description = u'Czytniki e-książek w Polsce. Jak wybrać, kupić i korzystać z Amazon Kindle i innych'
language = 'pl'
__author__ = u'Tomasz D\u0142ugosz'
oldest_article = 7
max_articles_per_feed = 100
feeds = [(u'Świat Czytników - wpisy', u'http://swiatczytnikow.pl/feed')]
remove_tags = [dict(name = 'ul', attrs = {'class' : 'similar-posts'})]
preprocess_regexps = [(re.compile(u'<h3>Czytaj dalej:</h3>'), lambda match: '')]

View File

@ -0,0 +1,54 @@
# -*- coding: utf-8 -*-
#!/usr/bin/env python
__license__ = 'GPL v3'
__copyright__ = u'2011, Silviu Cotoar\u0103'
'''
tabu.ro
'''
from calibre.web.feeds.news import BasicNewsRecipe
class TabuRo(BasicNewsRecipe):
title = u'Tabu'
__author__ = u'Silviu Cotoar\u0103'
description = 'Cel mai curajos site de femei'
publisher = 'Tabu'
oldest_article = 5
language = 'ro'
max_articles_per_feed = 100
no_stylesheets = True
use_embedded_content = False
category = 'Ziare,Reviste,Femei'
encoding = 'utf-8'
cover_url = 'http://www.tabu.ro/img/tabu-logo2.png'
conversion_options = {
'comments' : description
,'tags' : category
,'language' : language
,'publisher' : publisher
}
keep_only_tags = [
dict(name='div', attrs={'id':'Article'}),
]
remove_tags = [
dict(name='div', attrs={'id':['advertisementArticle']}),
dict(name='div', attrs={'class':'voting_number'}),
dict(name='div', attrs={'id':'number_votes'}),
dict(name='div', attrs={'id':'rating_one'}),
dict(name='div', attrs={'class':'float: right;'})
]
remove_tags_after = [
dict(name='div', attrs={'id':'comments'}),
]
feeds = [
(u'Feeds', u'http://www.tabu.ro/rss_all.xml')
]
def preprocess_html(self, soup):
return self.adeify_images(soup)

View File

@ -0,0 +1,17 @@
from calibre.web.feeds.news import BasicNewsRecipe
class AdvancedUserRecipe1299054026(BasicNewsRecipe):
title = u'Thai Post Daily'
__author__ = 'Chotechai P.'
oldest_article = 7
max_articles_per_feed = 100
cover_url = 'http://upload.wikimedia.org/wikipedia/th/1/10/ThaiPost_Logo.png'
feeds = [(u'\u0e02\u0e48\u0e32\u0e27\u0e2b\u0e19\u0e49\u0e32\u0e2b\u0e19\u0e36\u0e48\u0e07', u'http://thaipost.net/taxonomy/term/1/all/feed'), (u'\u0e1a\u0e17\u0e1a\u0e23\u0e23\u0e13\u0e32\u0e18\u0e34\u0e01\u0e32\u0e23', u'http://thaipost.net/taxonomy/term/11/all/feed'), (u'\u0e40\u0e1b\u0e25\u0e27 \u0e2a\u0e35\u0e40\u0e07\u0e34\u0e19', u'http://thaipost.net/taxonomy/term/2/all/feed'), (u'\u0e2a\u0e20\u0e32\u0e1b\u0e23\u0e30\u0e0a\u0e32\u0e0a\u0e19', u'http://thaipost.net/taxonomy/term/3/all/feed'), (u'\u0e16\u0e39\u0e01\u0e17\u0e38\u0e01\u0e02\u0e49\u0e2d', u'http://thaipost.net/taxonomy/term/4/all/feed'), (u'\u0e01\u0e32\u0e23\u0e40\u0e21\u0e37\u0e2d\u0e07', u'http://thaipost.net/taxonomy/term/5/all/feed'), (u'\u0e17\u0e48\u0e32\u0e19\u0e02\u0e38\u0e19\u0e19\u0e49\u0e2d\u0e22', u'http://thaipost.net/taxonomy/term/12/all/feed'), (u'\u0e1a\u0e17\u0e04\u0e27\u0e32\u0e21\u0e1e\u0e34\u0e40\u0e28\u0e29', u'http://thaipost.net/taxonomy/term/66/all/feed'), (u'\u0e23\u0e32\u0e22\u0e07\u0e32\u0e19\u0e1e\u0e34\u0e40\u0e28\u0e29', u'http://thaipost.net/taxonomy/term/67/all/feed'), (u'\u0e1a\u0e31\u0e19\u0e17\u0e36\u0e01\u0e2b\u0e19\u0e49\u0e32 4', u'http://thaipost.net/taxonomy/term/13/all/feed'), (u'\u0e40\u0e2a\u0e35\u0e22\u0e1a\u0e0b\u0e36\u0e48\u0e07\u0e2b\u0e19\u0e49\u0e32', u'http://thaipost.net/taxonomy/term/64/all/feed'), (u'\u0e04\u0e31\u0e19\u0e1b\u0e32\u0e01\u0e2d\u0e22\u0e32\u0e01\u0e40\u0e25\u0e48\u0e32', u'http://thaipost.net/taxonomy/term/65/all/feed'), (u'\u0e40\u0e28\u0e23\u0e29\u0e10\u0e01\u0e34\u0e08', u'http://thaipost.net/taxonomy/term/6/all/feed'), (u'\u0e01\u0e23\u0e30\u0e08\u0e01\u0e44\u0e23\u0e49\u0e40\u0e07\u0e32', u'http://thaipost.net/taxonomy/term/14/all/feed'), (u'\u0e01\u0e23\u0e30\u0e08\u0e01\u0e2b\u0e31\u0e01\u0e21\u0e38\u0e21', u'http://thaipost.net/taxonomy/term/71/all/feed'), (u'\u0e04\u0e34\u0e14\u0e40\u0e2b\u0e19\u0e37\u0e2d\u0e01\u0e23\u0e30\u0e41\u0e2a', u'http://thaipost.net/taxonomy/term/69/all/feed'), (u'\u0e23\u0e32\u0e22\u0e07\u0e32\u0e19', u'http://thaipost.net/taxonomy/term/68/all/feed'), (u'\u0e2d\u0e34\u0e42\u0e04\u0e42\u0e1f\u0e01\u0e31\u0e2a', u'http://thaipost.net/taxonomy/term/10/all/feed'), (u'\u0e01\u0e32\u0e23\u0e28\u0e36\u0e01\u0e29\u0e32-\u0e2a\u0e32\u0e18\u0e32\u0e23\u0e13\u0e2a\u0e38\u0e02', u'http://thaipost.net/taxonomy/term/7/all/feed'), (u'\u0e15\u0e48\u0e32\u0e07\u0e1b\u0e23\u0e30\u0e40\u0e17\u0e28', u'http://thaipost.net/taxonomy/term/8/all/feed'), (u'\u0e01\u0e35\u0e2c\u0e32', u'http://thaipost.net/taxonomy/term/9/all/feed')]
def print_version(self, url):
return url.replace(url, 'http://www.thaipost.net/print/' + url [32:])
remove_tags = []
remove_tags.append(dict(name = 'div', attrs = {'class' : 'print-logo'}))
remove_tags.append(dict(name = 'div', attrs = {'class' : 'print-site_name'}))
remove_tags.append(dict(name = 'div', attrs = {'class' : 'print-breadcrumb'}))

View File

@ -0,0 +1,56 @@
# -*- coding: utf-8 -*-
#!/usr/bin/env python
__license__ = 'GPL v3'
__copyright__ = u'2011, Silviu Cotoar\u0103'
'''
unica.ro
'''
from calibre.web.feeds.news import BasicNewsRecipe
class Unica(BasicNewsRecipe):
title = u'Unica'
__author__ = u'Silviu Cotoar\u0103'
description = 'Asa cum esti tu'
publisher = 'Unica'
oldest_article = 5
language = 'ro'
max_articles_per_feed = 100
no_stylesheets = True
use_embedded_content = False
category = 'Ziare,Reviste,Femei'
encoding = 'utf-8'
cover_url = 'http://www.unica.ro/fileadmin/images/logo.gif'
conversion_options = {
'comments' : description
,'tags' : category
,'language' : language
,'publisher' : publisher
}
keep_only_tags = [
dict(name='div', attrs={'id':'sticky'})
, dict(name='p', attrs={'class':'bodytext'})
]
remove_tags = [
dict(name='div', attrs={'class':['top-links']})
, dict(name='div', attrs={'id':['autor_name']})
, dict(name='div', attrs={'class':['box-r']})
, dict(name='div', attrs={'class':['category']})
, dict(name='div', attrs={'class':['data']})
]
remove_tags_after = [
dict(name='ul', attrs={'class':'pager'})
]
feeds = [
(u'Feeds', u'http://www.unica.ro/rss.html')
]
def preprocess_html(self, soup):
return self.adeify_images(soup)

View File

@ -2,9 +2,9 @@
# -*- coding: utf-8 mode: python -*-
__license__ = 'GPL v3'
__copyright__ = '2010, Steffen Siebert <calibre at steffensiebert.de>'
__copyright__ = '2010-2011, Steffen Siebert <calibre at steffensiebert.de>'
__docformat__ = 'restructuredtext de'
__version__ = '1.1'
__version__ = '1.2'
"""
Die Zeit EPUB
@ -13,21 +13,43 @@ Die Zeit EPUB
import os, urllib2, zipfile, re
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ptempfile import PersistentTemporaryFile
from calibre import walk
class ZeitEPUBAbo(BasicNewsRecipe):
title = u'Zeit Online Premium'
title = u'Die Zeit'
description = u'Das EPUB Abo der Zeit (needs subscription)'
language = 'de'
lang = 'de-DE'
__author__ = 'Steffen Siebert'
__author__ = 'Steffen Siebert and Tobias Isenberg'
needs_subscription = True
conversion_options = {
'no_default_epub_cover' : True
'no_default_epub_cover' : True,
# fixing the wrong left margin
'mobi_ignore_margins' : True,
}
preprocess_regexps = [
# filtering for correct dashes
(re.compile(r' - '), lambda match: ' '), # regular "Gedankenstrich"
(re.compile(r' -,'), lambda match: ' ,'), # "Gedankenstrich" before a comma
(re.compile(r'(?<=\d)-(?=\d)'), lambda match: ''), # number-number
# filtering for unicode characters that are missing on the Kindle,
# try to replace them with meaningful work-arounds
(re.compile(u'\u2080'), lambda match: '<span style="font-size: 50%;">0</span>'), # subscript-0
(re.compile(u'\u2081'), lambda match: '<span style="font-size: 50%;">1</span>'), # subscript-1
(re.compile(u'\u2082'), lambda match: '<span style="font-size: 50%;">2</span>'), # subscript-2
(re.compile(u'\u2083'), lambda match: '<span style="font-size: 50%;">3</span>'), # subscript-3
(re.compile(u'\u2084'), lambda match: '<span style="font-size: 50%;">4</span>'), # subscript-4
(re.compile(u'\u2085'), lambda match: '<span style="font-size: 50%;">5</span>'), # subscript-5
(re.compile(u'\u2086'), lambda match: '<span style="font-size: 50%;">6</span>'), # subscript-6
(re.compile(u'\u2087'), lambda match: '<span style="font-size: 50%;">7</span>'), # subscript-7
(re.compile(u'\u2088'), lambda match: '<span style="font-size: 50%;">8</span>'), # subscript-8
(re.compile(u'\u2089'), lambda match: '<span style="font-size: 50%;">9</span>'), # subscript-9
]
def build_index(self):
domain = "http://premium.zeit.de"
url = domain + "/abovorteile/cgi-bin/_er_member/p4z.fpl?ER_Do=getUserData&ER_NextTemplate=login_ok"
@ -55,9 +77,36 @@ class ZeitEPUBAbo(BasicNewsRecipe):
zfile.extractall(self.output_dir)
tmp.close()
index = os.path.join(self.output_dir, 'content.opf')
self.report_progress(1,_('epub downloaded and extracted'))
# doing regular expression filtering
for path in walk('.'):
(shortname, extension) = os.path.splitext(path)
if extension.lower() in ('.html', '.htm', '.xhtml'):
with open(path, 'r+b') as f:
raw = f.read()
raw = raw.decode('utf-8')
for pat, func in self.preprocess_regexps:
raw = pat.sub(func, raw)
f.seek(0)
f.truncate()
f.write(raw.encode('utf-8'))
# adding real cover
self.report_progress(0,_('trying to download cover image (titlepage)'))
self.download_cover()
self.conversion_options["cover"] = self.cover_path
return index
# getting url of the cover
def get_cover_url(self):
try:
inhalt = self.index_to_soup('http://www.zeit.de/inhalt')
cover_url = inhalt.find('div', attrs={'class':'singlearchive clearfix'}).img['src'].replace('icon_','')
except:
cover_url = 'http://images.zeit.de/bilder/titelseiten_zeit/1946/001_001.jpg'
return cover_url

Some files were not shown because too many files have changed in this diff Show More