Sync to trunk.
165
Changelog.yaml
@ -19,6 +19,169 @@
|
||||
# new recipes:
|
||||
# - title:
|
||||
|
||||
- version: 0.7.48
|
||||
date: 2011-03-04
|
||||
|
||||
new features:
|
||||
- title: "Changes to the internal database structure used by calibre"
|
||||
description: >
|
||||
"These changes will allow calibre, in the future, to support book language, arbitrary book identifiers and keep track of when the metadata for a book was last modified. WARNING: Because of these changes, if you downgrade calibre versions after upgrading to 0.7.48, you will lose any changes you make to the ISBN of book entries in your calibre database, so do not downgrade unless you really have to. Also note that the first time you start calibre after this update, the startup will be slow as the database structure is being changed."
|
||||
|
||||
- title: "Launch of a new website that catalogues DRM free ebooks. http://drmfree.calibre-ebook.com"
|
||||
description: "A growing catalogue of DRM free ebooks. Ebooks that you actually own after paying, instead of just renting."
|
||||
type: major
|
||||
|
||||
- title: "News download: Add an option to keep at most x issues of a particular periodical in the calibre library. Use the Advanced tab in the Fetch news dialog for your news source to set this option."
|
||||
tickets: [9168]
|
||||
|
||||
- title: "You can now right click on the cover in the book details panel to copy/paste a new cover."
|
||||
tickets: [9255]
|
||||
|
||||
- title: "Add an entry to the add books drop down menu to easily add formats to an existing book record"
|
||||
|
||||
- title: "Tag browser: Clicking on a nested category now searches for the category alone. Clicking twice searches for the category and all its descendants and so on."
|
||||
tickets: [9166, 9169]
|
||||
|
||||
- title: "Add a button to the Manage authors dialog to copy author sort values to author"
|
||||
|
||||
- title: "Decrease startup times on large libraries by using a faster algorithm to parse stored dates"
|
||||
|
||||
- title: "Add quick create links to easily create custom columns of commonly used types to the add custom column dialog"
|
||||
|
||||
- title: "Allow drag drop of images to change cover in book details window."
|
||||
tickets: [9226]
|
||||
|
||||
- title: "Device susbsytem: Create a drive info file named driveinfo.calibre in the root of each device drive for USB connected devices. This file contains various useful data. API Change: The open method of the device plugins now accepts an extra parameter library_uuid which is the id of the calibre library connected tot eh device"
|
||||
|
||||
bug fixes:
|
||||
- title: "Conversion pipeline: Fix regression in 0.7.46 that caused loss of some CSS information when converting HTML produced by Microsoft Word. Also remove empty tags from microsoft namespaces when parsing HTML"
|
||||
|
||||
- title: "Try harder to ensure that the worker log temporary files are deleted in windows"
|
||||
|
||||
- title: "CHM Input: Handle CHM files that dont specify a topics file."
|
||||
tickets: [9253]
|
||||
|
||||
- title: "Fix regression that caused memory leak in Tag Browser. This would show up as the memory usage of calibre increasing when switching libraries."
|
||||
tickets: [9246]
|
||||
|
||||
- title: "Fix bug that caused preferences->behavior to not show the output format set by the welcome wizard, and instead default to showing EPUB"
|
||||
|
||||
- title: "Fix bug that caused wrong books to be deleted from library if you choose 'delete from library and device' while the library is sorted by the On device column"
|
||||
|
||||
- title: "MOBI Input: Ignore all ASCII control codes except CR, NL and Tab."
|
||||
tickets: [9219]
|
||||
|
||||
improved recipes:
|
||||
- Credit Slips
|
||||
- Seattle Times
|
||||
- MacWorld
|
||||
- Austin Statesman
|
||||
- EPL Talk
|
||||
- Gawker
|
||||
- Deadspin
|
||||
|
||||
new recipes:
|
||||
- title: "Thai Post Today and Daily Post"
|
||||
author: "Chotechai P."
|
||||
|
||||
- title: "RBC.ru"
|
||||
author: Chewi
|
||||
|
||||
- title: Helsingin Sanomat
|
||||
author: oneillpt
|
||||
|
||||
- title: "LWN Weekly"
|
||||
author: David Cavalca
|
||||
|
||||
- title: "New York Times Sports and Technology Blogs"
|
||||
author: rylsfan
|
||||
|
||||
- title: "Historia and Buctaras"
|
||||
author: Silviu Cotoara
|
||||
|
||||
- title: "Buffalo News"
|
||||
author: ChappyOnIce
|
||||
|
||||
- title: "Dotpod"
|
||||
author: Federico Escalada
|
||||
|
||||
|
||||
|
||||
- version: 0.7.47
|
||||
date: 2011-02-25
|
||||
|
||||
new features:
|
||||
- title: "Tag Browser: Support the creation of nested User Categories"
|
||||
description: "See http://calibre-ebook.com/user_manual/gui.html#tag-browser for details"
|
||||
type: major
|
||||
|
||||
- title: "Disable Kent District Library plugin to download series information. The website could not handle the load calibre's 2 million users put on it. You can manually re-enable it if you really want series information, but it is very slow"
|
||||
|
||||
- title: "Drivers for the Wexler T7001, Archos 7, Wink and Xperia X10"
|
||||
|
||||
- title: "Comic Input: Add option to not add links to individual pages to the Table of Contents when converting CBC files"
|
||||
|
||||
- title: "EPUB Output: Try to ensure that the cover image always has an id='cover' to workaround Nook cover reading bug."
|
||||
tickets: [8182]
|
||||
|
||||
- title: "ODT input: Update odfpy library to latest version, adds support for bookmarks"
|
||||
|
||||
- title: "EPUB Output: Remove unnecessary CSS page breaks as they confuse the latest release of iBooks"
|
||||
|
||||
bug fixes:
|
||||
- title: "Fix regression in 0.7.46 that broke creating date and composite custom columns"
|
||||
|
||||
- title: "Linux binary build: Fix ImageMagick trying to load system modules instead of bundled modules"
|
||||
|
||||
- title: "Kobo driver: Handle missing firmware version file"
|
||||
|
||||
- title: "ODT Input: Do not force the background color to white."
|
||||
tickets: [9118]
|
||||
|
||||
- title: "MOBI Input: Do not speciy text-align for every paragraph. Fixes text-align inheritance issues for newer MOBIs with nested divs."
|
||||
tickets: [9098]
|
||||
|
||||
- title: "EPUB Output: Do not set the file-as attribute on title elements in the OPF as the current OPF spec does not support file-as. Instead use a calibre extension to OPF."
|
||||
tickets: [9109]
|
||||
|
||||
- title: "Content server: Fix regression that broke browsing User Categories via OPDS"
|
||||
tickets: [9090]
|
||||
|
||||
- title: "Update the book details panel after adding books incase automerge is turned on and the current book is affected"
|
||||
tickets: [9073]
|
||||
|
||||
- title: "FB2 Output: Fix paragraph spacing sometime incorrect."
|
||||
tickets: [8927]
|
||||
|
||||
- title: "Tag Browser: Fix generation of search query for authors with quote characters in their names"
|
||||
tickets: [9071]
|
||||
|
||||
- title: "Fix bug that could cause download of cover/social metadata from Amazon to sometimes fail"
|
||||
|
||||
- title: "LRF Input: Workaround for broken LRF files from BookDesigner that have incomplete TextStyle elements"
|
||||
|
||||
improved recipes:
|
||||
- Le Monde
|
||||
- Gizmodo
|
||||
- Lifehacker
|
||||
- ESPN
|
||||
- Adevarul
|
||||
- gsp.ro
|
||||
- Ming Pao
|
||||
|
||||
new recipes:
|
||||
- title: "Flickr Blog"
|
||||
author: Ricardo Jurado
|
||||
|
||||
- title: "Various Romanian news sources"
|
||||
author: Silviu Cotoara
|
||||
|
||||
- title: "Osnews.pl and SwiatCzytnikow"
|
||||
author: Tomasz Dlugosz
|
||||
|
||||
- title: "Roger Ebert Journal"
|
||||
author: Shane Erstad
|
||||
|
||||
- version: 0.7.46
|
||||
date: 2011-02-18
|
||||
|
||||
@ -60,6 +223,8 @@
|
||||
- title: "TXT Input: New paragraph-type option (off) to disable modifying the paragraph structure."
|
||||
|
||||
- title: "Device driver for the Kendo/Yifang M7 and the Wolder Mibuk Life"
|
||||
|
||||
- title: "For people building calibre from source, note that calibre now requires SIP >= 4.12 to build"
|
||||
|
||||
bug fixes:
|
||||
- title: "Fix main memory and storage card for Cybook Orizon being swapped with some firmwares"
|
||||
|
@ -108,8 +108,10 @@ function init() {
|
||||
function toplevel_layout() {
|
||||
var last = $(".toplevel li").last();
|
||||
var title = $('.toplevel h3').first();
|
||||
var bottom = last.position().top + last.height() - title.position().top;
|
||||
$("#main").height(Math.max(200, bottom+75));
|
||||
if (title && title.position()) {
|
||||
var bottom = last.position().top + last.height() - title.position().top;
|
||||
$("#main").height(Math.max(200, bottom+75));
|
||||
}
|
||||
}
|
||||
|
||||
function toplevel() {
|
||||
|
@ -83,7 +83,7 @@ categories_use_field_for_author_name = 'author'
|
||||
# Note that the "r'" in front of the { is necessary if there are backslashes
|
||||
# (\ characters) in the template. It doesn't hurt anything to leave it there
|
||||
# even if there aren't any backslashes.
|
||||
categories_collapsed_name_template = r'{first.sort:shorten(4,"",0)} - {last.sort:shorten(4,"",0)}'
|
||||
categories_collapsed_name_template = r'{first.sort:shorten(4,,0)} - {last.sort:shorten(4,,0)}'
|
||||
categories_collapsed_rating_template = r'{first.avg_rating:4.2f:ifempty(0)} - {last.avg_rating:4.2f:ifempty(0)}'
|
||||
categories_collapsed_popularity_template = r'{first.count:d} - {last.count:d}'
|
||||
|
||||
@ -349,3 +349,9 @@ public_smtp_relay_delay = 301
|
||||
# after a restart of calibre.
|
||||
draw_hidden_section_indicators = True
|
||||
|
||||
#: The maximum width and height for covers saved in the calibre library
|
||||
# All covers in the calibre library will be resized, preserving aspect ratio,
|
||||
# to fit within this size. This is to prevent slowdowns caused by extremely
|
||||
# large covers
|
||||
maximum_cover_size = (1200, 1600)
|
||||
|
||||
|
BIN
resources/images/id_card.png
Normal file
After Width: | Height: | Size: 6.3 KiB |
BIN
resources/images/minusminus.png
Normal file
After Width: | Height: | Size: 1.8 KiB |
BIN
resources/images/news/20minutos.png
Normal file
After Width: | Height: | Size: 800 B |
BIN
resources/images/news/7seri.png
Normal file
After Width: | Height: | Size: 249 B |
BIN
resources/images/news/adevarul.png
Normal file
After Width: | Height: | Size: 401 B |
BIN
resources/images/news/aventurilapescuit.png
Normal file
After Width: | Height: | Size: 627 B |
BIN
resources/images/news/bucataras.png
Normal file
After Width: | Height: | Size: 765 B |
BIN
resources/images/news/capital.png
Normal file
After Width: | Height: | Size: 617 B |
BIN
resources/images/news/catavencu.png
Normal file
After Width: | Height: | Size: 1.6 KiB |
BIN
resources/images/news/chipro.png
Normal file
After Width: | Height: | Size: 181 B |
BIN
resources/images/news/credit_slips.png
Normal file
After Width: | Height: | Size: 4.7 KiB |
BIN
resources/images/news/csid.png
Normal file
After Width: | Height: | Size: 340 B |
BIN
resources/images/news/curierulnational.png
Normal file
After Width: | Height: | Size: 1.3 KiB |
BIN
resources/images/news/descopera.png
Normal file
After Width: | Height: | Size: 686 B |
BIN
resources/images/news/ecuisine.png
Normal file
After Width: | Height: | Size: 501 B |
BIN
resources/images/news/egirl.png
Normal file
After Width: | Height: | Size: 507 B |
BIN
resources/images/news/fhmro.png
Normal file
After Width: | Height: | Size: 836 B |
BIN
resources/images/news/gandul.png
Normal file
After Width: | Height: | Size: 527 B |
BIN
resources/images/news/go4it.png
Normal file
After Width: | Height: | Size: 827 B |
BIN
resources/images/news/gsp.png
Normal file
After Width: | Height: | Size: 367 B |
BIN
resources/images/news/historiaro.png
Normal file
After Width: | Height: | Size: 521 B |
BIN
resources/images/news/hotcity.png
Normal file
After Width: | Height: | Size: 722 B |
BIN
resources/images/news/hotnews.png
Normal file
After Width: | Height: | Size: 722 B |
BIN
resources/images/news/intrefete.png
Normal file
After Width: | Height: | Size: 411 B |
BIN
resources/images/news/jurnalulnational.png
Normal file
After Width: | Height: | Size: 863 B |
BIN
resources/images/news/kudika.png
Normal file
After Width: | Height: | Size: 432 B |
BIN
resources/images/news/lwn_weekly.png
Normal file
After Width: | Height: | Size: 387 B |
BIN
resources/images/news/mediafax.png
Normal file
After Width: | Height: | Size: 657 B |
BIN
resources/images/news/moneyro.png
Normal file
After Width: | Height: | Size: 219 B |
BIN
resources/images/news/nationalgeoro.png
Normal file
After Width: | Height: | Size: 123 B |
BIN
resources/images/news/nytimes_sports.png
Normal file
After Width: | Height: | Size: 2.1 KiB |
BIN
resources/images/news/nytimes_tech.png
Normal file
After Width: | Height: | Size: 11 KiB |
BIN
resources/images/news/prosport.png
Normal file
After Width: | Height: | Size: 272 B |
BIN
resources/images/news/realitatea.png
Normal file
After Width: | Height: | Size: 4.0 KiB |
BIN
resources/images/news/romanialibera.png
Normal file
After Width: | Height: | Size: 222 B |
BIN
resources/images/news/sfin.png
Normal file
After Width: | Height: | Size: 229 B |
BIN
resources/images/news/standardmoney.png
Normal file
After Width: | Height: | Size: 510 B |
BIN
resources/images/news/superbebe.png
Normal file
After Width: | Height: | Size: 307 B |
BIN
resources/images/news/tabu.png
Normal file
After Width: | Height: | Size: 441 B |
BIN
resources/images/news/unica.png
Normal file
After Width: | Height: | Size: 327 B |
BIN
resources/images/news/ziarulfinanciar.png
Normal file
After Width: | Height: | Size: 1.9 KiB |
BIN
resources/images/plusplus.png
Normal file
After Width: | Height: | Size: 4.1 KiB |
BIN
resources/images/tb_folder.png
Normal file
After Width: | Height: | Size: 6.3 KiB |
68
resources/recipes/20minutos.recipe
Normal file
@ -0,0 +1,68 @@
|
||||
__license__ = 'GPL v3'
|
||||
__copyright__ = '2011, Darko Miletic <darko.miletic at gmail.com>'
|
||||
'''
|
||||
www.20minutos.es
|
||||
'''
|
||||
|
||||
from calibre.web.feeds.news import BasicNewsRecipe
|
||||
|
||||
class t20Minutos(BasicNewsRecipe):
|
||||
title = '20 Minutos'
|
||||
__author__ = 'Darko Miletic'
|
||||
description = 'Diario de informacion general y local mas leido de Espania, noticias de ultima hora de Espania, el mundo, local, deportes, noticias curiosas y mas'
|
||||
publisher = '20 Minutos Online SL'
|
||||
category = 'news, politics, Spain'
|
||||
oldest_article = 2
|
||||
max_articles_per_feed = 200
|
||||
no_stylesheets = True
|
||||
encoding = 'utf8'
|
||||
use_embedded_content = True
|
||||
language = 'es'
|
||||
remove_empty_feeds = True
|
||||
publication_type = 'newspaper'
|
||||
masthead_url = 'http://estaticos.20minutos.es/css4/img/ui/logo-301x54.png'
|
||||
extra_css = """
|
||||
body{font-family: Arial,Helvetica,sans-serif }
|
||||
img{margin-bottom: 0.4em; display:block}
|
||||
"""
|
||||
|
||||
conversion_options = {
|
||||
'comment' : description
|
||||
, 'tags' : category
|
||||
, 'publisher' : publisher
|
||||
, 'language' : language
|
||||
}
|
||||
|
||||
remove_tags = [dict(attrs={'class':'mf-viral'})]
|
||||
remove_attributes=['border']
|
||||
|
||||
feeds = [
|
||||
(u'Principal' , u'http://20minutos.feedsportal.com/c/32489/f/478284/index.rss')
|
||||
,(u'Cine' , u'http://20minutos.feedsportal.com/c/32489/f/478285/index.rss')
|
||||
,(u'Internacional' , u'http://20minutos.feedsportal.com/c/32489/f/492689/index.rss')
|
||||
,(u'Deportes' , u'http://20minutos.feedsportal.com/c/32489/f/478286/index.rss')
|
||||
,(u'Nacional' , u'http://20minutos.feedsportal.com/c/32489/f/492688/index.rss')
|
||||
,(u'Economia' , u'http://20minutos.feedsportal.com/c/32489/f/492690/index.rss')
|
||||
,(u'Tecnologia' , u'http://20minutos.feedsportal.com/c/32489/f/478292/index.rss')
|
||||
]
|
||||
|
||||
def preprocess_html(self, soup):
|
||||
for item in soup.findAll(style=True):
|
||||
del item['style']
|
||||
for item in soup.findAll('a'):
|
||||
limg = item.find('img')
|
||||
if item.string is not None:
|
||||
str = item.string
|
||||
item.replaceWith(str)
|
||||
else:
|
||||
if limg:
|
||||
item.name = 'div'
|
||||
item.attrs = []
|
||||
else:
|
||||
str = self.tag_to_string(item)
|
||||
item.replaceWith(str)
|
||||
for item in soup.findAll('img'):
|
||||
if not item.has_key('alt'):
|
||||
item['alt'] = 'image'
|
||||
return soup
|
||||
|
51
resources/recipes/7seri.recipe
Normal file
@ -0,0 +1,51 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
#!/usr/bin/env python
|
||||
|
||||
__license__ = 'GPL v3'
|
||||
__copyright__ = u'2011, Silviu Cotoar\u0103'
|
||||
'''
|
||||
sapteseri.ro
|
||||
'''
|
||||
|
||||
from calibre.web.feeds.news import BasicNewsRecipe
|
||||
|
||||
class SapteSeri(BasicNewsRecipe):
|
||||
title = u'Sapte Seri'
|
||||
__author__ = u'Silviu Cotoar\u0103'
|
||||
description = u'Sapte Seri'
|
||||
publisher = u'Sapte Seri'
|
||||
oldest_article = 5
|
||||
language = 'ro'
|
||||
max_articles_per_feed = 100
|
||||
no_stylesheets = True
|
||||
use_embedded_content = False
|
||||
category = 'Ziare,Oras,Distractie,Fun'
|
||||
encoding = 'utf-8'
|
||||
remove_empty_feeds = True
|
||||
remove_javascript = True
|
||||
cover_url = 'http://www.sapteseri.ro/Images/logo.jpg'
|
||||
|
||||
conversion_options = {
|
||||
'comments' : description
|
||||
,'tags' : category
|
||||
,'language' : language
|
||||
,'publisher' : publisher
|
||||
}
|
||||
|
||||
keep_only_tags = [
|
||||
dict(name='h1', attrs={'id':'title'})
|
||||
, dict(name='div', attrs={'class':'mt10 mb10'})
|
||||
, dict(name='div', attrs={'class':'mb20 mt10'})
|
||||
, dict(name='div', attrs={'class':'mt5 mb20'})
|
||||
]
|
||||
|
||||
remove_tags = [
|
||||
dict(name='div', attrs={'id':['entityimgworking']})
|
||||
]
|
||||
|
||||
feeds = [
|
||||
(u'Ce se intampla azi in Bucuresti', u'http://www.sapteseri.ro/ro/feed/ce-se-intampla-azi/bucuresti/')
|
||||
]
|
||||
|
||||
def preprocess_html(self, soup):
|
||||
return self.adeify_images(soup)
|
@ -32,16 +32,25 @@ class Adevarul(BasicNewsRecipe):
|
||||
}
|
||||
|
||||
keep_only_tags = [ dict(name='div', attrs={'class':'article_header'})
|
||||
,dict(name='div', attrs={'class':'bd'})
|
||||
,dict(name='div', attrs={'class':'bb-tu first-t bb-article-body'})
|
||||
]
|
||||
|
||||
|
||||
remove_tags = [ dict(name='div', attrs={'class':'bb-wg-article_related_attachements'})
|
||||
remove_tags = [
|
||||
dict(name='li', attrs={'class':'author'})
|
||||
,dict(name='li', attrs={'class':'date'})
|
||||
,dict(name='li', attrs={'class':'comments'})
|
||||
,dict(name='div', attrs={'class':'bb-wg-article_related_attachements'})
|
||||
,dict(name='div', attrs={'class':'bb-md bb-md-article_comments'})
|
||||
,dict(name='form', attrs={'id':'bb-comment-create-form'})
|
||||
]
|
||||
,dict(name='form', attrs={'id':'bb-comment-create-form'})
|
||||
,dict(name='div', attrs={'id':'mediatag'})
|
||||
,dict(name='div', attrs={'id':'ft'})
|
||||
,dict(name='div', attrs={'id':'comment_wrapper'})
|
||||
]
|
||||
|
||||
remove_tags_after = [ dict(name='form', attrs={'id':'bb-comment-create-form'}) ]
|
||||
remove_tags_after = [
|
||||
dict(name='div', attrs={'id':'comment_wrapper'}),
|
||||
]
|
||||
|
||||
feeds = [ (u'\u0218tiri', u'http://www.adevarul.ro/rss/latest') ]
|
||||
|
||||
|
51
resources/recipes/aventurilapescuit.recipe
Normal file
@ -0,0 +1,51 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
#!/usr/bin/env python
|
||||
|
||||
__license__ = 'GPL v3'
|
||||
__copyright__ = u'2011, Silviu Cotoar\u0103'
|
||||
'''
|
||||
aventurilapescuit.ro
|
||||
'''
|
||||
|
||||
from calibre.web.feeds.news import BasicNewsRecipe
|
||||
|
||||
class AventuriLaPescuit(BasicNewsRecipe):
|
||||
title = u'Aventuri La Pescuit'
|
||||
__author__ = u'Silviu Cotoar\u0103'
|
||||
description = 'Aventuri La Pescuit'
|
||||
publisher = 'Aventuri La Pescuit'
|
||||
oldest_article = 5
|
||||
language = 'ro'
|
||||
max_articles_per_feed = 100
|
||||
no_stylesheets = True
|
||||
use_embedded_content = False
|
||||
category = 'Ziare,Pescuit,Hobby'
|
||||
encoding = 'utf-8'
|
||||
cover_url = 'http://www.aventurilapescuit.ro/images/logo.gif'
|
||||
|
||||
conversion_options = {
|
||||
'comments' : description
|
||||
,'tags' : category
|
||||
,'language' : language
|
||||
,'publisher' : publisher
|
||||
}
|
||||
|
||||
keep_only_tags = [
|
||||
dict(name='div', attrs={'id':'Article'})
|
||||
]
|
||||
|
||||
remove_tags = [
|
||||
dict(name='div', attrs={'class':['right option']})
|
||||
, dict(name='iframe', attrs={'scrolling':['no']})
|
||||
]
|
||||
|
||||
remove_tags_after = [
|
||||
dict(name='iframe', attrs={'scrolling':['no']})
|
||||
]
|
||||
|
||||
feeds = [
|
||||
(u'Feeds', u'http://www.aventurilapescuit.ro/sections/rssread/1')
|
||||
]
|
||||
|
||||
def preprocess_html(self, soup):
|
||||
return self.adeify_images(soup)
|
56
resources/recipes/bucataras.recipe
Normal file
@ -0,0 +1,56 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
#!/usr/bin/env python
|
||||
|
||||
__license__ = 'GPL v3'
|
||||
__copyright__ = u'2011, Silviu Cotoar\u0103'
|
||||
'''
|
||||
bucataras.ro
|
||||
'''
|
||||
|
||||
from calibre.web.feeds.news import BasicNewsRecipe
|
||||
|
||||
class Bucataras(BasicNewsRecipe):
|
||||
title = u'Bucataras'
|
||||
__author__ = u'Silviu Cotoar\u0103'
|
||||
description = ''
|
||||
publisher = 'Bucataras'
|
||||
oldest_article = 5
|
||||
language = 'ro'
|
||||
max_articles_per_feed = 100
|
||||
no_stylesheets = True
|
||||
use_embedded_content = False
|
||||
category = 'Ziare,Bucatarie,Retete'
|
||||
encoding = 'utf-8'
|
||||
cover_url = 'http://www.bucataras.ro/templates/default/images/pink/logo.jpg'
|
||||
|
||||
conversion_options = {
|
||||
'comments' : description
|
||||
,'tags' : category
|
||||
,'language' : language
|
||||
,'publisher' : publisher
|
||||
}
|
||||
|
||||
keep_only_tags = [
|
||||
dict(name='h1', attrs={'class':'titlu'})
|
||||
, dict(name='div', attrs={'class':'contentL'})
|
||||
, dict(name='div', attrs={'class':'contentBottom'})
|
||||
|
||||
]
|
||||
|
||||
remove_tags = [
|
||||
dict(name='div', attrs={'class':['sociale']})
|
||||
, dict(name='div', attrs={'class':['contentR']})
|
||||
, dict(name='a', attrs={'target':['_self']})
|
||||
, dict(name='div', attrs={'class':['comentarii']})
|
||||
]
|
||||
|
||||
remove_tags_after = [
|
||||
dict(name='div', attrs={'class':['comentarii']})
|
||||
]
|
||||
|
||||
feeds = [
|
||||
(u'Feeds', u'http://www.bucataras.ro/rss/retete/')
|
||||
]
|
||||
|
||||
def preprocess_html(self, soup):
|
||||
return self.adeify_images(soup)
|
58
resources/recipes/buffalo_news.recipe
Normal file
@ -0,0 +1,58 @@
|
||||
__license__ = 'GPL v3'
|
||||
__author__ = 'Todd Chapman'
|
||||
__copyright__ = 'Todd Chapman'
|
||||
__version__ = 'v0.2'
|
||||
__date__ = '2 March 2011'
|
||||
|
||||
'''
|
||||
http://www.buffalonews.com/RSS/
|
||||
'''
|
||||
|
||||
from calibre.web.feeds.news import BasicNewsRecipe
|
||||
|
||||
class AdvancedUserRecipe1298680852(BasicNewsRecipe):
|
||||
title = u'Buffalo News'
|
||||
oldest_article = 2
|
||||
language = 'en'
|
||||
__author__ = 'ChappyOnIce'
|
||||
max_articles_per_feed = 20
|
||||
encoding = 'utf-8'
|
||||
masthead_url = 'http://www.buffalonews.com/buffalonews/skins/buffalonews/images/masthead/the_buffalo_news_logo.png'
|
||||
remove_javascript = True
|
||||
extra_css = 'body {text-align: justify;}\n \
|
||||
p {text-indent: 20px;}'
|
||||
|
||||
keep_only_tags = [
|
||||
dict(name='div', attrs={'class':['main-content-left']})
|
||||
]
|
||||
|
||||
remove_tags = [
|
||||
dict(name='div', attrs={'id':['commentCount']}),
|
||||
dict(name='div', attrs={'class':['story-list-links']})
|
||||
]
|
||||
|
||||
remove_tags_after = dict(name='div', attrs={'class':['body storyContent']})
|
||||
|
||||
feeds = [(u'City of Buffalo', u'http://www.buffalonews.com/city/communities/buffalo/?widget=rssfeed&view=feed&contentId=77944'),
|
||||
(u'Southern Erie County', u'http://www.buffalonews.com/city/communities/southern-erie/?widget=rssfeed&view=feed&contentId=77944'),
|
||||
(u'Eastern Erie County', u'http://www.buffalonews.com/city/communities/eastern-erie/?widget=rssfeed&view=feed&contentId=77944'),
|
||||
(u'Southern Tier', u'http://www.buffalonews.com/city/communities/southern-tier/?widget=rssfeed&view=feed&contentId=77944'),
|
||||
(u'Niagara County', u'http://www.buffalonews.com/city/communities/niagara-county/?widget=rssfeed&view=feed&contentId=77944'),
|
||||
(u'Business', u'http://www.buffalonews.com/business/?widget=rssfeed&view=feed&contentId=77944'),
|
||||
(u'MoneySmart', u'http://www.buffalonews.com/business/moneysmart/?widget=rssfeed&view=feed&contentId=77944'),
|
||||
(u'Bills & NFL', u'http://www.buffalonews.com/sports/bills-nfl/?widget=rssfeed&view=feed&contentId=77944'),
|
||||
(u'Sabres & NHL', u'http://www.buffalonews.com/sports/sabres-nhl/?widget=rssfeed&view=feed&contentId=77944'),
|
||||
(u'Bob DiCesare', u'http://www.buffalonews.com/sports/columns/bob-dicesare/?widget=rssfeed&view=feed&contentId=77944'),
|
||||
(u'Bucky Gleason', u'http://www.buffalonews.com/sports/columns/bucky-gleason/?widget=rssfeed&view=feed&contentId=77944'),
|
||||
(u'Mark Gaughan', u'http://www.buffalonews.com/sports/bills-nfl/inside-the-nfl/?widget=rssfeed&view=feed&contentId=77944'),
|
||||
(u'Mike Harrington', u'http://www.buffalonews.com/sports/columns/mike-harrington/?widget=rssfeed&view=feed&contentId=77944'),
|
||||
(u'Jerry Sullivan', u'http://www.buffalonews.com/sports/columns/jerry-sullivan/?widget=rssfeed&view=feed&contentId=77944'),
|
||||
(u'Other Sports Columns', u'http://www.buffalonews.com/sports/columns/other-sports-columns/?widget=rssfeed&view=feed&contentId=77944'),
|
||||
(u'Life', u'http://www.buffalonews.com/life/?widget=rssfeed&view=feed&contentId=77944'),
|
||||
(u'Bruce Andriatch', u'http://www.buffalonews.com/city/columns/bruce-andriatch/?widget=rssfeed&view=feed&contentId=77944'),
|
||||
(u'Donn Esmonde', u'http://www.buffalonews.com/city/columns/donn-esmonde/?widget=rssfeed&view=feed&contentId=77944'),
|
||||
(u'Rod Watson', u'http://www.buffalonews.com/city/columns/rod-watson/?widget=rssfeed&view=feed&contentId=77944'),
|
||||
(u'Entertainment', u'http://www.buffalonews.com/entertainment/?widget=rssfeed&view=feed&contentId=77944'),
|
||||
(u'Off Main Street', u'http://www.buffalonews.com/city/columns/off-main-street/?widget=rssfeed&view=feed&contentId=77944'),
|
||||
(u'Editorials', u'http://www.buffalonews.com/editorial-page/buffalo-news-editorials/?widget=rssfeed&view=feed&contentId=77944')
|
||||
]
|
52
resources/recipes/chipro.recipe
Normal file
@ -0,0 +1,52 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
#!/usr/bin/env python
|
||||
|
||||
__license__ = 'GPL v3'
|
||||
__copyright__ = u'2011, Silviu Cotoar\u0103'
|
||||
'''
|
||||
chip.ro
|
||||
'''
|
||||
|
||||
from calibre.web.feeds.news import BasicNewsRecipe
|
||||
|
||||
class ChipRo(BasicNewsRecipe):
|
||||
title = u'Chip Online'
|
||||
__author__ = u'Silviu Cotoar\u0103'
|
||||
description = 'Chip Online'
|
||||
publisher = 'Chip Online'
|
||||
oldest_article = 5
|
||||
language = 'ro'
|
||||
max_articles_per_feed = 100
|
||||
no_stylesheets = True
|
||||
use_embedded_content = False
|
||||
category = 'Ziare,Reviste,IT'
|
||||
encoding = 'utf-8'
|
||||
cover_url = 'http://www.chip.ro/images/logo.png'
|
||||
|
||||
conversion_options = {
|
||||
'comments' : description
|
||||
,'tags' : category
|
||||
,'language' : language
|
||||
,'publisher' : publisher
|
||||
}
|
||||
|
||||
keep_only_tags = [
|
||||
dict(name='h2', attrs={'class':'contentheading clearfix'})
|
||||
, dict(name='span', attrs={'class':'createby'})
|
||||
, dict(name='div', attrs={'class':'article-content'})
|
||||
]
|
||||
|
||||
remove_tags = [
|
||||
dict(name='div', attrs={'class':['sharemecompactbutton']})
|
||||
,dict(name='div', attrs={'align':['left']})
|
||||
,dict(name='div', attrs={'align':['center']})
|
||||
,dict(name='th', attrs={'class':['pagenav_prev']})
|
||||
,dict(name='table', attrs={'class':['pagenav']})
|
||||
]
|
||||
|
||||
feeds = [
|
||||
(u'Feeds', u'http://www.chip.ro/index.php?option=com_ninjarsssyndicator&feed_id=9&format=raw')
|
||||
]
|
||||
|
||||
def preprocess_html(self, soup):
|
||||
return self.adeify_images(soup)
|
@ -1,35 +1,44 @@
|
||||
#!/usr/bin/env python
|
||||
__license__ = 'GPL 3'
|
||||
__copyright__ = 'zotzot'
|
||||
__copyright__ = 'zotzo'
|
||||
__docformat__ = 'restructuredtext en'
|
||||
|
||||
from calibre.web.feeds.news import BasicNewsRecipe
|
||||
|
||||
|
||||
class CreditSlips(BasicNewsRecipe):
|
||||
__license__ = 'GPL v3'
|
||||
__author__ = 'zotzot'
|
||||
language = 'en'
|
||||
version = 1
|
||||
__author__ = 'zotzot'
|
||||
version = 2
|
||||
title = u'Credit Slips.org'
|
||||
publisher = u'Bankr-L'
|
||||
category = u'Economic blog'
|
||||
description = u'All things about credit.'
|
||||
cover_url = 'http://bit.ly/hyZSTr'
|
||||
oldest_article = 50
|
||||
description = u'A discussion on credit and bankruptcy'
|
||||
cover_url = 'http://bit.ly/eAKNCB'
|
||||
oldest_article = 15
|
||||
max_articles_per_feed = 100
|
||||
use_embedded_content = True
|
||||
no_stylesheets = True
|
||||
remove_javascript = True
|
||||
|
||||
conversion_options = {
|
||||
'comments': description,
|
||||
'tags': category,
|
||||
'language': 'en',
|
||||
'publisher': publisher,
|
||||
}
|
||||
|
||||
feeds = [
|
||||
(u'Credit Slips', u'http://www.creditslips.org/creditslips/atom.xml')
|
||||
]
|
||||
conversion_options = {
|
||||
'comments': description,
|
||||
'tags': category,
|
||||
'language': 'en',
|
||||
'publisher': publisher
|
||||
}
|
||||
extra_css = '''
|
||||
body{font-family:verdana,arial,helvetica,geneva,sans-serif;}
|
||||
img {float: left; margin-right: 0.5em;}
|
||||
'''
|
||||
(u'Credit Slips', u'http://www.creditslips.org/creditslips/atom.xml')
|
||||
]
|
||||
|
||||
extra_css = '''
|
||||
.author {font-family:Helvetica,sans-serif; font-weight:normal;font-size:small;}
|
||||
h1 {font-family:Arial,Helvetica,sans-serif; font-weight:bold;font-size:large;}
|
||||
p {font-family:Helvetica,Arial,sans-serif;font-size:small;}
|
||||
body {font-family:Helvetica,Arial,sans-serif;font-size:small;}
|
||||
'''
|
||||
|
||||
def populate_article_metadata(self, article, soup, first):
|
||||
h2 = soup.find('h2')
|
||||
h2.replaceWith(h2.prettify() + '<p><em>Posted by ' + article.author + '</em></p>')
|
||||
|
52
resources/recipes/csid.recipe
Normal file
@ -0,0 +1,52 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
#!/usr/bin/env python
|
||||
|
||||
__license__ = 'GPL v3'
|
||||
__copyright__ = u'2011, Silviu Cotoar\u0103'
|
||||
'''
|
||||
csid.ro
|
||||
'''
|
||||
|
||||
from calibre.web.feeds.news import BasicNewsRecipe
|
||||
|
||||
class CSID(BasicNewsRecipe):
|
||||
title = u'Ce se \u00eent\u00e2mpl\u0103 doctore?'
|
||||
__author__ = u'Silviu Cotoar\u0103'
|
||||
description = u'Ce se \u00eent\u00e2mpl\u0103 doctore?'
|
||||
publisher = 'CSID'
|
||||
oldest_article = 5
|
||||
language = 'ro'
|
||||
max_articles_per_feed = 100
|
||||
no_stylesheets = True
|
||||
use_embedded_content = False
|
||||
category = 'Ziare,Reviste,Femei,Health,Beauty'
|
||||
encoding = 'utf-8'
|
||||
cover_url = 'http://www.csid.ro/images/default/csid.gif'
|
||||
|
||||
conversion_options = {
|
||||
'comments' : description
|
||||
,'tags' : category
|
||||
,'language' : language
|
||||
,'publisher' : publisher
|
||||
}
|
||||
|
||||
keep_only_tags = [
|
||||
dict(name='div', attrs={'class':'content floatleft'})
|
||||
]
|
||||
|
||||
remove_tags = [
|
||||
dict(name='div', attrs={'id':['article_links']})
|
||||
, dict(name='div', attrs={'id':['tags']})
|
||||
, dict(name='p', attrs={'id':['tags']})
|
||||
]
|
||||
|
||||
remove_tags_after = [
|
||||
dict(name='p', attrs={'id':['tags']})
|
||||
]
|
||||
|
||||
feeds = [
|
||||
(u'Feeds', u'http://www.csid.ro/rss/')
|
||||
]
|
||||
|
||||
def preprocess_html(self, soup):
|
||||
return self.adeify_images(soup)
|
54
resources/recipes/curierulnational.recipe
Normal file
@ -0,0 +1,54 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
#!/usr/bin/env python
|
||||
|
||||
__license__ = 'GPL v3'
|
||||
__copyright__ = u'2011, Silviu Cotoar\u0103'
|
||||
'''
|
||||
curierulnational.ro
|
||||
'''
|
||||
|
||||
from calibre.web.feeds.news import BasicNewsRecipe
|
||||
|
||||
class CurierulNal(BasicNewsRecipe):
|
||||
title = u'Curierul Na\u0163ional'
|
||||
__author__ = u'Silviu Cotoar\u0103'
|
||||
description = ''
|
||||
publisher = 'Curierul Na\u0163ional'
|
||||
oldest_article = 5
|
||||
language = 'ro'
|
||||
max_articles_per_feed = 100
|
||||
no_stylesheets = True
|
||||
use_embedded_content = False
|
||||
category = 'Ziare,Stiri'
|
||||
encoding = 'utf-8'
|
||||
cover_url = 'http://www.curierulnational.ro/logo.gif'
|
||||
|
||||
conversion_options = {
|
||||
'comments' : description
|
||||
,'tags' : category
|
||||
,'language' : language
|
||||
,'publisher' : publisher
|
||||
}
|
||||
|
||||
keep_only_tags = [
|
||||
dict(name='div', attrs={'id':'col1'})
|
||||
, dict(name='img', attrs={'id':'placeholder'})
|
||||
]
|
||||
|
||||
remove_tags = [
|
||||
dict(name='p', attrs={'id':['alteArticole']})
|
||||
, dict(name='div', attrs={'id':['textSize']})
|
||||
, dict(name='ul', attrs={'class':['unit-rating']})
|
||||
, dict(name='div', attrs={'id':['comments']})
|
||||
]
|
||||
|
||||
remove_tags_after = [
|
||||
dict(name='ul', attrs={'class':'unit-rating'})
|
||||
]
|
||||
|
||||
feeds = [
|
||||
(u'Feeds', u'http://www.curierulnational.ro/feed.xml')
|
||||
]
|
||||
|
||||
def preprocess_html(self, soup):
|
||||
return self.adeify_images(soup)
|
@ -16,14 +16,9 @@ class Deadspin(BasicNewsRecipe):
|
||||
max_articles_per_feed = 100
|
||||
no_stylesheets = True
|
||||
encoding = 'utf-8'
|
||||
use_embedded_content = False
|
||||
use_embedded_content = True
|
||||
language = 'en'
|
||||
masthead_url = 'http://cache.gawkerassets.com/assets/deadspin.com/img/logo.png'
|
||||
extra_css = '''
|
||||
body{font-family: "Lucida Grande",Helvetica,Arial,sans-serif}
|
||||
img{margin-bottom: 1em}
|
||||
h1{font-family :Arial,Helvetica,sans-serif; font-size:large}
|
||||
'''
|
||||
conversion_options = {
|
||||
'comment' : description
|
||||
, 'tags' : category
|
||||
@ -31,13 +26,11 @@ class Deadspin(BasicNewsRecipe):
|
||||
, 'language' : language
|
||||
}
|
||||
|
||||
remove_attributes = ['width','height']
|
||||
keep_only_tags = [dict(attrs={'class':'content permalink'})]
|
||||
remove_tags_before = dict(name='h1')
|
||||
remove_tags = [dict(attrs={'class':'contactinfo'})]
|
||||
remove_tags_after = dict(attrs={'class':'contactinfo'})
|
||||
remove_tags = [
|
||||
{'class': 'feedflare'},
|
||||
]
|
||||
|
||||
feeds = [(u'Articles', u'http://feeds.gawker.com/deadspin/full')]
|
||||
feeds = [(u'Articles', u'http://feeds.gawker.com/deadspin/vip?format=xml')]
|
||||
|
||||
def preprocess_html(self, soup):
|
||||
return self.adeify_images(soup)
|
||||
|
57
resources/recipes/descopera.recipe
Normal file
@ -0,0 +1,57 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
#!/usr/bin/env python
|
||||
|
||||
__license__ = 'GPL v3'
|
||||
__copyright__ = u'2011, Silviu Cotoar\u0103'
|
||||
'''
|
||||
descopera.ro
|
||||
'''
|
||||
|
||||
from calibre.web.feeds.news import BasicNewsRecipe
|
||||
|
||||
class Descopera(BasicNewsRecipe):
|
||||
title = u'Descoper\u0103'
|
||||
__author__ = u'Silviu Cotoar\u0103'
|
||||
description = 'E lumea ta'
|
||||
publisher = 'Descopera'
|
||||
oldest_article = 5
|
||||
language = 'ro'
|
||||
max_articles_per_feed = 100
|
||||
no_stylesheets = True
|
||||
use_embedded_content = False
|
||||
category = 'Ziare,Reviste,Descopera'
|
||||
encoding = 'utf-8'
|
||||
cover_url = 'http://www.descopera.ro/images/header_images/logo.gif'
|
||||
|
||||
conversion_options = {
|
||||
'comments' : description
|
||||
,'tags' : category
|
||||
,'language' : language
|
||||
,'publisher' : publisher
|
||||
}
|
||||
|
||||
|
||||
keep_only_tags = [
|
||||
dict(name='h1', attrs={'style':'font-family: Arial,Helvetica,sans-serif; font-size: 18px; color: rgb(51, 51, 51); font-weight: bold; margin: 10px 0pt; clear: both; float: left;width: 610px;'})
|
||||
,dict(name='div', attrs={'style':'margin-right: 15px; margin-bottom: 15px; float: left;'})
|
||||
, dict(name='p', attrs={'id':'itemDescription'})
|
||||
,dict(name='div', attrs={'id':'itemBody'})
|
||||
]
|
||||
|
||||
remove_tags = [
|
||||
dict(name='div', attrs={'class':['tools']})
|
||||
, dict(name='div', attrs={'class':['share']})
|
||||
, dict(name='div', attrs={'class':['category']})
|
||||
, dict(name='div', attrs={'id':['comments']})
|
||||
]
|
||||
|
||||
remove_tags_after = [
|
||||
dict(name='div', attrs={'id':'comments'})
|
||||
]
|
||||
|
||||
feeds = [
|
||||
(u'Feeds', u'http://www.descopera.ro/rss')
|
||||
]
|
||||
|
||||
def preprocess_html(self, soup):
|
||||
return self.adeify_images(soup)
|
27
resources/recipes/dotpod.recipe
Normal file
@ -0,0 +1,27 @@
|
||||
__license__ = 'GPL v3'
|
||||
__copyright__ = '2011-2011, Federico Escalada <fedeescalada at gmail.com>'
|
||||
|
||||
from calibre.web.feeds.news import BasicNewsRecipe
|
||||
|
||||
class Dotpod(BasicNewsRecipe):
|
||||
__author__ = 'Federico Escalada'
|
||||
description = 'Tecnologia y Comunicacion Audiovisual'
|
||||
encoding = 'utf-8'
|
||||
language = 'es'
|
||||
max_articles_per_feed = 100
|
||||
no_stylesheets = True
|
||||
oldest_article = 7
|
||||
publication_type = 'blog'
|
||||
title = 'Dotpod'
|
||||
authors = 'Federico Picone'
|
||||
|
||||
conversion_options = {
|
||||
'authors' : authors
|
||||
,'comments' : description
|
||||
,'language' : language
|
||||
}
|
||||
|
||||
feeds = [('Dotpod', 'http://www.dotpod.com.ar/feed/')]
|
||||
|
||||
remove_tags = [dict(name='div', attrs={'class':'feedflare'})]
|
||||
|
55
resources/recipes/ecuisine.recipe
Normal file
@ -0,0 +1,55 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
#!/usr/bin/env python
|
||||
|
||||
__license__ = 'GPL v3'
|
||||
__copyright__ = u'2011, Silviu Cotoar\u0103'
|
||||
'''
|
||||
ecuisine.ro
|
||||
'''
|
||||
|
||||
from calibre.web.feeds.news import BasicNewsRecipe
|
||||
|
||||
class EcuisineRo(BasicNewsRecipe):
|
||||
title = u'eCuisine'
|
||||
__author__ = u'Silviu Cotoar\u0103'
|
||||
description = u'Reinventeaz\u0103 pl\u0103cerea de a g\u0103ti'
|
||||
publisher = 'eCuisine'
|
||||
oldest_article = 5
|
||||
language = 'ro'
|
||||
max_articles_per_feed = 100
|
||||
no_stylesheets = True
|
||||
use_embedded_content = False
|
||||
category = 'Ziare,Retete,Bucatarie'
|
||||
encoding = 'utf-8'
|
||||
cover_url = ''
|
||||
|
||||
conversion_options = {
|
||||
'comments' : description
|
||||
,'tags' : category
|
||||
,'language' : language
|
||||
,'publisher' : publisher
|
||||
}
|
||||
|
||||
keep_only_tags = [
|
||||
dict(name='div', attrs={'class':'page-title'})
|
||||
, dict(name='div', attrs={'class':'content clearfix'})
|
||||
]
|
||||
|
||||
remove_tags = [
|
||||
dict(name='ul', attrs={'id':['recipe-tabs']})
|
||||
, dict(name='div', attrs={'class':['recipe-body-rating clearfix']})
|
||||
, dict(name='div', attrs={'class':['recipe-body-flags']})
|
||||
, dict(name='div', attrs={'id':['tweetmeme_button']})
|
||||
, dict(name='div', attrs={'class':['fbshare']})
|
||||
, dict(name='a', attrs={'class':['button-rounded']})
|
||||
, dict(name='div', attrs={'class':['recipe-body-related']})
|
||||
, dict(name='div', attrs={'class':['fbshare']})
|
||||
, dict(name='div', attrs={'class':['link-wrapper']})
|
||||
]
|
||||
|
||||
feeds = [
|
||||
(u'Feeds', u'http://www.ecuisine.ro/rss')
|
||||
]
|
||||
|
||||
def preprocess_html(self, soup):
|
||||
return self.adeify_images(soup)
|
43
resources/recipes/egirl.recipe
Normal file
@ -0,0 +1,43 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
#!/usr/bin/env python
|
||||
|
||||
__license__ = 'GPL v3'
|
||||
__copyright__ = u'2011, Silviu Cotoar\u0103'
|
||||
'''
|
||||
egirl.ro
|
||||
'''
|
||||
|
||||
from calibre.web.feeds.news import BasicNewsRecipe
|
||||
|
||||
class EgirlRo(BasicNewsRecipe):
|
||||
title = u'egirl'
|
||||
__author__ = u'Silviu Cotoar\u0103'
|
||||
description = u'Necesar pentru tine'
|
||||
publisher = u'egirl'
|
||||
oldest_article = 5
|
||||
language = 'ro'
|
||||
max_articles_per_feed = 100
|
||||
no_stylesheets = True
|
||||
use_embedded_content = False
|
||||
category = 'Ziare,Reviste,Femei'
|
||||
encoding = 'utf-8'
|
||||
cover_url = 'http://www.egirl.ro/images/egirlNou/logo_egirl.gif'
|
||||
|
||||
conversion_options = {
|
||||
'comments' : description
|
||||
,'tags' : category
|
||||
,'language' : language
|
||||
,'publisher' : publisher
|
||||
}
|
||||
|
||||
keep_only_tags = [
|
||||
dict(name='div', attrs={'id':'title_art'})
|
||||
, dict(name='div', attrs={'class':'content_style'})
|
||||
]
|
||||
|
||||
feeds = [
|
||||
(u'Feeds', u'http://www.egirl.ro/rss/egirl.xml')
|
||||
]
|
||||
|
||||
def preprocess_html(self, soup):
|
||||
return self.adeify_images(soup)
|
@ -1,6 +1,6 @@
|
||||
#!/usr/bin/env python
|
||||
__license__ = 'GPL 3'
|
||||
__copyright__ = 'zotzot'
|
||||
__copyright__ = 'zotzo'
|
||||
__docformat__ = 'restructuredtext en'
|
||||
'''
|
||||
http://www.epltalk.com
|
||||
@ -9,10 +9,9 @@ from calibre.web.feeds.news import BasicNewsRecipe
|
||||
|
||||
|
||||
class EPLTalkRecipe(BasicNewsRecipe):
|
||||
__license__ = 'GPL v3'
|
||||
__author__ = u'The Gaffer'
|
||||
language = 'en'
|
||||
version = 1
|
||||
version = 2
|
||||
__author__ = 'rylsfan'
|
||||
|
||||
title = u'EPL Talk'
|
||||
publisher = u'The Gaffer'
|
||||
@ -21,17 +20,40 @@ class EPLTalkRecipe(BasicNewsRecipe):
|
||||
description = u'News and Analysis from the English Premier League'
|
||||
cover_url = 'http://bit.ly/hJxZPu'
|
||||
|
||||
oldest_article = 45
|
||||
max_articles_per_feed = 150
|
||||
oldest_article = 3
|
||||
max_articles_per_feed = 100
|
||||
use_embedded_content = True
|
||||
remove_javascript = True
|
||||
encoding = 'utf8'
|
||||
|
||||
remove_tags_after = [dict(name='div', attrs={'class':'pd-rating'})]
|
||||
conversion_options = {
|
||||
'comment' : description
|
||||
, 'tags' : category
|
||||
, 'publisher' : publisher
|
||||
, 'language' : language
|
||||
}
|
||||
|
||||
feeds = [(u'EPL Talk', u'http://feeds.feedburner.com/EPLTalk')]
|
||||
remove_tags = [
|
||||
{'class': 'feedflare'},
|
||||
{'class': 'tweetmeme_button'},
|
||||
{'class': 'eplrelated'},
|
||||
{'p': 'Related posts:<ol>'},
|
||||
]
|
||||
|
||||
def preprocess_html(self, soup):
|
||||
return self.adeify_images(soup)
|
||||
|
||||
feeds =[
|
||||
(u'EPL Talk', u'http://feeds.feedburner.com/EPLTalk'),
|
||||
(u'MLS Talk', u'http://feeds.feedburner.com/majorleaguesoccertalksite'),
|
||||
#(),
|
||||
#(),
|
||||
#(),
|
||||
]
|
||||
|
||||
extra_css = '''
|
||||
body{font-family:verdana,arial,helvetica,geneva,sans-serif;}
|
||||
img {float: left; margin-right: 0.5em;}
|
||||
'''
|
||||
h1{font-family:Arial,Helvetica,sans-serif; font-weight:bold;font-size:large;}
|
||||
h2{font-family:Arial,Helvetica,sans-serif; font-weight:normal;font-size:small;}
|
||||
p{font-family:Arial,Helvetica,sans-serif;font-size:small;}
|
||||
body{font-family:Helvetica,Arial,sans-serif;font-size:small;}
|
||||
'''
|
||||
|
@ -41,7 +41,8 @@ class ESPN(BasicNewsRecipe):
|
||||
'''
|
||||
|
||||
|
||||
feeds = [('Top Headlines', 'http://sports.espn.go.com/espn/rss/news'),
|
||||
feeds = [
|
||||
('Top Headlines', 'http://sports.espn.go.com/espn/rss/news'),
|
||||
'http://sports.espn.go.com/espn/rss/nfl/news',
|
||||
'http://sports.espn.go.com/espn/rss/nba/news',
|
||||
'http://sports.espn.go.com/espn/rss/mlb/news',
|
||||
@ -107,10 +108,11 @@ class ESPN(BasicNewsRecipe):
|
||||
if match and 'soccernet' not in url and 'bassmaster' not in url:
|
||||
return 'http://sports.espn.go.com/espn/print?'+match.group(1)+'&type=story'
|
||||
else:
|
||||
if match and 'soccernet' in url:
|
||||
splitlist = url.split("&", 5)
|
||||
newurl = 'http://soccernet.espn.go.com/print?'+match.group(1)+'&type=story' + '&' + str(splitlist[2] )
|
||||
return newurl
|
||||
if 'soccernet' in url:
|
||||
match = re.search(r'/id/(\d+)/', url)
|
||||
if match:
|
||||
return \
|
||||
'http://soccernet.espn.go.com/print?id=%s&type=story' % match.group(1)
|
||||
#else:
|
||||
# if 'bassmaster' in url:
|
||||
# return url
|
||||
|
53
resources/recipes/fhmro.recipe
Normal file
@ -0,0 +1,53 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
#!/usr/bin/env python
|
||||
|
||||
__license__ = 'GPL v3'
|
||||
__copyright__ = u'2011, Silviu Cotoar\u0103'
|
||||
'''
|
||||
fhm.ro
|
||||
'''
|
||||
|
||||
from calibre.web.feeds.news import BasicNewsRecipe
|
||||
|
||||
class FHMro(BasicNewsRecipe):
|
||||
title = u'FHM Ro'
|
||||
__author__ = u'Silviu Cotoar\u0103'
|
||||
description = u'Pentru c\u0103 noi putem'
|
||||
publisher = 'FHM'
|
||||
oldest_article = 5
|
||||
language = 'ro'
|
||||
max_articles_per_feed = 100
|
||||
no_stylesheets = True
|
||||
use_embedded_content = False
|
||||
category = 'Reviste'
|
||||
encoding = 'utf-8'
|
||||
cover_url = 'http://www.fhm.com/App_Resources/Images/Site/re-design/logo.gif'
|
||||
|
||||
conversion_options = {
|
||||
'comments' : description
|
||||
,'tags' : category
|
||||
,'language' : language
|
||||
,'publisher' : publisher
|
||||
}
|
||||
|
||||
keep_only_tags = [
|
||||
dict(name='div', attrs={'class':'contentMainTitle'})
|
||||
, dict(name='div', attrs={'class':'entry'})
|
||||
]
|
||||
|
||||
remove_tags_after = [
|
||||
dict(name='div', attrs={'class':['ratingblock ']})
|
||||
, dict(name='a', attrs={'rel':['tag']})
|
||||
]
|
||||
|
||||
remove_tags = [
|
||||
dict(name='div', attrs={'class':['ratingblock ']})
|
||||
, dict(name='div', attrs={'class':['socialize-containter']})
|
||||
]
|
||||
|
||||
feeds = [
|
||||
(u'Feeds', u'http://www.fhm.ro/feed')
|
||||
]
|
||||
|
||||
def preprocess_html(self, soup):
|
||||
return self.adeify_images(soup)
|
48
resources/recipes/flickr.recipe
Normal file
@ -0,0 +1,48 @@
|
||||
__license__ = 'GPL v3'
|
||||
__author__ = 'Ricardo Jurado'
|
||||
__copyright__ = 'Ricardo Jurado'
|
||||
__version__ = 'v0.1'
|
||||
__date__ = '22 February 2011'
|
||||
|
||||
'''
|
||||
http://blog.flickr.net/
|
||||
'''
|
||||
|
||||
|
||||
from calibre.web.feeds.news import BasicNewsRecipe
|
||||
|
||||
class AdvancedUserRecipe1297031650(BasicNewsRecipe):
|
||||
|
||||
title = u'Flickr Blog'
|
||||
masthead_url = 'http://flickrtheblog.files.wordpress.com/2008/11/flickblog_logo.gif'
|
||||
cover_url = 'http://flickrtheblog.files.wordpress.com/2008/11/flickblog_logo.gif'
|
||||
publisher = u''
|
||||
|
||||
__author__ = 'Ricardo Jurado'
|
||||
description = 'Pictures Blog'
|
||||
category = 'Blog,Pictures'
|
||||
|
||||
oldest_article = 120
|
||||
max_articles_per_feed = 10
|
||||
no_stylesheets = True
|
||||
use_embedded_content = False
|
||||
encoding = 'UTF-8'
|
||||
remove_javascript = True
|
||||
language = 'en'
|
||||
|
||||
extra_css = """
|
||||
p{text-align: justify; font-size: 100%}
|
||||
body{ text-align: left; font-size:100% }
|
||||
h2{font-family: sans-serif; font-size:130%; font-weight:bold; text-align: justify; }
|
||||
.published{font-family:Arial,Helvetica,sans-serif; font-size:80%; }
|
||||
.posted{font-family:Arial,Helvetica,sans-serif; font-size:80%; }
|
||||
"""
|
||||
|
||||
keep_only_tags = [
|
||||
dict(name='div', attrs={'class':'entry'})
|
||||
]
|
||||
|
||||
feeds = [
|
||||
(u'BLOG', u'http://feeds.feedburner.com/Flickrblog'),
|
||||
#(u'BLOG', u'http://blog.flickr.net/es/feed/atom/')
|
||||
]
|
47
resources/recipes/flickr_es.recipe
Normal file
@ -0,0 +1,47 @@
|
||||
__license__ = 'GPL v3'
|
||||
__author__ = 'Ricardo Jurado'
|
||||
__copyright__ = 'Ricardo Jurado'
|
||||
__version__ = 'v0.1'
|
||||
__date__ = '22 February 2011'
|
||||
|
||||
'''
|
||||
http://blog.flickr.net/
|
||||
'''
|
||||
|
||||
|
||||
from calibre.web.feeds.news import BasicNewsRecipe
|
||||
|
||||
class AdvancedUserRecipe1297031650(BasicNewsRecipe):
|
||||
|
||||
title = u'Flickr Blog'
|
||||
masthead_url = 'http://flickrtheblog.files.wordpress.com/2008/11/flickblog_logo.gif'
|
||||
cover_url = 'http://flickrtheblog.files.wordpress.com/2008/11/flickblog_logo.gif'
|
||||
publisher = u''
|
||||
|
||||
__author__ = 'Ricardo Jurado'
|
||||
description = 'Pictures Blog'
|
||||
category = 'Blog,Pictures'
|
||||
|
||||
oldest_article = 120
|
||||
max_articles_per_feed = 10
|
||||
no_stylesheets = True
|
||||
use_embedded_content = False
|
||||
encoding = 'UTF-8'
|
||||
remove_javascript = True
|
||||
language = 'es'
|
||||
|
||||
extra_css = """
|
||||
p{text-align: justify; font-size: 100%}
|
||||
body{ text-align: left; font-size:100% }
|
||||
h2{font-family: sans-serif; font-size:130%; font-weight:bold; text-align: justify; }
|
||||
.published{font-family:Arial,Helvetica,sans-serif; font-size:80%; }
|
||||
.posted{font-family:Arial,Helvetica,sans-serif; font-size:80%; }
|
||||
"""
|
||||
|
||||
keep_only_tags = [
|
||||
dict(name='div', attrs={'class':'entry'})
|
||||
]
|
||||
|
||||
feeds = [
|
||||
(u'BLOG', u'http://blog.flickr.net/es/feed/atom/')
|
||||
]
|
@ -16,14 +16,10 @@ class Gawker(BasicNewsRecipe):
|
||||
max_articles_per_feed = 100
|
||||
no_stylesheets = True
|
||||
encoding = 'utf-8'
|
||||
use_embedded_content = False
|
||||
use_embedded_content = True
|
||||
language = 'en'
|
||||
masthead_url = 'http://cache.gawkerassets.com/assets/gawker.com/img/logo.png'
|
||||
extra_css = '''
|
||||
body{font-family: "Lucida Grande",Helvetica,Arial,sans-serif}
|
||||
img{margin-bottom: 1em}
|
||||
h1{font-family :Arial,Helvetica,sans-serif; font-size:large}
|
||||
'''
|
||||
|
||||
conversion_options = {
|
||||
'comment' : description
|
||||
, 'tags' : category
|
||||
@ -31,13 +27,11 @@ class Gawker(BasicNewsRecipe):
|
||||
, 'language' : language
|
||||
}
|
||||
|
||||
remove_attributes = ['width','height']
|
||||
keep_only_tags = [dict(attrs={'class':'content permalink'})]
|
||||
remove_tags_before = dict(name='h1')
|
||||
remove_tags = [dict(attrs={'class':'contactinfo'})]
|
||||
remove_tags_after = dict(attrs={'class':'contactinfo'})
|
||||
remove_tags = [
|
||||
{'class': 'feedflare'},
|
||||
]
|
||||
|
||||
feeds = [(u'Articles', u'http://feeds.gawker.com/gawker/full')]
|
||||
feeds = [(u'Articles', u'http://feeds.gawker.com/gawker/vip?format=xml')]
|
||||
|
||||
def preprocess_html(self, soup):
|
||||
return self.adeify_images(soup)
|
||||
|
@ -17,10 +17,9 @@ class Gizmodo(BasicNewsRecipe):
|
||||
max_articles_per_feed = 100
|
||||
no_stylesheets = True
|
||||
encoding = 'utf-8'
|
||||
use_embedded_content = False
|
||||
use_embedded_content = True
|
||||
language = 'en'
|
||||
masthead_url = 'http://cache.gawkerassets.com/assets/gizmodo.com/img/logo.png'
|
||||
extra_css = ' body{font-family: "Lucida Grande",Helvetica,Arial,sans-serif} img{margin-bottom: 1em} '
|
||||
|
||||
conversion_options = {
|
||||
'comment' : description
|
||||
@ -29,13 +28,12 @@ class Gizmodo(BasicNewsRecipe):
|
||||
, 'language' : language
|
||||
}
|
||||
|
||||
remove_attributes = ['width','height']
|
||||
keep_only_tags = [dict(attrs={'class':'content permalink'})]
|
||||
remove_tags_before = dict(name='h1')
|
||||
remove_tags = [dict(attrs={'class':'contactinfo'})]
|
||||
remove_tags_after = dict(attrs={'class':'contactinfo'})
|
||||
feeds = [(u'Articles', u'http://feeds.gawker.com/gizmodo/vip?format=xml')]
|
||||
|
||||
remove_tags = [
|
||||
{'class': 'feedflare'},
|
||||
]
|
||||
|
||||
feeds = [(u'Articles', u'http://feeds.gawker.com/gizmodo/full')]
|
||||
|
||||
def preprocess_html(self, soup):
|
||||
return self.adeify_images(soup)
|
||||
|
48
resources/recipes/go4it.recipe
Normal file
@ -0,0 +1,48 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
#!/usr/bin/env python
|
||||
|
||||
__license__ = 'GPL v3'
|
||||
__copyright__ = u'2011, Silviu Cotoar\u0103'
|
||||
'''
|
||||
go4it.ro
|
||||
'''
|
||||
|
||||
from calibre.web.feeds.news import BasicNewsRecipe
|
||||
|
||||
class Go4ITro(BasicNewsRecipe):
|
||||
title = u'go4it'
|
||||
__author__ = u'Silviu Cotoar\u0103'
|
||||
description = 'Gadgeturi, Lifestyle, Tehnologie'
|
||||
publisher = 'go4it'
|
||||
oldest_article = 5
|
||||
language = 'ro'
|
||||
max_articles_per_feed = 100
|
||||
no_stylesheets = True
|
||||
use_embedded_content = False
|
||||
category = 'Reviste,Ziare,IT'
|
||||
encoding = 'utf-8'
|
||||
cover_url = 'http://www.go4it.ro/images/logo.png'
|
||||
|
||||
conversion_options = {
|
||||
'comments' : description
|
||||
,'tags' : category
|
||||
,'language' : language
|
||||
,'publisher' : publisher
|
||||
}
|
||||
|
||||
keep_only_tags = [
|
||||
dict(name='div', attrs={'class':'subTitle clearfix'})
|
||||
, dict(name='div', attrs={'class':'story'})
|
||||
]
|
||||
|
||||
remove_tags = [
|
||||
dict(name='span', attrs={'class':['data']})
|
||||
, dict(name='a', attrs={'class':['comments']})
|
||||
]
|
||||
|
||||
feeds = [
|
||||
(u'Feeds', u'http://feeds2.feedburner.com/Go4itro-Stiri')
|
||||
]
|
||||
|
||||
def preprocess_html(self, soup):
|
||||
return self.adeify_images(soup)
|
@ -1,20 +1,43 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
#!/usr/bin/env python
|
||||
|
||||
__license__ = 'GPL v3'
|
||||
__copyright__ = u'2011, Silviu Cotoar\u0103'
|
||||
'''
|
||||
gsp.ro
|
||||
'''
|
||||
|
||||
from calibre.web.feeds.news import BasicNewsRecipe
|
||||
|
||||
class AdvancedUserRecipe1286351181(BasicNewsRecipe):
|
||||
title = u'gsp.ro'
|
||||
__author__ = 'bucsie'
|
||||
oldest_article = 2
|
||||
class GSP(BasicNewsRecipe):
|
||||
title = u'Gazeta Sporturilor'
|
||||
language = 'ro'
|
||||
__author__ = u'Silviu Cotoar\u0103'
|
||||
description = u'Gazeta Sporturilor'
|
||||
publisher = u'Gazeta Sporturilor'
|
||||
category = 'Ziare,Sport,Stiri,Romania'
|
||||
oldest_article = 5
|
||||
max_articles_per_feed = 100
|
||||
language='ro'
|
||||
cover_url ='http://www.gsp.ro/images/sigla_rosu.jpg'
|
||||
no_stylesheets = True
|
||||
use_embedded_content = False
|
||||
encoding = 'utf-8'
|
||||
remove_javascript = True
|
||||
cover_url = 'http://www.gsp.ro/images/logo.jpg'
|
||||
|
||||
remove_tags = [
|
||||
dict(name='div', attrs={'class':['related_articles', 'articol_noteaza straight_line dotted_line_top', 'comentarii','mai_multe_articole']}),
|
||||
dict(name='div', attrs={'id':'icons'})
|
||||
]
|
||||
remove_tags_after = dict(name='div', attrs={'id':'adoceanintactrovccmgpmnyt'})
|
||||
conversion_options = {
|
||||
'comments' : description
|
||||
,'tags' : category
|
||||
,'language' : language
|
||||
,'publisher' : publisher
|
||||
}
|
||||
|
||||
feeds = [(u'toate stirile', u'http://www.gsp.ro/index.php?section=section&screen=rss')]
|
||||
keep_only_tags = [ dict(name='h1', attrs={'class':'serif title_2'})
|
||||
,dict(name='div', attrs={'id':'only_text'})
|
||||
,dict(name='span', attrs={'class':'block poza_principala'})
|
||||
]
|
||||
|
||||
feeds = [ (u'\u0218tiri', u'http://www.gsp.ro/rss.xml') ]
|
||||
|
||||
def preprocess_html(self, soup):
|
||||
return self.adeify_images(soup)
|
||||
|
||||
def print_version(self, url):
|
||||
return 'http://www1.gsp.ro/print/' + url[(url.rindex('/')+1):]
|
||||
|
31
resources/recipes/helsingin_sanomat.recipe
Normal file
@ -0,0 +1,31 @@
|
||||
from calibre.web.feeds.news import BasicNewsRecipe
|
||||
|
||||
class AdvancedUserRecipe1298137661(BasicNewsRecipe):
|
||||
title = u'Helsingin Sanomat'
|
||||
__author__ = 'oneillpt'
|
||||
language = 'fi'
|
||||
oldest_article = 7
|
||||
max_articles_per_feed = 100
|
||||
no_stylesheets = True
|
||||
remove_javascript = True
|
||||
conversion_options = {
|
||||
'linearize_tables' : True
|
||||
}
|
||||
remove_tags = [
|
||||
dict(name='a', attrs={'id':'articleCommentUrl'}),
|
||||
dict(name='p', attrs={'class':'newsSummary'}),
|
||||
dict(name='div', attrs={'class':'headerTools'})
|
||||
]
|
||||
|
||||
feeds = [(u'Uutiset - HS.fi', u'http://www.hs.fi/uutiset/rss/'), (u'Politiikka - HS.fi', u'http://www.hs.fi/politiikka/rss/'),
|
||||
(u'Ulkomaat - HS.fi', u'http://www.hs.fi/ulkomaat/rss/'), (u'Kulttuuri - HS.fi', u'http://www.hs.fi/kulttuuri/rss/'),
|
||||
(u'Kirjat - HS.fi', u'http://www.hs.fi/kulttuuri/kirjat/rss/'), (u'Elokuvat - HS.fi', u'http://www.hs.fi/kulttuuri/elokuvat/rss/')
|
||||
]
|
||||
|
||||
def print_version(self, url):
|
||||
j = url.rfind("/")
|
||||
s = url[j:]
|
||||
i = s.rfind("?ref=rss")
|
||||
if i > 0:
|
||||
s = s[:i]
|
||||
return "http://www.hs.fi/tulosta" + s
|
51
resources/recipes/historiaro.recipe
Normal file
@ -0,0 +1,51 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
#!/usr/bin/env python
|
||||
|
||||
__license__ = 'GPL v3'
|
||||
__copyright__ = u'2011, Silviu Cotoar\u0103'
|
||||
'''
|
||||
historia.ro
|
||||
'''
|
||||
|
||||
from calibre.web.feeds.news import BasicNewsRecipe
|
||||
|
||||
class HistoriaRo(BasicNewsRecipe):
|
||||
title = u'Historia'
|
||||
__author__ = u'Silviu Cotoar\u0103'
|
||||
description = ''
|
||||
publisher = 'Historia'
|
||||
oldest_article = 5
|
||||
language = 'ro'
|
||||
max_articles_per_feed = 100
|
||||
no_stylesheets = True
|
||||
use_embedded_content = False
|
||||
category = 'Ziare,Reviste,Istorie'
|
||||
encoding = 'utf-8'
|
||||
cover_url = 'http://www.historia.ro/sites/all/themes/historia/images/historia.png'
|
||||
|
||||
conversion_options = {
|
||||
'comments' : description
|
||||
,'tags' : category
|
||||
,'language' : language
|
||||
,'publisher' : publisher
|
||||
}
|
||||
|
||||
keep_only_tags = [
|
||||
dict(name='div', attrs={'class':'c_antet_title'})
|
||||
, dict(name='a', attrs={'class':'overlaybox'})
|
||||
, dict(name='div', attrs={'class':'art_content'})
|
||||
]
|
||||
|
||||
remove_tags = [
|
||||
dict(name='div', attrs={'class':['fl_left']})
|
||||
, dict(name='div', attrs={'id':['article_toolbar']})
|
||||
, dict(name='div', attrs={'class':['zoom_cont']})
|
||||
]
|
||||
|
||||
|
||||
feeds = [
|
||||
(u'Feeds', u'http://www.historia.ro/rss.xml')
|
||||
]
|
||||
|
||||
def preprocess_html(self, soup):
|
||||
return self.adeify_images(soup)
|
43
resources/recipes/hotcity.recipe
Normal file
@ -0,0 +1,43 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
#!/usr/bin/env python
|
||||
|
||||
__license__ = 'GPL v3'
|
||||
__copyright__ = u'2011, Silviu Cotoar\u0103'
|
||||
'''
|
||||
hotcity.ro
|
||||
'''
|
||||
|
||||
from calibre.web.feeds.news import BasicNewsRecipe
|
||||
|
||||
class HotcityRo(BasicNewsRecipe):
|
||||
title = u'Hotcity'
|
||||
__author__ = u'Silviu Cotoar\u0103'
|
||||
description = u'Cultura urban\u0103 feminin\u0103'
|
||||
publisher = 'Hotcity'
|
||||
oldest_article = 5
|
||||
language = 'ro'
|
||||
max_articles_per_feed = 100
|
||||
no_stylesheets = True
|
||||
use_embedded_content = False
|
||||
category = 'Ziare,Reviste'
|
||||
encoding = 'utf-8'
|
||||
cover_url = 'http://www.hotcity.ro/i/bg_header.gif'
|
||||
|
||||
conversion_options = {
|
||||
'comments' : description
|
||||
,'tags' : category
|
||||
,'language' : language
|
||||
,'publisher' : publisher
|
||||
}
|
||||
|
||||
keep_only_tags = [
|
||||
dict(name='div', attrs={'class':'articol_title'})
|
||||
, dict(name='div', attrs={'class':'text'})
|
||||
]
|
||||
|
||||
feeds = [
|
||||
(u'Feeds', u'http://www.hotcity.ro/rss')
|
||||
]
|
||||
|
||||
def preprocess_html(self, soup):
|
||||
return self.adeify_images(soup)
|
52
resources/recipes/intrefete.recipe
Normal file
@ -0,0 +1,52 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
#!/usr/bin/env python
|
||||
|
||||
__license__ = 'GPL v3'
|
||||
__copyright__ = u'2011, Silviu Cotoar\u0103'
|
||||
'''
|
||||
intrefete.ro
|
||||
'''
|
||||
|
||||
from calibre.web.feeds.news import BasicNewsRecipe
|
||||
|
||||
class Intrefete(BasicNewsRecipe):
|
||||
title = u'\u00centre fete'
|
||||
__author__ = u'Silviu Cotoar\u0103'
|
||||
description = u'Petrece ziua cu stil, afl\u0103 ce e nou \u00eentre fete'
|
||||
publisher = u'Intre fete'
|
||||
oldest_article = 5
|
||||
language = 'ro'
|
||||
max_articles_per_feed = 100
|
||||
no_stylesheets = True
|
||||
use_embedded_content = False
|
||||
category = 'Ziare,Reviste,Femei'
|
||||
encoding = 'utf-8'
|
||||
cover_url = 'http://storage0.dms.mpinteractiv.ro/media/2/1401/16788/5878693/5/logo.jpg?width=300'
|
||||
|
||||
conversion_options = {
|
||||
'comments' : description
|
||||
,'tags' : category
|
||||
,'language' : language
|
||||
,'publisher' : publisher
|
||||
}
|
||||
|
||||
keep_only_tags = [
|
||||
dict(name='div', attrs={'class':'article'})
|
||||
]
|
||||
|
||||
remove_tags = [
|
||||
dict(name='div', attrs={'class':['author']})
|
||||
, dict(name='div', attrs={'class':['tags']})
|
||||
, dict(name='iframe', attrs={'scrolling':['no']})
|
||||
]
|
||||
|
||||
remove_tags_after = [
|
||||
dict(name='iframe', attrs={'scrolling':['no']})
|
||||
]
|
||||
|
||||
feeds = [
|
||||
(u'Feeds', u'http://www.intrefete.ro/rss/')
|
||||
]
|
||||
|
||||
def preprocess_html(self, soup):
|
||||
return self.adeify_images(soup)
|
47
resources/recipes/kudika.recipe
Normal file
@ -0,0 +1,47 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
#!/usr/bin/env python
|
||||
|
||||
__license__ = 'GPL v3'
|
||||
__copyright__ = u'2011, Silviu Cotoar\u0103'
|
||||
'''
|
||||
kudika.ro
|
||||
'''
|
||||
|
||||
from calibre.web.feeds.news import BasicNewsRecipe
|
||||
|
||||
class Kudika(BasicNewsRecipe):
|
||||
title = u'Kudika'
|
||||
__author__ = u'Silviu Cotoar\u0103'
|
||||
description = u'Revist\u0103 pentru femei'
|
||||
publisher = 'Kudika'
|
||||
oldest_article = 5
|
||||
language = 'ro'
|
||||
max_articles_per_feed = 100
|
||||
no_stylesheets = True
|
||||
use_embedded_content = False
|
||||
category = 'Ziare,Reviste,Femei'
|
||||
encoding = 'utf-8'
|
||||
cover_url = 'http://img.kudika.ro/images/template/page-logo.png'
|
||||
|
||||
conversion_options = {
|
||||
'comments' : description
|
||||
,'tags' : category
|
||||
,'language' : language
|
||||
,'publisher' : publisher
|
||||
}
|
||||
|
||||
keep_only_tags = [
|
||||
dict(name='div', attrs={'class':'header_recommend_article'}),
|
||||
dict(name='div', attrs={'id':'intertext_women'})
|
||||
]
|
||||
|
||||
remove_tags = [
|
||||
dict(name='p', attrs={'class':['page_breadcrumbs']})
|
||||
, dict(name='div', attrs={'class':['standard']})
|
||||
, dict(name='div', attrs={'id':['recommend_allover']})
|
||||
]
|
||||
|
||||
feeds = [ (u'Feeds', u'http://www.kudika.ro/feed.xml') ]
|
||||
|
||||
def preprocess_html(self, soup):
|
||||
return self.adeify_images(soup)
|
@ -1,10 +1,15 @@
|
||||
__license__ = 'GPL v3'
|
||||
__copyright__ = '2011'
|
||||
'''
|
||||
lemonde.fr
|
||||
'''
|
||||
import re
|
||||
from calibre.web.feeds.recipes import BasicNewsRecipe
|
||||
|
||||
class LeMonde(BasicNewsRecipe):
|
||||
title = 'Le Monde'
|
||||
__author__ = 'veezh'
|
||||
description = u'Actualit\xe9s'
|
||||
description = 'Actualités'
|
||||
oldest_article = 1
|
||||
max_articles_per_feed = 100
|
||||
no_stylesheets = True
|
||||
@ -12,13 +17,27 @@ class LeMonde(BasicNewsRecipe):
|
||||
use_embedded_content = False
|
||||
encoding = 'cp1252'
|
||||
publisher = 'lemonde.fr'
|
||||
category = 'news, France, world'
|
||||
language = 'fr'
|
||||
#publication_type = 'newsportal'
|
||||
extra_css = '''
|
||||
h1{font-size:130%;}
|
||||
.ariane{font-size:xx-small;}
|
||||
.source{font-size:xx-small;}
|
||||
#.href{font-size:xx-small;}
|
||||
.LM_caption{color:#666666; font-size:x-small;}
|
||||
#.main-article-info{font-family:Arial,Helvetica,sans-serif;}
|
||||
#full-contents{font-size:small; font-family:Arial,Helvetica,sans-serif;font-weight:normal;}
|
||||
#match-stats-summary{font-size:small; font-family:Arial,Helvetica,sans-serif;font-weight:normal;}
|
||||
'''
|
||||
#preprocess_regexps = [(re.compile(r'<!--.*?-->', re.DOTALL), lambda m: '')]
|
||||
conversion_options = {
|
||||
'comments' : description
|
||||
,'language' : language
|
||||
,'publisher' : publisher
|
||||
,'linearize_tables': True
|
||||
}
|
||||
'comments' : description
|
||||
,'tags' : category
|
||||
,'language' : language
|
||||
,'publisher' : publisher
|
||||
,'linearize_tables': True
|
||||
}
|
||||
|
||||
remove_empty_feeds = True
|
||||
|
||||
@ -32,15 +51,28 @@ class LeMonde(BasicNewsRecipe):
|
||||
return soup
|
||||
|
||||
preprocess_regexps = [
|
||||
(re.compile(r'([0-9])%'), lambda m: m.group(1) + ' %'),
|
||||
(re.compile(r'([0-9])([0-9])([0-9]) ([0-9])([0-9])([0-9])'), lambda m: m.group(1) + m.group(2) + m.group(3) + ' ' + m.group(4) + m.group(5) + m.group(6)),
|
||||
(re.compile(r'([0-9]) ([0-9])([0-9])([0-9])'), lambda m: m.group(1) + ' ' + m.group(2) + m.group(3) + m.group(4)),
|
||||
(re.compile(r'<span>'), lambda match: ' <span>'),
|
||||
(re.compile(r'\("'), lambda match: '(« '),
|
||||
(re.compile(r'"\)'), lambda match: ' »)'),
|
||||
(re.compile(r'“'), lambda match: '(« '),
|
||||
(re.compile(r'”'), lambda match: ' »)'),
|
||||
(re.compile(r'>\''), lambda match: '>‘'),
|
||||
(re.compile(r' \''), lambda match: ' ‘'),
|
||||
(re.compile(r'\''), lambda match: '’'),
|
||||
(re.compile(r'"<'), lambda match: ' »<'),
|
||||
(re.compile(r'"<em>'), lambda match: '<em>« '),
|
||||
(re.compile(r'"<em>"</em><em>'), lambda match: '<em>« '),
|
||||
(re.compile(r'"<a href='), lambda match: '« <a href='),
|
||||
(re.compile(r'</em>"'), lambda match: ' »</em>'),
|
||||
(re.compile(r'</a>"'), lambda match: ' »</a>'),
|
||||
(re.compile(r'"</'), lambda match: ' »</'),
|
||||
(re.compile(r'>"'), lambda match: '>« '),
|
||||
(re.compile(r'"<'), lambda match: ' »<'),
|
||||
(re.compile(r'’"'), lambda match: '’« '),
|
||||
(re.compile(r' "'), lambda match: ' « '),
|
||||
(re.compile(r'" '), lambda match: ' » '),
|
||||
(re.compile(r'\("'), lambda match: '(« '),
|
||||
(re.compile(r'"\)'), lambda match: ' »)'),
|
||||
(re.compile(r'"\.'), lambda match: ' ».'),
|
||||
(re.compile(r'",'), lambda match: ' »,'),
|
||||
(re.compile(r'"\?'), lambda match: ' »?'),
|
||||
@ -56,8 +88,14 @@ class LeMonde(BasicNewsRecipe):
|
||||
(re.compile(r' %'), lambda match: ' %'),
|
||||
(re.compile(r'\.jpg » border='), lambda match: '.jpg'),
|
||||
(re.compile(r'\.png » border='), lambda match: '.png'),
|
||||
(re.compile(r' – '), lambda match: ' – '),
|
||||
(re.compile(r' – '), lambda match: ' – '),
|
||||
(re.compile(r' - '), lambda match: ' – '),
|
||||
(re.compile(r' -,'), lambda match: ' –,'),
|
||||
(re.compile(r'»:'), lambda match: '» :'),
|
||||
]
|
||||
|
||||
|
||||
keep_only_tags = [
|
||||
dict(name='div', attrs={'class':['contenu']})
|
||||
]
|
||||
@ -65,11 +103,15 @@ class LeMonde(BasicNewsRecipe):
|
||||
remove_tags_after = [dict(id='appel_temoignage')]
|
||||
|
||||
def get_article_url(self, article):
|
||||
link = article.get('link')
|
||||
if 'blog' not in link:
|
||||
return link
|
||||
|
||||
url = article.get('guid', None)
|
||||
if '/chat/' in url or '.blog' in url or '/video/' in url or '/sport/' in url or '/portfolio/' in url or '/visuel/' in url :
|
||||
url = None
|
||||
return url
|
||||
|
||||
# def get_article_url(self, article):
|
||||
# link = article.get('link')
|
||||
# if 'blog' not in link and ('chat' not in link):
|
||||
# return link
|
||||
|
||||
feeds = [
|
||||
('A la une', 'http://www.lemonde.fr/rss/une.xml'),
|
||||
@ -94,3 +136,4 @@ class LeMonde(BasicNewsRecipe):
|
||||
cover_url = link_item.img['src']
|
||||
|
||||
return cover_url
|
||||
|
||||
|
@ -1,40 +1,52 @@
|
||||
from calibre.web.feeds.news import BasicNewsRecipe
|
||||
|
||||
class AdvancedUserRecipe1292550626(BasicNewsRecipe):
|
||||
title = 'Leduc - Wetaskiwin Pipestone Flyer'
|
||||
__author__ = 'Brian Hahn'
|
||||
description = 'News from Alberta, Canada'
|
||||
oldest_article = 56
|
||||
max_articles_per_feed = 100
|
||||
no_stylesheets = True
|
||||
#delay = 1
|
||||
use_embedded_content = False
|
||||
publisher = 'Pipestone Publishing'
|
||||
category = 'News, Alberta, Canada'
|
||||
language = 'en_CA'
|
||||
encoding = 'iso-8859-1'
|
||||
cover_url = 'http://www.pipestoneflyer.ca/images/calibre-cover.jpg'
|
||||
remove_tags_before = dict(id='ContentPanel')
|
||||
remove_tags_after = dict(id='ContentPanel')
|
||||
remove_tags = [dict(name='div', attrs={'id':'StoryNav'}),dict(name='div', attrs={'id':'BottomAds'}),dict(name='div', attrs={'id':'MoreStoryLinks'})]
|
||||
extra_css = 'img { margin:5px }'
|
||||
feeds = [
|
||||
('Feature', 'http://www.pipestoneflyer.ca/Feature.rss'),
|
||||
('Editors Desk', 'http://www.pipestoneflyer.ca/Editor%27s%20Desk.rss'),
|
||||
('Letters', 'http://www.pipestoneflyer.ca/Letters.rss'),
|
||||
('A Loco Viewpoint', 'http://www.pipestoneflyer.ca/A%20Loco%20Viewpoint.rss'),
|
||||
('Lifes Doorway', 'http://www.pipestoneflyer.ca/Life%27s%20Doorway.rss'),
|
||||
('From the Otherside', 'http://www.pipestoneflyer.ca/From%20the%20Otherside.rss'),
|
||||
('Opinion', 'http://www.pipestoneflyer.ca/Opinion.rss'),
|
||||
('Community', 'http://www.pipestoneflyer.ca/Community.rss'),
|
||||
('Sports', 'http://www.pipestoneflyer.ca/Sports.rss'),
|
||||
('Chambers', 'http://www.pipestoneflyer.ca/Chambers.rss'),
|
||||
('Government', 'http://www.pipestoneflyer.ca/Government.rss'),
|
||||
('Environment', 'http://www.pipestoneflyer.ca/Environment.rss'),
|
||||
('Health', 'http://www.pipestoneflyer.ca/Health.rss'),
|
||||
('Funnies', 'http://www.pipestoneflyer.ca/Funnies.rss'),
|
||||
('Faith', 'http://www.pipestoneflyer.ca/Faith.rss'),
|
||||
('News and Views', 'http://www.pipestoneflyer.ca/News%20and%20Views.rss'),
|
||||
('Obituaries', 'http://www.pipestoneflyer.ca/Obituaries.rss'),
|
||||
('Police Blotter', 'http://www.pipestoneflyer.ca/Police%20Blotter.rss'),
|
||||
]
|
||||
title = 'Leduc - Wetaskiwin Pipestone Flyer'
|
||||
__author__ = 'Brian Hahn'
|
||||
description = '''Provides news from central Alberta, Canada. This is a
|
||||
weekly publication that provides coverage from the Cities of Leduc and
|
||||
Wetaskiwin, including news from two complete counties, plus the towns and
|
||||
villages within. The counties of Leduc and Wetaskiwin provide news
|
||||
coverage of agriculture, sports, government, family, events and opinion.
|
||||
This publication updated weekly every Thursday.'''
|
||||
oldest_article = 13
|
||||
max_articles_per_feed = 100
|
||||
no_stylesheets = True
|
||||
#delay = 1
|
||||
use_embedded_content = False
|
||||
publisher = 'Pipestone Publishing'
|
||||
category = 'News, Alberta, Canada'
|
||||
language = 'en_CA'
|
||||
encoding = 'iso-8859-1'
|
||||
cover_url = 'http://www.pipestoneflyer.ca/images/calibre-cover.jpg'
|
||||
remove_tags_before = dict(id='ContentPanel')
|
||||
remove_tags_after = dict(id='ContentPanel')
|
||||
remove_tags = [dict(name='div',
|
||||
attrs={'id':'StoryNav'}),dict(name='div',
|
||||
attrs={'id':'BottomAds'}),dict(name='div', attrs={'id':'MoreStoryLinks'})]
|
||||
extra_css = 'img { margin:5px }'
|
||||
feeds = [
|
||||
('Feature', 'http://www.pipestoneflyer.ca/Feature.rss'),
|
||||
('Editors Desk', 'http://www.pipestoneflyer.ca/Editor%27s%20Desk.rss'),
|
||||
('Letters', 'http://www.pipestoneflyer.ca/Letters.rss'),
|
||||
('A Loco Viewpoint',
|
||||
'http://www.pipestoneflyer.ca/A%20Loco%20Viewpoint.rss'),
|
||||
('Lifes Doorway', 'http://www.pipestoneflyer.ca/Life%27s%20Doorway.rss'),
|
||||
('From the Otherside',
|
||||
'http://www.pipestoneflyer.ca/From%20the%20Otherside.rss'),
|
||||
('Opinion', 'http://www.pipestoneflyer.ca/Opinion.rss'),
|
||||
('Community', 'http://www.pipestoneflyer.ca/Community.rss'),
|
||||
('Sports', 'http://www.pipestoneflyer.ca/Sports.rss'),
|
||||
('Chambers', 'http://www.pipestoneflyer.ca/Chambers.rss'),
|
||||
('Government', 'http://www.pipestoneflyer.ca/Government.rss'),
|
||||
('Travel ', 'http://www.pipestoneflyer.ca/Travel%20.rss'),
|
||||
('Environment', 'http://www.pipestoneflyer.ca/Environment.rss'),
|
||||
('Health', 'http://www.pipestoneflyer.ca/Health.rss'),
|
||||
('Funnies', 'http://www.pipestoneflyer.ca/Funnies.rss'),
|
||||
('Events', 'http://www.pipestoneflyer.ca/Events.rss'),
|
||||
('Faith', 'http://www.pipestoneflyer.ca/Faith.rss'),
|
||||
('News and Views', 'http://www.pipestoneflyer.ca/News%20and%20Views.rss'),
|
||||
('Obituaries', 'http://www.pipestoneflyer.ca/Obituaries.rss'),
|
||||
('Police Blotter', 'http://www.pipestoneflyer.ca/Police%20Blotter.rss'),
|
||||
('Careers', 'http://www.pipestoneflyer.ca/Careers.rss'),
|
||||
]
|
||||
|
@ -16,15 +16,9 @@ class Lifehacker(BasicNewsRecipe):
|
||||
max_articles_per_feed = 100
|
||||
no_stylesheets = True
|
||||
encoding = 'utf-8'
|
||||
use_embedded_content = False
|
||||
use_embedded_content = True
|
||||
language = 'en'
|
||||
masthead_url = 'http://cache.gawkerassets.com/assets/lifehacker.com/img/logo.png'
|
||||
extra_css = '''
|
||||
body{font-family: "Lucida Grande",Helvetica,Arial,sans-serif}
|
||||
img{margin-bottom: 1em}
|
||||
h1{font-family :Arial,Helvetica,sans-serif; font-size:large}
|
||||
h2{font-family :Arial,Helvetica,sans-serif; font-size:x-small}
|
||||
'''
|
||||
conversion_options = {
|
||||
'comment' : description
|
||||
, 'tags' : category
|
||||
@ -32,20 +26,12 @@ class Lifehacker(BasicNewsRecipe):
|
||||
, 'language' : language
|
||||
}
|
||||
|
||||
remove_attributes = ['width', 'height', 'style']
|
||||
remove_tags_before = dict(name='h1')
|
||||
keep_only_tags = [dict(id='container')]
|
||||
remove_tags_after = dict(attrs={'class':'post-body'})
|
||||
remove_tags = [
|
||||
dict(id="sharemenu"),
|
||||
{'class': 'related'},
|
||||
{'class': 'feedflare'},
|
||||
]
|
||||
|
||||
feeds = [(u'Articles', u'http://feeds.gawker.com/lifehacker/full')]
|
||||
feeds = [(u'Articles', u'http://feeds.gawker.com/lifehacker/vip?format=xml')]
|
||||
|
||||
def preprocess_html(self, soup):
|
||||
return self.adeify_images(soup)
|
||||
|
||||
def print_version(self, url):
|
||||
return url.replace('#!', '?_escaped_fragment_=')
|
||||
|
||||
|
104
resources/recipes/lwn_weekly.recipe
Normal file
@ -0,0 +1,104 @@
|
||||
#!/usr/bin/env python
|
||||
|
||||
__license__ = 'GPL v3'
|
||||
__copyright__ = '2011, Davide Cavalca <davide125 at tiscali.it>'
|
||||
'''
|
||||
lwn.net
|
||||
'''
|
||||
|
||||
from calibre.web.feeds.news import BasicNewsRecipe
|
||||
import re
|
||||
|
||||
class WeeklyLWN(BasicNewsRecipe):
|
||||
title = 'LWN.net Weekly Edition'
|
||||
description = 'Weekly summary of what has happened in the free software world.'
|
||||
__author__ = 'Davide Cavalca'
|
||||
language = 'en'
|
||||
|
||||
cover_url = 'http://lwn.net/images/lcorner.png'
|
||||
#masthead_url = 'http://lwn.net/images/lcorner.png'
|
||||
publication_type = 'magazine'
|
||||
|
||||
remove_tags_before = dict(attrs={'class':'PageHeadline'})
|
||||
remove_tags_after = dict(attrs={'class':'ArticleText'})
|
||||
remove_tags = [dict(name=['h2', 'form'])]
|
||||
|
||||
conversion_options = { 'linearize_tables' : True }
|
||||
|
||||
oldest_article = 7.0
|
||||
needs_subscription = 'optional'
|
||||
|
||||
def get_browser(self):
|
||||
br = BasicNewsRecipe.get_browser()
|
||||
if self.username is not None and self.password is not None:
|
||||
br.open('https://lwn.net/login')
|
||||
br.select_form(name='loginform')
|
||||
br['Username'] = self.username
|
||||
br['Password'] = self.password
|
||||
br.submit()
|
||||
return br
|
||||
|
||||
def parse_index(self):
|
||||
if self.username is not None and self.password is not None:
|
||||
index_url = 'http://lwn.net/current/bigpage'
|
||||
else:
|
||||
index_url = 'http://lwn.net/free/bigpage'
|
||||
soup = self.index_to_soup(index_url)
|
||||
body = soup.body
|
||||
|
||||
articles = {}
|
||||
ans = []
|
||||
url_re = re.compile('^http://lwn.net/Articles/')
|
||||
|
||||
while True:
|
||||
tag_title = body.findNext(name='p', attrs={'class':'SummaryHL'})
|
||||
if tag_title == None:
|
||||
break
|
||||
|
||||
tag_section = tag_title.findPrevious(name='p', attrs={'class':'Cat1HL'})
|
||||
if tag_section == None:
|
||||
section = 'Front Page'
|
||||
else:
|
||||
section = tag_section.string
|
||||
|
||||
tag_section2 = tag_title.findPrevious(name='p', attrs={'class':'Cat2HL'})
|
||||
if tag_section2 != None:
|
||||
if tag_section2.findPrevious(name='p', attrs={'class':'Cat1HL'}) == tag_section:
|
||||
section = "%s: %s" %(section, tag_section2.string)
|
||||
|
||||
if section not in articles.keys():
|
||||
articles[section] = []
|
||||
if section not in ans:
|
||||
ans.append(section)
|
||||
|
||||
body = tag_title
|
||||
while True:
|
||||
tag_url = body.findNext(name='a', attrs={'href':url_re})
|
||||
if tag_url == None:
|
||||
break
|
||||
body = tag_url
|
||||
if tag_url.string == None:
|
||||
continue
|
||||
elif tag_url.string == 'Full Story':
|
||||
break
|
||||
elif tag_url.string.startswith('Comments ('):
|
||||
break
|
||||
else:
|
||||
continue
|
||||
|
||||
if tag_url == None:
|
||||
break
|
||||
|
||||
article = dict(
|
||||
title=tag_title.string,
|
||||
url=tag_url['href'].split('#')[0],
|
||||
description='', content='', date='')
|
||||
articles[section].append(article)
|
||||
|
||||
ans = [(key, articles[key]) for key in ans if articles.has_key(key)]
|
||||
if not ans:
|
||||
raise Exception('Could not find any articles.')
|
||||
|
||||
return ans
|
||||
|
||||
# vim: expandtab:ts=4:sw=4
|
@ -11,7 +11,6 @@ http://www.macworld.co.uk/
|
||||
'''
|
||||
|
||||
from calibre.web.feeds.news import BasicNewsRecipe
|
||||
from calibre.ptempfile import PersistentTemporaryFile
|
||||
|
||||
temp_files = []
|
||||
articles_are_obfuscated = True
|
||||
@ -36,26 +35,17 @@ class macWorld(BasicNewsRecipe):
|
||||
remove_javascript = True
|
||||
no_stylesheets = True
|
||||
|
||||
def get_obfuscated_article(self, url):
|
||||
br = self.get_browser()
|
||||
br.open(url+'&print')
|
||||
|
||||
response = br.follow_link(url, nr = 0)
|
||||
html = response.read()
|
||||
|
||||
self.temp_files.append(PersistentTemporaryFile('_fa.html'))
|
||||
self.temp_files[-1].write(html)
|
||||
self.temp_files[-1].close()
|
||||
return self.temp_files[-1].name
|
||||
|
||||
keep_only_tags = [
|
||||
dict(name='div', attrs={'id':'article'})
|
||||
dict(name='div', attrs={'id':'content'})
|
||||
]
|
||||
|
||||
remove_tags = [
|
||||
dict(name='div', attrs={'class':['toolBar','mac_tags','toolBar btmTools','textAds']}),
|
||||
{'class':['toolBar','mac_tags','toolBar btmTools','textAds']},
|
||||
dict(name='p', attrs={'class':'breadcrumbs'}),
|
||||
dict(name='div', attrs={'id':['breadcrumb','sidebar','comments']})
|
||||
dict(id=['breadcrumb','sidebar','comments','topContentWrapper',
|
||||
'rightColumn', 'aboveFootPromo', 'storyCarousel']),
|
||||
{'class':lambda x: x and ('tools' in x or 'toolBar'
|
||||
in x)}
|
||||
|
||||
]
|
||||
|
||||
|
@ -1,7 +1,9 @@
|
||||
__license__ = 'GPL v3'
|
||||
__copyright__ = '2010, Eddie Lau'
|
||||
__copyright__ = '2010-2011, Eddie Lau'
|
||||
'''
|
||||
Change Log:
|
||||
2011/02/20: skip duplicated links in finance section, put photos which may extend a whole page to the back of the articles
|
||||
clean up the indentation
|
||||
2010/12/07: add entertainment section, use newspaper front page as ebook cover, suppress date display in section list
|
||||
(to avoid wrong date display in case the user generates the ebook in a time zone different from HKT)
|
||||
2010/11/22: add English section, remove eco-news section which is not updated daily, correct
|
||||
@ -18,21 +20,21 @@ from calibre.web.feeds.recipes import BasicNewsRecipe
|
||||
from contextlib import nested
|
||||
|
||||
|
||||
from calibre import __appname__
|
||||
from calibre.ebooks.BeautifulSoup import BeautifulSoup
|
||||
from calibre.ebooks.metadata.opf2 import OPFCreator
|
||||
from calibre.ebooks.metadata.toc import TOC
|
||||
from calibre.ebooks.metadata import MetaInformation
|
||||
|
||||
class MPHKRecipe(BasicNewsRecipe):
|
||||
IsKindleUsed = True # to avoid generating periodical in which CJK characters can't be displayed in section/article view
|
||||
|
||||
IsCJKWellSupported = True # Set to False to avoid generating periodical in which CJK characters can't be displayed in section/article view
|
||||
title = 'Ming Pao - Hong Kong'
|
||||
oldest_article = 1
|
||||
max_articles_per_feed = 100
|
||||
__author__ = 'Eddie Lau'
|
||||
description = 'Hong Kong Chinese Newspaper'
|
||||
publisher = 'news.mingpao.com'
|
||||
description = ('Hong Kong Chinese Newspaper (http://news.mingpao.com). If'
|
||||
'you are using a Kindle with firmware < 3.1, customize the'
|
||||
'recipe')
|
||||
publisher = 'MingPao'
|
||||
category = 'Chinese, News, Hong Kong'
|
||||
remove_javascript = True
|
||||
use_embedded_content = False
|
||||
@ -46,19 +48,20 @@ class MPHKRecipe(BasicNewsRecipe):
|
||||
masthead_url = 'http://news.mingpao.com/image/portals_top_logo_news.gif'
|
||||
keep_only_tags = [dict(name='h1'),
|
||||
dict(name='font', attrs={'style':['font-size:14pt; line-height:160%;']}), # for entertainment page title
|
||||
dict(attrs={'class':['photo']}),
|
||||
dict(attrs={'id':['newscontent']}), # entertainment page content
|
||||
dict(attrs={'id':['newscontent01','newscontent02']})]
|
||||
dict(attrs={'id':['newscontent01','newscontent02']}),
|
||||
dict(attrs={'class':['photo']})
|
||||
]
|
||||
remove_tags = [dict(name='style'),
|
||||
dict(attrs={'id':['newscontent135']})] # for the finance page
|
||||
remove_attributes = ['width']
|
||||
preprocess_regexps = [
|
||||
(re.compile(r'<h5>', re.DOTALL|re.IGNORECASE),
|
||||
lambda match: '<h1>'),
|
||||
(re.compile(r'</h5>', re.DOTALL|re.IGNORECASE),
|
||||
lambda match: '</h1>'),
|
||||
(re.compile(r'<p><a href=.+?</a></p>', re.DOTALL|re.IGNORECASE), # for entertainment page
|
||||
lambda match: '')
|
||||
(re.compile(r'<h5>', re.DOTALL|re.IGNORECASE),
|
||||
lambda match: '<h1>'),
|
||||
(re.compile(r'</h5>', re.DOTALL|re.IGNORECASE),
|
||||
lambda match: '</h1>'),
|
||||
(re.compile(r'<p><a href=.+?</a></p>', re.DOTALL|re.IGNORECASE), # for entertainment page
|
||||
lambda match: '')
|
||||
]
|
||||
|
||||
def image_url_processor(cls, baseurl, url):
|
||||
@ -107,6 +110,9 @@ class MPHKRecipe(BasicNewsRecipe):
|
||||
def get_fetchdate(self):
|
||||
return self.get_dtlocal().strftime("%Y%m%d")
|
||||
|
||||
def get_fetchformatteddate(self):
|
||||
return self.get_dtlocal().strftime("%Y-%m-%d")
|
||||
|
||||
def get_fetchday(self):
|
||||
# convert UTC to local hk time - at around HKT 6.00am, all news are available
|
||||
return self.get_dtlocal().strftime("%d")
|
||||
@ -121,84 +127,66 @@ class MPHKRecipe(BasicNewsRecipe):
|
||||
return cover
|
||||
|
||||
def parse_index(self):
|
||||
feeds = []
|
||||
dateStr = self.get_fetchdate()
|
||||
for title, url in [(u'\u8981\u805e Headline', 'http://news.mingpao.com/' + dateStr + '/gaindex.htm'),
|
||||
(u'\u6559\u80b2 Education', 'http://news.mingpao.com/' + dateStr + '/gfindex.htm'),
|
||||
(u'\u6e2f\u805e Local', 'http://news.mingpao.com/' + dateStr + '/gbindex.htm'),
|
||||
(u'\u793e\u8a55\u2027\u7b46\u9663 Editorial', 'http://news.mingpao.com/' + dateStr + '/mrindex.htm'),
|
||||
(u'\u8ad6\u58c7 Forum', 'http://news.mingpao.com/' + dateStr + '/faindex.htm'),
|
||||
(u'\u4e2d\u570b China', 'http://news.mingpao.com/' + dateStr + '/caindex.htm'),
|
||||
(u'\u570b\u969b World', 'http://news.mingpao.com/' + dateStr + '/taindex.htm'),
|
||||
('Tech News', 'http://news.mingpao.com/' + dateStr + '/naindex.htm'),
|
||||
(u'\u9ad4\u80b2 Sport', 'http://news.mingpao.com/' + dateStr + '/spindex.htm'),
|
||||
(u'\u526f\u520a Supplement', 'http://news.mingpao.com/' + dateStr + '/jaindex.htm'),
|
||||
(u'\u82f1\u6587 English', 'http://news.mingpao.com/' + dateStr + '/emindex.htm')]:
|
||||
articles = self.parse_section(url)
|
||||
if articles:
|
||||
feeds.append((title, articles))
|
||||
# special - finance
|
||||
fin_articles = self.parse_fin_section('http://www.mpfinance.com/htm/Finance/' + dateStr + '/News/ea,eb,ecindex.htm')
|
||||
if fin_articles:
|
||||
feeds.append((u'\u7d93\u6fdf Finance', fin_articles))
|
||||
# special - eco-friendly
|
||||
# eco_articles = self.parse_eco_section('http://tssl.mingpao.com/htm/marketing/eco/cfm/Eco1.cfm')
|
||||
# if eco_articles:
|
||||
# feeds.append((u'\u74b0\u4fdd Eco News', eco_articles))
|
||||
# special - entertainment
|
||||
ent_articles = self.parse_ent_section('http://ol.mingpao.com/cfm/star1.cfm')
|
||||
if ent_articles:
|
||||
feeds.append((u'\u5f71\u8996 Entertainment', ent_articles))
|
||||
return feeds
|
||||
feeds = []
|
||||
dateStr = self.get_fetchdate()
|
||||
for title, url in [(u'\u8981\u805e Headline', 'http://news.mingpao.com/' + dateStr + '/gaindex.htm'),
|
||||
(u'\u6e2f\u805e Local', 'http://news.mingpao.com/' + dateStr + '/gbindex.htm'),
|
||||
(u'\u793e\u8a55/\u7b46\u9663 Editorial', 'http://news.mingpao.com/' + dateStr + '/mrindex.htm'),
|
||||
(u'\u8ad6\u58c7 Forum', 'http://news.mingpao.com/' + dateStr + '/faindex.htm'),
|
||||
(u'\u4e2d\u570b China', 'http://news.mingpao.com/' + dateStr + '/caindex.htm'),
|
||||
(u'\u570b\u969b World', 'http://news.mingpao.com/' + dateStr + '/taindex.htm'),
|
||||
('Tech News', 'http://news.mingpao.com/' + dateStr + '/naindex.htm'),
|
||||
(u'\u6559\u80b2 Education', 'http://news.mingpao.com/' + dateStr + '/gfindex.htm'),
|
||||
(u'\u9ad4\u80b2 Sport', 'http://news.mingpao.com/' + dateStr + '/spindex.htm'),
|
||||
(u'\u526f\u520a Supplement', 'http://news.mingpao.com/' + dateStr + '/jaindex.htm'),
|
||||
(u'\u82f1\u6587 English', 'http://news.mingpao.com/' + dateStr + '/emindex.htm')]:
|
||||
articles = self.parse_section(url)
|
||||
if articles:
|
||||
feeds.append((title, articles))
|
||||
# special - finance
|
||||
fin_articles = self.parse_fin_section('http://www.mpfinance.com/htm/Finance/' + dateStr + '/News/ea,eb,ecindex.htm')
|
||||
if fin_articles:
|
||||
feeds.append((u'\u7d93\u6fdf Finance', fin_articles))
|
||||
# special - entertainment
|
||||
ent_articles = self.parse_ent_section('http://ol.mingpao.com/cfm/star1.cfm')
|
||||
if ent_articles:
|
||||
feeds.append((u'\u5f71\u8996 Film/TV', ent_articles))
|
||||
return feeds
|
||||
|
||||
def parse_section(self, url):
|
||||
dateStr = self.get_fetchdate()
|
||||
soup = self.index_to_soup(url)
|
||||
divs = soup.findAll(attrs={'class': ['bullet','bullet_grey']})
|
||||
current_articles = []
|
||||
included_urls = []
|
||||
divs.reverse()
|
||||
for i in divs:
|
||||
a = i.find('a', href = True)
|
||||
title = self.tag_to_string(a)
|
||||
url = a.get('href', False)
|
||||
url = 'http://news.mingpao.com/' + dateStr + '/' +url
|
||||
if url not in included_urls and url.rfind('Redirect') == -1:
|
||||
current_articles.append({'title': title, 'url': url, 'description':'', 'date':''})
|
||||
included_urls.append(url)
|
||||
current_articles.reverse()
|
||||
return current_articles
|
||||
dateStr = self.get_fetchdate()
|
||||
soup = self.index_to_soup(url)
|
||||
divs = soup.findAll(attrs={'class': ['bullet','bullet_grey']})
|
||||
current_articles = []
|
||||
included_urls = []
|
||||
divs.reverse()
|
||||
for i in divs:
|
||||
a = i.find('a', href = True)
|
||||
title = self.tag_to_string(a)
|
||||
url = a.get('href', False)
|
||||
url = 'http://news.mingpao.com/' + dateStr + '/' +url
|
||||
if url not in included_urls and url.rfind('Redirect') == -1:
|
||||
current_articles.append({'title': title, 'url': url, 'description':'', 'date':''})
|
||||
included_urls.append(url)
|
||||
current_articles.reverse()
|
||||
return current_articles
|
||||
|
||||
def parse_fin_section(self, url):
|
||||
dateStr = self.get_fetchdate()
|
||||
soup = self.index_to_soup(url)
|
||||
a = soup.findAll('a', href= True)
|
||||
current_articles = []
|
||||
for i in a:
|
||||
url = i.get('href', False)
|
||||
if not url.rfind(dateStr) == -1 and url.rfind('index') == -1:
|
||||
title = self.tag_to_string(i)
|
||||
url = 'http://www.mpfinance.com/cfm/' +url
|
||||
current_articles.append({'title': title, 'url': url, 'description':''})
|
||||
return current_articles
|
||||
|
||||
def parse_eco_section(self, url):
|
||||
dateStr = self.get_fetchdate()
|
||||
soup = self.index_to_soup(url)
|
||||
divs = soup.findAll(attrs={'class': ['bullet']})
|
||||
current_articles = []
|
||||
included_urls = []
|
||||
for i in divs:
|
||||
a = i.find('a', href = True)
|
||||
title = self.tag_to_string(a)
|
||||
url = a.get('href', False)
|
||||
url = 'http://tssl.mingpao.com/htm/marketing/eco/cfm/' +url
|
||||
if url not in included_urls and url.rfind('Redirect') == -1 and not url.rfind('.txt') == -1 and not url.rfind(dateStr) == -1:
|
||||
for i in a:
|
||||
url = 'http://www.mpfinance.com/cfm/' + i.get('href', False)
|
||||
if url not in included_urls and not url.rfind(dateStr) == -1 and url.rfind('index') == -1:
|
||||
title = self.tag_to_string(i)
|
||||
current_articles.append({'title': title, 'url': url, 'description':''})
|
||||
included_urls.append(url)
|
||||
return current_articles
|
||||
|
||||
def parse_ent_section(self, url):
|
||||
self.get_fetchdate()
|
||||
soup = self.index_to_soup(url)
|
||||
a = soup.findAll('a', href=True)
|
||||
a.reverse()
|
||||
@ -223,67 +211,71 @@ class MPHKRecipe(BasicNewsRecipe):
|
||||
return soup
|
||||
|
||||
def create_opf(self, feeds, dir=None):
|
||||
if self.IsKindleUsed == False:
|
||||
super(MPHKRecipe,self).create_opf(feeds, dir)
|
||||
return
|
||||
if dir is None:
|
||||
dir = self.output_dir
|
||||
title = self.short_title()
|
||||
title += ' ' + self.get_fetchdate()
|
||||
#if self.output_profile.periodical_date_in_title:
|
||||
# title += strftime(self.timefmt)
|
||||
mi = MetaInformation(title, [__appname__])
|
||||
mi.publisher = __appname__
|
||||
mi.author_sort = __appname__
|
||||
mi.publication_type = self.publication_type+':'+self.short_title()
|
||||
#mi.timestamp = nowf()
|
||||
mi.timestamp = self.get_dtlocal()
|
||||
mi.comments = self.description
|
||||
if not isinstance(mi.comments, unicode):
|
||||
mi.comments = mi.comments.decode('utf-8', 'replace')
|
||||
#mi.pubdate = nowf()
|
||||
mi.pubdate = self.get_dtlocal()
|
||||
opf_path = os.path.join(dir, 'index.opf')
|
||||
ncx_path = os.path.join(dir, 'index.ncx')
|
||||
opf = OPFCreator(dir, mi)
|
||||
# Add mastheadImage entry to <guide> section
|
||||
mp = getattr(self, 'masthead_path', None)
|
||||
if mp is not None and os.access(mp, os.R_OK):
|
||||
from calibre.ebooks.metadata.opf2 import Guide
|
||||
ref = Guide.Reference(os.path.basename(self.masthead_path), os.getcwdu())
|
||||
ref.type = 'masthead'
|
||||
ref.title = 'Masthead Image'
|
||||
opf.guide.append(ref)
|
||||
if self.IsCJKWellSupported == True:
|
||||
# use Chinese title
|
||||
title = u'\u660e\u5831 (\u9999\u6e2f) ' + self.get_fetchformatteddate()
|
||||
else:
|
||||
# use English title
|
||||
title = self.short_title() + ' ' + self.get_fetchformatteddate()
|
||||
if True: # force date in title
|
||||
# title += strftime(self.timefmt)
|
||||
mi = MetaInformation(title, [self.publisher])
|
||||
mi.publisher = self.publisher
|
||||
mi.author_sort = self.publisher
|
||||
if self.IsCJKWellSupported == True:
|
||||
mi.publication_type = 'periodical:'+self.publication_type+':'+self.short_title()
|
||||
else:
|
||||
mi.publication_type = self.publication_type+':'+self.short_title()
|
||||
#mi.timestamp = nowf()
|
||||
mi.timestamp = self.get_dtlocal()
|
||||
mi.comments = self.description
|
||||
if not isinstance(mi.comments, unicode):
|
||||
mi.comments = mi.comments.decode('utf-8', 'replace')
|
||||
#mi.pubdate = nowf()
|
||||
mi.pubdate = self.get_dtlocal()
|
||||
opf_path = os.path.join(dir, 'index.opf')
|
||||
ncx_path = os.path.join(dir, 'index.ncx')
|
||||
opf = OPFCreator(dir, mi)
|
||||
# Add mastheadImage entry to <guide> section
|
||||
mp = getattr(self, 'masthead_path', None)
|
||||
if mp is not None and os.access(mp, os.R_OK):
|
||||
from calibre.ebooks.metadata.opf2 import Guide
|
||||
ref = Guide.Reference(os.path.basename(self.masthead_path), os.getcwdu())
|
||||
ref.type = 'masthead'
|
||||
ref.title = 'Masthead Image'
|
||||
opf.guide.append(ref)
|
||||
|
||||
manifest = [os.path.join(dir, 'feed_%d'%i) for i in range(len(feeds))]
|
||||
manifest.append(os.path.join(dir, 'index.html'))
|
||||
manifest.append(os.path.join(dir, 'index.ncx'))
|
||||
manifest = [os.path.join(dir, 'feed_%d'%i) for i in range(len(feeds))]
|
||||
manifest.append(os.path.join(dir, 'index.html'))
|
||||
manifest.append(os.path.join(dir, 'index.ncx'))
|
||||
|
||||
# Get cover
|
||||
cpath = getattr(self, 'cover_path', None)
|
||||
if cpath is None:
|
||||
pf = open(os.path.join(dir, 'cover.jpg'), 'wb')
|
||||
if self.default_cover(pf):
|
||||
cpath = pf.name
|
||||
if cpath is not None and os.access(cpath, os.R_OK):
|
||||
opf.cover = cpath
|
||||
manifest.append(cpath)
|
||||
# Get cover
|
||||
cpath = getattr(self, 'cover_path', None)
|
||||
if cpath is None:
|
||||
pf = open(os.path.join(dir, 'cover.jpg'), 'wb')
|
||||
if self.default_cover(pf):
|
||||
cpath = pf.name
|
||||
if cpath is not None and os.access(cpath, os.R_OK):
|
||||
opf.cover = cpath
|
||||
manifest.append(cpath)
|
||||
|
||||
# Get masthead
|
||||
mpath = getattr(self, 'masthead_path', None)
|
||||
if mpath is not None and os.access(mpath, os.R_OK):
|
||||
manifest.append(mpath)
|
||||
# Get masthead
|
||||
mpath = getattr(self, 'masthead_path', None)
|
||||
if mpath is not None and os.access(mpath, os.R_OK):
|
||||
manifest.append(mpath)
|
||||
|
||||
opf.create_manifest_from_files_in(manifest)
|
||||
for mani in opf.manifest:
|
||||
if mani.path.endswith('.ncx'):
|
||||
mani.id = 'ncx'
|
||||
if mani.path.endswith('mastheadImage.jpg'):
|
||||
mani.id = 'masthead-image'
|
||||
entries = ['index.html']
|
||||
toc = TOC(base_path=dir)
|
||||
self.play_order_counter = 0
|
||||
self.play_order_map = {}
|
||||
opf.create_manifest_from_files_in(manifest)
|
||||
for mani in opf.manifest:
|
||||
if mani.path.endswith('.ncx'):
|
||||
mani.id = 'ncx'
|
||||
if mani.path.endswith('mastheadImage.jpg'):
|
||||
mani.id = 'masthead-image'
|
||||
entries = ['index.html']
|
||||
toc = TOC(base_path=dir)
|
||||
self.play_order_counter = 0
|
||||
self.play_order_map = {}
|
||||
|
||||
def feed_index(num, parent):
|
||||
f = feeds[num]
|
||||
@ -321,7 +313,7 @@ class MPHKRecipe(BasicNewsRecipe):
|
||||
prefix = '/'.join('..'for i in range(2*len(re.findall(r'link\d+', last))))
|
||||
templ = self.navbar.generate(True, num, j, len(f),
|
||||
not self.has_single_feed,
|
||||
a.orig_url, __appname__, prefix=prefix,
|
||||
a.orig_url, self.publisher, prefix=prefix,
|
||||
center=self.center_navbar)
|
||||
elem = BeautifulSoup(templ.render(doctype='xhtml').decode('utf-8')).find('div')
|
||||
body.insert(len(body.contents), elem)
|
||||
@ -344,7 +336,7 @@ class MPHKRecipe(BasicNewsRecipe):
|
||||
if not desc:
|
||||
desc = None
|
||||
feed_index(i, toc.add_item('feed_%d/index.html'%i, None,
|
||||
f.title, play_order=po, description=desc, author=auth))
|
||||
f.title, play_order=po, description=desc, author=auth))
|
||||
|
||||
else:
|
||||
entries.append('feed_%d/index.html'%0)
|
||||
@ -357,4 +349,3 @@ class MPHKRecipe(BasicNewsRecipe):
|
||||
|
||||
with nested(open(opf_path, 'wb'), open(ncx_path, 'wb')) as (opf_file, ncx_file):
|
||||
opf.render(opf_file, ncx_file)
|
||||
|
||||
|
48
resources/recipes/nationalgeoro.recipe
Normal file
@ -0,0 +1,48 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
#!/usr/bin/env python
|
||||
|
||||
__license__ = 'GPL v3'
|
||||
__copyright__ = u'2011, Silviu Cotoar\u0103'
|
||||
'''
|
||||
natgeo.ro
|
||||
'''
|
||||
|
||||
from calibre.web.feeds.news import BasicNewsRecipe
|
||||
|
||||
class NationalGeoRo(BasicNewsRecipe):
|
||||
title = u'National Geographic RO'
|
||||
__author__ = u'Silviu Cotoar\u0103'
|
||||
description = u'S\u0103 avem grij\u0103 de planet\u0103'
|
||||
publisher = 'National Geographic'
|
||||
oldest_article = 5
|
||||
language = 'ro'
|
||||
max_articles_per_feed = 100
|
||||
no_stylesheets = True
|
||||
use_embedded_content = False
|
||||
category = 'Reviste'
|
||||
encoding = 'utf-8'
|
||||
cover_url = 'http://wiki.benecke.com/images/c/c4/NatGeographic_Logo.jpg'
|
||||
|
||||
conversion_options = {
|
||||
'comments' : description
|
||||
,'tags' : category
|
||||
,'language' : language
|
||||
,'publisher' : publisher
|
||||
}
|
||||
|
||||
keep_only_tags = [
|
||||
dict(name='h2', attrs={'class':'contentheading clearfix'})
|
||||
, dict(name='div', attrs={'class':'article-content'})
|
||||
|
||||
]
|
||||
|
||||
remove_tags = [
|
||||
dict(name='div', attrs={'class':['phocagallery']})
|
||||
]
|
||||
|
||||
feeds = [
|
||||
(u'Feeds', u'http://www.natgeo.ro/index.php?format=feed&type=rss')
|
||||
]
|
||||
|
||||
def preprocess_html(self, soup):
|
||||
return self.adeify_images(soup)
|
@ -88,8 +88,8 @@ class NYTimes(BasicNewsRecipe):
|
||||
|
||||
if headlinesOnly:
|
||||
title='New York Times Headlines'
|
||||
description = 'Headlines from the New York Times'
|
||||
needs_subscription = False
|
||||
description = 'Headlines from the New York Times. Needs a subscription from http://www.nytimes.com'
|
||||
needs_subscription = 'optional'
|
||||
elif webEdition:
|
||||
title='New York Times (Web)'
|
||||
description = 'New York Times on the Web'
|
||||
|
55
resources/recipes/nytimes_sports.recipe
Normal file
@ -0,0 +1,55 @@
|
||||
#!/usr/bin/env python
|
||||
# encoding: utf-8
|
||||
|
||||
from __future__ import with_statement
|
||||
__license__ = 'GPL 3'
|
||||
__copyright__ = 'zotzo'
|
||||
__docformat__ = 'restructuredtext en'
|
||||
"""
|
||||
http://fifthdown.blogs.nytimes.com/
|
||||
http://offthedribble.blogs.nytimes.com/
|
||||
http://thequad.blogs.nytimes.com/
|
||||
http://slapshot.blogs.nytimes.com/
|
||||
http://goal.blogs.nytimes.com/
|
||||
http://bats.blogs.nytimes.com/
|
||||
http://straightsets.blogs.nytimes.com/
|
||||
http://formulaone.blogs.nytimes.com/
|
||||
http://onpar.blogs.nytimes.com/
|
||||
"""
|
||||
|
||||
from calibre.web.feeds.news import BasicNewsRecipe
|
||||
|
||||
|
||||
class NYTimesSports(BasicNewsRecipe):
|
||||
title = 'New York Times Sports Beat'
|
||||
language = 'en'
|
||||
__author__ = 'rylsfan'
|
||||
description = 'Indepth sports from the New York Times'
|
||||
publisher = 'The New York Times'
|
||||
category = 'Sports'
|
||||
oldest_article = 3
|
||||
max_articles_per_feed = 25
|
||||
no_stylesheets = True
|
||||
language = 'en'
|
||||
#cover_url ='http://bit.ly/h8F4DO'
|
||||
feeds = [
|
||||
(u'The Fifth Down', u'http://fifthdown.blogs.nytimes.com/feed/'),
|
||||
(u'Off The Dribble', u'http://offthedribble.blogs.nytimes.com/feed/'),
|
||||
(u'The Quad', u'http://thequad.blogs.nytimes.com/feed/'),
|
||||
(u'Slap Shot', u'http://slapshot.blogs.nytimes.com/feed/'),
|
||||
(u'Goal', u'http://goal.blogs.nytimes.com/feed/'),
|
||||
(u'Bats', u'http://bats.blogs.nytimes.com/feed/'),
|
||||
(u'Straight Sets', u'http://straightsets.blogs.nytimes.com/feed/'),
|
||||
(u'Formula One', u'http://formulaone.blogs.nytimes.com/feed/'),
|
||||
(u'On Par', u'http://onpar.blogs.nytimes.com/feed/'),
|
||||
]
|
||||
keep_only_tags = [dict(name='div', attrs={'id':'header'}),
|
||||
dict(name='h1'),
|
||||
dict(name='h2'),
|
||||
dict(name='div', attrs={'class':'entry-content'})]
|
||||
extra_css = '''
|
||||
h1{font-family:Arial,Helvetica,sans-serif; font-weight:bold;font-size:large;}
|
||||
h2{font-family:Arial,Helvetica,sans-serif; font-weight:normal;font-size:small;}
|
||||
p{font-family:Arial,Helvetica,sans-serif;font-size:small;}
|
||||
body{font-family:Helvetica,Arial,sans-serif;font-size:small;}
|
||||
'''
|
@ -96,18 +96,18 @@ class NYTimes(BasicNewsRecipe):
|
||||
if headlinesOnly:
|
||||
title='New York Times Headlines'
|
||||
description = 'Headlines from the New York Times'
|
||||
needs_subscription = False
|
||||
needs_subscription = True
|
||||
elif webEdition:
|
||||
title='New York Times (Web)'
|
||||
description = 'New York Times on the Web'
|
||||
needs_subscription = True
|
||||
elif replaceKindleVersion:
|
||||
title='The New York Times'
|
||||
title='The New York Times'
|
||||
description = 'Today\'s New York Times'
|
||||
needs_subscription = True
|
||||
else:
|
||||
title='New York Times'
|
||||
description = 'Today\'s New York Times'
|
||||
description = 'Today\'s New York Times. Needs subscription from http://www.nytimes.com'
|
||||
needs_subscription = True
|
||||
|
||||
|
||||
@ -676,7 +676,7 @@ class NYTimes(BasicNewsRecipe):
|
||||
if hlines:
|
||||
for hline in hlines:
|
||||
hline.extract()
|
||||
|
||||
|
||||
#find all section headers
|
||||
hlines = runAround.findAll('h6')
|
||||
if hlines:
|
||||
|
46
resources/recipes/nytimes_tech.recipe
Normal file
@ -0,0 +1,46 @@
|
||||
#!/usr/bin/env python
|
||||
# encoding: utf-8
|
||||
|
||||
from __future__ import with_statement
|
||||
__license__ = 'GPL 3'
|
||||
__copyright__ = 'zotzo'
|
||||
__docformat__ = 'restructuredtext en'
|
||||
"""
|
||||
http://pogue.blogs.nytimes.com/
|
||||
"""
|
||||
|
||||
from calibre.web.feeds.news import BasicNewsRecipe
|
||||
|
||||
|
||||
class NYTimesTechnology(BasicNewsRecipe):
|
||||
title = 'New York Times Technology Beat'
|
||||
language = 'en'
|
||||
__author__ = 'David Pogue'
|
||||
description = 'The latest in technology from David Pogue'
|
||||
publisher = 'The New York Times'
|
||||
category = 'Technology'
|
||||
oldest_article = 14
|
||||
max_articles_per_feed = 25
|
||||
no_stylesheets = True
|
||||
language = 'en'
|
||||
cover_url ='http://bit.ly/g0SKJT'
|
||||
feeds = [
|
||||
(u'Pogues Posts', u'http://pogue.blogs.nytimes.com/feed/'),
|
||||
(u'Bits', u'http://bits.blogs.nytimes.com/feed/'),
|
||||
(u'Gadgetwise', u'http://gadgetwise.blogs.nytimes.com/feed/'),
|
||||
(u'Open', u'http://open.blogs.nytimes.com/feed/')
|
||||
]
|
||||
keep_only_tags = [dict(name='div', attrs={'id':'header'}),
|
||||
dict(name='h1'),
|
||||
dict(name='h2'),
|
||||
dict(name='div', attrs={'class':'entry-content'})]
|
||||
extra_css = '''
|
||||
h1{font-family:Arial,Helvetica,sans-serif;
|
||||
font-weight:bold;font-size:large;}
|
||||
|
||||
h2{font-family:Arial,Helvetica,sans-serif;
|
||||
font-weight:normal;font-size:small;}
|
||||
|
||||
p{font-family:Arial,Helvetica,sans-serif;font-size:small;}
|
||||
body{font-family:Helvetica,Arial,sans-serif;font-size:small;}
|
||||
'''
|
50
resources/recipes/osnews_pl.recipe
Normal file
@ -0,0 +1,50 @@
|
||||
#!/usr/bin/env python
|
||||
|
||||
__license__ = 'GPL v3'
|
||||
'''
|
||||
OSNews.pl
|
||||
'''
|
||||
|
||||
from calibre.web.feeds.news import BasicNewsRecipe
|
||||
import re
|
||||
|
||||
class OSNewsRecipe(BasicNewsRecipe):
|
||||
__author__ = u'Mori & Tomasz D\u0142ugosz'
|
||||
language = 'pl'
|
||||
|
||||
title = u'OSnews.pl'
|
||||
publisher = u'OSnews.pl'
|
||||
description = u'OSnews.pl jest spo\u0142eczno\u015bciowym serwisem informacyjnym po\u015bwi\u0119conym oprogramowaniu, systemom operacyjnym i \u015bwiatowi IT'
|
||||
|
||||
no_stylesheets = True
|
||||
remove_javascript = True
|
||||
encoding = 'utf-8'
|
||||
use_embedded_content = False;
|
||||
|
||||
oldest_article = 7
|
||||
max_articles_per_feed = 100
|
||||
|
||||
extra_css = '''
|
||||
.news-heading {font-size:150%}
|
||||
.newsinformations li {display:inline;}
|
||||
blockquote {border:2px solid #000; padding:5px;}
|
||||
'''
|
||||
|
||||
feeds = [
|
||||
(u'OSNews.pl', u'http://feeds.feedburner.com/OSnewspl')
|
||||
]
|
||||
|
||||
keep_only_tags = [
|
||||
dict(name = 'a', attrs = {'class' : 'news-heading'}),
|
||||
dict(name = 'div', attrs = {'class' : 'newsinformations'}),
|
||||
dict(name = 'div', attrs = {'id' : 'news-content'})
|
||||
]
|
||||
|
||||
remove_tags = [
|
||||
dict(name = 'div', attrs = {'class' : 'sociable'}),
|
||||
dict(name = 'div', attrs = {'class' : 'post_prev'}),
|
||||
dict(name = 'div', attrs = {'class' : 'post_next'}),
|
||||
dict(name = 'div', attrs = {'class' : 'clr'})
|
||||
]
|
||||
|
||||
preprocess_regexps = [(re.compile(u'</span>Komentarze: \(?[0-9]+\)? ?<span'), lambda match: '</span><span')]
|
21
resources/recipes/post_today.recipe
Normal file
@ -0,0 +1,21 @@
|
||||
from calibre.web.feeds.news import BasicNewsRecipe
|
||||
|
||||
class AdvancedUserRecipe1299061355(BasicNewsRecipe):
|
||||
title = u'Post Today'
|
||||
language = 'th'
|
||||
__author__ = "Chotechai P."
|
||||
oldest_article = 7
|
||||
max_articles_per_feed = 100
|
||||
cover_url = 'http://upload.wikimedia.org/wikipedia/th/2/2e/Posttoday_Logo.png'
|
||||
feeds = [(u'Breaking News', u'http://www.posttoday.com/rss/src/breakingnews.xml'), (u'\u0e02\u0e48\u0e32\u0e27', u'http://www.posttoday.com/rss/src/news.xml'), (u'\u0e27\u0e34\u0e40\u0e04\u0e23\u0e32\u0e30\u0e2b\u0e4c', u'http://www.posttoday.com/rss/src/analyse.xml'), (u'\u0e40\u0e21\u0e32\u0e17\u0e4c\u0e01\u0e31\u0e19\u0e43\u0e2b\u0e49 z', u'http://www.posttoday.com/rss/src/mouth.xml'), (u'\u0e44\u0e17\u0e22\u0e42\u0e0b\u0e44\u0e0b\u0e15\u0e35\u0e49', u'http://www.posttoday.com/rss/src/thaisociety.xml'), (u'\u0e44\u0e25\u0e1f\u0e4c\u0e2a\u0e44\u0e15\u0e25\u0e4c', u'http://www.posttoday.com/rss/src/lifestyle.xml'), (u'\u0e0a\u0e35\u0e49\u0e0a\u0e48\u0e2d\u0e07\u0e23\u0e27\u0e22', u'http://www.posttoday.com/rss/src/moneyguide.xml'), (u'\u0e1a\u0e49\u0e32\u0e19-\u0e04\u0e2d\u0e19\u0e42\u0e14', u'http://www.posttoday.com/rss/src/homecondo.xml'), (u'\u0e22\u0e32\u0e19\u0e22\u0e19\u0e15\u0e4c', u'http://www.posttoday.com/rss/src/motor.xml'), (u'\u0e14\u0e34\u0e08\u0e34\u0e15\u0e2d\u0e25\u0e44\u0e25\u0e1f\u0e4c', u'http://www.posttoday.com/rss/src/digitallife.xml'), (u'\u0e01\u0e35\u0e2c\u0e32', u'http://www.posttoday.com/rss/src/sport.xml'), (u'\u0e23\u0e2d\u0e1a\u0e42\u0e25\u0e01', u'http://www.posttoday.com/rss/src/world.xml'), (u'\u0e01\u0e34\u0e19-\u0e40\u0e17\u0e35\u0e48\u0e22\u0e27', u'http://www.posttoday.com/rss/src/eattravel.xml'), (u'Mind & Soul', u'http://www.posttoday.com/rss/src/mindsoul.xml'), (u'\u0e1a\u0e25\u0e47\u0e2d\u0e01 \u0e1a\u0e01.', u'http://www.posttoday.com/rss/src/blogs.xml')]
|
||||
keep_only_tags = []
|
||||
keep_only_tags.append(dict(name = 'div', attrs = {'class' :
|
||||
'articleContents'}))
|
||||
|
||||
remove_tags = []
|
||||
remove_tags.append(dict(name = 'label'))
|
||||
remove_tags.append(dict(name = 'span'))
|
||||
remove_tags.append(dict(name = 'div', attrs = {'class' :
|
||||
'socialBookmark'}))
|
||||
remove_tags.append(dict(name = 'div', attrs = {'class' :
|
||||
'misc'}))
|
49
resources/recipes/rbc_ru.recipe
Normal file
@ -0,0 +1,49 @@
|
||||
from calibre.web.feeds.news import BasicNewsRecipe
|
||||
|
||||
class AdvancedUserRecipe1286819935(BasicNewsRecipe):
|
||||
title = u'RBC.ru'
|
||||
__author__ = 'A. Chewi'
|
||||
oldest_article = 7
|
||||
max_articles_per_feed = 100
|
||||
no_stylesheets = True
|
||||
use_embedded_content = False
|
||||
conversion_options = {'linearize_tables' : True}
|
||||
remove_attributes = ['style']
|
||||
language = 'ru'
|
||||
timefmt = ' [%a, %d %b, %Y]'
|
||||
|
||||
keep_only_tags = [dict(name='h2', attrs={}),
|
||||
dict(name='div', attrs={'class': 'box _ga1_on_'}),
|
||||
dict(name='h1', attrs={'class': 'news_section'}),
|
||||
dict(name='div', attrs={'class': 'news_body dotted_border_bottom'}),
|
||||
dict(name='table', attrs={'class': 'newsBody'}),
|
||||
dict(name='h2', attrs={'class': 'black'})]
|
||||
|
||||
feeds = [(u'Главные новости', u'http://static.feed.rbc.ru/rbc/internal/rss.rbc.ru/rbc.ru/mainnews.rss'),
|
||||
(u'Политика', u'http://static.feed.rbc.ru/rbc/internal/rss.rbc.ru/rbc.ru/politics.rss'),
|
||||
(u'Экономика', u'http://static.feed.rbc.ru/rbc/internal/rss.rbc.ru/rbc.ru/economics.rss'),
|
||||
(u'Общество', u'http://static.feed.rbc.ru/rbc/internal/rss.rbc.ru/rbc.ru/society.rss'),
|
||||
(u'Происшествия', u'http://static.feed.rbc.ru/rbc/internal/rss.rbc.ru/rbc.ru/incidents.rss'),
|
||||
(u'Финансовые новости Quote.rbc.ru', u'http://static.feed.rbc.ru/rbc/internal/rss.rbc.ru/quote.ru/mainnews.rss')]
|
||||
|
||||
|
||||
remove_tags = [dict(name='div', attrs={'class': "video-frame"}),
|
||||
dict(name='div', attrs={'class': "photo-container videoContainer videoSWFLinks videoPreviewSlideContainer notes"}),
|
||||
dict(name='div', attrs={'class': "notes"}),
|
||||
dict(name='div', attrs={'class': "publinks"}),
|
||||
dict(name='a', attrs={'class': "print"}),
|
||||
dict(name='div', attrs={'class': "photo-report_new notes newslider"}),
|
||||
dict(name='div', attrs={'class': "videoContainer"}),
|
||||
dict(name='div', attrs={'class': "videoPreviewSlideContainer"}),
|
||||
dict(name='a', attrs={'class': "videoPreviewContainer"}),
|
||||
dict(name='a', attrs={'class': "red"}),]
|
||||
|
||||
def preprocess_html(self, soup):
|
||||
for alink in soup.findAll('a'):
|
||||
if alink.string is not None:
|
||||
tstr = alink.string
|
||||
alink.replaceWith(tstr)
|
||||
return soup
|
||||
|
||||
def print_version(self, url):
|
||||
return url + '?print=true'
|
136
resources/recipes/roger_ebert_blog.recipe
Normal file
@ -0,0 +1,136 @@
|
||||
import re
|
||||
import urllib2
|
||||
import time
|
||||
from calibre.web.feeds.news import BasicNewsRecipe
|
||||
from calibre.ebooks.BeautifulSoup import BeautifulSoup, SoupStrainer
|
||||
from calibre import strftime
|
||||
|
||||
'''
|
||||
Help Needed:
|
||||
Still can't figure out why I'm getting strange characters. Esp. the Great Movies descriptions in the TOC.
|
||||
Anyone help me figure that out?
|
||||
|
||||
Change Log:
|
||||
2011-02-19: Version 2: Added "Oscars" section and fixed date problem
|
||||
'''
|
||||
|
||||
class Ebert(BasicNewsRecipe):
|
||||
title = 'Roger Ebert'
|
||||
__author__ = 'Shane Erstad'
|
||||
version = 2
|
||||
description = 'Roger Ebert Movie Reviews'
|
||||
publisher = 'Chicago Sun Times'
|
||||
category = 'movies'
|
||||
oldest_article = 8
|
||||
max_articles_per_feed = 100
|
||||
no_stylesheets = True
|
||||
use_embedded_content = False
|
||||
encoding = 'UTF-8'
|
||||
masthead_url = 'http://rogerebert.suntimes.com/graphics/global/roger.jpg'
|
||||
language = 'en'
|
||||
remove_empty_feeds = False
|
||||
PREFIX = 'http://rogerebert.suntimes.com'
|
||||
patternReviews = r'<span class="*?movietitle"*?>(.*?)</span>.*?<div class="*?headline"*?>(.*?)</div>(.*?)</div>'
|
||||
patternCommentary = r'<div class="*?headline"*?>.*?(<a href="/apps/pbcs.dll/article\?AID=.*?COMMENTARY.*?" id="ltred">.*?</a>).*?<div class="blurb clear">(.*?)</div>'
|
||||
patternPeople = r'<div class="*?headline"*?>.*?(<a href="/apps/pbcs.dll/article\?AID=.*?PEOPLE.*?" id="ltred">.*?</a>).*?<div class="blurb clear">(.*?)</div>'
|
||||
patternOscars = r'<div class="*?headline"*?>.*?(<a href="/apps/pbcs.dll/article\?AID=.*?OSCARS.*?" id="ltred">.*?</a>).*?<div class="blurb clear">(.*?)</div>'
|
||||
patternGlossary = r'<div class="*?headline"*?>.*?(<a href="/apps/pbcs.dll/article\?AID=.*?GLOSSARY.*?" id="ltred">.*?</a>).*?<div class="blurb clear">(.*?)</div>'
|
||||
|
||||
|
||||
|
||||
conversion_options = {
|
||||
'comment' : description
|
||||
, 'tags' : category
|
||||
, 'publisher' : publisher
|
||||
, 'language' : language
|
||||
, 'linearize_tables' : True
|
||||
}
|
||||
|
||||
|
||||
feeds = [
|
||||
(u'Reviews' , u'http://rogerebert.suntimes.com/apps/pbcs.dll/section?category=reviews' )
|
||||
,(u'Commentary' , u'http://rogerebert.suntimes.com/apps/pbcs.dll/section?category=COMMENTARY')
|
||||
,(u'Great Movies' , u'http://rogerebert.suntimes.com/apps/pbcs.dll/section?category=REVIEWS08')
|
||||
,(u'People' , u'http://rogerebert.suntimes.com/apps/pbcs.dll/section?category=PEOPLE')
|
||||
,(u'Oscars' , u'http://rogerebert.suntimes.com/apps/pbcs.dll/section?category=OSCARS')
|
||||
,(u'Glossary' , u'http://rogerebert.suntimes.com/apps/pbcs.dll/section?category=GLOSSARY')
|
||||
|
||||
]
|
||||
|
||||
preprocess_regexps = [
|
||||
(re.compile(r'<font.*?>.*?This is a printer friendly.*?</font>.*?<hr>', re.DOTALL|re.IGNORECASE),
|
||||
lambda m: '')
|
||||
]
|
||||
|
||||
|
||||
|
||||
def print_version(self, url):
|
||||
return url + '&template=printart'
|
||||
|
||||
def parse_index(self):
|
||||
totalfeeds = []
|
||||
lfeeds = self.get_feeds()
|
||||
for feedobj in lfeeds:
|
||||
feedtitle, feedurl = feedobj
|
||||
self.log('\tFeedurl: ', feedurl)
|
||||
self.report_progress(0, _('Fetching feed')+' %s...'%(feedtitle if feedtitle else feedurl))
|
||||
articles = []
|
||||
page = urllib2.urlopen(feedurl).read()
|
||||
|
||||
if feedtitle == 'Reviews' or feedtitle == 'Great Movies':
|
||||
pattern = self.patternReviews
|
||||
elif feedtitle == 'Commentary':
|
||||
pattern = self.patternCommentary
|
||||
elif feedtitle == 'People':
|
||||
pattern = self.patternPeople
|
||||
elif feedtitle == 'Glossary':
|
||||
pattern = self.patternGlossary
|
||||
elif feedtitle == 'Oscars':
|
||||
pattern = self.patternOscars
|
||||
|
||||
|
||||
regex = re.compile(pattern, re.IGNORECASE|re.DOTALL)
|
||||
|
||||
for match in regex.finditer(page):
|
||||
if feedtitle == 'Reviews' or feedtitle == 'Great Movies':
|
||||
movietitle = match.group(1)
|
||||
thislink = match.group(2)
|
||||
description = match.group(3)
|
||||
elif feedtitle == 'Commentary' or feedtitle == 'People' or feedtitle == 'Glossary' or feedtitle == 'Oscars':
|
||||
thislink = match.group(1)
|
||||
description = match.group(2)
|
||||
|
||||
self.log(thislink)
|
||||
|
||||
for link in BeautifulSoup(thislink, parseOnlyThese=SoupStrainer('a')):
|
||||
thisurl = self.PREFIX + link['href']
|
||||
thislinktext = self.tag_to_string(link)
|
||||
|
||||
if feedtitle == 'Reviews' or feedtitle == 'Great Movies':
|
||||
thistitle = movietitle
|
||||
elif feedtitle == 'Commentary' or feedtitle == 'People' or feedtitle == 'Glossary' or feedtitle == 'Oscars':
|
||||
thistitle = thislinktext
|
||||
|
||||
if thistitle == '':
|
||||
continue
|
||||
|
||||
|
||||
pattern2 = r'AID=\/(.*?)\/'
|
||||
reg2 = re.compile(pattern2, re.IGNORECASE|re.DOTALL)
|
||||
match2 = reg2.search(thisurl)
|
||||
if match2:
|
||||
c = time.strptime(match2.group(1),"%Y%m%d")
|
||||
mydate=strftime("%A, %B %d, %Y", c)
|
||||
else:
|
||||
mydate = strftime("%A, %B %d, %Y")
|
||||
self.log(mydate)
|
||||
|
||||
articles.append({
|
||||
'title' :thistitle
|
||||
,'date' :' [' + mydate + ']'
|
||||
,'url' :thisurl
|
||||
,'description':description
|
||||
})
|
||||
totalfeeds.append((feedtitle, articles))
|
||||
|
||||
return totalfeeds
|
59
resources/recipes/romanialibera.recipe
Normal file
@ -0,0 +1,59 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
#!/usr/bin/env python
|
||||
|
||||
__license__ = 'GPL v3'
|
||||
__copyright__ = u'2011, Silviu Cotoar\u0103'
|
||||
'''
|
||||
romanialibera.ro
|
||||
'''
|
||||
|
||||
from calibre.web.feeds.news import BasicNewsRecipe
|
||||
|
||||
class RomaniaLibera(BasicNewsRecipe):
|
||||
title = u'Rom\u00e2nia Liber\u0103'
|
||||
__author__ = u'Silviu Cotoar\u0103'
|
||||
description = u'Rom\u00e2nia Liber\u0103'
|
||||
publisher = u'Rom\u00e2nia Liber\u0103'
|
||||
oldest_article = 5
|
||||
language = 'ro'
|
||||
max_articles_per_feed = 100
|
||||
no_stylesheets = True
|
||||
use_embedded_content = False
|
||||
category = 'Ziare,Stiri'
|
||||
encoding = 'utf-8'
|
||||
cover_url = 'http://www.romanialibera.ro/templates/lilac/images/sigla_1.gif'
|
||||
|
||||
conversion_options = {
|
||||
'comments' : description
|
||||
,'tags' : category
|
||||
,'language' : language
|
||||
,'publisher' : publisher
|
||||
}
|
||||
|
||||
keep_only_tags = [
|
||||
dict(name='div', attrs={'id':'articol'})
|
||||
]
|
||||
|
||||
remove_tags = [
|
||||
dict(name='div', attrs={'id':['art_actions']})
|
||||
, dict(name='div', attrs={'class':['stats']})
|
||||
, dict(name='div', attrs={'class':['data']})
|
||||
, dict(name='div', attrs={'class':['autori']})
|
||||
, dict(name='div', attrs={'class':['banda_explicatii_text']})
|
||||
, dict(name='td', attrs={'class':['connect_widget_vertical_center connect_widget_button_cell']})
|
||||
, dict(name='div', attrs={'class':['aceeasi_tema']})
|
||||
, dict(name='div', attrs={'class':['art_after_text']})
|
||||
, dict(name='div', attrs={'class':['navigare']})
|
||||
, dict(name='div', attrs={'id':['art_text_left']})
|
||||
]
|
||||
|
||||
remove_tags_after = [
|
||||
dict(name='div', attrs={'class':'art_after_text'})
|
||||
]
|
||||
|
||||
feeds = [
|
||||
(u'Feeds', u'http://www.romanialibera.ro/rss.xml')
|
||||
]
|
||||
|
||||
def preprocess_html(self, soup):
|
||||
return self.adeify_images(soup)
|
@ -69,12 +69,16 @@ class SeattleTimes(BasicNewsRecipe):
|
||||
u'http://seattletimes.nwsource.com/rss/mostreadarticles.xml'),
|
||||
]
|
||||
|
||||
keep_only_tags = [dict(id='content')]
|
||||
remove_tags = [
|
||||
dict(name=['object','link','script'])
|
||||
,dict(name='p', attrs={'class':'permission'})
|
||||
dict(name=['object','link','script']),
|
||||
{'class':['permission', 'note', 'bottomtools',
|
||||
'homedelivery']},
|
||||
dict(id=["rightcolumn", 'footer', 'adbottom']),
|
||||
]
|
||||
|
||||
def print_version(self, url):
|
||||
return url
|
||||
start_url, sep, rest_url = url.rpartition('_')
|
||||
rurl, rsep, article_id = start_url.rpartition('/')
|
||||
return u'http://seattletimes.nwsource.com/cgi-bin/PrintStory.pl?document_id=' + article_id
|
||||
|
55
resources/recipes/sfin.recipe
Normal file
@ -0,0 +1,55 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
#!/usr/bin/env python
|
||||
|
||||
__license__ = 'GPL v3'
|
||||
__copyright__ = u'2011, Silviu Cotoar\u0103'
|
||||
'''
|
||||
sfin.ro
|
||||
'''
|
||||
|
||||
from calibre.web.feeds.news import BasicNewsRecipe
|
||||
|
||||
class Sfin(BasicNewsRecipe):
|
||||
title = u'S\u0103pt\u0103m\u00e2na Financiar\u0103'
|
||||
__author__ = u'Silviu Cotoar\u0103'
|
||||
description = 'SFIN'
|
||||
publisher = 'SFIN'
|
||||
oldest_article = 5
|
||||
language = 'ro'
|
||||
max_articles_per_feed = 100
|
||||
no_stylesheets = True
|
||||
use_embedded_content = False
|
||||
category = 'Ziare,Stiri,Economie,Business'
|
||||
encoding = 'utf-8'
|
||||
cover_url = 'http://img.9am.ro/images/logo_surse/saptamana_financiara.gif'
|
||||
|
||||
conversion_options = {
|
||||
'comments' : description
|
||||
,'tags' : category
|
||||
,'language' : language
|
||||
,'publisher' : publisher
|
||||
}
|
||||
|
||||
keep_only_tags = [
|
||||
dict(name='div', attrs={'class':'col2ContentLeft'})
|
||||
, dict(name='div', attrs={'id':'contentArticol'})
|
||||
]
|
||||
|
||||
remove_tags = [
|
||||
dict(name='div', attrs={'class':['infoArticol']})
|
||||
, dict(name='div', attrs={'class':['separator']})
|
||||
, dict(name='div', attrs={'class':['tags']})
|
||||
, dict(name='div', attrs={'id':['comments']})
|
||||
, dict(name='div', attrs={'class':'boxForm'})
|
||||
]
|
||||
|
||||
remove_tags_after = [
|
||||
dict(name='div', attrs={'class':'tags'})
|
||||
]
|
||||
|
||||
feeds = [
|
||||
(u'Feeds', u'http://www.sfin.ro/rss')
|
||||
]
|
||||
|
||||
def preprocess_html(self, soup):
|
||||
return self.adeify_images(soup)
|
@ -10,12 +10,14 @@ class AdvancedUserRecipe1278049615(BasicNewsRecipe):
|
||||
|
||||
max_articles_per_feed = 100
|
||||
|
||||
feeds = [(u'News', u'http://www.statesman.com/section-rss.do?source=news&includeSubSections=true'),
|
||||
(u'Business', u'http://www.statesman.com/section-rss.do?source=business&includeSubSections=true'),
|
||||
(u'Life', u'http://www.statesman.com/section-rss.do?source=life&includesubsection=true'),
|
||||
(u'Editorial', u'http://www.statesman.com/section-rss.do?source=opinion&includesubsections=true'),
|
||||
(u'Sports', u'http://www.statesman.com/section-rss.do?source=sports&includeSubSections=true')
|
||||
]
|
||||
feeds = [(u'News',
|
||||
u'http://www.statesman.com/section-rss.do?source=news&includeSubSections=true'),
|
||||
(u'Local', u'http://www.statesman.com/section-rss.do?source=local&includeSubSections=true'),
|
||||
(u'Business', u'http://www.statesman.com/section-rss.do?source=business&includeSubSections=true'),
|
||||
(u'Life', u'http://www.statesman.com/section-rss.do?source=life&includesubsection=true'),
|
||||
(u'Editorial', u'http://www.statesman.com/section-rss.do?source=opinion&includesubsections=true'),
|
||||
(u'Sports', u'http://www.statesman.com/section-rss.do?source=sports&includeSubSections=true')
|
||||
]
|
||||
masthead_url = "http://www.statesman.com/images/cmg-logo.gif"
|
||||
#temp_files = []
|
||||
#articles_are_obfuscated = True
|
||||
@ -28,8 +30,11 @@ class AdvancedUserRecipe1278049615(BasicNewsRecipe):
|
||||
conversion_options = {'linearize_tables':True}
|
||||
remove_tags = [
|
||||
dict(name='div', attrs={'id':'cxArticleOptions'}),
|
||||
{'class':['perma', 'comments', 'trail', 'share-buttons',
|
||||
'toggle_show_on']},
|
||||
]
|
||||
keep_only_tags = [
|
||||
dict(name='div', attrs={'class':'cxArticleHeader'}),
|
||||
dict(name='div', attrs={'id':'cxArticleBodyText'}),
|
||||
dict(name='div', attrs={'class':'cxArticleHeader'}),
|
||||
dict(name='div', attrs={'id':['cxArticleBodyText',
|
||||
'content']}),
|
||||
]
|
||||
|
51
resources/recipes/superbebe.recipe
Normal file
@ -0,0 +1,51 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
#!/usr/bin/env python
|
||||
|
||||
__license__ = 'GPL v3'
|
||||
__copyright__ = u'2011, Silviu Cotoar\u0103'
|
||||
'''
|
||||
superbebe.ro
|
||||
'''
|
||||
|
||||
from calibre.web.feeds.news import BasicNewsRecipe
|
||||
|
||||
class Superbebe(BasicNewsRecipe):
|
||||
title = u'Superbebe'
|
||||
__author__ = u'Silviu Cotoar\u0103'
|
||||
description = 'Superbebe'
|
||||
publisher = 'Superbebe'
|
||||
oldest_article = 5
|
||||
language = 'ro'
|
||||
max_articles_per_feed = 100
|
||||
no_stylesheets = True
|
||||
use_embedded_content = False
|
||||
category = 'Ziare,Reviste,Bebe,Mamici'
|
||||
encoding = 'utf-8'
|
||||
cover_url = 'http://www.superbebe.ro/images/superbebe.gif'
|
||||
|
||||
conversion_options = {
|
||||
'comments' : description
|
||||
,'tags' : category
|
||||
,'language' : language
|
||||
,'publisher' : publisher
|
||||
}
|
||||
|
||||
keep_only_tags = [
|
||||
dict(name='div', attrs={'class':'articol'})
|
||||
]
|
||||
|
||||
remove_tags = [
|
||||
dict(name='div', attrs={'class':['info']})
|
||||
, dict(name='div', attrs={'class':['tags']})
|
||||
]
|
||||
|
||||
remove_tags_after = [
|
||||
dict(name='div', attrs={'class':['tags']})
|
||||
]
|
||||
|
||||
feeds = [
|
||||
(u'Feeds', u'http://www.superbebe.ro/rss')
|
||||
]
|
||||
|
||||
def preprocess_html(self, soup):
|
||||
return self.adeify_images(soup)
|
25
resources/recipes/swiatkindle.recipe
Normal file
@ -0,0 +1,25 @@
|
||||
#!/usr/bin/env python
|
||||
|
||||
__license__ = 'GPL v3'
|
||||
__copyright__ = u'2011, Tomasz Dlugosz <tomek3d@gmail.com>'
|
||||
'''
|
||||
swiatczytnikow.pl
|
||||
'''
|
||||
|
||||
import re
|
||||
from calibre.web.feeds.news import BasicNewsRecipe
|
||||
|
||||
class swiatczytnikow(BasicNewsRecipe):
|
||||
title = u'Swiat Czytnikow'
|
||||
description = u'Czytniki e-książek w Polsce. Jak wybrać, kupić i korzystać z Amazon Kindle i innych'
|
||||
language = 'pl'
|
||||
__author__ = u'Tomasz D\u0142ugosz'
|
||||
oldest_article = 7
|
||||
max_articles_per_feed = 100
|
||||
|
||||
feeds = [(u'Świat Czytników - wpisy', u'http://swiatczytnikow.pl/feed')]
|
||||
|
||||
remove_tags = [dict(name = 'ul', attrs = {'class' : 'similar-posts'})]
|
||||
|
||||
preprocess_regexps = [(re.compile(u'<h3>Czytaj dalej:</h3>'), lambda match: '')]
|
||||
|
54
resources/recipes/tabu.recipe
Normal file
@ -0,0 +1,54 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
#!/usr/bin/env python
|
||||
|
||||
__license__ = 'GPL v3'
|
||||
__copyright__ = u'2011, Silviu Cotoar\u0103'
|
||||
'''
|
||||
tabu.ro
|
||||
'''
|
||||
|
||||
from calibre.web.feeds.news import BasicNewsRecipe
|
||||
|
||||
class TabuRo(BasicNewsRecipe):
|
||||
title = u'Tabu'
|
||||
__author__ = u'Silviu Cotoar\u0103'
|
||||
description = 'Cel mai curajos site de femei'
|
||||
publisher = 'Tabu'
|
||||
oldest_article = 5
|
||||
language = 'ro'
|
||||
max_articles_per_feed = 100
|
||||
no_stylesheets = True
|
||||
use_embedded_content = False
|
||||
category = 'Ziare,Reviste,Femei'
|
||||
encoding = 'utf-8'
|
||||
cover_url = 'http://www.tabu.ro/img/tabu-logo2.png'
|
||||
|
||||
conversion_options = {
|
||||
'comments' : description
|
||||
,'tags' : category
|
||||
,'language' : language
|
||||
,'publisher' : publisher
|
||||
}
|
||||
|
||||
keep_only_tags = [
|
||||
dict(name='div', attrs={'id':'Article'}),
|
||||
]
|
||||
|
||||
remove_tags = [
|
||||
dict(name='div', attrs={'id':['advertisementArticle']}),
|
||||
dict(name='div', attrs={'class':'voting_number'}),
|
||||
dict(name='div', attrs={'id':'number_votes'}),
|
||||
dict(name='div', attrs={'id':'rating_one'}),
|
||||
dict(name='div', attrs={'class':'float: right;'})
|
||||
]
|
||||
|
||||
remove_tags_after = [
|
||||
dict(name='div', attrs={'id':'comments'}),
|
||||
]
|
||||
|
||||
feeds = [
|
||||
(u'Feeds', u'http://www.tabu.ro/rss_all.xml')
|
||||
]
|
||||
|
||||
def preprocess_html(self, soup):
|
||||
return self.adeify_images(soup)
|
17
resources/recipes/thai_post_daily.recipe
Normal file
@ -0,0 +1,17 @@
|
||||
from calibre.web.feeds.news import BasicNewsRecipe
|
||||
|
||||
class AdvancedUserRecipe1299054026(BasicNewsRecipe):
|
||||
title = u'Thai Post Daily'
|
||||
__author__ = 'Chotechai P.'
|
||||
oldest_article = 7
|
||||
max_articles_per_feed = 100
|
||||
cover_url = 'http://upload.wikimedia.org/wikipedia/th/1/10/ThaiPost_Logo.png'
|
||||
feeds = [(u'\u0e02\u0e48\u0e32\u0e27\u0e2b\u0e19\u0e49\u0e32\u0e2b\u0e19\u0e36\u0e48\u0e07', u'http://thaipost.net/taxonomy/term/1/all/feed'), (u'\u0e1a\u0e17\u0e1a\u0e23\u0e23\u0e13\u0e32\u0e18\u0e34\u0e01\u0e32\u0e23', u'http://thaipost.net/taxonomy/term/11/all/feed'), (u'\u0e40\u0e1b\u0e25\u0e27 \u0e2a\u0e35\u0e40\u0e07\u0e34\u0e19', u'http://thaipost.net/taxonomy/term/2/all/feed'), (u'\u0e2a\u0e20\u0e32\u0e1b\u0e23\u0e30\u0e0a\u0e32\u0e0a\u0e19', u'http://thaipost.net/taxonomy/term/3/all/feed'), (u'\u0e16\u0e39\u0e01\u0e17\u0e38\u0e01\u0e02\u0e49\u0e2d', u'http://thaipost.net/taxonomy/term/4/all/feed'), (u'\u0e01\u0e32\u0e23\u0e40\u0e21\u0e37\u0e2d\u0e07', u'http://thaipost.net/taxonomy/term/5/all/feed'), (u'\u0e17\u0e48\u0e32\u0e19\u0e02\u0e38\u0e19\u0e19\u0e49\u0e2d\u0e22', u'http://thaipost.net/taxonomy/term/12/all/feed'), (u'\u0e1a\u0e17\u0e04\u0e27\u0e32\u0e21\u0e1e\u0e34\u0e40\u0e28\u0e29', u'http://thaipost.net/taxonomy/term/66/all/feed'), (u'\u0e23\u0e32\u0e22\u0e07\u0e32\u0e19\u0e1e\u0e34\u0e40\u0e28\u0e29', u'http://thaipost.net/taxonomy/term/67/all/feed'), (u'\u0e1a\u0e31\u0e19\u0e17\u0e36\u0e01\u0e2b\u0e19\u0e49\u0e32 4', u'http://thaipost.net/taxonomy/term/13/all/feed'), (u'\u0e40\u0e2a\u0e35\u0e22\u0e1a\u0e0b\u0e36\u0e48\u0e07\u0e2b\u0e19\u0e49\u0e32', u'http://thaipost.net/taxonomy/term/64/all/feed'), (u'\u0e04\u0e31\u0e19\u0e1b\u0e32\u0e01\u0e2d\u0e22\u0e32\u0e01\u0e40\u0e25\u0e48\u0e32', u'http://thaipost.net/taxonomy/term/65/all/feed'), (u'\u0e40\u0e28\u0e23\u0e29\u0e10\u0e01\u0e34\u0e08', u'http://thaipost.net/taxonomy/term/6/all/feed'), (u'\u0e01\u0e23\u0e30\u0e08\u0e01\u0e44\u0e23\u0e49\u0e40\u0e07\u0e32', u'http://thaipost.net/taxonomy/term/14/all/feed'), (u'\u0e01\u0e23\u0e30\u0e08\u0e01\u0e2b\u0e31\u0e01\u0e21\u0e38\u0e21', u'http://thaipost.net/taxonomy/term/71/all/feed'), (u'\u0e04\u0e34\u0e14\u0e40\u0e2b\u0e19\u0e37\u0e2d\u0e01\u0e23\u0e30\u0e41\u0e2a', u'http://thaipost.net/taxonomy/term/69/all/feed'), (u'\u0e23\u0e32\u0e22\u0e07\u0e32\u0e19', u'http://thaipost.net/taxonomy/term/68/all/feed'), (u'\u0e2d\u0e34\u0e42\u0e04\u0e42\u0e1f\u0e01\u0e31\u0e2a', u'http://thaipost.net/taxonomy/term/10/all/feed'), (u'\u0e01\u0e32\u0e23\u0e28\u0e36\u0e01\u0e29\u0e32-\u0e2a\u0e32\u0e18\u0e32\u0e23\u0e13\u0e2a\u0e38\u0e02', u'http://thaipost.net/taxonomy/term/7/all/feed'), (u'\u0e15\u0e48\u0e32\u0e07\u0e1b\u0e23\u0e30\u0e40\u0e17\u0e28', u'http://thaipost.net/taxonomy/term/8/all/feed'), (u'\u0e01\u0e35\u0e2c\u0e32', u'http://thaipost.net/taxonomy/term/9/all/feed')]
|
||||
|
||||
def print_version(self, url):
|
||||
return url.replace(url, 'http://www.thaipost.net/print/' + url [32:])
|
||||
|
||||
remove_tags = []
|
||||
remove_tags.append(dict(name = 'div', attrs = {'class' : 'print-logo'}))
|
||||
remove_tags.append(dict(name = 'div', attrs = {'class' : 'print-site_name'}))
|
||||
remove_tags.append(dict(name = 'div', attrs = {'class' : 'print-breadcrumb'}))
|
56
resources/recipes/unica.recipe
Normal file
@ -0,0 +1,56 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
#!/usr/bin/env python
|
||||
|
||||
__license__ = 'GPL v3'
|
||||
__copyright__ = u'2011, Silviu Cotoar\u0103'
|
||||
'''
|
||||
unica.ro
|
||||
'''
|
||||
|
||||
from calibre.web.feeds.news import BasicNewsRecipe
|
||||
|
||||
class Unica(BasicNewsRecipe):
|
||||
title = u'Unica'
|
||||
__author__ = u'Silviu Cotoar\u0103'
|
||||
description = 'Asa cum esti tu'
|
||||
publisher = 'Unica'
|
||||
oldest_article = 5
|
||||
language = 'ro'
|
||||
max_articles_per_feed = 100
|
||||
no_stylesheets = True
|
||||
use_embedded_content = False
|
||||
category = 'Ziare,Reviste,Femei'
|
||||
encoding = 'utf-8'
|
||||
cover_url = 'http://www.unica.ro/fileadmin/images/logo.gif'
|
||||
|
||||
conversion_options = {
|
||||
'comments' : description
|
||||
,'tags' : category
|
||||
,'language' : language
|
||||
,'publisher' : publisher
|
||||
}
|
||||
|
||||
keep_only_tags = [
|
||||
dict(name='div', attrs={'id':'sticky'})
|
||||
, dict(name='p', attrs={'class':'bodytext'})
|
||||
|
||||
]
|
||||
|
||||
remove_tags = [
|
||||
dict(name='div', attrs={'class':['top-links']})
|
||||
, dict(name='div', attrs={'id':['autor_name']})
|
||||
, dict(name='div', attrs={'class':['box-r']})
|
||||
, dict(name='div', attrs={'class':['category']})
|
||||
, dict(name='div', attrs={'class':['data']})
|
||||
]
|
||||
|
||||
remove_tags_after = [
|
||||
dict(name='ul', attrs={'class':'pager'})
|
||||
]
|
||||
|
||||
feeds = [
|
||||
(u'Feeds', u'http://www.unica.ro/rss.html')
|
||||
]
|
||||
|
||||
def preprocess_html(self, soup):
|
||||
return self.adeify_images(soup)
|
@ -2,9 +2,9 @@
|
||||
# -*- coding: utf-8 mode: python -*-
|
||||
|
||||
__license__ = 'GPL v3'
|
||||
__copyright__ = '2010, Steffen Siebert <calibre at steffensiebert.de>'
|
||||
__copyright__ = '2010-2011, Steffen Siebert <calibre at steffensiebert.de>'
|
||||
__docformat__ = 'restructuredtext de'
|
||||
__version__ = '1.1'
|
||||
__version__ = '1.2'
|
||||
|
||||
"""
|
||||
Die Zeit EPUB
|
||||
@ -13,21 +13,43 @@ Die Zeit EPUB
|
||||
import os, urllib2, zipfile, re
|
||||
from calibre.web.feeds.news import BasicNewsRecipe
|
||||
from calibre.ptempfile import PersistentTemporaryFile
|
||||
from calibre import walk
|
||||
|
||||
class ZeitEPUBAbo(BasicNewsRecipe):
|
||||
|
||||
title = u'Zeit Online Premium'
|
||||
title = u'Die Zeit'
|
||||
description = u'Das EPUB Abo der Zeit (needs subscription)'
|
||||
language = 'de'
|
||||
lang = 'de-DE'
|
||||
|
||||
__author__ = 'Steffen Siebert'
|
||||
__author__ = 'Steffen Siebert and Tobias Isenberg'
|
||||
needs_subscription = True
|
||||
|
||||
conversion_options = {
|
||||
'no_default_epub_cover' : True
|
||||
'no_default_epub_cover' : True,
|
||||
# fixing the wrong left margin
|
||||
'mobi_ignore_margins' : True,
|
||||
}
|
||||
|
||||
preprocess_regexps = [
|
||||
# filtering for correct dashes
|
||||
(re.compile(r' - '), lambda match: ' – '), # regular "Gedankenstrich"
|
||||
(re.compile(r' -,'), lambda match: ' –,'), # "Gedankenstrich" before a comma
|
||||
(re.compile(r'(?<=\d)-(?=\d)'), lambda match: '–'), # number-number
|
||||
# filtering for unicode characters that are missing on the Kindle,
|
||||
# try to replace them with meaningful work-arounds
|
||||
(re.compile(u'\u2080'), lambda match: '<span style="font-size: 50%;">0</span>'), # subscript-0
|
||||
(re.compile(u'\u2081'), lambda match: '<span style="font-size: 50%;">1</span>'), # subscript-1
|
||||
(re.compile(u'\u2082'), lambda match: '<span style="font-size: 50%;">2</span>'), # subscript-2
|
||||
(re.compile(u'\u2083'), lambda match: '<span style="font-size: 50%;">3</span>'), # subscript-3
|
||||
(re.compile(u'\u2084'), lambda match: '<span style="font-size: 50%;">4</span>'), # subscript-4
|
||||
(re.compile(u'\u2085'), lambda match: '<span style="font-size: 50%;">5</span>'), # subscript-5
|
||||
(re.compile(u'\u2086'), lambda match: '<span style="font-size: 50%;">6</span>'), # subscript-6
|
||||
(re.compile(u'\u2087'), lambda match: '<span style="font-size: 50%;">7</span>'), # subscript-7
|
||||
(re.compile(u'\u2088'), lambda match: '<span style="font-size: 50%;">8</span>'), # subscript-8
|
||||
(re.compile(u'\u2089'), lambda match: '<span style="font-size: 50%;">9</span>'), # subscript-9
|
||||
]
|
||||
|
||||
def build_index(self):
|
||||
domain = "http://premium.zeit.de"
|
||||
url = domain + "/abovorteile/cgi-bin/_er_member/p4z.fpl?ER_Do=getUserData&ER_NextTemplate=login_ok"
|
||||
@ -55,9 +77,36 @@ class ZeitEPUBAbo(BasicNewsRecipe):
|
||||
zfile.extractall(self.output_dir)
|
||||
|
||||
tmp.close()
|
||||
|
||||
index = os.path.join(self.output_dir, 'content.opf')
|
||||
|
||||
self.report_progress(1,_('epub downloaded and extracted'))
|
||||
|
||||
# doing regular expression filtering
|
||||
for path in walk('.'):
|
||||
(shortname, extension) = os.path.splitext(path)
|
||||
if extension.lower() in ('.html', '.htm', '.xhtml'):
|
||||
with open(path, 'r+b') as f:
|
||||
raw = f.read()
|
||||
raw = raw.decode('utf-8')
|
||||
for pat, func in self.preprocess_regexps:
|
||||
raw = pat.sub(func, raw)
|
||||
f.seek(0)
|
||||
f.truncate()
|
||||
f.write(raw.encode('utf-8'))
|
||||
|
||||
# adding real cover
|
||||
self.report_progress(0,_('trying to download cover image (titlepage)'))
|
||||
self.download_cover()
|
||||
self.conversion_options["cover"] = self.cover_path
|
||||
|
||||
return index
|
||||
|
||||
# getting url of the cover
|
||||
def get_cover_url(self):
|
||||
try:
|
||||
inhalt = self.index_to_soup('http://www.zeit.de/inhalt')
|
||||
cover_url = inhalt.find('div', attrs={'class':'singlearchive clearfix'}).img['src'].replace('icon_','')
|
||||
except:
|
||||
cover_url = 'http://images.zeit.de/bilder/titelseiten_zeit/1946/001_001.jpg'
|
||||
return cover_url
|
||||
|