Merge from trunk

This commit is contained in:
Charles Haley 2013-03-31 10:30:57 +02:00
commit ccd8d30109
245 changed files with 76406 additions and 46125 deletions

View File

@ -38,6 +38,8 @@ calibre_plugins/
recipes/.git
recipes/.gitignore
recipes/README.md
recipes/icon_checker.py
recipes/readme_updater.py
recipes/katalog_egazeciarz.recipe
recipes/tv_axnscifi.recipe
recipes/tv_comedycentral.recipe
@ -60,6 +62,7 @@ recipes/tv_tvpkultura.recipe
recipes/tv_tvppolonia.recipe
recipes/tv_tvpuls.recipe
recipes/tv_viasathistory.recipe
recipes/icons/katalog_egazeciarz.png
recipes/icons/tv_axnscifi.png
recipes/icons/tv_comedycentral.png
recipes/icons/tv_discoveryscience.png

View File

@ -1,3 +1,4 @@
# vim:fileencoding=UTF-8:ts=2:sw=2:sta:et:sts=2:ai
# Each release can have new features and bug fixes. Each of which
# must have a title and can optionally have linked tickets and a description.
# In addition they can have a type field which defaults to minor, but should be major
@ -19,6 +20,105 @@
# new recipes:
# - title:
- version: 0.9.25
date: 2013-03-29
new features:
- title: "Automatic adding: When checking for duplicates is enabled, use the same duplicates found dialog as is used during manual adding."
tickets: [1160914]
- title: "ToC Editor: Allow searching to find a location quickly when browsing through the book to select a location for a ToC item"
- title: "ToC Editor: Add a button to quickly flatten the entire table of contents"
- title: "Conversion: When converting a single book to EPUB or AZW3, add an option to automatically launch the Table of Contents editor after the conversion completes. Found under the Table of Contents section of the conversion dialog."
bug fixes:
- title: "calibredb: Nicer error messages when user provides invalid input"
tickets: [1160452,1160631]
- title: "News download: Always use the .jpg extension for jpeg images as apparently Moon+ Reader cannot handle .jpeg"
- title: "Fix Book Details popup keyboard navigation doesn't work on a Mac"
tickets: [1159610]
- title: "Fix a regression that caused the case of the book files to not be changed when changing the case of the title/author on case insensitive filesystems"
improved recipes:
- RTE news
- Various Polish news sources
- Psychology Today
- Foreign Affairs
- History Today
- Harpers Magazine (printed edition)
- Business Week Magazine
- The Hindu
- Irish Times
- Le Devoir
new recipes:
- title: Fortune Magazine
author: Rick Shang
- title: Eclipse Online
author: Jim DeVona
- version: 0.9.24
date: 2013-03-22
new features:
- title: "ToC Editor: Allow auto-generation of Table of Contents entries from headings and/or links in the book"
- title: "EPUB/MOBI Catalogs: Allow saving used settings as presets which can be loaded easily later."
tickets: [1155587]
- title: "Indicate which columns are custom columns when selecting columns in the Preferences"
tickets: [1158066]
- title: "News download: Add an option recipe authors can set to have calibre automatically reduce the size of downloaded images by lowering their quality"
bug fixes:
- title: "News download: Fix a regression in 0.9.23 that prevented oldest_article from working with some RSS feeds."
- title: "Conversion: handle the :before and :after pseudo CSS selectors correctly"
- title: "AZW3 Output: Handle the case of the <guide> reference to a ToC containing an anchor correctly."
tickets: [1158413]
- title: "BiBTeX catalogs: Fix ISBN not being output and the library_name field causing catalog generation to fail"
tickets: [1156432, 1158127]
- title: "Conversion: Add support for CSS stylesheets that wrap their rules inside a @media rule."
tickets: [1157345]
- title: "Cover browser: Fix scrolling not working for books after the 32678'th book in a large library."
tickets: [1153204]
- title: "Linux: Update bundled libmtp version"
- title: "Clear the Book details panel when the current search returns no matches."
tickets: [1153026]
- title: "Fix a regression that broke creation of advanced column coloring rules"
tickets: [1156291]
- title: "Amazon metadata download: Handle cover images loaded via javascript on the amazon.de site"
- title: "Nicer error message when exporting a generated csv catalog to a file open in another program on windows."
tickets: [1155539]
- title: "Fix ebook-convert -h showing ANSI escape codes in the windows command prompt"
tickets: [1158499]
improved recipes:
- Various Polish news sources
- kath.net
- Il Giornale
- Kellog Insight
new recipes:
- title:
- version: 0.9.23
date: 2013-03-15

View File

@ -434,6 +434,18 @@ a number of older formats either do not support a metadata based Table of Conten
documents do not have one. In these cases, the options in this section can help you automatically
generate a Table of Contents in the converted ebook, based on the actual content in the input document.
.. note:: Using these options can be a little challenging to get exactly right.
If you prefer creating/editing the Table of Contents by hand, convert to
the EPUB or AZW3 formats and select the checkbox at the bottom of the
screen that says
:guilabel:`Manually fine-tune the Table of Contents after conversion`.
This will launch the ToC Editor tool after the conversion. It allows you to
create entries in the Table of Contents by simply clicking the place in the
book where you want the entry to point. You can also use the ToC Editor by
itself, without doing a conversion. Go to :guilabel:`Preferences->Toolbars`
and add the ToC Editor to the main toolbar. Then just select the book you
want to edit and click the ToC Editor button.
The first option is :guilabel:`Force use of auto-generated Table of Contents`. By checking this option
you can have |app| override any Table of Contents found in the metadata of the input document with the
auto generated one.
@ -456,7 +468,7 @@ For example, to remove all entries titles "Next" or "Previous" use::
Next|Previous
Finally, the :guilabel:`Level 1,2,3 TOC` options allow you to create a sophisticated multi-level Table of Contents.
The :guilabel:`Level 1,2,3 TOC` options allow you to create a sophisticated multi-level Table of Contents.
They are XPath expressions that match tags in the intermediate XHTML produced by the conversion pipeline. See the
:ref:`conversion-introduction` for how to get access to this XHTML. Also read the :ref:`xpath-tutorial`, to learn
how to construct XPath expressions. Next to each option is a button that launches a wizard to help with the creation

View File

@ -87,7 +87,9 @@ this bug.
How do I convert a collection of HTML files in a specific order?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In order to convert a collection of HTML files in a specific oder, you have to create a table of contents file. That is, another HTML file that contains links to all the other files in the desired order. Such a file looks like::
In order to convert a collection of HTML files in a specific oder, you have to
create a table of contents file. That is, another HTML file that contains links
to all the other files in the desired order. Such a file looks like::
<html>
<body>
@ -102,19 +104,36 @@ In order to convert a collection of HTML files in a specific oder, you have to c
</body>
</html>
Then just add this HTML file to the GUI and use the convert button to create your ebook.
Then, just add this HTML file to the GUI and use the convert button to create
your ebook. You can use the option in the Table of Contents section in the
conversion dialog to control how the Table of Contents is generated.
.. note:: By default, when adding HTML files, |app| follows links in the files in *depth first* order. This means that if file A.html links to B.html and C.html and D.html, but B.html also links to D.html, then the files will be in the order A.html, B.html, D.html, C.html. If instead you want the order to be A.html, B.html, C.html, D.html then you must tell |app| to add your files in *breadth first* order. Do this by going to Preferences->Plugins and customizing the HTML to ZIP plugin.
.. note:: By default, when adding HTML files, |app| follows links in the files
in *depth first* order. This means that if file A.html links to B.html and
C.html and D.html, but B.html also links to D.html, then the files will be
in the order A.html, B.html, D.html, C.html. If instead you want the order
to be A.html, B.html, C.html, D.html then you must tell |app| to add your
files in *breadth first* order. Do this by going to Preferences->Plugins
and customizing the HTML to ZIP plugin.
The EPUB I produced with |app| is not valid?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|app| does not guarantee that an EPUB produced by it is valid. The only guarantee it makes is that if you feed it valid XHTML 1.1 + CSS 2.1 it will output a valid EPUB. |app| is designed for ebook consumers, not producers. It tries hard to ensure that EPUBs it produces actually work as intended on a wide variety of devices, a goal that is incompatible with producing valid EPUBs, and one that is far more important to the vast majority of its users. If you need a tool that always produces valid EPUBs, |app| is not for you.
|app| does not guarantee that an EPUB produced by it is valid. The only
guarantee it makes is that if you feed it valid XHTML 1.1 + CSS 2.1 it will
output a valid EPUB. |app| is designed for ebook consumers, not producers. It
tries hard to ensure that EPUBs it produces actually work as intended on a wide
variety of devices, a goal that is incompatible with producing valid EPUBs, and
one that is far more important to the vast majority of its users. If you need a
tool that always produces valid EPUBs, |app| is not for you.
How do I use some of the advanced features of the conversion tools?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
You can get help on any individual feature of the converters by mousing over it in the GUI or running ``ebook-convert dummy.html .epub -h`` at a terminal. A good place to start is to look at the following demo files that demonstrate some of the advanced features:
* `html-demo.zip <http://calibre-ebook.com/downloads/html-demo.zip>`_
You can get help on any individual feature of the converters by mousing over
it in the GUI or running ``ebook-convert dummy.html .epub -h`` at a terminal.
A good place to start is to look at the following demo file that demonstrates
some of the advanced features
`html-demo.zip <http://calibre-ebook.com/downloads/html-demo.zip>`_
Device Integration
@ -126,11 +145,11 @@ Device Integration
What devices does |app| support?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|app| can directly connect to all the major (and most of the minor) ebook reading devices,
smarthphones, tablets, etc.
In addition, using the :guilabel:`Connect to folder` function you can use it with any ebook reader that exports itself as a USB disk.
You can even connect to Apple devices (via iTunes), using the :guilabel:`Connect to iTunes`
function.
|app| can directly connect to all the major (and most of the minor) ebook
reading devices, smarthphones, tablets, etc. In addition, using the
:guilabel:`Connect to folder` function you can use it with any ebook reader
that exports itself as a USB disk. You can even connect to Apple devices (via
iTunes), using the :guilabel:`Connect to iTunes` function.
.. _devsupport:

View File

@ -10,46 +10,35 @@ class Adventure_zone(BasicNewsRecipe):
oldest_article = 20
max_articles_per_feed = 100
cover_url = 'http://www.adventure-zone.info/inne/logoaz_2012.png'
index='http://www.adventure-zone.info/fusion/'
index = 'http://www.adventure-zone.info/fusion/'
use_embedded_content = False
preprocess_regexps = [(re.compile(r"<td class='capmain'>Komentarze</td>", re.IGNORECASE), lambda m: ''),
(re.compile(r'</?table.*?>'), lambda match: ''),
(re.compile(r'</?tbody.*?>'), lambda match: '')]
remove_tags_before= dict(name='td', attrs={'class':'main-bg'})
remove_tags= [dict(name='img', attrs={'alt':'Drukuj'})]
remove_tags_after= dict(id='comments')
extra_css = '.main-bg{text-align: left;} td.capmain{ font-size: 22px; }'
remove_tags_before = dict(name='td', attrs={'class':'main-bg'})
remove_tags = [dict(name='img', attrs={'alt':'Drukuj'})]
remove_tags_after = dict(id='comments')
extra_css = '.main-bg{text-align: left;} td.capmain{ font-size: 22px; } img.news-category {float: left; margin-right: 5px;}'
feeds = [(u'Nowinki', u'http://www.adventure-zone.info/fusion/feeds/news.php')]
'''def parse_feeds (self):
feeds = BasicNewsRecipe.parse_feeds(self)
soup=self.index_to_soup(u'http://www.adventure-zone.info/fusion/feeds/news.php')
tag=soup.find(name='channel')
titles=[]
for r in tag.findAll(name='image'):
r.extract()
art=tag.findAll(name='item')
for i in art:
titles.append(i.title.string)
for feed in feeds:
for article in feed.articles[:]:
article.title=titles[feed.articles.index(article)]
return feeds'''
'''def get_cover_url(self):
soup = self.index_to_soup('http://www.adventure-zone.info/fusion/news.php')
cover=soup.find(id='box_OstatninumerAZ')
self.cover_url='http://www.adventure-zone.info/fusion/'+ cover.center.a.img['src']
return getattr(self, 'cover_url', self.cover_url)'''
def populate_article_metadata(self, article, soup, first):
result = re.search('(.+) - Adventure Zone', soup.title.string)
if result:
article.title = result.group(1)
result = result.group(1)
else:
result = soup.body.find('strong')
if result:
article.title = result.string
result = result.string
if result:
result = result.replace('&amp;', '&')
result = result.replace('&#39;', '')
article.title = result
def skip_ad_pages(self, soup):
skip_tag = soup.body.find(name='td', attrs={'class':'main-bg'})
@ -77,5 +66,4 @@ class Adventure_zone(BasicNewsRecipe):
if a.has_key('href') and 'http://' not in a['href'] and 'https://' not in a['href']:
a['href']=self.index + a['href']
return soup

View File

@ -0,0 +1,54 @@
from __future__ import unicode_literals
__license__ = 'WTFPL'
__author__ = '2013, François D. <franek at chicour.net>'
__description__ = 'Get some fresh news from Arrêt sur images'
from calibre.web.feeds.recipes import BasicNewsRecipe
class Asi(BasicNewsRecipe):
title = 'Arrêt sur images'
__author__ = 'François D. (aka franek)'
description = 'Global news in french from news site "Arrêt sur images"'
oldest_article = 7.0
language = 'fr'
needs_subscription = True
max_articles_per_feed = 100
simultaneous_downloads = 1
timefmt = '[%a, %d %b %Y %I:%M +0200]'
cover_url = 'http://www.arretsurimages.net/images/header/menu/menu_1.png'
use_embedded_content = False
no_stylesheets = True
remove_javascript = True
feeds = [
('vite dit et gratuit', 'http://www.arretsurimages.net/vite-dit.rss'),
('Toutes les chroniques', 'http://www.arretsurimages.net/chroniques.rss'),
('Contenus et dossiers', 'http://www.arretsurimages.net/dossiers.rss'),
]
conversion_options = { 'smarten_punctuation' : True }
remove_tags = [dict(id='vite-titre'), dict(id='header'), dict(id='wrap-connexion'), dict(id='col_right'), dict(name='div', attrs={'class':'bloc-chroniqueur-2'}), dict(id='footercontainer')]
def print_version(self, url):
return url.replace('contenu.php', 'contenu-imprimable.php')
def get_browser(self):
# Need to use robust HTML parser
br = BasicNewsRecipe.get_browser(self, use_robust_parser=True)
if self.username is not None and self.password is not None:
br.open('http://www.arretsurimages.net/index.php')
br.select_form(nr=0)
br.form.set_all_readonly(False)
br['redir'] = 'forum/login.php'
br['username'] = self.username
br['password'] = self.password
br.submit()
return br

View File

@ -2,12 +2,12 @@
from calibre.web.feeds.news import BasicNewsRecipe
class Astroflesz(BasicNewsRecipe):
title = u'Astroflesz'
title = u'Astroflesz'
oldest_article = 7
__author__ = 'fenuks'
description = u'astroflesz.pl - to portal poświęcony astronomii. Informuje zarówno o aktualnych wydarzeniach i odkryciach naukowych, jak również zapowiada ciekawe zjawiska astronomiczne'
category = 'astronomy'
language = 'pl'
__author__ = 'fenuks'
description = u'astroflesz.pl - to portal poświęcony astronomii. Informuje zarówno o aktualnych wydarzeniach i odkryciach naukowych, jak również zapowiada ciekawe zjawiska astronomiczne'
category = 'astronomy'
language = 'pl'
cover_url = 'http://www.astroflesz.pl/templates/astroflesz/images/logo/logo.png'
ignore_duplicate_articles = {'title', 'url'}
max_articles_per_feed = 100
@ -17,4 +17,11 @@ class Astroflesz(BasicNewsRecipe):
keep_only_tags = [dict(id="k2Container")]
remove_tags_after = dict(name='div', attrs={'class':'itemLinks'})
remove_tags = [dict(name='div', attrs={'class':['itemLinks', 'itemToolbar', 'itemRatingBlock']})]
feeds = [(u'Wszystkie', u'http://astroflesz.pl/?format=feed')]
feeds = [(u'Wszystkie', u'http://astroflesz.pl/?format=feed')]
def postprocess_html(self, soup, first_fetch):
t = soup.find(attrs={'class':'itemIntroText'})
if t:
for i in t.findAll('img'):
i['style'] = 'float: left; margin-right: 5px;'
return soup

View File

@ -1,17 +1,20 @@
from calibre.web.feeds.news import BasicNewsRecipe
import re
class BadaniaNet(BasicNewsRecipe):
title = u'badania.net'
title = u'badania.net'
__author__ = 'fenuks'
description = u'chcesz wiedzieć więcej?'
category = 'science'
language = 'pl'
description = u'chcesz wiedzieć więcej?'
category = 'science'
language = 'pl'
cover_url = 'http://badania.net/wp-content/badanianet_green_transparent.png'
extra_css = '.alignleft {float:left; margin-right:5px;} .alignright {float:right; margin-left:5px;}'
oldest_article = 7
max_articles_per_feed = 100
no_stylesheets = True
preprocess_regexps = [(re.compile(r"<h4>Tekst sponsoruje</h4>", re.IGNORECASE), lambda m: ''),]
remove_empty_feeds = True
use_embedded_content = False
remove_tags = [dict(attrs={'class':['omc-flex-category', 'omc-comment-count', 'omc-single-tags']})]
remove_tags_after = dict(attrs={'class':'omc-single-tags'})
keep_only_tags = [dict(id='omc-full-article')]
feeds = [(u'Psychologia', u'http://badania.net/category/psychologia/feed/'), (u'Technologie', u'http://badania.net/category/technologie/feed/'), (u'Biologia', u'http://badania.net/category/biologia/feed/'), (u'Chemia', u'http://badania.net/category/chemia/feed/'), (u'Zdrowie', u'http://badania.net/category/zdrowie/'), (u'Seks', u'http://badania.net/category/psychologia-ewolucyjna-tematyka-seks/feed/')]
feeds = [(u'Psychologia', u'http://badania.net/category/psychologia/feed/'), (u'Technologie', u'http://badania.net/category/technologie/feed/'), (u'Biologia', u'http://badania.net/category/biologia/feed/'), (u'Chemia', u'http://badania.net/category/chemia/feed/'), (u'Zdrowie', u'http://badania.net/category/zdrowie/'), (u'Seks', u'http://badania.net/category/psychologia-ewolucyjna-tematyka-seks/feed/')]

View File

@ -1,5 +1,7 @@
from calibre.web.feeds.news import BasicNewsRecipe
import re
from calibre.ebooks.BeautifulSoup import Comment
class BenchmarkPl(BasicNewsRecipe):
title = u'Benchmark.pl'
__author__ = 'fenuks'
@ -13,10 +15,10 @@ class BenchmarkPl(BasicNewsRecipe):
no_stylesheets = True
remove_attributes = ['style']
preprocess_regexps = [(re.compile(ur'<h3><span style="font-size: small;">&nbsp;Zobacz poprzednie <a href="http://www.benchmark.pl/news/zestawienie/grupa_id/135">Opinie dnia:</a></span>.*</body>', re.DOTALL|re.IGNORECASE), lambda match: '</body>'), (re.compile(ur'Więcej o .*?</ul>', re.DOTALL|re.IGNORECASE), lambda match: '')]
keep_only_tags=[dict(name='div', attrs={'class':['m_zwykly', 'gallery']}), dict(id='article')]
remove_tags_after=dict(name='div', attrs={'class':'body'})
remove_tags=[dict(name='div', attrs={'class':['kategoria', 'socialize', 'thumb', 'panelOcenaObserwowane', 'categoryNextToSocializeGallery', 'breadcrumb', 'footer', 'moreTopics']}), dict(name='table', attrs={'background':'http://www.benchmark.pl/uploads/backend_img/a/fotki_newsy/opinie_dnia/bg.png'}), dict(name='table', attrs={'width':'210', 'cellspacing':'1', 'cellpadding':'4', 'border':'0', 'align':'right'})]
INDEX= 'http://www.benchmark.pl'
keep_only_tags = [dict(name='div', attrs={'class':['m_zwykly', 'gallery']}), dict(id='article')]
remove_tags_after = dict(id='article')
remove_tags = [dict(name='div', attrs={'class':['comments', 'body', 'kategoria', 'socialize', 'thumb', 'panelOcenaObserwowane', 'categoryNextToSocializeGallery', 'breadcrumb', 'footer', 'moreTopics']}), dict(name='table', attrs = {'background':'http://www.benchmark.pl/uploads/backend_img/a/fotki_newsy/opinie_dnia/bg.png'}), dict(name='table', attrs={'width':'210', 'cellspacing':'1', 'cellpadding':'4', 'border':'0', 'align':'right'})]
INDEX = 'http://www.benchmark.pl'
feeds = [(u'Aktualności', u'http://www.benchmark.pl/rss/aktualnosci-pliki.xml'),
(u'Testy i recenzje', u'http://www.benchmark.pl/rss/testy-recenzje-minirecenzje.xml')]
@ -27,7 +29,12 @@ class BenchmarkPl(BasicNewsRecipe):
soup2 = self.index_to_soup(nexturl['href'])
nexturl = soup2.find(attrs={'class':'next'})
pagetext = soup2.find(name='div', attrs={'class':'body'})
appendtag.find('div', attrs={'class':'k_ster'}).extract()
tag = appendtag.find('div', attrs={'class':'k_ster'})
if tag:
tag.extract()
comments = pagetext.findAll(text=lambda text:isinstance(text, Comment))
for comment in comments:
comment.extract()
pos = len(appendtag.contents)
appendtag.insert(pos, pagetext)
if appendtag.find('div', attrs={'class':'k_ster'}):
@ -37,40 +44,44 @@ class BenchmarkPl(BasicNewsRecipe):
def image_article(self, soup, appendtag):
nexturl=soup.find('div', attrs={'class':'preview'})
if nexturl is not None:
nexturl=nexturl.find('a', attrs={'class':'move_next'})
image=appendtag.find('div', attrs={'class':'preview'}).div['style'][16:]
image=self.INDEX + image[:image.find("')")]
nexturl = soup.find('div', attrs={'class':'preview'})
if nexturl:
nexturl = nexturl.find('a', attrs={'class':'move_next'})
image = appendtag.find('div', attrs={'class':'preview'}).div['style'][16:]
image = self.INDEX + image[:image.find("')")]
appendtag.find(attrs={'class':'preview'}).name='img'
appendtag.find(attrs={'class':'preview'})['src']=image
appendtag.find('a', attrs={'class':'move_next'}).extract()
while nexturl is not None:
nexturl= self.INDEX + nexturl['href']
while nexturl:
nexturl = self.INDEX + nexturl['href']
soup2 = self.index_to_soup(nexturl)
nexturl=soup2.find('a', attrs={'class':'move_next'})
image=soup2.find('div', attrs={'class':'preview'}).div['style'][16:]
image=self.INDEX + image[:image.find("')")]
nexturl = soup2.find('a', attrs={'class':'move_next'})
image = soup2.find('div', attrs={'class':'preview'}).div['style'][16:]
image = self.INDEX + image[:image.find("')")]
soup2.find(attrs={'class':'preview'}).name='img'
soup2.find(attrs={'class':'preview'})['src']=image
pagetext=soup2.find('div', attrs={'class':'gallery'})
pagetext = soup2.find('div', attrs={'class':'gallery'})
pagetext.find('div', attrs={'class':'title'}).extract()
pagetext.find('div', attrs={'class':'thumb'}).extract()
pagetext.find('div', attrs={'class':'panelOcenaObserwowane'}).extract()
if nexturl is not None:
if nexturl:
pagetext.find('a', attrs={'class':'move_next'}).extract()
pagetext.find('a', attrs={'class':'move_back'}).extract()
comments = pagetext.findAll(text=lambda text:isinstance(text, Comment))
for comment in comments:
comment.extract()
pos = len(appendtag.contents)
appendtag.insert(pos, pagetext)
def preprocess_html(self, soup):
if soup.find('div', attrs={'class':'preview'}) is not None:
if soup.find('div', attrs={'class':'preview'}):
self.image_article(soup, soup.body)
else:
self.append_page(soup, soup.body)
for a in soup('a'):
if a.has_key('href') and 'http://' not in a['href'] and 'https://' not in a['href']:
a['href']=self.INDEX + a['href']
if a.has_key('href') and not a['href'].startswith('http'):
a['href'] = self.INDEX + a['href']
for r in soup.findAll(attrs={'class':['comments', 'body']}):
r.extract()
return soup

View File

@ -14,7 +14,7 @@ from calibre.web.feeds.news import BasicNewsRecipe
class biweekly(BasicNewsRecipe):
__author__ = u'Łukasz Grąbczewski'
title = 'Biweekly'
language = 'en'
language = 'en_PL'
publisher = 'National Audiovisual Institute'
publication_type = 'magazine'
description = u'link with culture [English edition of Polish magazine]: literature, theatre, film, art, music, views, talks'

View File

@ -0,0 +1,30 @@
__license__ = 'GPL v3'
from calibre.web.feeds.news import BasicNewsRecipe
class BlogBiszopa(BasicNewsRecipe):
title = u'Blog Biszopa'
__author__ = 'fenuks'
description = u'Zapiski z Granitowego Miasta'
category = 'history'
#publication_type = ''
language = 'pl'
#encoding = ''
#extra_css = ''
cover_url = 'http://blogbiszopa.pl/wp-content/themes/biszop/images/logo.png'
masthead_url = ''
use_embedded_content = False
oldest_article = 7
max_articles_per_feed = 100
no_stylesheets = True
remove_empty_feeds = True
remove_javascript = True
remove_attributes = ['style', 'font']
ignore_duplicate_articles = {'title', 'url'}
keep_only_tags = [dict(id='main-content')]
remove_tags = [dict(name='footer')]
#remove_tags_after = {}
#remove_tags_before = {}
feeds = [(u'Artyku\u0142y', u'http://blogbiszopa.pl/feed/')]

View File

@ -11,8 +11,8 @@ class BusinessWeekMagazine(BasicNewsRecipe):
category = 'news'
encoding = 'UTF-8'
keep_only_tags = [
dict(name='div', attrs={'id':'article_body_container'}),
]
dict(name='div', attrs={'id':'article_body_container'}),
]
remove_tags = [dict(name='ui'),dict(name='li'),dict(name='div', attrs={'id':['share-email']})]
no_javascript = True
no_stylesheets = True
@ -25,6 +25,7 @@ class BusinessWeekMagazine(BasicNewsRecipe):
#Find date
mag=soup.find('h2',text='Magazine')
self.log(mag)
dates=self.tag_to_string(mag.findNext('h3'))
self.timefmt = u' [%s]'%dates
@ -32,7 +33,7 @@ class BusinessWeekMagazine(BasicNewsRecipe):
div0 = soup.find ('div', attrs={'class':'column left'})
section_title = ''
feeds = OrderedDict()
for div in div0.findAll('h4'):
for div in div0.findAll(['h4','h5']):
articles = []
section_title = self.tag_to_string(div.findPrevious('h3')).strip()
title=self.tag_to_string(div.a).strip()
@ -48,7 +49,7 @@ class BusinessWeekMagazine(BasicNewsRecipe):
feeds[section_title] += articles
div1 = soup.find ('div', attrs={'class':'column center'})
section_title = ''
for div in div1.findAll('h5'):
for div in div1.findAll(['h4','h5']):
articles = []
desc=self.tag_to_string(div.findNext('p')).strip()
section_title = self.tag_to_string(div.findPrevious('h3')).strip()

View File

@ -1,5 +1,6 @@
from calibre.web.feeds.news import BasicNewsRecipe
import re
class Ciekawostki_Historyczne(BasicNewsRecipe):
title = u'Ciekawostki Historyczne'
oldest_article = 7
@ -7,42 +8,31 @@ class Ciekawostki_Historyczne(BasicNewsRecipe):
description = u'Serwis popularnonaukowy - odkrycia, kontrowersje, historia, ciekawostki, badania, ciekawostki z przeszłości.'
category = 'history'
language = 'pl'
masthead_url= 'http://ciekawostkihistoryczne.pl/wp-content/themes/Wordpress_Magazine/images/logo-ciekawostki-historyczne-male.jpg'
cover_url='http://ciekawostkihistoryczne.pl/wp-content/themes/Wordpress_Magazine/images/logo-ciekawostki-historyczne-male.jpg'
masthead_url = 'http://ciekawostkihistoryczne.pl/wp-content/themes/Wordpress_Magazine/images/logo-ciekawostki-historyczne-male.jpg'
cover_url = 'http://ciekawostkihistoryczne.pl/wp-content/themes/Wordpress_Magazine/images/logo-ciekawostki-historyczne-male.jpg'
max_articles_per_feed = 100
extra_css = 'img.alignleft {float:left; margin-right:5px;} .alignright {float:right; margin-left:5px;}'
oldest_article = 12
preprocess_regexps = [(re.compile(ur'Ten artykuł ma kilka stron.*?</fb:like>', re.DOTALL), lambda match: ''), (re.compile(ur'<h2>Zobacz też:</h2>.*?</ol>', re.DOTALL), lambda match: '')]
no_stylesheets=True
remove_empty_feeds=True
keep_only_tags=[dict(name='div', attrs={'class':'post'})]
remove_tags=[dict(id='singlepostinfo')]
no_stylesheets = True
remove_empty_feeds = True
keep_only_tags = [dict(name='div', attrs={'class':'post'})]
recursions = 5
remove_tags = [dict(id='singlepostinfo')]
feeds = [(u'Staro\u017cytno\u015b\u0107', u'http://ciekawostkihistoryczne.pl/tag/starozytnosc/feed/'), (u'\u015aredniowiecze', u'http://ciekawostkihistoryczne.pl/tag/sredniowiecze/feed/'), (u'Nowo\u017cytno\u015b\u0107', u'http://ciekawostkihistoryczne.pl/tag/nowozytnosc/feed/'), (u'XIX wiek', u'http://ciekawostkihistoryczne.pl/tag/xix-wiek/feed/'), (u'1914-1939', u'http://ciekawostkihistoryczne.pl/tag/1914-1939/feed/'), (u'1939-1945', u'http://ciekawostkihistoryczne.pl/tag/1939-1945/feed/'), (u'Powojnie (od 1945)', u'http://ciekawostkihistoryczne.pl/tag/powojnie/feed/'), (u'Recenzje', u'http://ciekawostkihistoryczne.pl/category/recenzje/feed/')]
def append_page(self, soup, appendtag):
tag=soup.find(name='h7')
if tag:
if tag.br:
pass
elif tag.nextSibling.name=='p':
tag=tag.nextSibling
nexturl = tag.findAll('a')
for nextpage in nexturl:
tag.extract()
nextpage= nextpage['href']
soup2 = self.index_to_soup(nextpage)
pagetext = soup2.find(name='div', attrs={'class':'post'})
for r in pagetext.findAll('div', attrs={'id':'singlepostinfo'}):
r.extract()
for r in pagetext.findAll('div', attrs={'class':'wp-caption alignright'}):
r.extract()
for r in pagetext.findAll('h1'):
r.extract()
pagetext.find('h6').nextSibling.extract()
pagetext.find('h7').nextSibling.extract()
pos = len(appendtag.contents)
appendtag.insert(pos, pagetext)
def is_link_wanted(self, url, tag):
return 'ciekawostkihistoryczne' in url and url[-2] in {'2', '3', '4', '5', '6'}
def preprocess_html(self, soup):
self.append_page(soup, soup.body)
def postprocess_html(self, soup, first_fetch):
tag = soup.find('h7')
if tag:
tag.nextSibling.extract()
if not first_fetch:
for r in soup.findAll(['h1']):
r.extract()
soup.find('h6').nextSibling.extract()
return soup

View File

@ -1,5 +1,5 @@
# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai
import re
from calibre.web.feeds.news import BasicNewsRecipe
class Computerworld_pl(BasicNewsRecipe):
title = u'Computerworld.pl'
@ -12,8 +12,16 @@ class Computerworld_pl(BasicNewsRecipe):
no_stylesheets = True
oldest_article = 7
max_articles_per_feed = 100
keep_only_tags = [dict(attrs={'class':['tyt_news', 'prawo', 'autor', 'tresc']})]
remove_tags_after = dict(name='div', attrs={'class':'rMobi'})
remove_tags = [dict(name='div', attrs={'class':['nnav', 'rMobi']}), dict(name='table', attrs={'class':'ramka_slx'})]
remove_attributes = ['style',]
preprocess_regexps = [(re.compile(u'Zobacz również:', re.IGNORECASE), lambda m: ''), (re.compile(ur'[*]+reklama[*]+', re.IGNORECASE), lambda m: ''),]
keep_only_tags = [dict(id=['szpaltaL', 's2011'])]
remove_tags_after = dict(name='div', attrs={'class':'tresc'})
remove_tags = [dict(attrs={'class':['nnav', 'rMobi', 'tagi', 'rec']}),]
feeds = [(u'Wiadomo\u015bci', u'http://rssout.idg.pl/cw/news_iso.xml')]
def skip_ad_pages(self, soup):
if soup.title.string.lower() == 'advertisement':
tag = soup.find(name='a')
if tag:
new_soup = self.index_to_soup(tag['href'], raw=True)
return new_soup

View File

@ -1,5 +1,6 @@
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup
from calibre.ebooks.BeautifulSoup import BeautifulSoup, Comment
class CoNowegoPl(BasicNewsRecipe):
title = u'conowego.pl'
__author__ = 'fenuks'
@ -10,6 +11,7 @@ class CoNowegoPl(BasicNewsRecipe):
oldest_article = 7
max_articles_per_feed = 100
INDEX = 'http://www.conowego.pl/'
extra_css = '.news-single-img {float:left; margin-right:5px;}'
no_stylesheets = True
remove_empty_feeds = True
use_embedded_content = False
@ -35,6 +37,9 @@ class CoNowegoPl(BasicNewsRecipe):
pos = len(appendtag.contents)
appendtag.insert(pos, pagetext)
comments = appendtag.findAll(text=lambda text:isinstance(text, Comment))
for comment in comments:
comment.extract()
for r in appendtag.findAll(attrs={'class':['pages', 'paginationWrap']}):
r.extract()

View File

@ -12,11 +12,13 @@ class CzasGentlemanow(BasicNewsRecipe):
ignore_duplicate_articles = {'title', 'url'}
oldest_article = 7
max_articles_per_feed = 100
extra_css = '.gallery-item {float:left; margin-right: 10px; max-width: 20%;} .alignright {text-align: right; float:right; margin-left:5px;}\
.wp-caption-text {text-align: left;} img.aligncenter {display: block; margin-left: auto; margin-right: auto;} .alignleft {float: left; margin-right:5px;}'
no_stylesheets = True
remove_empty_feeds = True
preprocess_regexps = [(re.compile(u'<h3>Może Cię też zainteresować:</h3>'), lambda m: '')]
use_embedded_content = False
keep_only_tags = [dict(name='div', attrs={'class':'content'})]
remove_tags = [dict(attrs={'class':'meta_comments'}), dict(id=['comments', 'related_posts_thumbnails'])]
remove_tags = [dict(attrs={'class':'meta_comments'}), dict(id=['comments', 'related_posts_thumbnails', 'respond'])]
remove_tags_after = dict(id='comments')
feeds = [(u'M\u0119ski \u015awiat', u'http://czasgentlemanow.pl/category/meski-swiat/feed/'), (u'Styl', u'http://czasgentlemanow.pl/category/styl/feed/'), (u'Vademecum Gentlemana', u'http://czasgentlemanow.pl/category/vademecum/feed/'), (u'Dom i rodzina', u'http://czasgentlemanow.pl/category/dom-i-rodzina/feed/'), (u'Honor', u'http://czasgentlemanow.pl/category/honor/feed/'), (u'Gad\u017cety Gentlemana', u'http://czasgentlemanow.pl/category/gadzety-gentlemana/feed/')]

View File

@ -16,6 +16,7 @@ class Dobreprogramy_pl(BasicNewsRecipe):
extra_css = '.title {font-size:22px;}'
oldest_article = 8
max_articles_per_feed = 100
remove_attrs = ['style', 'width', 'height']
preprocess_regexps = [(re.compile(ur'<div id="\S+360pmp4">Twoja przeglądarka nie obsługuje Flasha i HTML5 lub wyłączono obsługę JavaScript...</div>'), lambda match: '') ]
keep_only_tags=[dict(attrs={'class':['news', 'entry single']})]
remove_tags = [dict(attrs={'class':['newsOptions', 'noPrint', 'komentarze', 'tags font-heading-master']}), dict(id='komentarze'), dict(name='iframe')]
@ -28,4 +29,11 @@ class Dobreprogramy_pl(BasicNewsRecipe):
for a in soup('a'):
if a.has_key('href') and 'http://' not in a['href'] and 'https://' not in a['href']:
a['href']=self.index + a['href']
for r in soup.findAll('iframe'):
r.parent.extract()
return soup
def postprocess_html(self, soup, first_fetch):
for r in soup.findAll('span', text=''):
if not r.string:
r.extract()
return soup

View File

@ -8,6 +8,7 @@ class BasicUserRecipe1337668045(BasicNewsRecipe):
cover_url = 'http://drytooling.com.pl/images/drytooling-kindle.png'
description = u'Drytooling.com.pl jest serwisem wspinaczki zimowej, alpinizmu i himalaizmu. Jeśli uwielbiasz zimę, nie możesz doczekać się aż wyciągniesz szpej z szafki i uderzysz w Tatry, Alpy, czy może Himalaje, to znajdziesz tutaj naprawdę dużo interesujących Cię treści! Zapraszamy!'
__author__ = u'Damian Granowski'
language = 'pl'
oldest_article = 100
max_articles_per_feed = 20
auto_cleanup = True

View File

@ -1,4 +1,5 @@
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import Comment
class Dzieje(BasicNewsRecipe):
title = u'dzieje.pl'
@ -8,11 +9,12 @@ class Dzieje(BasicNewsRecipe):
category = 'history'
language = 'pl'
ignore_duplicate_articles = {'title', 'url'}
extra_css = '.imagecache-default {float:left; margin-right:20px;}'
index = 'http://dzieje.pl'
oldest_article = 8
max_articles_per_feed = 100
remove_javascript=True
no_stylesheets= True
remove_javascript = True
no_stylesheets = True
keep_only_tags = [dict(name='h1', attrs={'class':'title'}), dict(id='content-area')]
remove_tags = [dict(attrs={'class':'field field-type-computed field-field-tagi'}), dict(id='dogory')]
#feeds = [(u'Dzieje', u'http://dzieje.pl/rss.xml')]
@ -28,16 +30,19 @@ class Dzieje(BasicNewsRecipe):
pagetext = soup2.find(id='content-area').find(attrs={'class':'content'})
for r in pagetext.findAll(attrs={'class':['fieldgroup group-groupkul', 'fieldgroup group-zdjeciekult', 'fieldgroup group-zdjecieciekaw', 'fieldgroup group-zdjecieksiazka', 'fieldgroup group-zdjeciedu', 'field field-type-filefield field-field-zdjecieglownawyd']}):
r.extract()
pos = len(appendtag.contents)
appendtag.insert(pos, pagetext)
comments = pagetext.findAll(text=lambda text:isinstance(text, Comment))
# appendtag.insert(pos, pagetext)
tag = soup2.find('li', attrs={'class':'pager-next'})
for r in appendtag.findAll(attrs={'class':['item-list', 'field field-type-computed field-field-tagi', ]}):
r.extract()
comments = appendtag.findAll(text=lambda text:isinstance(text, Comment))
for comment in comments:
comment.extract()
def find_articles(self, url):
articles = []
soup=self.index_to_soup(url)
tag=soup.find(id='content-area').div.div
soup = self.index_to_soup(url)
tag = soup.find(id='content-area').div.div
for i in tag.findAll('div', recursive=False):
temp = i.find(attrs={'class':'views-field-title'}).span.a
title = temp.string
@ -64,7 +69,7 @@ class Dzieje(BasicNewsRecipe):
def preprocess_html(self, soup):
for a in soup('a'):
if a.has_key('href') and 'http://' not in a['href'] and 'https://' not in a['href']:
a['href']=self.index + a['href']
if a.has_key('href') and not a['href'].startswith('http'):
a['href'] = self.index + a['href']
self.append_page(soup, soup.body)
return soup

View File

@ -2,6 +2,8 @@
from calibre.web.feeds.news import BasicNewsRecipe
import re
from calibre.ebooks.BeautifulSoup import Comment
class Dziennik_pl(BasicNewsRecipe):
title = u'Dziennik.pl'
__author__ = 'fenuks'
@ -9,17 +11,17 @@ class Dziennik_pl(BasicNewsRecipe):
category = 'newspaper'
language = 'pl'
masthead_url= 'http://5.s.dziennik.pl/images/logos.png'
cover_url= 'http://5.s.dziennik.pl/images/logos.png'
cover_url = 'http://5.s.dziennik.pl/images/logos.png'
no_stylesheets = True
oldest_article = 7
max_articles_per_feed = 100
remove_javascript=True
remove_empty_feeds=True
remove_javascript = True
remove_empty_feeds = True
ignore_duplicate_articles = {'title', 'url'}
extra_css= 'ul {list-style: none; padding: 0; margin: 0;} li {float: left;margin: 0 0.15em;}'
extra_css = 'ul {list-style: none; padding: 0; margin: 0;} li {float: left;margin: 0 0.15em;}'
preprocess_regexps = [(re.compile("Komentarze:"), lambda m: ''), (re.compile('<p><strong><a href=".*?">&gt;&gt;&gt; CZYTAJ TAKŻE: ".*?"</a></strong></p>'), lambda m: '')]
keep_only_tags=[dict(id='article')]
remove_tags=[dict(name='div', attrs={'class':['art_box_dodatki', 'new_facebook_icons2', 'leftArt', 'article_print', 'quiz-widget', 'belka-spol', 'belka-spol belka-spol-bottom', 'art_data_tags', 'cl_right', 'boxRounded gal_inside']}), dict(name='a', attrs={'class':['komentarz', 'article_icon_addcommnent']})]
keep_only_tags = [dict(id='article')]
remove_tags = [dict(name='div', attrs={'class':['art_box_dodatki', 'new_facebook_icons2', 'leftArt', 'article_print', 'quiz-widget', 'belka-spol', 'belka-spol belka-spol-bottom', 'art_data_tags', 'cl_right', 'boxRounded gal_inside']}), dict(name='a', attrs={'class':['komentarz', 'article_icon_addcommnent']})]
feeds = [(u'Wszystko', u'http://rss.dziennik.pl/Dziennik-PL/'),
(u'Wiadomości', u'http://rss.dziennik.pl/Dziennik-Wiadomosci'),
(u'Gospodarka', u'http://rss.dziennik.pl/Dziennik-Gospodarka'),
@ -34,26 +36,29 @@ class Dziennik_pl(BasicNewsRecipe):
(u'Nieruchomości', u'http://rss.dziennik.pl/Dziennik-Nieruchomosci')]
def skip_ad_pages(self, soup):
tag=soup.find(name='a', attrs={'title':'CZYTAJ DALEJ'})
tag = soup.find(name='a', attrs={'title':'CZYTAJ DALEJ'})
if tag:
new_soup=self.index_to_soup(tag['href'], raw=True)
new_soup = self.index_to_soup(tag['href'], raw=True)
return new_soup
def append_page(self, soup, appendtag):
tag=soup.find('a', attrs={'class':'page_next'})
tag = soup.find('a', attrs={'class':'page_next'})
if tag:
appendtag.find('div', attrs={'class':'article_paginator'}).extract()
while tag:
soup2= self.index_to_soup(tag['href'])
tag=soup2.find('a', attrs={'class':'page_next'})
soup2 = self.index_to_soup(tag['href'])
tag = soup2.find('a', attrs={'class':'page_next'})
if not tag:
for r in appendtag.findAll('div', attrs={'class':'art_src'}):
r.extract()
pagetext = soup2.find(name='div', attrs={'class':'article_body'})
for dictionary in self.remove_tags:
v=pagetext.findAll(name=dictionary['name'], attrs=dictionary['attrs'])
v = pagetext.findAll(name=dictionary['name'], attrs=dictionary['attrs'])
for delete in v:
delete.extract()
comments = pagetext.findAll(text=lambda text:isinstance(text, Comment))
for comment in comments:
comment.extract()
pos = len(appendtag.contents)
appendtag.insert(pos, pagetext)
if appendtag.find('div', attrs={'class':'article_paginator'}):

View File

@ -1,5 +1,7 @@
import re
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import Comment
class DziennikWschodni(BasicNewsRecipe):
title = u'Dziennik Wschodni'
__author__ = 'fenuks'
@ -72,6 +74,10 @@ class DziennikWschodni(BasicNewsRecipe):
if pagetext:
pos = len(appendtag.contents)
appendtag.insert(pos, pagetext)
comments = appendtag.findAll(text=lambda text:isinstance(text, Comment))
for comment in comments:
comment.extract()
def preprocess_html(self, soup):
self.append_page(soup, soup.body)

View File

@ -1,5 +1,6 @@
import re
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import Comment
class EchoDnia(BasicNewsRecipe):
title = u'Echo Dnia'
@ -68,6 +69,10 @@ class EchoDnia(BasicNewsRecipe):
if pagetext:
pos = len(appendtag.contents)
appendtag.insert(pos, pagetext)
comments = appendtag.findAll(text=lambda text:isinstance(text, Comment))
for comment in comments:
comment.extract()
def preprocess_html(self, soup):
self.append_page(soup, soup.body)

View File

@ -0,0 +1,38 @@
from calibre.web.feeds.news import BasicNewsRecipe
class EclipseOnline(BasicNewsRecipe):
#
# oldest_article specifies the maximum age, in days, of posts to retrieve.
# The default of 32 is intended to work well with a "days of month = 1"
# recipe schedule to download "monthly issues" of Eclipse Online.
# Increase this value to include additional posts. However, the RSS feed
# currently only includes the 10 most recent posts, so that's the max.
#
oldest_article = 32
title = u'Eclipse Online'
description = u'"Where strange and wonderful things happen, where reality is eclipsed for a little while with something magical and new." Eclipse Online is edited by Jonathan Strahan and published online by Night Shade Books. http://www.nightshadebooks.com/category/eclipse/'
publication_type = 'magazine'
language = 'en'
__author__ = u'Jim DeVona'
__version__ = '1.0'
# For now, use this Eclipse Online logo as the ebook cover image.
# (Disable the cover_url line to let Calibre generate a default cover, including date.)
cover_url = 'http://www.nightshadebooks.com/wp-content/uploads/2012/10/Eclipse-Logo.jpg'
# Extract the "post" div containing the story (minus redundant metadata) from each page.
keep_only_tags = [dict(name='div', attrs={'class':lambda x: x and 'post' in x})]
remove_tags = [dict(name='span', attrs={'class': ['post-author', 'post-category', 'small']})]
# Nice plain markup (like Eclipse's) works best for most e-readers.
# Disregard any special styling rules, but center illustrations.
auto_cleanup = False
no_stylesheets = True
remove_attributes = ['style', 'align']
extra_css = '.wp-caption {text-align: center;} .wp-caption-text {font-size: small; font-style: italic;}'
# Tell Calibre where to look for article links. It will proceed to retrieve
# these posts and format them into an ebook according to the above rules.
feeds = ['http://www.nightshadebooks.com/category/eclipse/feed/']

View File

@ -9,7 +9,7 @@ class EkologiaPl(BasicNewsRecipe):
language = 'pl'
cover_url = 'http://www.ekologia.pl/assets/images/logo/ekologia_pl_223x69.png'
ignore_duplicate_articles = {'title', 'url'}
extra_css = '.title {font-size: 200%;}'
extra_css = '.title {font-size: 200%;} .imagePowiazane, .imgCon {float:left; margin-right:5px;}'
oldest_article = 7
max_articles_per_feed = 100
no_stylesheets = True

View File

@ -3,85 +3,153 @@
__license__ = 'GPL v3'
__copyright__ = '2010, matek09, matek09@gmail.com'
from calibre.web.feeds.news import BasicNewsRecipe
import re
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup, Comment
class Esensja(BasicNewsRecipe):
title = u'Esensja'
__author__ = 'matek09'
description = 'Monthly magazine'
encoding = 'utf-8'
no_stylesheets = True
language = 'pl'
remove_javascript = True
HREF = '0'
title = u'Esensja'
__author__ = 'matek09 & fenuks'
description = 'Magazyn kultury popularnej'
encoding = 'utf-8'
no_stylesheets = True
language = 'pl'
remove_javascript = True
masthead_url = 'http://esensja.pl/img/wrss.gif'
oldest_article = 1
URL = 'http://esensja.pl'
HREF = '0'
remove_attributes = ['style', 'bgcolor', 'alt', 'color']
keep_only_tags = [dict(attrs={'class':'sekcja'}), ]
#keep_only_tags.append(dict(name = 'div', attrs = {'class' : 'article'})
#remove_tags_before = dict(dict(name = 'div', attrs = {'class' : 't-title'}))
remove_tags_after = dict(id='tekst')
#keep_only_tags =[]
#keep_only_tags.append(dict(name = 'div', attrs = {'class' : 'article'})
remove_tags_before = dict(dict(name = 'div', attrs = {'class' : 't-title'}))
remove_tags_after = dict(dict(name = 'img', attrs = {'src' : '../../../2000/01/img/tab_bot.gif'}))
remove_tags = [dict(name = 'img', attrs = {'src' : ['../../../2000/01/img/tab_top.gif', '../../../2000/01/img/tab_bot.gif']}),
dict(name = 'div', attrs = {'class' : 't-title2 nextpage'}),
#dict(attrs={'rel':'lightbox[galeria]'})
dict(attrs={'class':['tekst_koniec', 'ref', 'wykop']}),
dict(attrs={'itemprop':['copyrightHolder', 'publisher']}),
dict(id='komentarze')
]
remove_tags =[]
remove_tags.append(dict(name = 'img', attrs = {'src' : '../../../2000/01/img/tab_top.gif'}))
remove_tags.append(dict(name = 'img', attrs = {'src' : '../../../2000/01/img/tab_bot.gif'}))
remove_tags.append(dict(name = 'div', attrs = {'class' : 't-title2 nextpage'}))
extra_css = '''
.t-title {font-size: x-large; font-weight: bold; text-align: left}
.t-author {font-size: x-small; text-align: left}
.t-title2 {font-size: x-small; font-style: italic; text-align: left}
.text {font-size: small; text-align: left}
.annot-ref {font-style: italic; text-align: left}
'''
extra_css = '''
.t-title {font-size: x-large; font-weight: bold; text-align: left}
.t-author {font-size: x-small; text-align: left}
.t-title2 {font-size: x-small; font-style: italic; text-align: left}
.text {font-size: small; text-align: left}
.annot-ref {font-style: italic; text-align: left}
'''
preprocess_regexps = [(re.compile(r'alt="[^"]*"'), lambda match: ''),
(re.compile(ur'(title|alt)="[^"]*?"', re.DOTALL), lambda match: ''),
]
preprocess_regexps = [(re.compile(r'alt="[^"]*"'),
lambda match: '')]
def parse_index(self):
soup = self.index_to_soup('http://www.esensja.pl/magazyn/')
a = soup.find('a', attrs={'href' : re.compile('.*/index.html')})
year = a['href'].split('/')[0]
month = a['href'].split('/')[1]
self.HREF = 'http://www.esensja.pl/magazyn/' + year + '/' + month + '/iso/'
soup = self.index_to_soup(self.HREF + '01.html')
self.cover_url = 'http://www.esensja.pl/magazyn/' + year + '/' + month + '/img/ilustr/cover_b.jpg'
feeds = []
chapter = ''
subchapter = ''
articles = []
intro = soup.find('div', attrs={'class' : 'n-title'})
'''
introduction = {'title' : self.tag_to_string(intro.a),
'url' : self.HREF + intro.a['href'],
'date' : '',
'description' : ''}
chapter = 'Wprowadzenie'
articles.append(introduction)
'''
for tag in intro.findAllNext(attrs={'class': ['chapter', 'subchapter', 'n-title']}):
if tag.name in 'td':
if len(articles) > 0:
section = chapter
if len(subchapter) > 0:
section += ' - ' + subchapter
feeds.append((section, articles))
articles = []
if tag['class'] == 'chapter':
chapter = self.tag_to_string(tag).capitalize()
subchapter = ''
else:
subchapter = self.tag_to_string(tag)
subchapter = self.tag_to_string(tag)
continue
finalurl = tag.a['href']
if not finalurl.startswith('http'):
finalurl = self.HREF + finalurl
articles.append({'title' : self.tag_to_string(tag.a), 'url' : finalurl, 'date' : '', 'description' : ''})
a = self.index_to_soup(finalurl)
i = 1
while True:
div = a.find('div', attrs={'class' : 't-title2 nextpage'})
if div is not None:
link = div.a['href']
if not link.startswith('http'):
link = self.HREF + link
a = self.index_to_soup(link)
articles.append({'title' : self.tag_to_string(tag.a) + ' c. d. ' + str(i), 'url' : link, 'date' : '', 'description' : ''})
i = i + 1
else:
break
def parse_index(self):
soup = self.index_to_soup('http://www.esensja.pl/magazyn/')
a = soup.find('a', attrs={'href' : re.compile('.*/index.html')})
year = a['href'].split('/')[0]
month = a['href'].split('/')[1]
self.HREF = 'http://www.esensja.pl/magazyn/' + year + '/' + month + '/iso/'
soup = self.index_to_soup(self.HREF + '01.html')
self.cover_url = 'http://www.esensja.pl/magazyn/' + year + '/' + month + '/img/ilustr/cover_b.jpg'
feeds = []
intro = soup.find('div', attrs={'class' : 'n-title'})
introduction = {'title' : self.tag_to_string(intro.a),
'url' : self.HREF + intro.a['href'],
'date' : '',
'description' : ''}
chapter = 'Wprowadzenie'
subchapter = ''
articles = []
articles.append(introduction)
for tag in intro.findAllNext(attrs={'class': ['chapter', 'subchapter', 'n-title']}):
if tag.name in 'td':
if len(articles) > 0:
section = chapter
if len(subchapter) > 0:
section += ' - ' + subchapter
feeds.append((section, articles))
articles = []
if tag['class'] == 'chapter':
chapter = self.tag_to_string(tag).capitalize()
subchapter = ''
else:
subchapter = self.tag_to_string(tag)
subchapter = self.tag_to_string(tag)
continue
articles.append({'title' : self.tag_to_string(tag.a), 'url' : self.HREF + tag.a['href'], 'date' : '', 'description' : ''})
return feeds
a = self.index_to_soup(self.HREF + tag.a['href'])
i = 1
while True:
div = a.find('div', attrs={'class' : 't-title2 nextpage'})
if div is not None:
a = self.index_to_soup(self.HREF + div.a['href'])
articles.append({'title' : self.tag_to_string(tag.a) + ' c. d. ' + str(i), 'url' : self.HREF + div.a['href'], 'date' : '', 'description' : ''})
i = i + 1
else:
break
def append_page(self, soup, appendtag):
r = appendtag.find(attrs={'class':'wiecej_xxx'})
if r:
nr = r.findAll(attrs={'class':'tn-link'})[-1]
try:
nr = int(nr.a.string)
except:
return
baseurl = soup.find(attrs={'property':'og:url'})['content'] + '&strona={0}'
for number in range(2, nr+1):
soup2 = self.index_to_soup(baseurl.format(number))
pagetext = soup2.find(attrs={'class':'tresc'})
pos = len(appendtag.contents)
appendtag.insert(pos, pagetext)
for r in appendtag.findAll(attrs={'class':['wiecej_xxx', 'tekst_koniec']}):
r.extract()
for r in appendtag.findAll('script'):
r.extract()
comments = appendtag.findAll(text=lambda text:isinstance(text, Comment))
for comment in comments:
comment.extract()
def preprocess_html(self, soup):
self.append_page(soup, soup.body)
for tag in soup.findAll(attrs={'class':'img_box_right'}):
temp = tag.find('img')
src = ''
if temp:
src = temp.get('src', '')
for r in tag.findAll('a', recursive=False):
r.extract()
info = tag.find(attrs={'class':'img_info'})
text = str(tag)
if not src:
src = re.search('src="[^"]*?"', text)
if src:
src = src.group(0)
src = src[5:].replace('//', '/')
if src:
tag.contents = []
tag.insert(0, BeautifulSoup('<img src="{0}{1}" />'.format(self.URL, src)))
if info:
tag.insert(len(tag.contents), info)
return soup
return feeds

View File

@ -0,0 +1,109 @@
__license__ = 'GPL v3'
import re
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup, Comment
class EsensjaRSS(BasicNewsRecipe):
title = u'Esensja (RSS)'
__author__ = 'fenuks'
description = u'Magazyn kultury popularnej'
category = 'reading, fantasy, reviews, boardgames, culture'
#publication_type = ''
language = 'pl'
encoding = 'utf-8'
INDEX = 'http://www.esensja.pl'
extra_css = '''.t-title {font-size: x-large; font-weight: bold; text-align: left}
.t-author {font-size: x-small; text-align: left}
.t-title2 {font-size: x-small; font-style: italic; text-align: left}
.text {font-size: small; text-align: left}
.annot-ref {font-style: italic; text-align: left}
'''
cover_url = ''
masthead_url = 'http://esensja.pl/img/wrss.gif'
use_embedded_content = False
oldest_article = 7
max_articles_per_feed = 100
no_stylesheets = True
remove_empty_feeds = True
remove_javascript = True
ignore_duplicate_articles = {'title', 'url'}
preprocess_regexps = [(re.compile(r'alt="[^"]*"'), lambda match: ''),
(re.compile(ur'(title|alt)="[^"]*?"', re.DOTALL), lambda match: ''),
]
remove_attributes = ['style', 'bgcolor', 'alt', 'color']
keep_only_tags = [dict(attrs={'class':'sekcja'}), ]
remove_tags_after = dict(id='tekst')
remove_tags = [dict(name = 'img', attrs = {'src' : ['../../../2000/01/img/tab_top.gif', '../../../2000/01/img/tab_bot.gif']}),
dict(name = 'div', attrs = {'class' : 't-title2 nextpage'}),
#dict(attrs={'rel':'lightbox[galeria]'})
dict(attrs={'class':['tekst_koniec', 'ref', 'wykop']}),
dict(attrs={'itemprop':['copyrightHolder', 'publisher']}),
dict(id='komentarze')
]
feeds = [(u'Książka', u'http://esensja.pl/rss/ksiazka.rss'),
(u'Film', u'http://esensja.pl/rss/film.rss'),
(u'Komiks', u'http://esensja.pl/rss/komiks.rss'),
(u'Gry', u'http://esensja.pl/rss/gry.rss'),
(u'Muzyka', u'http://esensja.pl/rss/muzyka.rss'),
(u'Twórczość', u'http://esensja.pl/rss/tworczosc.rss'),
(u'Varia', u'http://esensja.pl/rss/varia.rss'),
(u'Zgryźliwi Tetrycy', u'http://esensja.pl/rss/tetrycy.rss'),
(u'Nowe książki', u'http://esensja.pl/rss/xnowosci.rss'),
(u'Ostatnio dodane książki', u'http://esensja.pl/rss/xdodane.rss'),
]
def get_cover_url(self):
soup = self.index_to_soup(self.INDEX)
cover = soup.find(id='panel_1')
self.cover_url = self.INDEX + cover.find('a')['href'].replace('index.html', '') + 'img/ilustr/cover_b.jpg'
return getattr(self, 'cover_url', self.cover_url)
def append_page(self, soup, appendtag):
r = appendtag.find(attrs={'class':'wiecej_xxx'})
if r:
nr = r.findAll(attrs={'class':'tn-link'})[-1]
try:
nr = int(nr.a.string)
except:
return
baseurl = soup.find(attrs={'property':'og:url'})['content'] + '&strona={0}'
for number in range(2, nr+1):
soup2 = self.index_to_soup(baseurl.format(number))
pagetext = soup2.find(attrs={'class':'tresc'})
pos = len(appendtag.contents)
appendtag.insert(pos, pagetext)
for r in appendtag.findAll(attrs={'class':['wiecej_xxx', 'tekst_koniec']}):
r.extract()
for r in appendtag.findAll('script'):
r.extract()
comments = appendtag.findAll(text=lambda text:isinstance(text, Comment))
for comment in comments:
comment.extract()
def preprocess_html(self, soup):
self.append_page(soup, soup.body)
for tag in soup.findAll(attrs={'class':'img_box_right'}):
temp = tag.find('img')
src = ''
if temp:
src = temp.get('src', '')
for r in tag.findAll('a', recursive=False):
r.extract()
info = tag.find(attrs={'class':'img_info'})
text = str(tag)
if not src:
src = re.search('src="[^"]*?"', text)
if src:
src = src.group(0)
src = src[5:].replace('//', '/')
if src:
tag.contents = []
tag.insert(0, BeautifulSoup('<img src="{0}{1}" />'.format(self.INDEX, src)))
if info:
tag.insert(len(tag.contents), info)
return soup

View File

@ -1,19 +1,54 @@
# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:fdm=marker:ai
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import Comment
import re
class FilmOrgPl(BasicNewsRecipe):
title = u'Film.org.pl'
__author__ = 'fenuks'
description = u"Recenzje, analizy, artykuły, rankingi - wszystko o filmie dla miłośników kina. Opisy efektów specjalnych, wersji reżyserskich, remake'ów, sequeli. No i forum filmowe. Jedne z największych w Polsce."
category = 'film'
language = 'pl'
title = u'Film.org.pl'
__author__ = 'fenuks'
description = u"Recenzje, analizy, artykuły, rankingi - wszystko o filmie dla miłośników kina. Opisy efektów specjalnych, wersji reżyserskich, remake'ów, sequeli. No i forum filmowe. Jedne z największych w Polsce."
category = 'film'
language = 'pl'
extra_css = '.alignright {float:right; margin-left:5px;} .alignleft {float:left; margin-right:5px;} .recenzja-title {font-size: 150%; margin-top: 5px; margin-bottom: 5px;}'
cover_url = 'http://film.org.pl/wp-content/themes/KMF/images/logo_kmf10.png'
ignore_duplicate_articles = {'title', 'url'}
oldest_article = 7
max_articles_per_feed = 100
no_stylesheets = True
remove_javascript = True
remove_empty_feeds = True
use_embedded_content = True
preprocess_regexps = [(re.compile(ur'<h3>Przeczytaj także:</h3>.*', re.IGNORECASE|re.DOTALL), lambda m: '</body>'), (re.compile(ur'<div>Artykuł</div>', re.IGNORECASE), lambda m: ''), (re.compile(ur'<div>Ludzie filmu</div>', re.IGNORECASE), lambda m: '')]
remove_tags = [dict(name='img', attrs={'alt':['Ludzie filmu', u'Artykuł']})]
feeds = [(u'Recenzje', u'http://film.org.pl/r/recenzje/feed/'), (u'Artyku\u0142', u'http://film.org.pl/a/artykul/feed/'), (u'Analiza', u'http://film.org.pl/a/analiza/feed/'), (u'Ranking', u'http://film.org.pl/a/ranking/feed/'), (u'Blog', u'http://film.org.pl/kmf/blog/feed/'), (u'Ludzie', u'http://film.org.pl/a/ludzie/feed/'), (u'Seriale', u'http://film.org.pl/a/seriale/feed/'), (u'Oceanarium', u'http://film.org.pl/a/ocenarium/feed/'), (u'VHS', u'http://film.org.pl/a/vhs-a/feed/')]
use_embedded_content = False
remove_attributes = ['style']
preprocess_regexps = [(re.compile(ur'<h3>Przeczytaj także:</h3>.*', re.IGNORECASE|re.DOTALL), lambda m: '</body>'), (re.compile(ur'</?center>', re.IGNORECASE|re.DOTALL), lambda m: ''), (re.compile(ur'<div>Artykuł</div>', re.IGNORECASE), lambda m: ''), (re.compile(ur'<div>Ludzie filmu</div>', re.IGNORECASE), lambda m: ''), (re.compile(ur'(<br ?/?>\s*?){2,}', re.IGNORECASE|re.DOTALL), lambda m: '')]
keep_only_tags = [dict(name=['h11', 'h16', 'h17']), dict(attrs={'class':'editor'})]
remove_tags_after = dict(id='comments')
remove_tags = [dict(name=['link', 'meta', 'style']), dict(name='img', attrs={'alt':['Ludzie filmu', u'Artykuł']}), dict(id='comments'), dict(attrs={'style':'border: 0pt none ; margin: 0pt; padding: 0pt;'}), dict(name='p', attrs={'class':'rating'}), dict(attrs={'layout':'button_count'})]
feeds = [(u'Recenzje', u'http://film.org.pl/r/recenzje/feed/'), (u'Artyku\u0142', u'http://film.org.pl/a/artykul/feed/'), (u'Analiza', u'http://film.org.pl/a/analiza/feed/'), (u'Ranking', u'http://film.org.pl/a/ranking/feed/'), (u'Blog', u'http://film.org.pl/kmf/blog/feed/'), (u'Ludzie', u'http://film.org.pl/a/ludzie/feed/'), (u'Seriale', u'http://film.org.pl/a/seriale/feed/'), (u'Oceanarium', u'http://film.org.pl/a/ocenarium/feed/'), (u'VHS', u'http://film.org.pl/a/vhs-a/feed/')]
def append_page(self, soup, appendtag):
tag = soup.find('div', attrs={'class': 'pagelink'})
if tag:
for nexturl in tag.findAll('a'):
url = nexturl['href']
soup2 = self.index_to_soup(url)
pagetext = soup2.find(attrs={'class': 'editor'})
comments = pagetext.findAll(text=lambda text:isinstance(text, Comment))
for comment in comments:
comment.extract()
pos = len(appendtag.contents)
appendtag.insert(pos, pagetext)
for r in appendtag.findAll(attrs={'class': 'pagelink'}):
r.extract()
for r in appendtag.findAll(attrs={'id': 'comments'}):
r.extract()
for r in appendtag.findAll(attrs={'style':'border: 0pt none ; margin: 0pt; padding: 0pt;'}):
r.extract()
for r in appendtag.findAll(attrs={'layout':'button_count'}):
r.extract()
def preprocess_html(self, soup):
for c in soup.findAll('h11'):
c.name = 'h1'
self.append_page(soup, soup.body)
for r in soup.findAll('br'):
r.extract()
return soup

View File

@ -1,6 +1,7 @@
from calibre.web.feeds.news import BasicNewsRecipe
import re
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup
class FilmWebPl(BasicNewsRecipe):
title = u'FilmWeb'
__author__ = 'fenuks'
@ -14,11 +15,12 @@ class FilmWebPl(BasicNewsRecipe):
no_stylesheets = True
remove_empty_feeds = True
ignore_duplicate_articles = {'title', 'url'}
preprocess_regexps = [(re.compile(u'\(kliknij\,\ aby powiększyć\)', re.IGNORECASE), lambda m: ''), ]#(re.compile(ur' | ', re.IGNORECASE), lambda m: '')]
remove_javascript = True
preprocess_regexps = [(re.compile(u'\(kliknij\,\ aby powiększyć\)', re.IGNORECASE), lambda m: ''), (re.compile(ur'(<br ?/?>\s*?<br ?/?>\s*?)+', re.IGNORECASE), lambda m: '<br />')]#(re.compile(ur' | ', re.IGNORECASE), lambda m: '')]
extra_css = '.hdrBig {font-size:22px;} ul {list-style-type:none; padding: 0; margin: 0;}'
remove_tags = [dict(name='div', attrs={'class':['recommendOthers']}), dict(name='ul', attrs={'class':'fontSizeSet'}), dict(attrs={'class':'userSurname anno'})]
#remove_tags = [dict()]
remove_attributes = ['style',]
keep_only_tags = [dict(name='h1', attrs={'class':['hdrBig', 'hdrEntity']}), dict(name='div', attrs={'class':['newsInfo', 'newsInfoSmall', 'reviewContent description']})]
keep_only_tags = [dict(attrs={'class':['hdr hdr-super', 'newsContent']})]
feeds = [(u'News / Filmy w produkcji', 'http://www.filmweb.pl/feed/news/category/filminproduction'),
(u'News / Festiwale, nagrody i przeglądy', u'http://www.filmweb.pl/feed/news/category/festival'),
(u'News / Seriale', u'http://www.filmweb.pl/feed/news/category/serials'),
@ -42,6 +44,11 @@ class FilmWebPl(BasicNewsRecipe):
if skip_tag is not None:
return self.index_to_soup(skip_tag['href'], raw=True)
def postprocess_html(self, soup, first_fetch):
for r in soup.findAll(attrs={'class':'singlephoto'}):
r['style'] = 'float:left; margin-right: 10px;'
return soup
def preprocess_html(self, soup):
for a in soup('a'):
if a.has_key('href') and 'http://' not in a['href'] and 'https://' not in a['href']:
@ -51,9 +58,8 @@ class FilmWebPl(BasicNewsRecipe):
for i in soup.findAll('sup'):
if not i.string or i.string.startswith('(kliknij'):
i.extract()
tag = soup.find(name='ul', attrs={'class':'inline sep-line'})
if tag:
tag.name = 'div'
for t in tag.findAll('li'):
t.name = 'div'
for r in soup.findAll(id=re.compile('photo-\d+')):
r.extract()
for r in soup.findAll(style=re.compile('float: ?left')):
r['class'] = 'singlephoto'
return soup

View File

@ -1,6 +1,5 @@
from calibre.web.feeds.news import BasicNewsRecipe
import re
from calibre.ptempfile import PersistentTemporaryFile
class ForeignAffairsRecipe(BasicNewsRecipe):
''' there are three modifications:
@ -45,7 +44,6 @@ class ForeignAffairsRecipe(BasicNewsRecipe):
'publisher': publisher}
temp_files = []
articles_are_obfuscated = True
def get_cover_url(self):
soup = self.index_to_soup(self.FRONTPAGE)
@ -53,20 +51,6 @@ class ForeignAffairsRecipe(BasicNewsRecipe):
img_url = div.find('img')['src']
return self.INDEX + img_url
def get_obfuscated_article(self, url):
br = self.get_browser()
br.open(url)
response = br.follow_link(url_regex = r'/print/[0-9]+', nr = 0)
html = response.read()
self.temp_files.append(PersistentTemporaryFile('_fa.html'))
self.temp_files[-1].write(html)
self.temp_files[-1].close()
return self.temp_files[-1].name
def parse_index(self):
answer = []
@ -89,10 +73,10 @@ class ForeignAffairsRecipe(BasicNewsRecipe):
if div.find('a') is not None:
originalauthor=self.tag_to_string(div.findNext('div', attrs = {'class':'views-field-field-article-book-nid'}).div.a)
title=subsectiontitle+': '+self.tag_to_string(div.span.a)+' by '+originalauthor
url=self.INDEX+div.span.a['href']
url=self.INDEX+self.index_to_soup(self.INDEX+div.span.a['href']).find('a', attrs={'class':'fa_addthis_print'})['href']
atr=div.findNext('div', attrs = {'class': 'views-field-field-article-display-authors-value'})
if atr is not None:
author=self.tag_to_string(atr.span.a)
author=self.tag_to_string(atr.span)
else:
author=''
desc=div.findNext('span', attrs = {'class': 'views-field-field-article-summary-value'})
@ -106,10 +90,10 @@ class ForeignAffairsRecipe(BasicNewsRecipe):
for div in sec.findAll('div', attrs = {'class': 'views-field-title'}):
if div.find('a') is not None:
title=self.tag_to_string(div.span.a)
url=self.INDEX+div.span.a['href']
url=self.INDEX+self.index_to_soup(self.INDEX+div.span.a['href']).find('a', attrs={'class':'fa_addthis_print'})['href']
atr=div.findNext('div', attrs = {'class': 'views-field-field-article-display-authors-value'})
if atr is not None:
author=self.tag_to_string(atr.span.a)
author=self.tag_to_string(atr.span)
else:
author=''
desc=div.findNext('span', attrs = {'class': 'views-field-field-article-summary-value'})
@ -119,7 +103,7 @@ class ForeignAffairsRecipe(BasicNewsRecipe):
description=''
articles.append({'title':title, 'date':None, 'url':url, 'description':description, 'author':author})
if articles:
answer.append((section, articles))
answer.append((section, articles))
return answer
def preprocess_html(self, soup):

View File

@ -0,0 +1,75 @@
from calibre.web.feeds.recipes import BasicNewsRecipe
from collections import OrderedDict
class Fortune(BasicNewsRecipe):
title = 'Fortune Magazine'
__author__ = 'Rick Shang'
description = 'FORTUNE is a global business magazine that has been revered in its content and credibility since 1930. FORTUNE covers the entire field of business, including specific companies and business trends, prominent business leaders, and new ideas shaping the global marketplace.'
language = 'en'
category = 'news'
encoding = 'UTF-8'
keep_only_tags = [dict(attrs={'id':['storycontent']})]
remove_tags = [dict(attrs={'class':['hed_side','socialMediaToolbarContainer']})]
no_javascript = True
no_stylesheets = True
needs_subscription = True
def get_browser(self):
br = BasicNewsRecipe.get_browser(self)
br.open('http://money.cnn.com/2013/03/21/smallbusiness/legal-marijuana-startups.pr.fortune/index.html')
br.select_form(name="paywall-form")
br['email'] = self.username
br['password'] = self.password
br.submit()
return br
def parse_index(self):
articles = []
soup0 = self.index_to_soup('http://money.cnn.com/magazines/fortune/')
#Go to the latestissue
soup = self.index_to_soup(soup0.find('div',attrs={'class':'latestissue'}).find('a',href=True)['href'])
#Find cover & date
cover_item = soup.find('div', attrs={'id':'cover-story'})
cover = cover_item.find('img',src=True)
self.cover_url = cover['src']
date = self.tag_to_string(cover_item.find('div', attrs={'class':'tocDate'})).strip()
self.timefmt = u' [%s]'%date
feeds = OrderedDict()
section_title = ''
#checkout the cover story
articles = []
coverstory=soup.find('div', attrs={'class':'cnnHeadline'})
title=self.tag_to_string(coverstory.a).strip()
url=coverstory.a['href']
desc=self.tag_to_string(coverstory.findNext('p', attrs={'class':'cnnBlurbTxt'}))
articles.append({'title':title, 'url':url, 'description':desc, 'date':''})
feeds['Cover Story'] = []
feeds['Cover Story'] += articles
for post in soup.findAll('div', attrs={'class':'cnnheader'}):
section_title = self.tag_to_string(post).strip()
articles = []
ul=post.findNext('ul')
for link in ul.findAll('li'):
links=link.find('h2')
title=self.tag_to_string(links.a).strip()
url=links.a['href']
desc=self.tag_to_string(link.find('p', attrs={'class':'cnnBlurbTxt'}))
articles.append({'title':title, 'url':url, 'description':desc, 'date':''})
if articles:
if section_title not in feeds:
feeds[section_title] = []
feeds[section_title] += articles
ans = [(key, val) for key, val in feeds.iteritems()]
return ans

View File

@ -1,5 +1,6 @@
import re
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import Comment
class GazetaLubuska(BasicNewsRecipe):
title = u'Gazeta Lubuska'
@ -58,6 +59,10 @@ class GazetaLubuska(BasicNewsRecipe):
if pagetext:
pos = len(appendtag.contents)
appendtag.insert(pos, pagetext)
comments = appendtag.findAll(text=lambda text:isinstance(text, Comment))
for comment in comments:
comment.extract()
def preprocess_html(self, soup):
self.append_page(soup, soup.body)

View File

@ -1,5 +1,6 @@
import re
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import Comment
class GazetaPomorska(BasicNewsRecipe):
title = u'Gazeta Pomorska'
@ -85,6 +86,10 @@ class GazetaPomorska(BasicNewsRecipe):
if pagetext:
pos = len(appendtag.contents)
appendtag.insert(pos, pagetext)
comments = appendtag.findAll(text=lambda text:isinstance(text, Comment))
for comment in comments:
comment.extract()
def preprocess_html(self, soup):
self.append_page(soup, soup.body)

View File

@ -1,5 +1,6 @@
import re
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import Comment
class GazetaWspolczesna(BasicNewsRecipe):
title = u'Gazeta Wsp\xf3\u0142czesna'
@ -57,6 +58,10 @@ class GazetaWspolczesna(BasicNewsRecipe):
if pagetext:
pos = len(appendtag.contents)
appendtag.insert(pos, pagetext)
comments = appendtag.findAll(text=lambda text:isinstance(text, Comment))
for comment in comments:
comment.extract()
def preprocess_html(self, soup):
self.append_page(soup, soup.body)

View File

@ -1,6 +1,6 @@
# -*- coding: utf-8 -*-
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import Comment
class Gazeta_Wyborcza(BasicNewsRecipe):
title = u'Gazeta.pl'
@ -16,6 +16,7 @@ class Gazeta_Wyborcza(BasicNewsRecipe):
max_articles_per_feed = 100
remove_javascript = True
no_stylesheets = True
ignore_duplicate_articles = {'title', 'url'}
remove_tags_before = dict(id='k0')
remove_tags_after = dict(id='banP4')
remove_tags = [dict(name='div', attrs={'class':'rel_box'}), dict(attrs={'class':['date', 'zdjP', 'zdjM', 'pollCont', 'rel_video', 'brand', 'txt_upl']}), dict(name='div', attrs={'id':'footer'})]
@ -48,6 +49,9 @@ class Gazeta_Wyborcza(BasicNewsRecipe):
url = self.INDEX + link['href']
soup2 = self.index_to_soup(url)
pagetext = soup2.find(id='artykul')
comments = pagetext.findAll(text=lambda text:isinstance(text, Comment))
for comment in comments:
comment.extract()
pos = len(appendtag.contents)
appendtag.insert(pos, pagetext)
tag = soup2.find('div', attrs={'id': 'Str'})
@ -65,6 +69,9 @@ class Gazeta_Wyborcza(BasicNewsRecipe):
nexturl = pagetext.find(id='gal_btn_next')
if nexturl:
nexturl = nexturl.a['href']
comments = pagetext.findAll(text=lambda text:isinstance(text, Comment))
for comment in comments:
comment.extract()
pos = len(appendtag.contents)
appendtag.insert(pos, pagetext)
rem = appendtag.find(id='gal_navi')
@ -105,3 +112,7 @@ class Gazeta_Wyborcza(BasicNewsRecipe):
soup = self.index_to_soup('http://wyborcza.pl/' + cover.contents[3].a['href'])
self.cover_url = 'http://wyborcza.pl' + soup.img['src']
return getattr(self, 'cover_url', self.cover_url)
'''def image_url_processor(self, baseurl, url):
print "@@@@@@@@", url
return url.replace('http://wyborcza.pl/ ', '')'''

View File

@ -1,5 +1,6 @@
import re
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import Comment
class GCN(BasicNewsRecipe):
title = u'Gazeta Codziennej Nowiny'
@ -16,36 +17,36 @@ class GCN(BasicNewsRecipe):
remove_empty_feeds = True
no_stylesheets = True
ignore_duplicate_articles = {'title', 'url'}
preprocess_regexps = [(re.compile(ur'Czytaj:.*?</a>', re.DOTALL), lambda match: ''), (re.compile(ur'Przeczytaj także:.*?</a>', re.DOTALL|re.IGNORECASE), lambda match: ''),
remove_attributes = ['style']
preprocess_regexps = [(re.compile(ur'Czytaj:.*?</a>', re.DOTALL), lambda match: ''), (re.compile(ur'Przeczytaj także:.*?</a>', re.DOTALL|re.IGNORECASE), lambda match: ''),
(re.compile(ur'Przeczytaj również:.*?</a>', re.DOTALL|re.IGNORECASE), lambda match: ''), (re.compile(ur'Zobacz też:.*?</a>', re.DOTALL|re.IGNORECASE), lambda match: '')]
keep_only_tags = [dict(id=['article', 'cover', 'photostory'])]
remove_tags = [dict(id=['articleTags', 'articleMeta', 'boxReadIt', 'articleGalleries', 'articleConnections',
'ForumArticleComments', 'articleRecommend', 'jedynkiLinks', 'articleGalleryConnections',
'photostoryConnections', 'articleEpaper', 'articlePoll', 'articleAlarm', 'articleByline']),
'ForumArticleComments', 'articleRecommend', 'jedynkiLinks', 'articleGalleryConnections',
'photostoryConnections', 'articleEpaper', 'articlePoll', 'articleAlarm', 'articleByline']),
dict(attrs={'class':'articleFunctions'})]
feeds = [(u'Wszystkie', u'http://www.nowiny24.pl/rss.xml'),
(u'Podkarpacie', u'http://www.nowiny24.pl/podkarpacie.xml'),
(u'Bieszczady', u'http://www.nowiny24.pl/bieszczady.xml'),
(u'Rzeszów', u'http://www.nowiny24.pl/rzeszow.xml'),
(u'Przemyśl', u'http://www.nowiny24.pl/przemysl.xml'),
(u'Leżajsk', u'http://www.nowiny24.pl/lezajsk.xml'),
(u'Łańcut', u'http://www.nowiny24.pl/lancut.xml'),
(u'Dębica', u'http://www.nowiny24.pl/debica.xml'),
(u'Jarosław', u'http://www.nowiny24.pl/jaroslaw.xml'),
(u'Krosno', u'http://www.nowiny24.pl/krosno.xml'),
(u'Mielec', u'http://www.nowiny24.pl/mielec.xml'),
(u'Nisko', u'http://www.nowiny24.pl/nisko.xml'),
(u'Sanok', u'http://www.nowiny24.pl/sanok.xml'),
(u'Stalowa Wola', u'http://www.nowiny24.pl/stalowawola.xml'),
(u'Tarnobrzeg', u'http://www.nowiny24.pl/tarnobrzeg.xml'),
(u'Sport', u'http://www.nowiny24.pl/sport.xml'),
(u'Dom', u'http://www.nowiny24.pl/dom.xml'),
(u'Auto', u'http://www.nowiny24.pl/auto.xml'),
(u'Praca', u'http://www.nowiny24.pl/praca.xml'),
(u'Zdrowie', u'http://www.nowiny24.pl/zdrowie.xml'),
feeds = [(u'Wszystkie', u'http://www.nowiny24.pl/rss.xml'),
(u'Podkarpacie', u'http://www.nowiny24.pl/podkarpacie.xml'),
(u'Bieszczady', u'http://www.nowiny24.pl/bieszczady.xml'),
(u'Rzeszów', u'http://www.nowiny24.pl/rzeszow.xml'),
(u'Przemyśl', u'http://www.nowiny24.pl/przemysl.xml'),
(u'Leżajsk', u'http://www.nowiny24.pl/lezajsk.xml'),
(u'Łańcut', u'http://www.nowiny24.pl/lancut.xml'),
(u'Dębica', u'http://www.nowiny24.pl/debica.xml'),
(u'Jarosław', u'http://www.nowiny24.pl/jaroslaw.xml'),
(u'Krosno', u'http://www.nowiny24.pl/krosno.xml'),
(u'Mielec', u'http://www.nowiny24.pl/mielec.xml'),
(u'Nisko', u'http://www.nowiny24.pl/nisko.xml'),
(u'Sanok', u'http://www.nowiny24.pl/sanok.xml'),
(u'Stalowa Wola', u'http://www.nowiny24.pl/stalowawola.xml'),
(u'Tarnobrzeg', u'http://www.nowiny24.pl/tarnobrzeg.xml'),
(u'Sport', u'http://www.nowiny24.pl/sport.xml'),
(u'Dom', u'http://www.nowiny24.pl/dom.xml'),
(u'Auto', u'http://www.nowiny24.pl/auto.xml'),
(u'Praca', u'http://www.nowiny24.pl/praca.xml'),
(u'Zdrowie', u'http://www.nowiny24.pl/zdrowie.xml'),
(u'Wywiady', u'http://www.nowiny24.pl/wywiady.xml')]
def get_cover_url(self):
@ -78,6 +79,10 @@ class GCN(BasicNewsRecipe):
pos = len(appendtag.contents)
appendtag.insert(pos, pagetext)
comments = appendtag.findAll(text=lambda text:isinstance(text, Comment))
for comment in comments:
comment.extract()
def preprocess_html(self, soup):
self.append_page(soup, soup.body)
return soup

View File

@ -11,12 +11,13 @@ class Gildia(BasicNewsRecipe):
language = 'pl'
oldest_article = 8
max_articles_per_feed = 100
remove_empty_feeds=True
no_stylesheets=True
remove_empty_feeds = True
no_stylesheets = True
ignore_duplicate_articles = {'title', 'url'}
preprocess_regexps = [(re.compile(ur'</?sup>'), lambda match: '') ]
remove_tags=[dict(name='div', attrs={'class':'backlink'}), dict(name='div', attrs={'class':'im_img'}), dict(name='div', attrs={'class':'addthis_toolbox addthis_default_style'})]
keep_only_tags=dict(name='div', attrs={'class':'widetext'})
ignore_duplicate_articles = {'title', 'url'}
remove_tags = [dict(name='div', attrs={'class':'backlink'}), dict(name='div', attrs={'class':'im_img'}), dict(name='div', attrs={'class':'addthis_toolbox addthis_default_style'})]
keep_only_tags = dict(name='div', attrs={'class':'widetext'})
feeds = [(u'Gry', u'http://www.gry.gildia.pl/rss'), (u'Literatura', u'http://www.literatura.gildia.pl/rss'), (u'Film', u'http://www.film.gildia.pl/rss'), (u'Horror', u'http://www.horror.gildia.pl/rss'), (u'Konwenty', u'http://www.konwenty.gildia.pl/rss'), (u'Plansz\xf3wki', u'http://www.planszowki.gildia.pl/rss'), (u'Manga i anime', u'http://www.manga.gildia.pl/rss'), (u'Star Wars', u'http://www.starwars.gildia.pl/rss'), (u'Techno', u'http://www.techno.gildia.pl/rss'), (u'Historia', u'http://www.historia.gildia.pl/rss'), (u'Magia', u'http://www.magia.gildia.pl/rss'), (u'Bitewniaki', u'http://www.bitewniaki.gildia.pl/rss'), (u'RPG', u'http://www.rpg.gildia.pl/rss'), (u'LARP', u'http://www.larp.gildia.pl/rss'), (u'Muzyka', u'http://www.muzyka.gildia.pl/rss'), (u'Nauka', u'http://www.nauka.gildia.pl/rss')]
@ -34,7 +35,7 @@ class Gildia(BasicNewsRecipe):
def preprocess_html(self, soup):
for a in soup('a'):
if a.has_key('href') and 'http://' not in a['href'] and 'https://' not in a['href']:
if a.has_key('href') and not a['href'].startswith('http'):
if '/gry/' in a['href']:
a['href']='http://www.gry.gildia.pl' + a['href']
elif u'książk' in soup.title.string.lower() or u'komiks' in soup.title.string.lower():

26
recipes/gofin_pl.recipe Normal file
View File

@ -0,0 +1,26 @@
#!/usr/bin/env python
__license__ = 'GPL v3'
__author__ = 'teepel <teepel44@gmail.com>'
'''
gofin.pl
'''
from calibre.web.feeds.news import BasicNewsRecipe
class gofin(BasicNewsRecipe):
title = u'Gofin'
__author__ = 'teepel <teepel44@gmail.com>'
language = 'pl'
description =u'Portal Podatkowo-Księgowy'
INDEX='http://gofin.pl'
oldest_article = 7
max_articles_per_feed = 100
remove_empty_feeds= True
simultaneous_downloads = 5
remove_javascript=True
no_stylesheets=True
auto_cleanup = True
feeds = [(u'Podatki', u'http://www.rss.gofin.pl/podatki.xml'), (u'Prawo Pracy', u'http://www.rss.gofin.pl/prawopracy.xml'), (u'Rachunkowo\u015b\u0107', u'http://www.rss.gofin.pl/rachunkowosc.xml'), (u'Sk\u0142adki, zasi\u0142ki, emerytury', u'http://www.rss.gofin.pl/zasilki.xml'),(u'Firma', u'http://www.rss.gofin.pl/firma.xml'), (u'Prawnik radzi', u'http://www.rss.gofin.pl/prawnikradzi.xml')]

View File

@ -1,22 +1,23 @@
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup
class Gram_pl(BasicNewsRecipe):
title = u'Gram.pl'
__author__ = 'fenuks'
description = u'Serwis społecznościowy o grach: recenzje, newsy, zapowiedzi, encyklopedia gier, forum. Gry PC, PS3, X360, PS Vita, sprzęt dla graczy.'
category = 'games'
language = 'pl'
title = u'Gram.pl'
__author__ = 'fenuks'
description = u'Serwis społecznościowy o grach: recenzje, newsy, zapowiedzi, encyklopedia gier, forum. Gry PC, PS3, X360, PS Vita, sprzęt dla graczy.'
category = 'games'
language = 'pl'
oldest_article = 8
index='http://www.gram.pl'
max_articles_per_feed = 100
ignore_duplicate_articles = {'title', 'url'}
no_stylesheets= True
remove_empty_feeds = True
#extra_css = 'h2 {font-style: italic; font-size:20px;} .picbox div {float: left;}'
#extra_css = 'h2 {font-style: italic; font-size:20px;} .picbox div {float: left;}'
cover_url=u'http://www.gram.pl/www/01/img/grampl_zima.png'
keep_only_tags= [dict(id='articleModule')]
remove_tags = [dict(attrs={'class':['breadCrump', 'dymek', 'articleFooter', 'twitter-share-button']})]
feeds = [(u'Informacje', u'http://www.gram.pl/feed_news.asp'),
remove_tags = [dict(attrs={'class':['breadCrump', 'dymek', 'articleFooter', 'twitter-share-button']}), dict(name='aside')]
feeds = [(u'Informacje', u'http://www.gram.pl/feed_news.asp'),
(u'Publikacje', u'http://www.gram.pl/feed_news.asp?type=articles')
]
@ -45,4 +46,4 @@ class Gram_pl(BasicNewsRecipe):
tag=soup.find(name='span', attrs={'class':'platforma'})
if tag:
tag.name = 'p'
return soup
return soup

View File

@ -1,5 +1,6 @@
import time
from calibre.web.feeds.recipes import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import Comment
class GryOnlinePl(BasicNewsRecipe):
title = u'Gry-Online.pl'
@ -40,10 +41,14 @@ class GryOnlinePl(BasicNewsRecipe):
r.extract()
for r in pagetext.findAll(attrs={'itemprop':'description'}):
r.extract()
pos = len(appendtag.contents)
appendtag.insert(pos, pagetext)
for r in appendtag.findAll(attrs={'class':['n5p', 'add-info', 'twitter-share-button', 'lista lista3 lista-gry']}):
r.extract()
comments = appendtag.findAll(text=lambda text:isinstance(text, Comment))
for comment in comments:
comment.extract()
else:
tag = appendtag.find('div', attrs={'class':'S018stronyr'})
if tag:
@ -70,16 +75,22 @@ class GryOnlinePl(BasicNewsRecipe):
r.extract()
for r in pagetext.findAll(attrs={'itemprop':'description'}):
r.extract()
comments = pagetext.findAll(text=lambda text:isinstance(text, Comment))
[comment.extract() for comment in comments]
pos = len(appendtag.contents)
appendtag.insert(pos, pagetext)
for r in appendtag.findAll(attrs={'class':['n5p', 'add-info', 'twitter-share-button', 'lista lista3 lista-gry', 'S018strony']}):
r.extract()
comments = appendtag.findAll(text=lambda text:isinstance(text, Comment))
for comment in comments:
comment.extract()
def image_url_processor(self, baseurl, url):
if url.startswith('..'):
return url[2:]
else:
return url
return url
def preprocess_html(self, soup):
self.append_page(soup, soup.body)

View File

@ -77,10 +77,9 @@ class Harpers_full(BasicNewsRecipe):
self.timefmt = u' [%s]'%date
#get cover
coverurl='http://harpers.org/wp-content/themes/harpers/ajax_microfiche.php?img=harpers-'+re.split('harpers.org/',currentIssue_url)[1]+'gif/0001.gif'
soup2 = self.index_to_soup(coverurl)
self.cover_url = self.tag_to_string(soup2.find('img')['src'])
self.cover_url = soup1.find('div', attrs = {'class':'picture_hp'}).find('img', src=True)['src']
self.log(self.cover_url)
articles = []
count = 0
for item in soup1.findAll('div', attrs={'class':'articleData'}):

View File

@ -2,7 +2,6 @@ from __future__ import with_statement
__license__ = 'GPL 3'
__copyright__ = '2009, Kovid Goyal <kovid@kovidgoyal.net>'
import time
from calibre.web.feeds.news import BasicNewsRecipe
class TheHindu(BasicNewsRecipe):
@ -14,44 +13,42 @@ class TheHindu(BasicNewsRecipe):
max_articles_per_feed = 100
no_stylesheets = True
keep_only_tags = [dict(id='content')]
remove_tags = [dict(attrs={'class':['article-links', 'breadcr']}),
dict(id=['email-section', 'right-column', 'printfooter', 'topover',
'slidebox', 'th_footer'])]
auto_cleanup = True
extra_css = '.photo-caption { font-size: smaller }'
def preprocess_raw_html(self, raw, url):
return raw.replace('<body><p>', '<p>').replace('</p></body>', '</p>')
def postprocess_html(self, soup, first_fetch):
for t in soup.findAll(['table', 'tr', 'td','center']):
t.name = 'div'
return soup
def parse_index(self):
today = time.strftime('%Y-%m-%d')
soup = self.index_to_soup(
'http://www.thehindu.com/todays-paper/tp-index/?date=' + today)
div = soup.find(id='left-column')
feeds = []
soup = self.index_to_soup('http://www.thehindu.com/todays-paper/')
div = soup.find('div', attrs={'id':'left-column'})
soup.find(id='subnav-tpbar').extract()
current_section = None
current_articles = []
for x in div.findAll(['h3', 'div']):
if current_section and x.get('class', '') == 'tpaper':
a = x.find('a', href=True)
if a is not None:
title = self.tag_to_string(a)
self.log('\tFound article:', title)
current_articles.append({'url':a['href']+'?css=print',
'title':title, 'date': '',
'description':''})
if x.name == 'h3':
if current_section and current_articles:
feeds = []
for x in div.findAll(['a', 'span']):
if x.name == 'span' and x['class'] == 's-link':
# Section heading found
if current_articles and current_section:
feeds.append((current_section, current_articles))
current_section = self.tag_to_string(x)
self.log('Found section:', current_section)
current_articles = []
self.log('\tFound section:', current_section)
elif x.name == 'a':
title = self.tag_to_string(x)
url = x.get('href', False)
if not url or not title:
continue
self.log('\t\tFound article:', title)
self.log('\t\t\t', url)
current_articles.append({'title': title, 'url':url,
'description':'', 'date':''})
if current_articles and current_section:
feeds.append((current_section, current_articles))
return feeds

View File

@ -1,27 +1,22 @@
from calibre.web.feeds.news import BasicNewsRecipe
class Historia_org_pl(BasicNewsRecipe):
title = u'Historia.org.pl'
__author__ = 'fenuks'
description = u'Artykuły dotyczące historii w układzie epok i tematów, forum. Najlepsza strona historii. Matura z historii i egzamin gimnazjalny z historii.'
cover_url = 'http://lh3.googleusercontent.com/_QeRQus12wGg/TOvHsZ2GN7I/AAAAAAAAD_o/LY1JZDnq7ro/logo5.jpg'
category = 'history'
language = 'pl'
title = u'Historia.org.pl'
__author__ = 'fenuks'
description = u'Artykuły dotyczące historii w układzie epok i tematów, forum. Najlepsza strona historii. Matura z historii i egzamin gimnazjalny z historii.'
cover_url = 'http://lh3.googleusercontent.com/_QeRQus12wGg/TOvHsZ2GN7I/AAAAAAAAD_o/LY1JZDnq7ro/logo5.jpg'
category = 'history'
language = 'pl'
oldest_article = 8
extra_css = 'img {float: left; margin-right: 10px;} .alignleft {float: left; margin-right: 10px;}'
remove_empty_feeds= True
no_stylesheets = True
use_embedded_content = True
max_articles_per_feed = 100
ignore_duplicate_articles = {'title', 'url'}
feeds = [(u'Wszystkie', u'http://historia.org.pl/feed/'),
(u'Wiadomości', u'http://historia.org.pl/Kategoria/wiadomosci/feed/'),
(u'Publikacje', u'http://historia.org.pl/Kategoria/artykuly/feed/'),
(u'Publicystyka', u'http://historia.org.pl/Kategoria/publicystyka/feed/'),
(u'Recenzje', u'http://historia.org.pl/Kategoria/recenzje/feed/'),
(u'Projekty', u'http://historia.org.pl/Kategoria/projekty/feed/'),]
def print_version(self, url):
return url + '?tmpl=component&print=1&layout=default&page='
feeds = [(u'Wszystkie', u'http://historia.org.pl/feed/'),
(u'Wiadomości', u'http://historia.org.pl/Kategoria/wiadomosci/feed/'),
(u'Publikacje', u'http://historia.org.pl/Kategoria/artykuly/feed/'),
(u'Publicystyka', u'http://historia.org.pl/Kategoria/publicystyka/feed/'),
(u'Recenzje', u'http://historia.org.pl/Kategoria/recenzje/feed/'),
(u'Projekty', u'http://historia.org.pl/Kategoria/projekty/feed/'),]

View File

@ -1,6 +1,6 @@
import re
from calibre.web.feeds.recipes import BasicNewsRecipe
from collections import OrderedDict
import re
from calibre.web.feeds.news import BasicNewsRecipe
class HistoryToday(BasicNewsRecipe):
@ -19,7 +19,6 @@ class HistoryToday(BasicNewsRecipe):
needs_subscription = True
def get_browser(self):
br = BasicNewsRecipe.get_browser(self)
if self.username is not None and self.password is not None:
@ -46,8 +45,9 @@ class HistoryToday(BasicNewsRecipe):
#Go to issue
soup = self.index_to_soup('http://www.historytoday.com/contents')
cover = soup.find('div',attrs={'id':'content-area'}).find('img')['src']
cover = soup.find('div',attrs={'id':'content-area'}).find('img', attrs={'src':re.compile('.*cover.*')})['src']
self.cover_url=cover
self.log(self.cover_url)
#Go to the main body
@ -84,4 +84,3 @@ class HistoryToday(BasicNewsRecipe):
def cleanup(self):
self.browser.open('http://www.historytoday.com/logout')

Binary file not shown.

After

Width:  |  Height:  |  Size: 898 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 755 B

BIN
recipes/icons/esenja.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 329 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 329 B

BIN
recipes/icons/gofin_pl.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 618 B

BIN
recipes/icons/histmag.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 537 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 806 B

After

Width:  |  Height:  |  Size: 869 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 857 B

BIN
recipes/icons/km_blog.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 532 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.3 KiB

0
recipes/icons/nowy_obywatel.png Executable file → Normal file
View File

Before

Width:  |  Height:  |  Size: 480 B

After

Width:  |  Height:  |  Size: 480 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 697 B

BIN
recipes/icons/sport_pl.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 627 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 863 B

View File

@ -1,5 +1,7 @@
from calibre.web.feeds.news import BasicNewsRecipe
import re
from calibre.ebooks.BeautifulSoup import Comment
class in4(BasicNewsRecipe):
title = u'IN4.pl'
oldest_article = 7
@ -8,14 +10,14 @@ class in4(BasicNewsRecipe):
description = u'Serwis Informacyjny - Aktualnosci, recenzje'
category = 'IT'
language = 'pl'
index='http://www.in4.pl/'
index = 'http://www.in4.pl/'
#cover_url= 'http://www.in4.pl/recenzje/337/in4pl.jpg'
no_stylesheets = True
remove_empty_feeds = True
preprocess_regexps = [(re.compile(ur'<a title="translate into.*?</a>', re.DOTALL), lambda match: '') ]
keep_only_tags=[dict(name='div', attrs={'class':'left_alone'})]
remove_tags_after=dict(name='img', attrs={'title':'komentarze'})
remove_tags=[dict(name='img', attrs={'title':'komentarze'})]
keep_only_tags = [dict(name='div', attrs={'class':'left_alone'})]
remove_tags_after = dict(name='img', attrs={'title':'komentarze'})
remove_tags = [dict(name='img', attrs={'title':'komentarze'})]
feeds = [(u'Wiadomo\u015bci', u'http://www.in4.pl/rss.php'), (u'Recenzje', u'http://www.in4.pl/rss_recenzje.php'), (u'Mini recenzje', u'http://www.in4.pl/rss_mini.php')]
def append_page(self, soup, appendtag):
@ -28,10 +30,13 @@ class in4(BasicNewsRecipe):
while nexturl:
soup2 = self.index_to_soup(nexturl)
pagetext = soup2.find(id='news')
comments = pagetext.findAll(text=lambda text:isinstance(text, Comment))
for comment in comments:
comment.extract()
pos = len(appendtag.contents)
appendtag.insert(pos, pagetext)
nexturl=None
tag=soup2.findAll('a')
nexturl = None
tag = soup2.findAll('a')
for z in tag:
if z.string and u'następna str' in z.string:
nexturl='http://www.in4.pl/' + z['href']

View File

@ -1,21 +1,20 @@
from calibre.web.feeds.news import BasicNewsRecipe
class INFRA(BasicNewsRecipe):
title = u'INFRA'
title = u'INFRA'
oldest_article = 7
max_articles_per_feed = 100
__author__ = 'fenuks'
description = u'Serwis Informacyjny INFRA - UFO, Zjawiska Paranormalne, Duchy, Tajemnice świata.'
cover_url = 'http://npn.nazwa.pl/templates/ja_teline_ii/images/logo.jpg'
category = 'UFO'
__author__ = 'fenuks'
description = u'Serwis Informacyjny INFRA - UFO, Zjawiska Paranormalne, Duchy, Tajemnice świata.'
cover_url = 'http://i.imgur.com/j7hJT.jpg'
category = 'UFO'
index='http://infra.org.pl'
language = 'pl'
language = 'pl'
max_articles_per_feed = 100
no_stylesheers=True
remove_tags_before=dict(name='h2', attrs={'class':'contentheading'})
remove_tags_after=dict(attrs={'class':'pagenav'})
remove_tags=[dict(attrs={'class':'pagenav'})]
feeds = [(u'Najnowsze wiadomo\u015bci', u'http://www.infra.org.pl/index.php?option=com_rd_rss&id=1')]
remove_attrs = ['style']
no_stylesheets = True
keep_only_tags = [dict(id='ja-current-content')]
feeds = [(u'Najnowsze wiadomo\u015bci', u'http://www.infra.org.pl/rss')]
def preprocess_html(self, soup):
for item in soup.findAll(style=True):

View File

@ -1,23 +1,24 @@
#!/usr/bin/env python
__license__ = 'GPL v3'
__copyright__ = u'2010, Tomasz Dlugosz <tomek3d@gmail.com>'
__copyright__ = u'2010-2013, Tomasz Dlugosz <tomek3d@gmail.com>'
'''
fakty.interia.pl
'''
import re
from calibre.web.feeds.news import BasicNewsRecipe
class InteriaFakty(BasicNewsRecipe):
title = u'Interia.pl - Fakty'
description = u'Fakty ze strony interia.pl'
language = 'pl'
oldest_article = 7
oldest_article = 1
__author__ = u'Tomasz D\u0142ugosz'
simultaneous_downloads = 2
no_stylesheets = True
remove_javascript = True
max_articles_per_feed = 100
remove_empty_feeds= True
use_embedded_content = False
ignore_duplicate_articles = {'title', 'url'}
feeds = [(u'Kraj', u'http://kanaly.rss.interia.pl/kraj.xml'),
(u'\u015awiat', u'http://kanaly.rss.interia.pl/swiat.xml'),
@ -26,14 +27,36 @@ class InteriaFakty(BasicNewsRecipe):
(u'Wywiady', u'http://kanaly.rss.interia.pl/wywiady.xml'),
(u'Ciekawostki', u'http://kanaly.rss.interia.pl/ciekawostki.xml')]
keep_only_tags = [dict(name='div', attrs={'id':'article'})]
keep_only_tags = [
dict(name='h1'),
dict(name='div', attrs={'class': ['lead textContent', 'text textContent', 'source']})]
remove_tags = [
dict(name='div', attrs={'class':'box fontSizeSwitch'}),
dict(name='div', attrs={'class':'clear'}),
dict(name='div', attrs={'class':'embed embedLeft articleEmbedArticleList articleEmbedArticleListTitle'}),
dict(name='span', attrs={'class':'keywords'})]
remove_tags = [dict(name='div', attrs={'class':['embed embedAd', 'REMOVE', 'boxHeader']})]
preprocess_regexps = [
(re.compile(i[0], re.IGNORECASE | re.DOTALL), i[1]) for i in
[
(r'embed embed(Left|Right|Center) articleEmbed(Audio|Wideo articleEmbedVideo|ArticleFull|ArticleTitle|ArticleListTitle|AlbumHorizontal)">', lambda match: 'REMOVE">'),
(r'</div> <div class="source">', lambda match: ''),
(r'<p><a href="http://forum.interia.pl.*?</a></p>', lambda match: '')
]
]
def get_article_url(self, article):
link = article.get('link', None)
if link and 'galerie' not in link and link.split('/')[-1]=="story01.htm":
link=link.split('/')[-2]
encoding = {'0B': '.', '0C': '/', '0A': '0', '0F': '=', '0G': '&',
'0D': '?', '0E': '-', '0H': ',', '0I': '_', '0N': '.com', '0L': 'http://'}
for k, v in encoding.iteritems():
link = link.replace(k, v)
return link
def print_version(self, url):
chunks = url.split(',')
return chunks[0] + '/podglad-wydruku'+ ',' + ','.join(chunks[1:])
extra_css = '''
h2 { font-size: 1.2em; }
'''
h1 { font-size:130% }
div.info { font-style:italic; font-size:70%}
'''

View File

@ -1,7 +1,7 @@
#!/usr/bin/env python
__license__ = 'GPL v3'
__copyright__ = u'2010, Tomasz Dlugosz <tomek3d@gmail.com>'
__copyright__ = u'2010-2013, Tomasz Dlugosz <tomek3d@gmail.com>'
'''
sport.interia.pl
'''
@ -13,61 +13,51 @@ class InteriaSport(BasicNewsRecipe):
title = u'Interia.pl - Sport'
description = u'Sport ze strony interia.pl'
language = 'pl'
oldest_article = 7
oldest_article = 1
__author__ = u'Tomasz D\u0142ugosz'
simultaneous_downloads = 3
no_stylesheets = True
remove_javascript = True
max_articles_per_feed = 100
remove_empty_feeds= True
use_embedded_content = False
ignore_duplicate_articles = {'title', 'url'}
feeds = [(u'Wydarzenia sportowe', u'http://kanaly.rss.interia.pl/sport.xml'),
(u'Pi\u0142ka no\u017cna', u'http://kanaly.rss.interia.pl/pilka_nozna.xml'),
(u'Siatk\xf3wka', u'http://kanaly.rss.interia.pl/siatkowka.xml'),
(u'Koszyk\xf3wka', u'http://kanaly.rss.interia.pl/koszykowka.xml'),
(u'NBA', u'http://kanaly.rss.interia.pl/nba.xml'),
(u'Kolarstwo', u'http://kanaly.rss.interia.pl/kolarstwo.xml'),
(u'\u017bu\u017cel', u'http://kanaly.rss.interia.pl/zuzel.xml'),
(u'Tenis', u'http://kanaly.rss.interia.pl/tenis.xml')]
keep_only_tags = [dict(name='div', attrs={'id':'article'})]
keep_only_tags = [
dict(name='h1'),
dict(name='div', attrs={'class': ['lead textContent', 'text textContent', 'source']})]
remove_tags = [dict(name='div', attrs={'class':'object gallery'}),
dict(name='div', attrs={'class':'box fontSizeSwitch'})]
extra_css = '''
.articleDate {
font-size: 0.5em;
color: black;
}
.articleFoto {
display: block;
font-family: sans;
font-size: 0.5em;
text-indent: 0
color: black;
}
.articleText {
display: block;
margin-bottom: 1em;
margin-left: 0;
margin-right: 0;
margin-top: 1em
color: black;
}
.articleLead {
font-size: 1.2em;
}
'''
remove_tags = [dict(name='div', attrs={'class':['embed embedAd', 'REMOVE', 'boxHeader']})]
preprocess_regexps = [
(re.compile(i[0], re.IGNORECASE | re.DOTALL), i[1]) for i in
[
(r'<p><a href.*?</a></p>', lambda match: ''),
# FIXME
#(r'(<div id="newsAddContent">)(.*?)(<a href=".*">)(.*?)(</a>)', lambda match: '\1\2\4'),
(r'<p>(<i>)?<b>(ZOBACZ|CZYTAJ) T.*?</div>', lambda match: '</div>')
(r'<p>(<i>)?<b>(ZOBACZ|CZYTAJ) T.*?</div>', lambda match: '</div>'),
(r'embed embed(Left|Right|Center) articleEmbed(Audio|Wideo articleEmbedVideo|ArticleFull|ArticleTitle|ArticleListTitle|AlbumHorizontal)">', lambda match: 'REMOVE">'),
(r'</div> <div class="source">', lambda match: ''),
(r'<p><a href="http://forum.interia.pl.*?</a></p>', lambda match: '')
]
]
def get_article_url(self, article):
link = article.get('link', None)
if link and 'galerie' not in link and link.split('/')[-1]=="story01.htm":
link=link.split('/')[-2]
encoding = {'0B': '.', '0C': '/', '0A': '0', '0F': '=', '0G': '&',
'0D': '?', '0E': '-', '0H': ',', '0I': '_', '0N': '.com', '0L': 'http://'}
for k, v in encoding.iteritems():
link = link.replace(k, v)
return link
def print_version(self, url):
chunks = url.split(',')
return chunks[0] + '/podglad-wydruku'+ ',' + ','.join(chunks[1:])
extra_css = '''
h1 { font-size:130% }
div.info { font-style:italic; font-size:70%}
'''

View File

@ -1,65 +1,62 @@
__license__ = 'GPL v3'
__copyright__ = "2008, Derry FitzGerald. 2009 Modified by Ray Kinsella and David O'Callaghan, 2011 Modified by Phil Burns"
__copyright__ = "2008, Derry FitzGerald. 2009 Modified by Ray Kinsella and David O'Callaghan, 2011 Modified by Phil Burns, 2013 Tom Scholl"
'''
irishtimes.com
'''
import re
import urlparse, re
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ptempfile import PersistentTemporaryFile
class IrishTimes(BasicNewsRecipe):
title = u'The Irish Times'
encoding = 'ISO-8859-15'
__author__ = "Derry FitzGerald, Ray Kinsella, David O'Callaghan and Phil Burns"
__author__ = "Derry FitzGerald, Ray Kinsella, David O'Callaghan and Phil Burns, Tom Scholl"
language = 'en_IE'
timefmt = ' (%A, %B %d, %Y)'
masthead_url = 'http://www.irishtimes.com/assets/images/generic/website/logo_theirishtimes.png'
encoding = 'utf-8'
oldest_article = 1.0
max_articles_per_feed = 100
max_articles_per_feed = 100
remove_empty_feeds = True
no_stylesheets = True
simultaneous_downloads= 5
r = re.compile('.*(?P<url>http:\/\/(www.irishtimes.com)|(rss.feedsportal.com\/c)\/.*\.html?).*')
remove_tags = [dict(name='div', attrs={'class':'footer'})]
extra_css = 'p, div { margin: 0pt; border: 0pt; text-indent: 0.5em } .headline {font-size: large;} \n .fact { padding-top: 10pt }'
temp_files = []
articles_are_obfuscated = True
feeds = [
('Frontpage', 'http://www.irishtimes.com/feeds/rss/newspaper/index.rss'),
('Ireland', 'http://www.irishtimes.com/feeds/rss/newspaper/ireland.rss'),
('World', 'http://www.irishtimes.com/feeds/rss/newspaper/world.rss'),
('Finance', 'http://www.irishtimes.com/feeds/rss/newspaper/finance.rss'),
('Features', 'http://www.irishtimes.com/feeds/rss/newspaper/features.rss'),
('Sport', 'http://www.irishtimes.com/feeds/rss/newspaper/sport.rss'),
('Opinion', 'http://www.irishtimes.com/feeds/rss/newspaper/opinion.rss'),
('Letters', 'http://www.irishtimes.com/feeds/rss/newspaper/letters.rss'),
('Magazine', 'http://www.irishtimes.com/feeds/rss/newspaper/magazine.rss'),
('Health', 'http://www.irishtimes.com/feeds/rss/newspaper/health.rss'),
('Education & Parenting', 'http://www.irishtimes.com/feeds/rss/newspaper/education.rss'),
('Motors', 'http://www.irishtimes.com/feeds/rss/newspaper/motors.rss'),
('An Teanga Bheo', 'http://www.irishtimes.com/feeds/rss/newspaper/anteangabheo.rss'),
('Commercial Property', 'http://www.irishtimes.com/feeds/rss/newspaper/commercialproperty.rss'),
('Science Today', 'http://www.irishtimes.com/feeds/rss/newspaper/sciencetoday.rss'),
('Property', 'http://www.irishtimes.com/feeds/rss/newspaper/property.rss'),
('The Tickets', 'http://www.irishtimes.com/feeds/rss/newspaper/theticket.rss'),
('Weekend', 'http://www.irishtimes.com/feeds/rss/newspaper/weekend.rss'),
('News features', 'http://www.irishtimes.com/feeds/rss/newspaper/newsfeatures.rss'),
('Obituaries', 'http://www.irishtimes.com/feeds/rss/newspaper/obituaries.rss'),
('News', 'http://www.irishtimes.com/cmlink/the-irish-times-news-1.1319192'),
('World', 'http://www.irishtimes.com/cmlink/irishtimesworldfeed-1.1321046'),
('Politics', 'http://www.irishtimes.com/cmlink/irish-times-politics-rss-1.1315953'),
('Business', 'http://www.irishtimes.com/cmlink/the-irish-times-business-1.1319195'),
('Culture', 'http://www.irishtimes.com/cmlink/the-irish-times-culture-1.1319213'),
('Sport', 'http://www.irishtimes.com/cmlink/the-irish-times-sport-1.1319194'),
('Debate', 'http://www.irishtimes.com/cmlink/debate-1.1319211'),
('Life & Style', 'http://www.irishtimes.com/cmlink/the-irish-times-life-style-1.1319214'),
]
def print_version(self, url):
if url.count('rss.feedsportal.com'):
#u = url.replace('0Bhtml/story01.htm','_pf0Bhtml/story01.htm')
u = url.find('irishtimes')
u = 'http://www.irishtimes.com' + url[u + 12:]
u = u.replace('0C', '/')
u = u.replace('A', '')
u = u.replace('0Bhtml/story01.htm', '_pf.html')
else:
u = url.replace('.html','_pf.html')
return u
def get_obfuscated_article(self, url):
# Insert a pic from the original url, but use content from the print url
pic = None
pics = self.index_to_soup(url)
div = pics.find('div', {'class' : re.compile('image-carousel')})
if div:
pic = div.img
if pic:
try:
pic['src'] = urlparse.urljoin(url, pic['src'])
pic.extract()
except:
pic = None
content = self.index_to_soup(url + '?mode=print&ot=example.AjaxPageLayout.ot')
if pic:
content.p.insert(0, pic)
self.temp_files.append(PersistentTemporaryFile('_fa.html'))
self.temp_files[-1].write(content.prettify())
self.temp_files[-1].close()
return self.temp_files[-1].name
def get_article_url(self, article):
return article.link

View File

@ -11,12 +11,10 @@ class AdvancedUserRecipe1295262156(BasicNewsRecipe):
auto_cleanup = True
encoding='iso-8859-1'
feeds = [(u'kath.net', u'http://www.kath.net/2005/xml/index.xml')]
def print_version(self, url):
return url+"&print=yes"
return url+"/print/yes"
extra_css = 'td.textb {font-size: medium;}'

View File

@ -1,14 +1,16 @@
import re
from calibre.web.feeds.news import BasicNewsRecipe
class KDEFamilyPl(BasicNewsRecipe):
title = u'KDEFamily.pl'
__author__ = 'fenuks'
description = u'KDE w Polsce'
category = 'open source, KDE'
language = 'pl'
title = u'KDEFamily.pl'
__author__ = 'fenuks'
description = u'KDE w Polsce'
category = 'open source, KDE'
language = 'pl'
cover_url = 'http://www.mykde.home.pl/kdefamily/wp-content/uploads/2012/07/logotype-e1341585198616.jpg'
oldest_article = 7
max_articles_per_feed = 100
preprocess_regexps = [(re.compile(r"Podobne wpisy.*", re.IGNORECASE|re.DOTALL), lambda m: '')]
no_stylesheets = True
use_embedded_content = True
feeds = [(u'Wszystko', u'http://kdefamily.pl/feed/')]
feeds = [(u'Wszystko', u'http://kdefamily.pl/feed/')]

36
recipes/km_blog.recipe Normal file
View File

@ -0,0 +1,36 @@
__license__ = 'GPL v3'
__author__ = 'teepel <teepel44@gmail.com>, Artur Stachecki <artur.stachecki@gmail.com>'
'''
korwin-mikke.pl/blog
'''
from calibre.web.feeds.news import BasicNewsRecipe
class km_blog(BasicNewsRecipe):
title = u'Korwin-Mikke Blog'
__author__ = 'teepel <teepel44@gmail.com>'
language = 'pl'
description ='Wiadomości z bloga korwin-mikke.pl/blog'
INDEX='http://korwin-mikke.pl/blog'
remove_empty_feeds= True
oldest_article = 7
max_articles_per_feed = 100
remove_javascript=True
no_stylesheets=True
remove_empty_feeds = True
feeds = [(u'blog', u'http://korwin-mikke.pl/blog/rss')]
keep_only_tags =[]
#this line should show title of the article, but it doesnt work
keep_only_tags.append(dict(name = 'div', attrs = {'class' : 'posts view'}))
keep_only_tags.append(dict(name = 'div', attrs = {'class' : 'text'}))
keep_only_tags.append(dict(name = 'h1'))
remove_tags =[]
remove_tags.append(dict(name = 'p', attrs = {'class' : 'float_right'}))
remove_tags.append(dict(name = 'p', attrs = {'class' : 'date'}))
remove_tags_after=[(dict(name = 'div', attrs = {'class': 'text'}))]

View File

@ -3,10 +3,10 @@ from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup
class Konflikty(BasicNewsRecipe):
title = u'Konflikty Zbrojne'
__author__ = 'fenuks'
cover_url = 'http://www.konflikty.pl/images/tapety_logo.jpg'
language = 'pl'
title = u'Konflikty Zbrojne'
__author__ = 'fenuks'
cover_url = 'http://www.konflikty.pl/images/tapety_logo.jpg'
language = 'pl'
description = u'Zbiór ciekawych artykułów historycznych, militarnych oraz recenzji książek, gier i filmów. Najświeższe informacje o lotnictwie, wojskach lądowych i polityce.'
category='military, history'
oldest_article = 7
@ -14,19 +14,20 @@ class Konflikty(BasicNewsRecipe):
no_stylesheets = True
keep_only_tags=[dict(attrs={'class':['title1', 'image']}), dict(id='body')]
feeds = [(u'Aktualności', u'http://www.konflikty.pl/rss_aktualnosci_10.xml'),
(u'Historia', u'http://www.konflikty.pl/rss_historia_10.xml'),
(u'Militaria', u'http://www.konflikty.pl/rss_militaria_10.xml'),
(u'Relacje', u'http://www.konflikty.pl/rss_relacje_10.xml'),
(u'Recenzje', u'http://www.konflikty.pl/rss_recenzje_10.xml'),
(u'Teksty źródłowe', u'http://www.konflikty.pl/rss_tekstyzrodlowe_10.xml')]
feeds = [(u'Aktualności', u'http://www.konflikty.pl/rss_aktualnosci_10.xml'),
(u'Historia', u'http://www.konflikty.pl/rss_historia_10.xml'),
(u'Militaria', u'http://www.konflikty.pl/rss_militaria_10.xml'),
(u'Relacje', u'http://www.konflikty.pl/rss_relacje_10.xml'),
(u'Recenzje', u'http://www.konflikty.pl/rss_recenzje_10.xml'),
(u'Teksty źródłowe', u'http://www.konflikty.pl/rss_tekstyzrodlowe_10.xml')]
def preprocess_html(self, soup):
for item in soup.findAll(style=True):
del item['style']
for image in soup.findAll(name='a', attrs={'class':'image'}):
image['style'] = 'width: 210px; float: left; margin-right:5px;'
if image.img and image.img.has_key('alt'):
image.name='div'
pos = len(image.contents)
image.insert(pos, BeautifulSoup('<p style="font-style:italic;">'+image.img['alt']+'</p>'))
return soup
return soup

View File

@ -2,21 +2,27 @@
from calibre.web.feeds.news import BasicNewsRecipe
class Kosmonauta(BasicNewsRecipe):
title = u'Kosmonauta.net'
__author__ = 'fenuks'
description = u'polskojęzyczny portal w całości dedykowany misjom kosmicznym i badaniom kosmosu.'
category = 'astronomy'
language = 'pl'
title = u'Kosmonauta.net'
__author__ = 'fenuks'
description = u'polskojęzyczny portal w całości dedykowany misjom kosmicznym i badaniom kosmosu.'
category = 'astronomy'
language = 'pl'
cover_url = 'http://bi.gazeta.pl/im/4/10393/z10393414X,Kosmonauta-net.jpg'
extra_css = '.thumbnail {float:left;margin-right:5px;}'
no_stylesheets = True
INDEX = 'http://www.kosmonauta.net'
oldest_article = 7
no_stylesheets = True
remove_javascript = True
remove_attributes = ['style']
max_articles_per_feed = 100
keep_only_tags = [dict(name='div', attrs={'class':'item-page'})]
remove_tags = [dict(attrs={'class':['article-tools clearfix', 'cedtag', 'nav clearfix', 'jwDisqusForm']})]
remove_tags = [dict(attrs={'class':['article-tools clearfix', 'cedtag', 'nav clearfix', 'jwDisqusForm']}), dict(attrs={'alt':['Poprzednia strona', 'Następna strona']})]
remove_tags_after = dict(name='div', attrs={'class':'cedtag'})
feeds = [(u'Kosmonauta.net', u'http://www.kosmonauta.net/?format=feed&type=atom')]
feeds = [(u'Kosmonauta.net', u'http://www.kosmonauta.net/?format=feed&type=atom')]
def print_version(self, url):
return url + '?tmpl=component&print=1&layout=default&page='
def preprocess_html(self, soup):
for a in soup.findAll(name='a'):
@ -24,5 +30,4 @@ class Kosmonauta(BasicNewsRecipe):
href = a['href']
if not href.startswith('http'):
a['href'] = self.INDEX + href
return soup
return soup

View File

@ -1,5 +1,6 @@
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup as bs
from calibre.ebooks.BeautifulSoup import BeautifulSoup as bs, Comment
class KurierGalicyjski(BasicNewsRecipe):
title = u'Kurier Galicyjski'
__author__ = 'fenuks'
@ -42,6 +43,9 @@ class KurierGalicyjski(BasicNewsRecipe):
r.extract()
for r in appendtag.findAll(attrs={'style':'border-top-width: thin; border-top-style: dashed; border-top-color: #CCC; border-bottom-width: thin; border-bottom-style: dashed; border-bottom-color: #CCC; padding-top:5px; padding-bottom:5px; text-align:right; margin-top:10px; height:20px;'}):
r.extract()
comments = appendtag.findAll(text=lambda text:isinstance(text, Comment))
for comment in comments:
comment.extract()
def preprocess_html(self, soup):
self.append_page(soup, soup.body)

View File

@ -1,5 +1,6 @@
import re
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import Comment
class KurierPoranny(BasicNewsRecipe):
title = u'Kurier Poranny'
@ -72,6 +73,11 @@ class KurierPoranny(BasicNewsRecipe):
if pagetext:
pos = len(appendtag.contents)
appendtag.insert(pos, pagetext)
comments = appendtag.findAll(text=lambda text:isinstance(text, Comment))
for comment in comments:
comment.extract()
def preprocess_html(self, soup):
self.append_page(soup, soup.body)

View File

@ -2,7 +2,7 @@ __license__ = 'GPL v3'
__author__ = 'Lorenzo Vigentini and Olivier Daigle'
__copyright__ = '2012, Lorenzo Vigentini <l.vigentini at gmail.com>, Olivier Daigle <odaigle _at nuvucameras __dot__ com>'
__version__ = 'v1.01'
__date__ = '22, December 2012'
__date__ = '17, March 2013'
__description__ = 'Canadian Paper '
'''
@ -28,10 +28,14 @@ class ledevoir(BasicNewsRecipe):
oldest_article = 1
max_articles_per_feed = 200
min_articles_per_feed = 0
use_embedded_content = False
recursion = 10
needs_subscription = 'optional'
compress_news_images = True
compress_news_images_auto_size = 4
filterDuplicates = False
url_list = []
@ -66,16 +70,16 @@ class ledevoir(BasicNewsRecipe):
feeds = [
(u'A la une', 'http://www.ledevoir.com/rss/manchettes.xml'),
# (u'Édition complete', 'http://feeds2.feedburner.com/fluxdudevoir'),
# (u'Opinions', 'http://www.ledevoir.com/rss/opinions.xml'),
# (u'Chroniques', 'http://www.ledevoir.com/rss/chroniques.xml'),
# (u'Politique', 'http://www.ledevoir.com/rss/section/politique.xml?id=51'),
# (u'International', 'http://www.ledevoir.com/rss/section/international.xml?id=76'),
# (u'Culture', 'http://www.ledevoir.com/rss/section/culture.xml?id=48'),
# (u'Environnement', 'http://www.ledevoir.com/rss/section/environnement.xml?id=78'),
# (u'Societe', 'http://www.ledevoir.com/rss/section/societe.xml?id=52'),
# (u'Economie', 'http://www.ledevoir.com/rss/section/economie.xml?id=49'),
# (u'Sports', 'http://www.ledevoir.com/rss/section/sports.xml?id=85'),
(u'Édition complete', 'http://feeds2.feedburner.com/fluxdudevoir'),
(u'Opinions', 'http://www.ledevoir.com/rss/opinions.xml'),
(u'Chroniques', 'http://www.ledevoir.com/rss/chroniques.xml'),
(u'Politique', 'http://www.ledevoir.com/rss/section/politique.xml?id=51'),
(u'International', 'http://www.ledevoir.com/rss/section/international.xml?id=76'),
(u'Culture', 'http://www.ledevoir.com/rss/section/culture.xml?id=48'),
(u'Environnement', 'http://www.ledevoir.com/rss/section/environnement.xml?id=78'),
(u'Societe', 'http://www.ledevoir.com/rss/section/societe.xml?id=52'),
(u'Economie', 'http://www.ledevoir.com/rss/section/economie.xml?id=49'),
(u'Sports', 'http://www.ledevoir.com/rss/section/sports.xml?id=85'),
(u'Art de vivre', 'http://www.ledevoir.com/rss/section/art-de-vivre.xml?id=50')
]
@ -113,3 +117,23 @@ class ledevoir(BasicNewsRecipe):
self.url_list.append(url)
return url
'''
def postprocess_html(self, soup, first):
#process all the images. assumes that the new html has the correct path
if first == 0:
return soup
for tag in soup.findAll(lambda tag: tag.name.lower()=='img' and tag.has_key('src')):
iurl = tag['src']
img = Image()
img.open(iurl)
# width, height = img.size
# print 'img is: ', iurl, 'width is: ', width, 'height is: ', height
if img < 0:
raise RuntimeError('Out of memory')
img.set_compression_quality(30)
img.save(iurl)
return soup
'''

View File

@ -1,4 +1,5 @@
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import Comment
class LinuxJournal(BasicNewsRecipe):
title = u'Linux Journal'
@ -25,6 +26,9 @@ class LinuxJournal(BasicNewsRecipe):
soup2 = self.index_to_soup('http://www.linuxjournal.com'+ nexturl)
pagetext = soup2.find(attrs={'class':'node-inner'}).find(attrs={'class':'content'})
next = appendtag.find('li', attrs={'class':'pager-next'})
comments = pagetext.findAll(text=lambda text:isinstance(text, Comment))
for comment in comments:
comment.extract()
pos = len(appendtag.contents)
appendtag.insert(pos, pagetext)
tag = appendtag.find('div', attrs={'class':'links'})
@ -33,4 +37,4 @@ class LinuxJournal(BasicNewsRecipe):
def preprocess_html(self, soup):
self.append_page(soup, soup.body)
return soup
return soup

View File

@ -1,13 +0,0 @@
from calibre.web.feeds.news import CalibrePeriodical
class MiDDay(CalibrePeriodical):
title = 'MiDDay'
calibre_periodicals_slug = 'midday'
description = '''Get your dose of the latest news, views and fun - from the
world of politics, sports and Bollywood to the cartoons, comics and games of
the entertainment section - Indias leading tabloid has it all. To subscribe
visit <a href="http://news.calibre-ebook.com/periodical/midday">calibre
Periodicals</a>.'''
language = 'en_IN'

View File

@ -2,13 +2,14 @@
import re
from calibre.web.feeds.news import BasicNewsRecipe
class Mlody_technik(BasicNewsRecipe):
title = u'Młody technik'
__author__ = 'fenuks'
description = u'Młody technik'
category = 'science'
language = 'pl'
title = u'Młody technik'
__author__ = 'fenuks'
description = u'Młody technik'
category = 'science'
language = 'pl'
#cover_url = 'http://science-everywhere.pl/wp-content/uploads/2011/10/mt12.jpg'
no_stylesheets = True
extra_css = 'img.alignleft {float: left; margin-right: 5px;}'
preprocess_regexps = [(re.compile(r"<h4>Podobne</h4>", re.IGNORECASE), lambda m: '')]
oldest_article = 7
max_articles_per_feed = 100
@ -17,18 +18,18 @@ class Mlody_technik(BasicNewsRecipe):
keep_only_tags = [dict(id='content')]
remove_tags = [dict(attrs={'class':'st-related-posts'})]
remove_tags_after = dict(attrs={'class':'entry-content clearfix'})
feeds = [(u'Wszystko', u'http://www.mt.com.pl/feed'),
#(u'MT NEWS 24/7', u'http://www.mt.com.pl/kategoria/mt-newsy-24-7/feed'),
(u'Info zoom', u'http://www.mt.com.pl/kategoria/info-zoom/feed'),
(u'm.technik', u'http://www.mt.com.pl/kategoria/m-technik/feed'),
(u'Szkoła', u'http://www.mt.com.pl/kategoria/szkola-2/feed'),
(u'Na Warsztacie', u'http://www.mt.com.pl/kategoria/na-warsztacie/feed'),
(u'Z pasji do...', u'http://www.mt.com.pl/kategoria/z-pasji-do/feed'),
(u'MT testuje', u'http://www.mt.com.pl/kategoria/mt-testuje/feed')]
feeds = [(u'Wszystko', u'http://www.mt.com.pl/feed'),
#(u'MT NEWS 24/7', u'http://www.mt.com.pl/kategoria/mt-newsy-24-7/feed'),
(u'Info zoom', u'http://www.mt.com.pl/kategoria/info-zoom/feed'),
(u'm.technik', u'http://www.mt.com.pl/kategoria/m-technik/feed'),
(u'Szkoła', u'http://www.mt.com.pl/kategoria/szkola-2/feed'),
(u'Na Warsztacie', u'http://www.mt.com.pl/kategoria/na-warsztacie/feed'),
(u'Z pasji do...', u'http://www.mt.com.pl/kategoria/z-pasji-do/feed'),
(u'MT testuje', u'http://www.mt.com.pl/kategoria/mt-testuje/feed')]
def get_cover_url(self):
soup = self.index_to_soup('http://www.mt.com.pl/')
tag = soup.find(attrs={'class':'xoxo'})
if tag:
self.cover_url = tag.find('img')['src']
return getattr(self, 'cover_url', self.cover_url)
return getattr(self, 'cover_url', self.cover_url)

View File

@ -1,16 +1,18 @@
from calibre.web.feeds.news import BasicNewsRecipe
import re
class NaukawPolsce(BasicNewsRecipe):
title = u'Nauka w Polsce'
__author__ = 'fenuks'
description = u'Serwis Nauka w Polsce ma za zadanie popularyzację polskiej nauki. Można na nim znaleźć wiadomości takie jak: osiągnięcia polskich naukowców, wydarzenia na polskich uczelniach, osiągnięcia studentów, konkursy dla badaczy, staże i stypendia naukowe, wydarzenia w polskiej nauce, kalendarium wydarzeń w nauce, materiały wideo o nauce.'
category = 'science'
language = 'pl'
title = u'Nauka w Polsce'
__author__ = 'fenuks'
description = u'Serwis Nauka w Polsce ma za zadanie popularyzację polskiej nauki. Można na nim znaleźć wiadomości takie jak: osiągnięcia polskich naukowców, wydarzenia na polskich uczelniach, osiągnięcia studentów, konkursy dla badaczy, staże i stypendia naukowe, wydarzenia w polskiej nauce, kalendarium wydarzeń w nauce, materiały wideo o nauce.'
category = 'science'
language = 'pl'
cover_url = 'http://www.naukawpolsce.pap.pl/Themes/Pap/images/logo-pl.gif'
oldest_article = 7
max_articles_per_feed = 100
no_stylesheets = True
remove_empty_feeds = True
extra_css = '.miniaturka {float: left; margin-right: 5px; max-width: 350px;} .miniaturka-dol-strony {display: inline-block; margin: 0 15px; width: 120px;}'
ignore_duplicate_articles = {'title', 'url'}
index = 'http://www.naukawpolsce.pl'
keep_only_tags = [dict(name='div', attrs={'class':'margines wiadomosc'})]
remove_tags = [dict(name='div', attrs={'class':'tagi'})]
@ -23,8 +25,8 @@ class NaukawPolsce(BasicNewsRecipe):
url = self.index + i.h1.a['href']
date = '' #i.span.string
articles.append({'title' : title,
'url' : url,
'date' : date,
'url' : url,
'date' : date,
'description' : ''
})
return articles
@ -44,4 +46,4 @@ class NaukawPolsce(BasicNewsRecipe):
def preprocess_html(self, soup):
for p in soup.findAll(name='p', text=re.compile('&nbsp;')):
p.extract()
return soup
return soup

View File

@ -1,16 +1,19 @@
from calibre.web.feeds.news import BasicNewsRecipe
class Niebezpiecznik_pl(BasicNewsRecipe):
title = u'Niebezpiecznik.pl'
__author__ = 'fenuks'
description = u'Niebezpiecznik.pl o bezpieczeństwie i nie...'
category = 'hacking, IT'
language = 'pl'
title = u'Niebezpiecznik.pl'
__author__ = 'fenuks'
description = u'Niebezpiecznik.pl o bezpieczeństwie i nie...'
category = 'hacking, IT'
language = 'pl'
oldest_article = 8
extra_css = '.entry {margin-top: 25px;}'
remove_attrs = ['style']
max_articles_per_feed = 100
no_stylesheets = True
remove_empty_feeds = True
cover_url = u'http://userlogos.org/files/logos/Karmody/niebezpiecznik_01.png'
remove_tags = [dict(name='div', attrs={'class':['sociable']}), dict(name='h4'), dict(attrs={'class':'similar-posts'})]
keep_only_tags = [dict(name='div', attrs={'class':['title', 'entry']})]
feeds = [(u'Wiadomości', u'http://feeds.feedburner.com/niebezpiecznik/'),
('Blog', 'http://feeds.feedburner.com/niebezpiecznik/linkblog/')]
feeds = [(u'Wiadomości', u'http://feeds.feedburner.com/niebezpiecznik/'),
('Blog', 'http://feeds.feedburner.com/niebezpiecznik/linkblog/')]

View File

@ -1,5 +1,6 @@
import re
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import Comment
class NTO(BasicNewsRecipe):
title = u'Nowa Trybuna Opolska'
@ -57,6 +58,10 @@ class NTO(BasicNewsRecipe):
if pagetext:
pos = len(appendtag.contents)
appendtag.insert(pos, pagetext)
comments = appendtag.findAll(text=lambda text:isinstance(text, Comment))
for comment in comments:
comment.extract()
def preprocess_html(self, soup):
self.append_page(soup, soup.body)

View File

@ -35,7 +35,10 @@ class NewYorkTimesBookReview(BasicNewsRecipe):
continue
if x['class'] in {'story', 'ledeStory'}:
tt = 'h3' if x['class'] == 'story' else 'h1'
a = x.find(tt).find('a', href=True)
try:
a = x.find(tt).find('a', href=True)
except AttributeError:
continue
title = self.tag_to_string(a)
url = a['href'] + '&pagewanted=all'
self.log('\tFound article:', title, url)

View File

@ -1,4 +1,6 @@
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import Comment
class OCLab(BasicNewsRecipe):
title = u'OCLab.pl'
oldest_article = 7
@ -26,6 +28,10 @@ class OCLab(BasicNewsRecipe):
appendtag.insert(pos, pagetext)
for r in appendtag.findAll(attrs={'class':'post-nav-bottom-list'}):
r.extract()
comments = appendtag.findAll(text=lambda text:isinstance(text, Comment))
for comment in comments:
comment.extract()
def preprocess_html(self, soup):
self.append_page(soup, soup.body)
return soup

View File

@ -0,0 +1,41 @@
#!/usr/bin/env python
__license__ = 'GPL v3'
from calibre.web.feeds.news import BasicNewsRecipe
class OptyczneRecipe(BasicNewsRecipe):
__author__ = u'Artur Stachecki <artur.stachecki@gmail.com>'
language = 'pl'
title = u'optyczne.pl'
category = u'News'
description = u'Najlepsze testy obiektywów, testy aparatów cyfrowych i testy lornetek w sieci!'
cover_url=''
remove_empty_feeds= True
no_stylesheets=True
oldest_article = 7
max_articles_per_feed = 100000
recursions = 0
no_stylesheets = True
remove_javascript = True
keep_only_tags =[]
keep_only_tags.append(dict(name = 'div', attrs = {'class' : 'news'}))
remove_tags =[]
remove_tags.append(dict(name = 'div', attrs = {'class' : 'center'}))
remove_tags.append(dict(name = 'div', attrs = {'class' : 'news_foto'}))
remove_tags.append(dict(name = 'div', attrs = {'align' : 'right'}))
extra_css = '''
body {font-family: Arial,Helvetica,sans-serif;}
h1{text-align: left;}
h2{font-size: medium; font-weight: bold;}
p.lead {font-weight: bold; text-align: left;}
.authordate {font-size: small; color: #696969;}
.fot{font-size: x-small; color: #666666;}
'''
feeds = [
('Aktualnosci', 'http://www.optyczne.pl/rss.xml'),
]

View File

@ -1,11 +1,12 @@
from calibre.web.feeds.news import BasicNewsRecipe
class OSWorld(BasicNewsRecipe):
title = u'OSWorld.pl'
__author__ = 'fenuks'
description = u'OSWorld.pl to serwis internetowy, dzięki któremu poznasz czym naprawdę jest Open Source. Serwis poświęcony jest wolnemu oprogramowaniu jak linux mint, centos czy ubunty. Znajdziecie u nasz artykuły, unity oraz informacje o certyfikatach CACert. OSWorld to mały świat wielkich systemów!'
category = 'OS, IT, open source, Linux'
language = 'pl'
title = u'OSWorld.pl'
__author__ = 'fenuks'
description = u'OSWorld.pl to serwis internetowy, dzięki któremu poznasz czym naprawdę jest Open Source. Serwis poświęcony jest wolnemu oprogramowaniu jak linux mint, centos czy ubunty. Znajdziecie u nasz artykuły, unity oraz informacje o certyfikatach CACert. OSWorld to mały świat wielkich systemów!'
category = 'OS, IT, open source, Linux'
language = 'pl'
cover_url = 'http://osworld.pl/wp-content/uploads/osworld-kwadrat-128x111.png'
extra_css = 'img.alignleft {float: left; margin-right: 5px;}'
oldest_article = 7
max_articles_per_feed = 100
no_stylesheets = True
@ -14,7 +15,7 @@ class OSWorld(BasicNewsRecipe):
keep_only_tags = [dict(id=['dzial', 'posts'])]
remove_tags = [dict(attrs={'class':'post-comments'})]
remove_tags_after = dict(attrs={'class':'entry clr'})
feeds = [(u'Artyku\u0142y', u'http://osworld.pl/category/artykuly/feed/'), (u'Nowe wersje', u'http://osworld.pl/category/nowe-wersje/feed/')]
feeds = [(u'Artyku\u0142y', u'http://osworld.pl/category/artykuly/feed/'), (u'Nowe wersje', u'http://osworld.pl/category/nowe-wersje/feed/')]
def append_page(self, soup, appendtag):
tag = appendtag.find(attrs={'id':'paginacja'})
@ -30,4 +31,4 @@ class OSWorld(BasicNewsRecipe):
def preprocess_html(self, soup):
self.append_page(soup, soup.body)
return soup
return soup

View File

@ -1,5 +1,6 @@
import re
from calibre.web.feeds.news import BasicNewsRecipe
class Overclock_pl(BasicNewsRecipe):
title = u'Overclock.pl'
oldest_article = 7
@ -21,4 +22,4 @@ class Overclock_pl(BasicNewsRecipe):
if 'articles/show' in url:
return url.replace('show', 'showall')
else:
return url
return url

View File

@ -1,20 +1,21 @@
from calibre.web.feeds.news import BasicNewsRecipe
class PC_Centre(BasicNewsRecipe):
title = u'PC Centre'
title = u'PC Centre'
oldest_article = 7
max_articles_per_feed = 100
__author__ = 'fenuks'
description = u'Portal komputerowy, a w nim: testy sprzętu komputerowego, recenzje gier i oprogramowania. a także opisy produktów związanych z komputerami.'
category = 'IT'
language = 'pl'
__author__ = 'fenuks'
description = u'Portal komputerowy, a w nim: testy sprzętu komputerowego, recenzje gier i oprogramowania. a także opisy produktów związanych z komputerami.'
category = 'IT'
language = 'pl'
masthead_url= 'http://pccentre.pl/views/images/logo.gif'
cover_url= 'http://pccentre.pl/views/images/logo.gif'
no_stylesheets = True
remove_empty_feeds = True
ignore_duplicate_articles = {'title', 'url'}
#keep_only_tags= [dict(id='content')]
#remove_tags=[dict(attrs={'class':['ikony r', 'list_of_content', 'dot accordion']}), dict(id='comments')]
remove_tags=[dict(attrs={'class':'logo_print'})]
feeds = [(u'Aktualno\u015bci', u'http://pccentre.pl/backend.php'), (u'Publikacje', u'http://pccentre.pl/backend.php?mode=a'), (u'Sprz\u0119t komputerowy', u'http://pccentre.pl/backend.php?mode=n&section=2'), (u'Oprogramowanie', u'http://pccentre.pl/backend.php?mode=n&section=3'), (u'Gry komputerowe i konsole', u'http://pccentre.pl/backend.php?mode=n&section=4'), (u'Internet', u'http://pccentre.pl/backend.php?mode=n&section=7'), (u'Bezpiecze\u0144stwo', u'http://pccentre.pl/backend.php?mode=n&section=5'), (u'Multimedia', u'http://pccentre.pl/backend.php?mode=n&section=6'), (u'Biznes', u'http://pccentre.pl/backend.php?mode=n&section=9')]
feeds = [(u'Aktualno\u015bci', u'http://pccentre.pl/backend.php'), (u'Publikacje', u'http://pccentre.pl/backend.php?mode=a'), (u'Sprz\u0119t komputerowy', u'http://pccentre.pl/backend.php?mode=n&section=2'), (u'Oprogramowanie', u'http://pccentre.pl/backend.php?mode=n&section=3'), (u'Gry komputerowe i konsole', u'http://pccentre.pl/backend.php?mode=n&section=4'), (u'Internet', u'http://pccentre.pl/backend.php?mode=n&section=7'), (u'Bezpiecze\u0144stwo', u'http://pccentre.pl/backend.php?mode=n&section=5'), (u'Multimedia', u'http://pccentre.pl/backend.php?mode=n&section=6'), (u'Biznes', u'http://pccentre.pl/backend.php?mode=n&section=9')]
def print_version(self, url):
return url.replace('show', 'print')

View File

@ -1,4 +1,8 @@
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import Comment
#currently recipe is not working
class PC_Foster(BasicNewsRecipe):
title = u'PC Foster'
oldest_article = 7
@ -29,6 +33,9 @@ class PC_Foster(BasicNewsRecipe):
appendtag.insert(pos, pagetext)
for r in appendtag.findAll(attrs={'class':'review_content double'}):
r.extract()
comments = appendtag.findAll(text=lambda text:isinstance(text, Comment))
for comment in comments:
comment.extract()
def preprocess_html(self, soup):
self.append_page(soup, soup.body)

View File

@ -67,12 +67,13 @@ class PsychologyToday(BasicNewsRecipe):
title = title + u' (%s)'%author
article_page= self.index_to_soup('http://www.psychologytoday.com'+post.find('a', href=True)['href'])
print_page=article_page.find('li', attrs={'class':'print_html first'})
url='http://www.psychologytoday.com'+print_page.find('a',href=True)['href']
desc = self.tag_to_string(post.find('div', attrs={'class':'collection-node-description'})).strip()
self.log('Found article:', title)
self.log('\t', url)
self.log('\t', desc)
articles.append({'title':title, 'url':url, 'date':'','description':desc})
if print_page is not None:
url='http://www.psychologytoday.com'+print_page.find('a',href=True)['href']
desc = self.tag_to_string(post.find('div', attrs={'class':'collection-node-description'})).strip()
self.log('Found article:', title)
self.log('\t', url)
self.log('\t', desc)
articles.append({'title':title, 'url':url, 'date':'','description':desc})
return [('Current Issue', articles)]

View File

@ -23,8 +23,8 @@ class PublicoPT(BasicNewsRecipe):
remove_empty_feeds = True
extra_css = ' body{font-family: Arial,Helvetica,sans-serif } img{margin-bottom: 0.4em} '
keep_only_tags = [dict(attrs={'class':['content-noticia-title','artigoHeader','ECOSFERA_MANCHETE','noticia','textoPrincipal','ECOSFERA_texto_01']})]
remove_tags = [dict(attrs={'class':['options','subcoluna']})]
keep_only_tags = [dict(attrs={'class':['hentry article single']})]
remove_tags = [dict(attrs={'class':['entry-options entry-options-above group','entry-options entry-options-below group', 'module tag-list']})]
feeds = [
(u'Geral', u'http://feeds.feedburner.com/publicoRSS'),

View File

@ -1,4 +1,6 @@
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import Comment
class PurePC(BasicNewsRecipe):
title = u'PurePC'
oldest_article = 7
@ -27,7 +29,10 @@ class PurePC(BasicNewsRecipe):
appendtag.insert(pos, pagetext)
for r in appendtag.findAll(attrs={'class':['PageMenuList', 'pager', 'fivestar-widget']}):
r.extract()
comments = appendtag.findAll(text=lambda text:isinstance(text, Comment))
for comment in comments:
comment.extract()
def preprocess_html(self, soup):
self.append_page(soup, soup.body)
return soup
return soup

View File

@ -6,10 +6,12 @@ class RTE(BasicNewsRecipe):
max_articles_per_feed = 100
__author__ = u'Robin Phillips'
language = 'en_IE'
auto_cleanup=True
auto_cleanup_keep = '//figure[@class="photography gal642 single"]'
remove_tags = [dict(attrs={'class':['topAd','botad','previousNextItem','headline','footerLinks','footernav']})]
feeds = [(u'News', u'http://www.rte.ie/rss/news.xml'), (u'Sport', u'http://www.rte.ie/rss/sport.xml'), (u'Soccer', u'http://www.rte.ie/rss/soccer.xml'), (u'GAA', u'http://www.rte.ie/rss/gaa.xml'), (u'Rugby', u'http://www.rte.ie/rss/rugby.xml'), (u'Racing', u'http://www.rte.ie/rss/racing.xml'), (u'Business', u'http://www.rte.ie/rss/business.xml'), (u'Entertainment', u'http://www.rte.ie/rss/entertainment.xml')]
def print_version(self, url):
return url.replace('http://www', 'http://m')
#def print_version(self, url):
#return url.replace('http://www', 'http://m')

71
recipes/sport_pl.recipe Normal file
View File

@ -0,0 +1,71 @@
#!/usr/bin/env python
__license__ = 'GPL v3'
__copyright__ = 'teepel 2012'
'''
sport.pl
'''
from calibre.web.feeds.news import BasicNewsRecipe
class sport_pl(BasicNewsRecipe):
title = 'Sport.pl'
__author__ = 'teepel <teepel44@gmail.com>'
language = 'pl'
description =u'Największy portal sportowy w Polsce. Wiadomości sportowe z najważniejszych wydarzeń, relacje i wyniki meczów na żywo.'
masthead_url='http://press.gazeta.pl/file/mediakit/154509/c8/sportpl.jpg'
oldest_article = 1
max_articles_per_feed = 100
remove_javascript=True
no_stylesheets=True
remove_empty_feeds = True
keep_only_tags =[]
keep_only_tags.append(dict(name = 'div', attrs = {'id' : 'article'}))
remove_tags =[]
remove_tags.append(dict(name = 'a', attrs = {'href' : 'www.gazeta.pl'}))
feeds = [
(u'Wszystkie wiadomości', u'http://rss.gazeta.pl/pub/rss/sport.xml'),
(u'Piłka nożna', u'http://www.sport.pl/pub/rss/sport/pilka_nozna.htm'),
(u'F1', u'http://www.sport.pl/pub/rss/sportf1.htm'),
(u'Tenis', u'http://serwisy.gazeta.pl/pub/rss/tenis.htm'),
(u'Siatkówka', u'http://gazeta.pl.feedsportal.com/c/32739/f/611628/index.rss'),
(u'Koszykówka', u'http://gazeta.pl.feedsportal.com/c/32739/f/611647/index.rss'),
(u'Piłka ręczna', u'http://gazeta.pl.feedsportal.com/c/32739/f/611635/index.rss'),
(u'Inne sporty', u'http://gazeta.pl.feedsportal.com/c/32739/f/611649/index.rss'),
]
def parse_feeds(self):
feeds = BasicNewsRecipe.parse_feeds(self)
for feed in feeds:
for article in feed.articles[:]:
if '[ZDJĘCIA]' in article.title:
article.title = article.title.replace('[ZDJĘCIA]','')
elif '[WIDEO]' in article.title:
article.title = article.title.replace('[WIDEO]','')
return feeds
def print_version(self, url):
if 'feedsportal' in url:
segment = url.split('/')
urlPart = segment[-2]
urlPart = urlPart.replace('0L0Ssport0Bpl0C','')
urlPart = urlPart.replace('0C10H','/')
urlPart = urlPart.replace('0H',',')
urlPart = urlPart.replace('0I','_')
urlPart = urlPart.replace('A','')
segment1 = urlPart.split('/')
seg1 = segment1[0]
seg2 = segment1[1]
segment2 = seg2.split(',')
part = segment2[0] + ',' + segment2[1]
return 'http://www.sport.pl/' + seg1 + '/2029020,' + part + '.html'
else:
segment = url.split('/')
part2 = segment[-2]
part1 = segment[-1]
segment2 = part1.split(',')
part = segment2[1] + ',' + segment2[2]
return 'http://www.sport.pl/' + part2 + '/2029020,' + part + '.html'

File diff suppressed because one or more lines are too long

View File

@ -1,18 +1,20 @@
from calibre.web.feeds.news import BasicNewsRecipe
import re
class Tablety_pl(BasicNewsRecipe):
title = u'Tablety.pl'
__author__ = 'fenuks'
description = u'Tablety, gry i aplikacje na tablety.'
title = u'Tablety.pl'
__author__ = 'fenuks'
description = u'Tablety, gry i aplikacje na tablety.'
masthead_url= 'http://www.tablety.pl/wp-content/themes/kolektyw/img/logo.png'
cover_url = 'http://www.tablety.pl/wp-content/themes/kolektyw/img/logo.png'
category = 'IT'
language = 'pl'
use_embedded_content=True
cover_url = 'http://www.tablety.pl/wp-content/themes/kolektyw/img/logo.png'
category = 'IT'
language = 'pl'
use_embedded_content = False
no_stylesheets = True
oldest_article = 8
max_articles_per_feed = 100
preprocess_regexps = [(re.compile(ur'<p><strong>Przeczytaj także.*?</a></strong></p>', re.DOTALL), lambda match: ''), (re.compile(ur'<p><strong>Przeczytaj koniecznie.*?</a></strong></p>', re.DOTALL), lambda match: '')]
keep_only_tags = [dict(id='news_block')]
#remove_tags_before=dict(name="h1", attrs={'class':'entry-title'})
#remove_tags_after=dict(name="footer", attrs={'class':'entry-footer clearfix'})
#remove_tags=[dict(name='footer', attrs={'class':'entry-footer clearfix'}), dict(name='div', attrs={'class':'entry-comment-counter'})]
feeds = [(u'Najnowsze posty', u'http://www.tablety.pl/feed/')]
remove_tags=[dict(attrs={'class':['comments_icon', 'wp-polls', 'entry-comments']})]
feeds = [(u'Najnowsze posty', u'http://www.tablety.pl/feed/')]

26
recipes/trystero.recipe Normal file
View File

@ -0,0 +1,26 @@
#!/usr/bin/env python
__license__ = 'GPL v3'
__copyright__ = u'2013, Tomasz Dlugosz <tomek3d@gmail.com>'
'''
trystero.pl
'''
from calibre.web.feeds.news import BasicNewsRecipe
class trystero(BasicNewsRecipe):
title = 'Trystero'
__author__ = u'Tomasz D\u0142ugosz'
language = 'pl'
description =u'Trystero.pl jest niezależnym blogiem finansowym. Publikowane na nim teksty dotyczą rynku kapitałowego, ekonomii, gospodarki i życia społecznego w takiej mniej więcej kolejności.'
oldest_article = 7
remove_javascript=True
no_stylesheets=True
feeds = [(u'Newsy', u'http://www.trystero.pl/feed')]
keep_only_tags = [
dict(name='h1'),
dict(name='div', attrs={'class': ['post-content']})]

View File

@ -1,5 +1,6 @@
import re
from calibre.web.feeds.news import BasicNewsRecipe
class UbuntuPomoc(BasicNewsRecipe):
title = u'Ubuntu-pomoc.org'
__author__ = 'fenuks'
@ -15,8 +16,8 @@ class UbuntuPomoc(BasicNewsRecipe):
remove_empty_feeds = True
use_embedded_content = False
remove_attrs = ['style']
keep_only_tags = [dict(attrs={'class':'post'})]
remove_tags_after = dict(attrs={'class':'underEntry'})
remove_tags = [dict(attrs={'class':['underPostTitle', 'yarpp-related', 'underEntry', 'social', 'tags', 'commentlist', 'youtube_sc']}), dict(id=['wp_rp_first', 'commentReply'])]
keep_only_tags = [dict(name='article')]
#remove_tags_after = dict(attrs={'class':'underEntry'})
remove_tags = [dict(attrs={'class':['yarpp-related', 'youtube_sc', 'share']}), dict(name='footer')]
feeds = [(u'Ca\u0142o\u015b\u0107', u'http://feeds.feedburner.com/Ubuntu-Pomoc'),
(u'Gry', u'http://feeds.feedburner.com/GryUbuntu-pomoc')]
]

View File

@ -0,0 +1,28 @@
__license__ = 'GPL v3'
from calibre.web.feeds.news import BasicNewsRecipe
class WebSecurity(BasicNewsRecipe):
title = u'WebSecurity'
__author__ = 'fenuks'
description = u'WebSecurity.pl to największy w Polsce portal o bezpieczeństwie sieciowym.'
category = ''
#publication_type = ''
language = 'pl'
#encoding = ''
#extra_css = ''
cover_url = 'http://websecurity.pl/images/websecurity-logo.png'
masthead_url = ''
use_embedded_content = False
oldest_article = 7
max_articles_per_feed = 100
no_stylesheets = True
remove_empty_feeds = True
remove_javascript = True
remove_attributes = ['style', 'font']
ignore_duplicate_articles = {'title', 'url'}
keep_only_tags = [dict(attrs={'class':'article single'}), dict(id='content')]
remove_tags = [dict(attrs={'class':['sociable', 'no-comments']})]
remove_tags_after = dict(attrs={'class':'sociable'})
feeds = [(u'Wszystkie', u'http://websecurity.pl/feed/'), (u'Aktualno\u015bci', u'http://websecurity.pl/aktualnosci/feed/'), (u'Artyku\u0142y', u'http://websecurity.pl/artykuly/feed/'), (u'Blogosfera', u'http://websecurity.pl/blogosfera/wpisy/feed/')]

View File

@ -1,30 +1,30 @@
from calibre.web.feeds.news import BasicNewsRecipe
class WirtualneMedia(BasicNewsRecipe):
title = u'wirtualnemedia.pl'
title = u'wirtualnemedia.pl'
oldest_article = 7
max_articles_per_feed = 100
no_stylesheets = True
use_embedded_content = False
remove_empty_feeds = True
__author__ = 'fenuks'
description = u'Portal o mediach, reklamie, internecie, PR, telekomunikacji - nr 1 w Polsce - WirtualneMedia.pl - wiadomości z pierwszej ręki.'
category = 'internet'
language = 'pl'
__author__ = 'fenuks'
extra_css = '.thumbnail {float:left; max-width:150px; margin-right:5px;}'
description = u'Portal o mediach, reklamie, internecie, PR, telekomunikacji - nr 1 w Polsce - WirtualneMedia.pl - wiadomości z pierwszej ręki.'
category = 'internet'
language = 'pl'
ignore_duplicate_articles = {'title', 'url'}
masthead_url= 'http://i.wp.pl/a/f/jpeg/8654/wirtualnemedia.jpeg'
cover_url= 'http://static.wirtualnemedia.pl/img/logo_wirtualnemedia_newsletter.gif'
remove_tags=[dict(id=['header', 'footer'])]
feeds = [(u'Gospodarka', u'http://www.wirtualnemedia.pl/rss/wm_gospodarka.xml'),
(u'Internet', u'http://www.wirtualnemedia.pl/rss/wm_internet.xml'),
(u'Kultura', u'http://www.wirtualnemedia.pl/rss/wm_kulturarozrywka.xml'),
(u'Badania', u'http://www.wirtualnemedia.pl/rss/wm_marketing.xml'),
(u'Prasa', u'http://www.wirtualnemedia.pl/rss/wm_prasa.xml'),
(u'Radio', u'http://www.wirtualnemedia.pl/rss/wm_radio.xml'),
(u'Reklama', u'http://www.wirtualnemedia.pl/rss/wm_reklama.xml'),
(u'PR', u'http://www.wirtualnemedia.pl/rss/wm_relations.xml'),
(u'Technologie', u'http://www.wirtualnemedia.pl/rss/wm_telekomunikacja.xml'),
(u'Telewizja', u'http://www.wirtualnemedia.pl/rss/wm_telewizja_rss.xml')
]
feeds = [(u'Gospodarka', u'http://www.wirtualnemedia.pl/rss/wm_gospodarka.xml'),
(u'Internet', u'http://www.wirtualnemedia.pl/rss/wm_internet.xml'),
(u'Kultura', u'http://www.wirtualnemedia.pl/rss/wm_kulturarozrywka.xml'),
(u'Badania', u'http://www.wirtualnemedia.pl/rss/wm_marketing.xml'),
(u'Prasa', u'http://www.wirtualnemedia.pl/rss/wm_prasa.xml'),
(u'Radio', u'http://www.wirtualnemedia.pl/rss/wm_radio.xml'),
(u'Reklama', u'http://www.wirtualnemedia.pl/rss/wm_reklama.xml'),
(u'PR', u'http://www.wirtualnemedia.pl/rss/wm_relations.xml'),
(u'Technologie', u'http://www.wirtualnemedia.pl/rss/wm_telekomunikacja.xml'),
(u'Telewizja', u'http://www.wirtualnemedia.pl/rss/wm_telewizja_rss.xml')]
def print_version(self, url):
return url.replace('artykul', 'print')
return url.replace('artykul', 'print')

View File

@ -1,5 +1,6 @@
# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:fdm=marker:ai
from calibre.web.feeds.news import BasicNewsRecipe
class ZTS(BasicNewsRecipe):
title = u'Zaufana Trzecia Strona'
__author__ = 'fenuks'
@ -7,6 +8,7 @@ class ZTS(BasicNewsRecipe):
category = 'IT, security'
language = 'pl'
cover_url = 'http://www.zaufanatrzeciastrona.pl/wp-content/uploads/2012/08/z3s_h100.png'
extra_css = '.thumbnail {float: left; margin-right:5px;}'
oldest_article = 7
max_articles_per_feed = 100
no_stylesheets = True

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -9,14 +9,14 @@ msgstr ""
"Project-Id-Version: calibre\n"
"Report-Msgid-Bugs-To: FULL NAME <EMAIL@ADDRESS>\n"
"POT-Creation-Date: 2011-11-25 14:01+0000\n"
"PO-Revision-Date: 2013-02-26 12:21+0000\n"
"Last-Translator: Miguel Angel del Olmo <silinio45@gmail.com>\n"
"PO-Revision-Date: 2013-03-19 21:03+0000\n"
"Last-Translator: Jorge Luis Granda <costeelation@hotmail.com>\n"
"Language-Team: Español; Castellano <>\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"X-Launchpad-Export-Date: 2013-02-27 04:37+0000\n"
"X-Generator: Launchpad (build 16506)\n"
"X-Launchpad-Export-Date: 2013-03-20 04:42+0000\n"
"X-Generator: Launchpad (build 16532)\n"
#. name for aaa
msgid "Ghotuo"
@ -9808,7 +9808,7 @@ msgstr "Huave; San Mateo Del Mar"
#. name for huw
msgid "Hukumina"
msgstr ""
msgstr "Hukumina"
#. name for hux
msgid "Huitoto; Nüpode"
@ -9816,15 +9816,15 @@ msgstr "Huitoto; Nipode"
#. name for huy
msgid "Hulaulá"
msgstr ""
msgstr "Hulaulá"
#. name for huz
msgid "Hunzib"
msgstr ""
msgstr "Hunzib"
#. name for hvc
msgid "Haitian Vodoun Culture Language"
msgstr ""
msgstr "Idioma de la cultura haitiana vodoun"
#. name for hve
msgid "Huave; San Dionisio Del Mar"
@ -9832,11 +9832,11 @@ msgstr "Huave; San Dionisio Del Mar"
#. name for hvk
msgid "Haveke"
msgstr ""
msgstr "Haveke"
#. name for hvn
msgid "Sabu"
msgstr ""
msgstr "Sabu"
#. name for hvv
msgid "Huave; Santa María Del Mar"
@ -9844,7 +9844,7 @@ msgstr "Huave; Santa María Del Mar"
#. name for hwa
msgid "Wané"
msgstr ""
msgstr "Wané"
#. name for hwc
msgid "Creole English; Hawai'i"
@ -9856,7 +9856,7 @@ msgstr ""
#. name for hya
msgid "Hya"
msgstr ""
msgstr "Hya"
#. name for hye
msgid "Armenian"
@ -9864,7 +9864,7 @@ msgstr "Armenio"
#. name for iai
msgid "Iaai"
msgstr ""
msgstr "Iaai"
#. name for ian
msgid "Iatmul"
@ -30664,31 +30664,31 @@ msgstr ""
#. name for zpu
msgid "Zapotec; Yalálag"
msgstr ""
msgstr "Zapotec; Yalálag"
#. name for zpv
msgid "Zapotec; Chichicapan"
msgstr ""
msgstr "Zapotec; Chichicapan"
#. name for zpw
msgid "Zapotec; Zaniza"
msgstr ""
msgstr "Zapotec; Zaniza"
#. name for zpx
msgid "Zapotec; San Baltazar Loxicha"
msgstr ""
msgstr "Zapotec; San Baltazar Loxicha"
#. name for zpy
msgid "Zapotec; Mazaltepec"
msgstr ""
msgstr "Zapotec; Mazaltepec"
#. name for zpz
msgid "Zapotec; Texmelucan"
msgstr ""
msgstr "Zapotec; Texmelucan"
#. name for zqe
msgid "Zhuang; Qiubei"
msgstr ""
msgstr "Zhuang; Qiubei"
#. name for zra
msgid "Kara (Korea)"
@ -30732,7 +30732,7 @@ msgstr "Malayo estándar"
#. name for zsr
msgid "Zapotec; Southern Rincon"
msgstr ""
msgstr "Zapotec; Southern Rincon"
#. name for zsu
msgid "Sukurum"
@ -30760,11 +30760,11 @@ msgstr "Zapoteco de Santa Catarina Albarradas"
#. name for ztp
msgid "Zapotec; Loxicha"
msgstr ""
msgstr "Zapotec; Loxicha"
#. name for ztq
msgid "Zapotec; Quioquitani-Quierí"
msgstr ""
msgstr "Zapotec; Quioquitani-Quierí"
#. name for zts
msgid "Zapotec; Tilquiapan"

View File

@ -12,14 +12,14 @@ msgstr ""
"Report-Msgid-Bugs-To: Debian iso-codes team <pkg-isocodes-"
"devel@lists.alioth.debian.org>\n"
"POT-Creation-Date: 2011-11-25 14:01+0000\n"
"PO-Revision-Date: 2013-02-04 07:01+0000\n"
"PO-Revision-Date: 2013-03-16 14:32+0000\n"
"Last-Translator: drMerry <Unknown>\n"
"Language-Team: Dutch <vertaling@vrijschrift.org>\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"X-Launchpad-Export-Date: 2013-02-05 04:44+0000\n"
"X-Generator: Launchpad (build 16468)\n"
"X-Launchpad-Export-Date: 2013-03-17 04:58+0000\n"
"X-Generator: Launchpad (build 16532)\n"
"Language: nl\n"
#. name for aaa
@ -340,7 +340,7 @@ msgstr "Adi"
#. name for adj
msgid "Adioukrou"
msgstr ""
msgstr "Adiokrou"
#. name for adl
msgid "Galo"
@ -352,11 +352,11 @@ msgstr "Adang"
#. name for ado
msgid "Abu"
msgstr ""
msgstr "Abu"
#. name for adp
msgid "Adap"
msgstr ""
msgstr "Adap"
#. name for adq
msgid "Adangbe"
@ -372,7 +372,7 @@ msgstr "Adamorobe gebarentaal"
#. name for adt
msgid "Adnyamathanha"
msgstr ""
msgstr "Adnyamathanha"
#. name for adu
msgid "Aduge"
@ -392,7 +392,7 @@ msgstr "Adyghe"
#. name for adz
msgid "Adzera"
msgstr ""
msgstr "Adzera"
#. name for aea
msgid "Areba"
@ -416,11 +416,11 @@ msgstr "Pashai; noordoost"
#. name for aek
msgid "Haeke"
msgstr ""
msgstr "Haeke"
#. name for ael
msgid "Ambele"
msgstr ""
msgstr "Ambele"
#. name for aem
msgid "Arem"
@ -432,7 +432,7 @@ msgstr "Armeense gebarentaal"
#. name for aeq
msgid "Aer"
msgstr ""
msgstr "Aer"
#. name for aer
msgid "Arrernte; Eastern"
@ -440,7 +440,7 @@ msgstr "Arrernte; oostelijk"
#. name for aes
msgid "Alsea"
msgstr ""
msgstr "Alsea"
#. name for aeu
msgid "Akeu"
@ -468,7 +468,7 @@ msgstr "Andai"
#. name for afe
msgid "Putukwam"
msgstr ""
msgstr "Putukwam"
#. name for afg
msgid "Afghan Sign Language"

View File

@ -13,14 +13,14 @@ msgstr ""
"Report-Msgid-Bugs-To: Debian iso-codes team <pkg-isocodes-"
"devel@lists.alioth.debian.org>\n"
"POT-Creation-Date: 2011-11-25 14:01+0000\n"
"PO-Revision-Date: 2013-02-21 23:51+0000\n"
"PO-Revision-Date: 2013-03-23 10:17+0000\n"
"Last-Translator: Глория Хрусталёва <gloriya@hushmail.com>\n"
"Language-Team: Russian <debian-l10n-russian@lists.debian.org>\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"X-Launchpad-Export-Date: 2013-02-23 05:19+0000\n"
"X-Generator: Launchpad (build 16506)\n"
"X-Launchpad-Export-Date: 2013-03-24 04:45+0000\n"
"X-Generator: Launchpad (build 16540)\n"
"Language: ru\n"
#. name for aaa
@ -5381,7 +5381,7 @@ msgstr ""
#. name for cof
msgid "Colorado"
msgstr ""
msgstr "Колорадо"
#. name for cog
msgid "Chong"
@ -5505,7 +5505,7 @@ msgstr ""
#. name for cqu
msgid "Quechua; Chilean"
msgstr ""
msgstr "Кечуа; Чилийский"
#. name for cra
msgid "Chara"

Some files were not shown because too many files have changed in this diff Show More