Sync to trunk.

This commit is contained in:
John Schember 2011-09-21 18:40:51 -04:00
commit e7d0944fc2
439 changed files with 279013 additions and 440395 deletions

View File

@ -19,6 +19,169 @@
# new recipes:
# - title:
- version: 0.8.19
date: 2011-09-16
new features:
- title: "Driver for Sony Ericsson Xperia Arc"
- title: "MOBI Output: Add option in Preferences->Output Options->MOBI Output to enable the share via Facebook feature for calibre produced MOBI files. Note that enabling this disables the sync last read position across multiple devices feature. Don't ask me why, ask Amazon."
- title: "Content server: Update metadata when sending MOBI as well as EPUB files"
- title: "News download: Add an auto_cleanup_keep variable that allows recipe writers to tell the auto cleanup to never remove a specified element"
- title: "Conversion: Remove paragraph spacing: If you set the indent size negative, calibre will now leave the indents specified in the input document"
bug fixes:
- title: "Fix regression in 0.8.18 that broke PDF Output"
- title: "MOBI Output: Revert change in 0.8.18 that marked news downloads with a single section as blogs, as the Kindle does not auto archive them"
- title: "PDF output on OSX now generates proper non image based documents"
- title: "RTF Input: Fix handling of internal links and underlined text"
tickets: [845328]
- title: "Fix language sometimes not getting set when downloading metadata in the edit metadata dialog"
- title: "Fix regression that broke killing of multiple jobs"
tickets: [850764]
- title: "Fix bug processing author names with initials when downloading metadata from ozon.ru."
tickets: [845420]
- title: "Fix a memory leak in the Copy to library operation which also fixes the metadata.db being held open in the destination library"
tickets: [849469]
- title: "Keyboard shortcuts: Allow use of symbol keys like >,*,etc."
tickets: [847378]
- title: "EPUB Output: When splitting be a little cleverer about discarding 'empty' pages"
improved recipes:
- Twitch Films
- Japan Times
- People/US Magazine mashup
- Business World India
- Inquirer.net
- Guardian/Observer
new recipes:
- title: RT
author: Darko Miletic
- title: CIO Magazine
author: Julio Map
- title: India Today and Hindustan Times
author: Krittika Goyal
- title: Pagina 12 Print Edition
author: Pablo Marfil
- version: 0.8.18
date: 2011-09-09
new features:
- title: "Kindle news download: On Kindle 3 and newer have the View Articles and Sections menu remember the current article."
tickets: [748741]
- title: "Conversion: Add option to unsmarten puctuation under Look & Feel"
- title: "Driver of Motorola Ex124G and Pandigital Nova Tablet"
- title: "Allow downloading metadata from amazon.co.jp. To use it, configure the amazon metadata source to use the Japanese amazon site."
tickets: [842447]
- title: "When automatically generating author sort for author name, ignore common prefixes like Mr. Dr. etc. Controllable via tweak. Also add a tweak to allow control of how a string is split up into multiple authors."
tickets: [795984]
- title: "TXT Output: Preserve as much formatting as possible when generating Markdown output including various CSS styles"
bug fixes:
- title: "Fix pubdate incorrect when used in save to disk template in timezones ahead of GMT."
tickets: [844445]
- title: "When attempting to stop multiple device jobs at once, only show a single error message"
tickets: [841588]
- title: "Fix conversion of large EPUB files to PDF erroring out on systems with a limited number of available file handles"
tickets: [816616]
- title: "EPUB catalog generation: Fix some entries going off the left edge of the page for unread/wishlist items"
- title: "When setting language in an EPUB file always use the 2 letter language code in preference to the three letter code, when possible."
tickets: [841201]
- title: "Content server: Fix --url-prefix not used for links in the book details view."
- title: "MOBI Input: When links in a MOBI file point to just before block elements, and there is a page break on the block element, the links can end up pointing to the wrong place on conversion. Adjust the location in such cases to point to the block element directly."
improved recipes:
- Kopalnia Wiedzy
- FilmWeb.pl
- Philadelphia Inquirer
- Honolulu Star Advertiser
- Counterpunch
- Philadelphia Inquirer
new recipes:
- title: Various Polish news sources
author: fenuks
- version: 0.8.17
date: 2011-09-02
new features:
- title: "Basic support for Amazon AZW4 format (PDF wrapped inside a MOBI)"
- title: "When showing the cover browser in a separate window, allow the use of the V, D shortcut keys to view the current book and send it to device respectively."
tickets: [836402]
- title: "Add an option in Preferences->Miscellaneous to abort conversion jobs that take too long."
tickets: [835233]
- title: "Driver for HTC Evo and HP TouchPad (with kindle app)"
- title: "Preferences->Adding books, detect when the user specifies a test expression with no file extension and popup a warning"
bug fixes:
- title: "E-book viewer: Ensure toolbars are always visible"
- title: "Content server: Fix grouping of Tags/authors not working for some non english languages with Internet Explorer"
tickets: [835238]
- title: "When downloading metadata from amazon, fix italics inside brackets getting lost."
tickets: [836857]
- title: "Get Books: Add EscapeMagazine.pl and RW2010.pl stores"
- title: "Conversion pipeline: Fix conversion of cm/mm to pts. Fixes use of cm as a length unit when converting to MOBI."
- title: "When showing the cover browser in a separate window, focus the cover browser so that keyboard shortcuts work immediately."
tickets: [835933]
- title: "HTMLZ Output: Fix special chars like ampersands, etc. not being converted to entities"
- title: "Keyboard shortcuts config: Fix clicking done in the shortcut editor with shortcuts set to default caused the displayed shortcut to be always set to None"
- title: "Fix bottom most entries in keyboard shortcuts not editable"
improved recipes:
- Hacker News
- Nikkei News
new recipes:
- title: "Haber 7 and Hira"
authors: thomass
- title: "NTV and NTVSpor by A Erdogan"
author: A Erdogan
- version: 0.8.16
date: 2011-08-26

View File

@ -0,0 +1,38 @@
from calibre.web.feeds.news import BasicNewsRecipe
class Adventure_zone(BasicNewsRecipe):
title = u'Adventure Zone'
__author__ = 'fenuks'
description = 'Adventure zone - adventure games from A to Z'
category = 'games'
language = 'pl'
oldest_article = 15
max_articles_per_feed = 100
no_stylesheets = True
remove_tags_before= dict(name='td', attrs={'class':'main-bg'})
remove_tags_after= dict(name='td', attrs={'class':'main-body middle-border'})
extra_css = '.main-bg{text-align: left;} td.capmain{ font-size: 22px; }'
feeds = [(u'Nowinki', u'http://www.adventure-zone.info/fusion/feeds/news.php')]
def get_cover_url(self):
soup = self.index_to_soup('http://www.adventure-zone.info/fusion/news.php')
cover=soup.find(id='box_OstatninumerAZ')
self.cover_url='http://www.adventure-zone.info/fusion/'+ cover.center.a.img['src']
return getattr(self, 'cover_url', self.cover_url)
def skip_ad_pages(self, soup):
skip_tag = soup.body.findAll(name='a')
if skip_tag is not None:
for r in skip_tag:
if 'articles.php?' in r['href']:
if r.strong is not None:
word=r.strong.string
if ('zapowied' or 'recenzj') in word:
return self.index_to_soup('http://www.adventure-zone.info/fusion/print.php?type=A&item_id'+r['href'][r['href'].find('_id')+3:], raw=True)
else:
None
def print_version(self, url):
return url.replace('news.php?readmore', 'print.php?type=N&item_id')

View File

@ -0,0 +1,18 @@
from calibre.web.feeds.news import BasicNewsRecipe
class AstroNEWS(BasicNewsRecipe):
title = u'AstroNEWS'
__author__ = 'fenuks'
description = 'AstroNEWS- astronomy every day'
category = 'astronomy, science'
language = 'pl'
oldest_article = 8
max_articles_per_feed = 100
auto_cleanup = True
cover_url='http://news.astronet.pl/img/logo_news.jpg'
# no_stylesheets= True
feeds = [(u'Wiadomości', u'http://news.astronet.pl/rss.cgi')]
def print_version(self, url):
return url.replace('astronet.pl/', 'astronet.pl/print.cgi?')

View File

@ -0,0 +1,15 @@
from calibre.web.feeds.news import BasicNewsRecipe
class Astronomia_pl(BasicNewsRecipe):
title = u'Astronomia.pl'
__author__ = 'fenuks'
description = 'Astronomia - polish astronomy site'
cover_url = 'http://www.astronomia.pl/grafika/logo.gif'
category = 'astronomy, science'
language = 'pl'
oldest_article = 8
max_articles_per_feed = 100
#no_stylesheets=True
remove_tags_before=dict(name='div', attrs={'id':'a1'})
keep_only_tags=[dict(name='div', attrs={'id':['a1', 'h2']})]
feeds = [(u'Wiadomości z astronomii i astronautyki', u'http://www.astronomia.pl/rss/')]

View File

@ -1,15 +1,52 @@
from calibre.web.feeds.news import BasicNewsRecipe
class Bash_org_pl(BasicNewsRecipe):
title = u'Bash.org.pl'
__author__ = 'fenuks'
description = 'Bash.org.pl - funny quotations from IRC discussions'
category = 'funny quotations, humour'
language = 'pl'
oldest_article = 15
cover_url = u'http://userlogos.org/files/logos/dzikiosiol/none_0.png'
max_articles_per_feed = 100
max_articles_per_feed = 50
no_stylesheets= True
keep_only_tags= [dict(name='div', attrs={'class':'quote post-content post-body'})]
feeds = [(u'Cytaty', u'http://bash.org.pl/rss')]
keep_only_tags= [dict(name='a', attrs={'class':'qid click'}),
dict(name='div', attrs={'class':'quote post-content post-body'})]
def latest_articles(self):
articles = []
soup=self.index_to_soup(u'http://bash.org.pl/latest/')
#date=soup.find('div', attrs={'class':'right'}).string
tags=soup.findAll('a', attrs={'class':'qid click'})
for a in tags:
title=a.string
url='http://bash.org.pl' +a['href']
articles.append({'title' : title,
'url' : url,
'date' : '',
'description' : ''
})
return articles
def random_articles(self):
articles = []
for i in range(self.max_articles_per_feed):
soup=self.index_to_soup(u'http://bash.org.pl/random/')
#date=soup.find('div', attrs={'class':'right'}).string
url=soup.find('a', attrs={'class':'qid click'})
title=url.string
url='http://bash.org.pl' +url['href']
articles.append({'title' : title,
'url' : url,
'date' : '',
'description' : ''
})
return articles
def parse_index(self):
feeds = []
feeds.append((u"Najnowsze", self.latest_articles()))
feeds.append((u"Losowe", self.random_articles()))
return feeds

View File

@ -0,0 +1,70 @@
from calibre.web.feeds.news import BasicNewsRecipe
import re
class Benchmark_pl(BasicNewsRecipe):
title = u'Benchmark.pl'
__author__ = 'fenuks'
description = u'benchmark.pl -IT site'
cover_url = 'http://www.ieaddons.pl/benchmark/logo_benchmark_new.gif'
category = 'IT'
language = 'pl'
oldest_article = 8
max_articles_per_feed = 100
no_stylesheets=True
preprocess_regexps = [(re.compile(ur'\bWięcej o .*</body>', re.DOTALL|re.IGNORECASE), lambda match: '</body>')]
keep_only_tags=[dict(name='div', attrs={'class':['m_zwykly', 'gallery']})]
remove_tags_after=dict(name='div', attrs={'class':'body'})
remove_tags=[dict(name='div', attrs={'class':['kategoria', 'socialize', 'thumb', 'panelOcenaObserwowane', 'categoryNextToSocializeGallery']})]
INDEX= 'http://www.benchmark.pl'
feeds = [(u'Aktualności', u'http://www.benchmark.pl/rss/aktualnosci-pliki.xml'),
(u'Testy i recenzje', u'http://www.benchmark.pl/rss/testy-recenzje-minirecenzje.xml')]
def append_page(self, soup, appendtag):
nexturl = soup.find('span', attrs={'class':'next'})
while nexturl is not None:
nexturl= self.INDEX + nexturl.parent['href']
soup2 = self.index_to_soup(nexturl)
nexturl=soup2.find('span', attrs={'class':'next'})
pagetext = soup2.find(name='div', attrs={'class':'body'})
appendtag.find('div', attrs={'class':'k_ster'}).extract()
pos = len(appendtag.contents)
appendtag.insert(pos, pagetext)
if appendtag.find('div', attrs={'class':'k_ster'}) is not None:
appendtag.find('div', attrs={'class':'k_ster'}).extract()
def image_article(self, soup, appendtag):
nexturl=soup.find('div', attrs={'class':'preview'})
if nexturl is not None:
nexturl=nexturl.find('a', attrs={'class':'move_next'})
image=appendtag.find('div', attrs={'class':'preview'}).div['style'][16:]
image=self.INDEX + image[:image.find("')")]
appendtag.find(attrs={'class':'preview'}).name='img'
appendtag.find(attrs={'class':'preview'})['src']=image
appendtag.find('a', attrs={'class':'move_next'}).extract()
while nexturl is not None:
nexturl= self.INDEX + nexturl['href']
soup2 = self.index_to_soup(nexturl)
nexturl=soup2.find('a', attrs={'class':'move_next'})
image=soup2.find('div', attrs={'class':'preview'}).div['style'][16:]
image=self.INDEX + image[:image.find("')")]
soup2.find(attrs={'class':'preview'}).name='img'
soup2.find(attrs={'class':'preview'})['src']=image
pagetext=soup2.find('div', attrs={'class':'gallery'})
pagetext.find('div', attrs={'class':'title'}).extract()
pagetext.find('div', attrs={'class':'thumb'}).extract()
pagetext.find('div', attrs={'class':'panelOcenaObserwowane'}).extract()
if nexturl is not None:
pagetext.find('a', attrs={'class':'move_next'}).extract()
pagetext.find('a', attrs={'class':'move_back'}).extract()
pos = len(appendtag.contents)
appendtag.insert(pos, pagetext)
def preprocess_html(self, soup):
if soup.find('div', attrs={'class':'preview'}) is not None:
self.image_article(soup, soup.body)
else:
self.append_page(soup, soup.body)
return soup

View File

@ -0,0 +1,61 @@
from calibre.web.feeds.recipes import BasicNewsRecipe
import re
class SportsIllustratedRecipe(BasicNewsRecipe) :
__author__ = 'ape'
__copyright__ = 'ape'
__license__ = 'GPL v3'
language = 'de'
description = 'Berliner Zeitung'
version = 2
title = u'Berliner Zeitung'
timefmt = ' [%d.%m.%Y]'
no_stylesheets = True
remove_javascript = True
use_embedded_content = False
publication_type = 'newspaper'
keep_only_tags = [dict(name='div', attrs={'class':'teaser t_split t_artikel'})]
INDEX = 'http://www.berlinonline.de/berliner-zeitung/'
def parse_index(self):
base = 'http://www.berlinonline.de'
answer = []
articles = {}
more = 1
soup = self.index_to_soup(self.INDEX)
# Get list of links to ressorts from index page
ressort_list = soup.findAll('ul', attrs={'class': re.compile('ressortlist')})
for ressort in ressort_list[0].findAll('a'):
feed_title = ressort.string
print 'Analyzing', feed_title
if not articles.has_key(feed_title):
articles[feed_title] = []
answer.append(feed_title)
# Load ressort page.
feed = self.index_to_soup('http://www.berlinonline.de' + ressort['href'])
# find mainbar div which contains the list of all articles
for article_container in feed.findAll('div', attrs={'class': re.compile('mainbar')}):
# iterate over all articles
for article_teaser in article_container.findAll('div', attrs={'class': re.compile('teaser')}):
# extract title of article
if article_teaser.h3 != None:
article = {'title' : article_teaser.h3.a.string, 'date' : u'', 'url' : base + article_teaser.h3.a['href'], 'description' : u''}
articles[feed_title].append(article)
else:
# Skip teasers for missing photos
if article_teaser.div.p.contents[0].find('Foto:') > -1:
continue
article = {'title': 'Weitere Artikel ' + str(more), 'date': u'', 'url': base + article_teaser.div.p.a['href'], 'description': u''}
articles[feed_title].append(article)
more += 1
answer = [[key, articles[key]] for key in answer if articles.has_key(key)]
return answer
def get_masthead_url(self):
return 'http://www.berlinonline.de/.img/berliner-zeitung/blz_logo.gif'

View File

@ -1,55 +1,50 @@
#!/usr/bin/env python
__license__ = 'GPL v3'
__copyright__ = '2008, Kovid Goyal kovid@kovidgoyal.net'
__docformat__ = 'restructuredtext en'
__copyright__ = '2008 Kovid Goyal kovid@kovidgoyal.net, 2010 Darko Miletic <darko.miletic at gmail.com>'
'''
businessweek.com
www.businessweek.com
'''
from calibre.web.feeds.news import BasicNewsRecipe
class BusinessWeek(BasicNewsRecipe):
title = 'Business Week'
description = 'Business News, Stock Market and Financial Advice'
__author__ = 'ChuckEggDotCom and Sujata Raman'
language = 'en'
__author__ = 'Kovid Goyal and Darko Miletic'
description = 'Read the latest international business news & stock market news. Get updated company profiles, financial advice, global economy and technology news.'
publisher = 'Bloomberg L.P.'
category = 'Business, business news, stock market, stock market news, financial advice, company profiles, financial advice, global economy, technology news'
oldest_article = 7
max_articles_per_feed = 10
max_articles_per_feed = 200
no_stylesheets = True
encoding = 'utf8'
use_embedded_content = False
language = 'en'
remove_empty_feeds = True
publication_type = 'magazine'
cover_url = 'http://images.businessweek.com/mz/covers/current_120x160.jpg'
masthead_url = 'http://assets.businessweek.com/images/bw-logo.png'
extra_css = """
body{font-family: Helvetica,Arial,sans-serif }
img{margin-bottom: 0.4em; display:block}
.tagline{color: gray; font-style: italic}
.photoCredit{font-size: small; color: gray}
"""
recursions = 1
match_regexps = [r'http://www.businessweek.com/.*_page_[1-9].*']
extra_css = '''
h1{font-family :Arial,Helvetica,sans-serif; font-size:large;}
.news_story_title{font-family :Arial,Helvetica,sans-serif; font-size:large;font-weight:bold;}
h2{font-family :Arial,Helvetica,sans-serif; font-size:medium;color:#666666;}
h3{text-transform:uppercase;font-family :Arial,Helvetica,sans-serif; font-size:large;font-weight:bold;}
h4{font-family :Arial,Helvetica,sans-serif; font-size:small;font-weight:bold;}
p{font-family :Arial,Helvetica,sans-serif; }
#lede600{font-size:x-small;}
#storybody{font-size:x-small;}
p{font-family :Arial,Helvetica,sans-serif;}
.strap{font-family :Arial,Helvetica,sans-serif; font-size:x-small; color:#064599;}
.byline{font-family :Arial,Helvetica,sans-serif; font-size:x-small;}
.postedBy{font-family :Arial,Helvetica,sans-serif; font-size:x-small;color:#666666;}
.trackback{font-family :Arial,Helvetica,sans-serif; font-size:x-small;color:#666666;}
.date{font-family :Arial,Helvetica,sans-serif; font-size:x-small;color:#666666;}
.wrapper{font-family :Arial,Helvetica,sans-serif; font-size:x-small;}
.photoCredit{font-family :Arial,Helvetica,sans-serif; font-size:x-small;color:#666666;}
.tagline{font-family :Arial,Helvetica,sans-serif; font-size:x-small;color:#666666;}
.pageCount{color:#666666;font-family :Arial,Helvetica,sans-serif; font-size:x-small;}
.note{font-family :Arial,Helvetica,sans-serif; font-size:small;color:#666666;font-style:italic;}
.highlight{font-family :Arial,Helvetica,sans-serif; font-size:small;background-color:#FFF200;}
.annotation{font-family :Arial,Helvetica,sans-serif; font-size:x-small;color:#666666;}
'''
conversion_options = {
'comment' : description
, 'tags' : category
, 'publisher' : publisher
, 'language' : language
}
remove_tags = [ dict(name='div', attrs={'id':["log","feedback","footer","secondarynav","secondnavbar","header","email","bw2-header","column2","wrapper-bw2-footer","wrapper-mgh-footer","inset","commentForm","commentDisplay","bwExtras","bw2-umbrella","readerComments","leg","rightcol"]}),
dict(name='div', attrs={'class':["menu",'sponsorbox smallertext',"TopNavTile","graybottom leaderboard"]}),
dict(name='img', alt ="News"),
dict(name='td', width ="1"),
remove_tags = [
dict(attrs={'class':'inStory'})
,dict(name=['meta','link','iframe','base','embed','object','table','th','tr','td'])
,dict(attrs={'id':['inset','videoDisplay']})
]
keep_only_tags = [dict(name='div', attrs={'id':['story-body','storyBody','article_body','articleBody']})]
remove_attributes = ['lang']
match_regexps = [r'http://www.businessweek.com/.*_page_[1-9].*']
feeds = [
(u'Top Stories', u'http://www.businessweek.com/topStories/rss/topStories.rss'),
@ -75,19 +70,36 @@ class BusinessWeek(BasicNewsRecipe):
]
def get_article_url(self, article):
url = article.get('guid', None)
if 'podcasts' in url:
return None
if 'surveys' in url:
return None
if 'images' in url:
return None
if 'feedroom' in url:
return None
if '/magazine/toc/' in url:
return None
rurl, sep, rest = url.rpartition('?')
if rurl:
return rurl
return rest
if 'podcasts' in url or 'surveys' in url:
url = None
def print_version(self, url):
if '/news/' in url or '/blog/' in url:
return url
if '/magazine' in url:
rurl = url.replace('http://www.businessweek.com/','http://www.businessweek.com/printer/')
else:
rurl = url.replace('http://www.businessweek.com/','http://www.businessweek.com/print/')
return rurl.replace('/investing/','/investor/')
def postprocess_html(self, soup, first):
for tag in soup.findAll(name=['ul','li','table','td','tr','span']):
tag.name = 'div'
for tag in soup.findAll(name= 'div',attrs={ 'id':'pageNav'}):
tag.extract()
def preprocess_html(self, soup):
for item in soup.findAll(style=True):
del item['style']
for alink in soup.findAll('a'):
if alink.string is not None:
tstr = alink.string
alink.replaceWith(tstr)
return soup

View File

@ -4,95 +4,73 @@ __copyright__ = '2009-2010, Darko Miletic <darko.miletic at gmail.com>'
www.businessworld.in
'''
from calibre import strftime
import re
from calibre.web.feeds.news import BasicNewsRecipe
class BusinessWorldMagazine(BasicNewsRecipe):
title = 'Business World Magazine'
__author__ = 'Darko Miletic'
__author__ = 'Kovid Goyal'
description = 'News from India'
publisher = 'ABP Pvt Ltd Publication'
category = 'news, politics, finances, India, Asia'
delay = 1
no_stylesheets = True
INDEX = 'http://www.businessworld.in/bw/Magazine_Current_Issue'
INDEX = 'http://www.businessworld.in/businessworld/magazine_latest_issue.php'
ROOT = 'http://www.businessworld.in'
use_embedded_content = False
encoding = 'utf-8'
language = 'en_IN'
extra_css = """
img{display: block; margin-bottom: 0.5em}
body{font-family: Arial,Helvetica,sans-serif}
h2{color: gray; display: block}
"""
conversion_options = {
'comment' : description
, 'tags' : category
, 'publisher' : publisher
, 'language' : language
}
def is_in_list(self,linklist,url):
for litem in linklist:
if litem == url:
return True
return False
auto_cleanup = True
def parse_index(self):
br = self.browser
br.open(self.ROOT)
raw = br.open(br.click_link(text_regex=re.compile('Current.*Issue',
re.I))).read()
soup = self.index_to_soup(raw)
mc = soup.find(attrs={'class':'mag_cover'})
if mc is not None:
img = mc.find('img', src=True)
if img is not None:
self.cover_url = img['src']
feeds = []
current_section = None
articles = []
linklist = []
soup = self.index_to_soup(self.INDEX)
for tag in soup.findAll(['h3', 'h2']):
inner_a = tag.find('a')
if tag.name == 'h3' and inner_a is not None:
continue
if tag.name == 'h2' and (inner_a is None or current_section is
None):
continue
if tag.name == 'h3':
if current_section is not None and articles:
feeds.append((current_section, articles))
current_section = self.tag_to_string(tag)
self.log('Found section:', current_section)
articles = []
elif tag.name == 'h2':
url = inner_a.get('href', None)
if url is None: continue
if url.startswith('/'): url = self.ROOT + url
title = self.tag_to_string(inner_a)
h1 = tag.findPreviousSibling('h1')
if h1 is not None:
title = self.tag_to_string(h1) + title
self.log('\tFound article:', title)
articles.append({'title':title, 'url':url, 'date':'',
'description':''})
if current_section and articles:
feeds.append((current_section, articles))
return feeds
tough = soup.find('div', attrs={'id':'tough'})
if tough:
for item in tough.findAll('h1'):
description = ''
title_prefix = ''
feed_link = item.find('a')
if feed_link and feed_link.has_key('href'):
url = self.ROOT + feed_link['href']
if not self.is_in_list(linklist,url):
title = title_prefix + self.tag_to_string(feed_link)
date = strftime(self.timefmt)
articles.append({
'title' :title
,'date' :date
,'url' :url
,'description':description
})
linklist.append(url)
for item in soup.findAll('div', attrs={'class':'nametitle'}):
description = ''
title_prefix = ''
feed_link = item.find('a')
if feed_link and feed_link.has_key('href'):
url = self.ROOT + feed_link['href']
if not self.is_in_list(linklist,url):
title = title_prefix + self.tag_to_string(feed_link)
date = strftime(self.timefmt)
articles.append({
'title' :title
,'date' :date
,'url' :url
,'description':description
})
linklist.append(url)
return [(soup.head.title.string, articles)]
keep_only_tags = [dict(name='div', attrs={'id':'printwrapper'})]
remove_tags = [dict(name=['object','link','meta','base','iframe','link','table'])]
def print_version(self, url):
return url.replace('/bw/','/bw/storyContent/')
def get_cover_url(self):
cover_url = None
soup = self.index_to_soup(self.INDEX)
cover_item = soup.find('img',attrs={'class':'toughbor'})
if cover_item:
cover_url = self.ROOT + cover_item['src']
return cover_url

40
recipes/cgm_pl.recipe Normal file
View File

@ -0,0 +1,40 @@
from calibre.web.feeds.news import BasicNewsRecipe
class CGM(BasicNewsRecipe):
title = u'CGM'
oldest_article = 7
__author__ = 'fenuks'
description = u'Codzienna Gazeta Muzyczna'
cover_url = 'http://www.krafcy.com/foto/tinymce/Image/cgm%281%29.jpg'
category = 'music'
language = 'pl'
use_embedded_content = False
max_articles_per_feed = 100
no_stylesheers=True
extra_css = 'div {color:black;} strong {color:black;} span {color:black;} p {color:black;}'
remove_tags_before=dict(id='mainContent')
remove_tags_after=dict(name='div', attrs={'class':'fbContainer'})
remove_tags=[dict(name='div', attrs={'class':'fbContainer'}),
dict(name='p', attrs={'class':['tagCloud', 'galleryAuthor']}),
dict(id=['movieShare', 'container'])]
feeds = [(u'Informacje', u'http://www.cgm.pl/rss.xml'), (u'Polecamy', u'http://www.cgm.pl/rss,4,news.xml'),
(u'Recenzje', u'http://www.cgm.pl/rss,1,news.xml')]
def preprocess_html(self, soup):
ad=soup.findAll('img')
for r in ad:
if '/_vault/_article_photos/5841.jpg' in r['src'] or '_vault/_article_photos/5807.jpg' in r['src'] or 'article_photos/5841.jpg' in r['src'] or 'article_photos/5825.jpg' in r['src'] or '_article_photos/5920.jpg' in r['src'] or '_article_photos/5919.jpg' in r['src'] or '_article_photos/5918.jpg' in r['src'] or '_article_photos/5914.jpg' in r['src'] or '_article_photos/5911.jpg' in r['src'] or '_article_photos/5923.jpg' in r['src'] or '_article_photos/5921.jpg' in r['src']:
ad[ad.index(r)].extract()
gallery=soup.find('div', attrs={'class':'galleryFlash'})
if gallery:
img=gallery.find('embed')
if img:
img=img['src'][35:]
img='http://www.cgm.pl/_vault/_gallery/_photo/'+img
param=gallery.findAll(name='param')
for i in param:
i.extract()
gallery.contents[1].name='img'
gallery.contents[1]['src']=img
return soup

29
recipes/china_post.recipe Normal file
View File

@ -0,0 +1,29 @@
from calibre.web.feeds.news import BasicNewsRecipe
class CP(BasicNewsRecipe):
title = u'China Post'
language = 'en_CN'
__author__ = 'Krittika Goyal'
oldest_article = 1 #days
max_articles_per_feed = 25
use_embedded_content = False
no_stylesheets = True
auto_cleanup = True
feeds = [
('Top Stories',
'http://www.chinapost.com.tw/rss/front.xml'),
('Taiwan',
'http://www.chinapost.com.tw/rss/taiwan.xml'),
('China',
'http://www.chinapost.com.tw/rss/china.xml'),
('Business',
'http://www.chinapost.com.tw/rss/business.xml'),
('World',
'http://www.chinapost.com.tw/rss/international.xml'),
('Sports',
'http://www.chinapost.com.tw/rss/sports.xml'),
]

View File

@ -1,35 +1,47 @@
from calibre.web.feeds.news import BasicNewsRecipe
class Cicero(BasicNewsRecipe):
timefmt = ' [%Y-%m-%d]'
title = u'Cicero'
__author__ = 'mad@sharktooth.de'
class BasicUserRecipe1316245412(BasicNewsRecipe):
title = u'Cicero Online'
description = u'Magazin f\xfcr politische Kultur'
oldest_article = 7
publisher = 'Ringier Publishing GmbH'
category = 'news, politics, Germany'
language = 'de'
encoding = 'UTF-8'
__author__ = 'Armin Geller' # Upd. 2011-09-19
oldest_article = 7
max_articles_per_feed = 100
no_stylesheets = True
use_embedded_content = False
publisher = 'Ringier Publishing'
category = 'news, politics, Germany'
encoding = 'iso-8859-1'
publication_type = 'magazine'
masthead_url = 'http://www.cicero.de/img2/cicero_logo_rss.gif'
auto_cleanup = False
# remove_javascript = True
remove_tags = [
dict(name='div', attrs={'id':["header", "navigation", "skip-link", "header-print", "header-print-url", "meta-toolbar", "footer"]}),
dict(name='div', attrs={'class':["region region-sidebar-first column sidebar", "breadcrumb", "breadcrumb-title", "meta", "comment-wrapper",
"field field-name-field-show-teaser-right field-type-list-boolean field-label-above"]}),
dict(name='div', attrs={'title':["Dossier Auswahl"]}),
dict(name='h2', attrs={'class':["title comment-form"]}),
dict(name='form', attrs={'class':["comment-form user-info-from-cookie"]}),
# 2011-09-19 clean-up on first feed historical caricature- and video preview pictures and social icons
dict(name='table', attrs={'class':["mcx-social-horizontal", "page-header"]}), # 2011-09-19
dict(name='div', attrs={'class':["page-header", "view view-alle-karikaturen view-id-alle_karikaturen view-display-id-default view-dom-id-1",
"pagination",
"view view-letzte-videos view-id-letzte_videos view-display-id-default view-dom-id-1"]}), # 2011-09-19
]
feeds = [
(u'Das gesamte Portfolio', u'http://www.cicero.de/rss/rss.php?ress_id='),
#(u'Alle Heft-Inhalte', u'http://www.cicero.de/rss/rss.php?ress_id=heft'),
#(u'Alle Online-Inhalte', u'http://www.cicero.de/rss/rss.php?ress_id=online'),
#(u'Berliner Republik', u'http://www.cicero.de/rss/rss.php?ress_id=4'),
#(u'Weltb\xfchne', u'http://www.cicero.de/rss/rss.php?ress_id=1'),
#(u'Salon', u'http://www.cicero.de/rss/rss.php?ress_id=7'),
#(u'Kapital', u'http://www.cicero.de/rss/rss.php?ress_id=6'),
#(u'Netzst\xfccke', u'http://www.cicero.de/rss/rss.php?ress_id=9'),
#(u'Leinwand', u'http://www.cicero.de/rss/rss.php?ress_id=12'),
#(u'Bibliothek', u'http://www.cicero.de/rss/rss.php?ress_id=15'),
(u'Kolumne - Alle Kolulmnen', u'http://www.cicero.de/rss/rss2.php?ress_id='),
#(u'Kolumne - Schreiber, Berlin', u'http://www.cicero.de/rss/rss2.php?ress_id=35'),
#(u'Kolumne - TV Kritik', u'http://www.cicero.de/rss/rss2.php?ress_id=34')
(u'Das gesamte Portfolio', u'http://www.cicero.de/rss.xml'),
(u'Berliner Republik', u'http://www.cicero.de/berliner-republik.xml'),
(u'Weltb\xfchne', u'http://www.cicero.de/weltbuehne.xml'),
(u'Kapital', u'http://www.cicero.de/kapital.xml'),
(u'Salon', u'http://www.cicero.de/salon.xml'),
(u'Blogs', u'http://www.cicero.de/blogs.xml'), #seems not to be in use at the moment
]
def print_version(self, url):
return 'http://www.cicero.de/page_print.php?' + url.rpartition('?')[2]
return url + '?print'
# def get_cover_url(self):
# return 'http://www.cicero.de/sites/all/themes/cicero/logo.png' # need to find a good logo on their home page!

128
recipes/cio_magazine.recipe Normal file
View File

@ -0,0 +1,128 @@
# Los primeros comentarios son las dificultades que he tenido con el Piton
# Cuando da error UTF8 revisa los comentarios (acentos). En notepad++ Search, Goto, posicion y lo ves.
# Editar con Notepad++ Si pone - donde no debe es que ha indentado mal... Edit - Blank operations - tab to space
# He entendido lo que significa el from... son paths dentro de pylib.zip...
# Con from importa solo un simbolo...con import,la libreria completa
from calibre.web.feeds.news import BasicNewsRecipe
# sys no hace falta... lo intente usar para escribir en stderr
from calibre import strftime
# Para convertir el tiempo del articulo
import string, re
# Para usar expresiones regulares
# Visto en pylib.zip... la primera letra es mayuscula
# Estas dos ultimas han sido un vago intento de establecer una cookie (no usado)
class CIO_Magazine(BasicNewsRecipe):
title = 'CIO Magazine'
oldest_article = 14
max_articles_per_feed = 100
auto_cleanup = True
__author__ = 'Julio Map'
description = 'CIO is the leading information brand for today-s busy Chief information Officer - CIO Magazine bi-monthly '
language = 'en'
encoding = 'utf8'
cover_url = 'http://www.cio.com/homepage/images/hp-cio-logo-linkedin.png'
remove_tags_before = dict(name='div', attrs={'id':'container'})
# Absolutamente innecesario... al final he visto un print_version (ver mas adelante)
# Dentro de una revista dada...
# issue_details contiene el titulo y las secciones de este ejemplar
# DetailModule esta dentro de issue_details contiene las urls y resumenes
# Dentro de un articulo dado...
# Article-default-body contiene el texto. Pero como digo, he encontrado una print_version
no_stylesheets = True
remove_javascript = True
def print_version(self,url):
# A esta funcion le llama el sistema... no hay que llamarla uno mismo (porque seria llamada dos veces)
# Existe una version imprimible de los articulos cambiando
# http://www.cio.com/article/<num>/<titulo> por
# http://www.cio.com/article/print/<num> que contiene todas las paginas dentro del div id=container
if url.startswith('/'):
url = 'http://www.cio.com'+url
segments = url.split('/')
printURL = '/'.join(segments[0:4]) + '/print/' + segments[4] +'#'
return printURL
def parse_index(self):
###########################################################################
# This method should be implemented in recipes that parse a website
# instead of feeds to generate a list of articles. Typical uses are for
# news sources that have a Print Edition webpage that lists all the
# articles in the current print edition. If this function is implemented,
# it will be used in preference to BasicNewsRecipe.parse_feeds().
#
# It must return a list. Each element of the list must be a 2-element
# tuple of the form ('feed title', list of articles).
#
# Each list of articles must contain dictionaries of the form:
#
# {
# 'title' : article title,
# 'url' : URL of print version,
# 'date' : The publication date of the article as a string,
# 'description' : A summary of the article
# 'content' : The full article (can be an empty string). This is used by FullContentProfile
# }
#
# For an example, see the recipe for downloading The Atlantic.
# In addition, you can add 'author' for the author of the article.
###############################################################################
# Primero buscamos cual es la ultima revista que se ha creado
soupinicial = self.index_to_soup('http://www.cio.com/magazine')
# Es el primer enlace que hay en el DIV con class content_body
a= soupinicial.find(True, attrs={'class':'content_body'}).find('a', href=True)
INDEX = re.sub(r'\?.*', '', a['href'])
# Como cio.com usa enlaces relativos, le anteponemos el domain name.
if INDEX.startswith('/'): # protegiendonos de que dejen de usarlos
INDEX = 'http://www.cio.com'+INDEX
# Y nos aseguramos en los logs que lo estamos haciendo bien
print ("INDEX en parse_index: ", INDEX)
# Ya sabemos cual es la revista... procesemosla.
soup = self.index_to_soup(INDEX)
articles = {}
key = None
feeds = []
# Para empezar nos quedamos solo con dos DIV, 'heading' y ' issue_item'
# Del primero sacamos las categorias (key) y del segundo las urls y resumenes
for div in soup.findAll(True,
attrs={'class':['heading', 'issue_item']}):
if div['class'] == 'heading':
key = string.capwords(self.tag_to_string(div.span))
print ("Key: ",key) # Esto es para depurar
articles[key] = []
feeds.append(key)
elif div['class'] == 'issue_item':
a = div.find('a', href=True)
if not a:
continue
url = re.sub(r'\?.*', '', a['href'])
print("url: ",url) # Esto es para depurar
title = self.tag_to_string(a, use_alt=True).strip() # Ya para nota, quitar al final las dos ultimas palabras
pubdate = strftime('%a, %d %b') # No es la fecha de publicacion sino la de colecta
summary = div.find('p') # Dentro de la div 'issue_item' el unico parrafo que hay es el resumen
description = '' # Si hay summary la description sera el summary... si no, la dejamos en blanco
if summary:
description = self.tag_to_string(summary, use_alt=False)
print ("Description = ", description)
feed = key if key is not None else 'Uncategorized' # Esto esta copiado del NY times
if not articles.has_key(feed):
articles[feed] = []
if not 'podcasts' in url:
articles[feed].append(
dict(title=title, url=url, date=pubdate,
description=description,
content=''))
feeds = [(key, articles[key]) for key in feeds if articles.has_key(key)]
return feeds

View File

@ -1,40 +1,10 @@
import re
from lxml.html import parse
from calibre.web.feeds.news import BasicNewsRecipe
class Counterpunch(BasicNewsRecipe):
'''
Parses counterpunch.com for articles
'''
title = 'Counterpunch'
description = 'Daily political opinion from www.Counterpunch.com'
language = 'en'
__author__ = 'O. Emmerson'
keep_only_tags = [dict(name='td', attrs={'width': '522'})]
max_articles_per_feed = 10
title = u'Counterpunch'
oldest_article = 7
max_articles_per_feed = 100
auto_cleanup = True
def parse_index(self):
feeds = []
title, url = 'Counterpunch', 'http://www.counterpunch.com'
articles = self.parse_page(url)
if articles:
feeds.append((title, articles))
return feeds
def parse_page(self, url):
parsed_page = parse(url).getroot()
articles = []
unwanted_text = re.compile('Website\ of\ the|I\ urge\ you|Subscribe\ now|DONATE|\@asis\.com|donation\ button|click\ over\ to\ our')
parsed_articles = [a for a in parsed_page.cssselect("html>body>table tr>td>p[class='style2']") if not unwanted_text.search(a.text_content())]
for art in parsed_articles:
try:
author = art.text
title = art.cssselect("a")[0].text + ' by {0}'.format(author)
art_url = 'http://www.counterpunch.com/' + art.cssselect("a")[0].attrib['href']
articles.append({'title': title, 'url': art_url})
except Exception as e:
e
#print('Handler Error: ', e, 'title :', a.text_content())
pass
return articles
feeds = [(u'Counterpunch', u'http://www.counterpunch.org/category/article/feed/')]

View File

@ -1,5 +1,5 @@
from calibre.web.feeds.news import BasicNewsRecipe
import re
class Dobreprogramy_pl(BasicNewsRecipe):
title = 'Dobreprogramy.pl'
@ -15,6 +15,7 @@ class Dobreprogramy_pl(BasicNewsRecipe):
extra_css = '.title {font-size:22px;}'
oldest_article = 8
max_articles_per_feed = 100
preprocess_regexps = [(re.compile(ur'<div id="\S+360pmp4">Twoja przeglądarka nie obsługuje Flasha i HTML5 lub wyłączono obsługę JavaScript...</div>'), lambda match: '') ]
remove_tags = [dict(name='div', attrs={'class':['komentarze', 'block', 'portalInfo', 'menuBar', 'topBar']})]
keep_only_tags = [dict(name='div', attrs={'class':['mainBar', 'newsContent', 'postTitle title', 'postInfo', 'contentText', 'content']})]
feeds = [(u'Aktualności', 'http://feeds.feedburner.com/dobreprogramy/Aktualnosci'),

17
recipes/dzieje_pl.recipe Normal file
View File

@ -0,0 +1,17 @@
from calibre.web.feeds.news import BasicNewsRecipe
class Dzieje(BasicNewsRecipe):
title = u'dzieje.pl'
__author__ = 'fenuks'
description = 'Dzieje - history of Poland'
cover_url = 'http://www.dzieje.pl/sites/default/files/dzieje_logo.png'
category = 'history'
language = 'pl'
oldest_article = 8
max_articles_per_feed = 100
remove_javascript=True
no_stylesheets= True
remove_tags_before= dict(name='h1', attrs={'class':'title'})
remove_tags_after= dict(id='dogory')
remove_tags=[dict(id='dogory')]
feeds = [(u'Dzieje', u'http://dzieje.pl/rss.xml')]

View File

@ -77,30 +77,21 @@ class Economist(BasicNewsRecipe):
continue
self.log('Found section: %s'%section_title)
articles = []
for h5 in section.findAll('h5'):
article_title = self.tag_to_string(h5).strip()
if not article_title:
continue
data = h5.findNextSibling(attrs={'class':'article'})
if data is None: continue
a = data.find('a', href=True)
if a is None: continue
url = a['href']
if url.startswith('/'): url = 'http://www.economist.com'+url
url += '/print'
article_title += ': %s'%self.tag_to_string(a).strip()
articles.append({'title':article_title, 'url':url,
'description':'', 'date':''})
if not articles:
# We have last or first section
for art in section.findAll(attrs={'class':'article'}):
a = art.find('a', href=True)
subsection = ''
for node in section.findAll(attrs={'class':'article'}):
subsec = node.findPreviousSibling('h5')
if subsec is not None:
subsection = self.tag_to_string(subsec)
prefix = (subsection+': ') if subsection else ''
a = node.find('a', href=True)
if a is not None:
url = a['href']
if url.startswith('/'): url = 'http://www.economist.com'+url
url += '/print'
title = self.tag_to_string(a)
if title:
title = prefix + title
self.log('\tFound article:', title)
articles.append({'title':title, 'url':url,
'description':'', 'date':''})

View File

@ -69,30 +69,21 @@ class Economist(BasicNewsRecipe):
continue
self.log('Found section: %s'%section_title)
articles = []
for h5 in section.findAll('h5'):
article_title = self.tag_to_string(h5).strip()
if not article_title:
continue
data = h5.findNextSibling(attrs={'class':'article'})
if data is None: continue
a = data.find('a', href=True)
if a is None: continue
url = a['href']
if url.startswith('/'): url = 'http://www.economist.com'+url
url += '/print'
article_title += ': %s'%self.tag_to_string(a).strip()
articles.append({'title':article_title, 'url':url,
'description':'', 'date':''})
if not articles:
# We have last or first section
for art in section.findAll(attrs={'class':'article'}):
a = art.find('a', href=True)
subsection = ''
for node in section.findAll(attrs={'class':'article'}):
subsec = node.findPreviousSibling('h5')
if subsec is not None:
subsection = self.tag_to_string(subsec)
prefix = (subsection+': ') if subsection else ''
a = node.find('a', href=True)
if a is not None:
url = a['href']
if url.startswith('/'): url = 'http://www.economist.com'+url
url += '/print'
title = self.tag_to_string(a)
if title:
title = prefix + title
self.log('\tFound article:', title)
articles.append({'title':title, 'url':url,
'description':'', 'date':''})

View File

@ -0,0 +1,15 @@
from calibre.web.feeds.news import BasicNewsRecipe
class Elektroda(BasicNewsRecipe):
title = u'Elektroda'
oldest_article = 8
__author__ = 'fenuks'
description = 'Elektroda.pl'
cover_url = 'http://demotywatory.elektroda.pl/Thunderpic/logo.gif'
category = 'electronics'
language = 'pl'
max_articles_per_feed = 100
remove_tags_before=dict(name='span', attrs={'class':'postbody'})
remove_tags_after=dict(name='td', attrs={'class':'spaceRow'})
remove_tags=[dict(name='a', attrs={'href':'#top'})]
feeds = [(u'Elektroda', u'http://www.elektroda.pl/rtvforum/rss.php')]

View File

@ -32,9 +32,9 @@ class Filmweb_pl(BasicNewsRecipe):
(u'Recenzje użytkowników', u'http://www.filmweb.pl/feed/user-reviews/latest')]
def skip_ad_pages(self, soup):
skip_tag = soup.find('a', attrs={'class':'welcomeScreenButton'})['href']
#self.log.warn(skip_tag)
skip_tag = soup.find('a', attrs={'class':'welcomeScreenButton'})
if skip_tag is not None:
return self.index_to_soup(skip_tag, raw=True)
else:
None
self.log.warn('skip_tag')
self.log.warn(skip_tag)
return self.index_to_soup(skip_tag['href'], raw=True)

View File

@ -16,13 +16,13 @@ class Fleshbot(BasicNewsRecipe):
max_articles_per_feed = 100
no_stylesheets = True
encoding = 'utf-8'
use_embedded_content = False
use_embedded_content = True
language = 'en'
masthead_url = 'http://cache.fleshbot.com/assets/base/img/thumbs140x140/fleshbot.com.png'
masthead_url = 'http://cache.gawkerassets.com/assets/kotaku.com/img/logo.png'
extra_css = '''
body{font-family: "Lucida Grande",Helvetica,Arial,sans-serif}
img{margin-bottom: 1em}
h1{font-family :Arial,Helvetica,sans-serif; font-size:x-large}
h1{font-family :Arial,Helvetica,sans-serif; font-size:large}
'''
conversion_options = {
'comment' : description
@ -31,13 +31,12 @@ class Fleshbot(BasicNewsRecipe):
, 'language' : language
}
remove_attributes = ['width','height']
keep_only_tags = [dict(attrs={'class':'content permalink'})]
remove_tags_before = dict(name='h1')
remove_tags = [dict(attrs={'class':'contactinfo'})]
remove_tags_after = dict(attrs={'class':'contactinfo'})
feeds = [(u'Articles', u'http://feeds.gawker.com/fleshbot/vip?format=xml')]
remove_tags = [
{'class': 'feedflare'},
]
feeds = [(u'Articles', u'http://feeds.gawker.com/fleshbot/full')]
def preprocess_html(self, soup):
return self.adeify_images(soup)

26
recipes/gildia_pl.recipe Normal file
View File

@ -0,0 +1,26 @@
from calibre.web.feeds.news import BasicNewsRecipe
class Gildia(BasicNewsRecipe):
title = u'Gildia.pl'
__author__ = 'fenuks'
description = 'Gildia - cultural site'
cover_url = 'http://www.film.gildia.pl/_n_/portal/redakcja/logo/logo-gildia.pl-500.jpg'
category = 'culture'
language = 'pl'
oldest_article = 8
max_articles_per_feed = 100
no_stylesheets=True
remove_tags=[dict(name='div', attrs={'class':'backlink'}), dict(name='div', attrs={'class':'im_img'}), dict(name='div', attrs={'class':'addthis_toolbox addthis_default_style'})]
keep_only_tags=dict(name='div', attrs={'class':'widetext'})
feeds = [(u'Gry', u'http://www.gry.gildia.pl/rss'), (u'Literatura', u'http://www.literatura.gildia.pl/rss'), (u'Film', u'http://www.film.gildia.pl/rss'), (u'Horror', u'http://www.horror.gildia.pl/rss'), (u'Konwenty', u'http://www.konwenty.gildia.pl/rss'), (u'Plansz\xf3wki', u'http://www.planszowki.gildia.pl/rss'), (u'Manga i anime', u'http://www.manga.gildia.pl/rss'), (u'Star Wars', u'http://www.starwars.gildia.pl/rss'), (u'Techno', u'http://www.techno.gildia.pl/rss'), (u'Historia', u'http://www.historia.gildia.pl/rss'), (u'Magia', u'http://www.magia.gildia.pl/rss'), (u'Bitewniaki', u'http://www.bitewniaki.gildia.pl/rss'), (u'RPG', u'http://www.rpg.gildia.pl/rss'), (u'LARP', u'http://www.larp.gildia.pl/rss'), (u'Muzyka', u'http://www.muzyka.gildia.pl/rss'), (u'Nauka', u'http://www.nauka.gildia.pl/rss')]
def skip_ad_pages(self, soup):
content = soup.find('div', attrs={'class':'news'})
skip_tag= content.findAll(name='a')
if skip_tag is not None:
for link in skip_tag:
if 'recenzja' in link['href']:
self.log.warn('odnosnik')
self.log.warn(link['href'])
return self.index_to_soup(link['href'], raw=True)

View File

@ -0,0 +1,13 @@
from calibre.web.feeds.news import BasicNewsRecipe
class GreenLinux(BasicNewsRecipe):
title = u'GreenLinux.pl'
__author__ = 'fenuks'
category = 'IT'
language = 'pl'
cover_url = 'http://lh5.ggpht.com/_xd_6Y9kXhEc/S8tjyqlfhfI/AAAAAAAAAYU/zFNTp07ZQko/top.png'
oldest_article = 15
max_articles_per_feed = 100
auto_cleanup = True
feeds = [(u'Newsy', u'http://feeds.feedburner.com/greenlinux')]

View File

@ -0,0 +1,38 @@
from calibre.web.feeds.recipes import BasicNewsRecipe
class Gry_online_pl(BasicNewsRecipe):
title = u'Gry-Online.pl'
__author__ = 'fenuks'
description = 'Gry-Online.pl - computer games'
category = 'games'
language = 'pl'
oldest_article = 13
INDEX= 'http://www.gry-online.pl/'
cover_url='http://www.gry-online.pl/img/1st_10/1st-gol-logo.png'
max_articles_per_feed = 100
no_stylesheets= True
extra_css = 'p.wn1{font-size:22px;}'
remove_tags_after= [dict(name='div', attrs={'class':['tresc-newsa']})]
keep_only_tags = [dict(name='div', attrs={'class':['txthead']}), dict(name='p', attrs={'class':['wtx1', 'wn1', 'wob']}), dict(name='a', attrs={'class':['num_str_nex']})]
#remove_tags= [dict(name='div', attrs={'class':['news_plat']})]
feeds = [(u'Newsy', 'http://www.gry-online.pl/rss/news.xml'), ('Teksty', u'http://www.gry-online.pl/rss/teksty.xml')]
def append_page(self, soup, appendtag):
nexturl = soup.find('a', attrs={'class':'num_str_nex'})
if appendtag.find('a', attrs={'class':'num_str_nex'}) is not None:
appendtag.find('a', attrs={'class':'num_str_nex'}).replaceWith('\n')
if nexturl is not None:
if 'strona' in nexturl.div.string:
nexturl= self.INDEX + nexturl['href']
soup2 = self.index_to_soup(nexturl)
pagetext = soup2.findAll(name='p', attrs={'class':['wtx1', 'wn1', 'wob']})
for tag in pagetext:
pos = len(appendtag.contents)
appendtag.insert(pos, tag)
self.append_page(soup2, appendtag)
def preprocess_html(self, soup):
self.append_page(soup, soup.body)
return soup

View File

@ -15,8 +15,10 @@ class Guardian(BasicNewsRecipe):
title = u'The Guardian and The Observer'
if date.today().weekday() == 6:
base_url = "http://www.guardian.co.uk/theobserver"
cover_pic = 'Observer digital edition'
else:
base_url = "http://www.guardian.co.uk/theguardian"
cover_pic = 'Guardian digital edition'
__author__ = 'Seabound and Sujata Raman'
language = 'en_GB'
@ -79,7 +81,7 @@ class Guardian(BasicNewsRecipe):
# soup = self.index_to_soup("http://www.guardian.co.uk/theobserver")
soup = self.index_to_soup(self.base_url)
# find cover pic
img = soup.find( 'img',attrs ={'alt':'Guardian digital edition'})
img = soup.find( 'img',attrs ={'alt':self.cover_pic})
if img is not None:
self.cover_url = img['src']
# end find cover pic

50
recipes/h7_tumspor.recipe Normal file
View File

@ -0,0 +1,50 @@
# -*- coding: utf-8 -*-
from calibre.web.feeds.news import BasicNewsRecipe
class Haber7TS (BasicNewsRecipe):
title = u'H7 TÜMSPOR'
__author__ = u'thomass'
description = ' Haber 7 TÜMSPOR sitesinden tüm branşlarda spor haberleri '
oldest_article =2
max_articles_per_feed =100
no_stylesheets = True
#delay = 1
#use_embedded_content = False
encoding = 'ISO 8859-9'
publisher = 'thomass'
category = 'güncel, haber, türkçe,spor,futbol'
language = 'tr'
publication_type = 'newspaper'
conversion_options = {
'tags' : category
,'language' : language
,'publisher' : publisher
,'linearize_tables': True
}
extra_css = ' #newsheadcon h1{font-weight: bold; font-size: 18px;color:#0000FF} '
keep_only_tags = [dict(name='div', attrs={'class':['intNews','leftmidmerge']})]
remove_tags = [dict(name='div', attrs={'id':['blocktitle','banner46860body']}),dict(name='div', attrs={'class':[ 'Breadcrumb','shr','mobile/home.jpg','etiket','yorumYazNew','shr','y-list','banner','lftBannerShowcase','comments','interNews','lftBanner','midblock','rightblock','comnum','commentcon',]}) ,dict(name='a', attrs={'class':['saveto','sendto','comlink','newsshare',]}),dict(name='iframe', attrs={'name':['frm111','frm107']}) ,dict(name='ul', attrs={'class':['nocPagi','leftmidmerge']})]
cover_img_url = 'http://image.tumspor.com/v2/images/tasarim/images/logo.jpg'
masthead_url = 'http://image.tumspor.com/v2/images/tasarim/images/logo.jpg'
remove_empty_feeds= True
feeds = [
( u'Futbol', u'http://open.dapper.net/services/h7tsfutbol'),
( u'Basketbol', u'http://open.dapper.net/services/h7tsbasket'),
( u'Tenis', u'http://open.dapper.net/services/h7tstenis'),
( u'NBA', u'http://open.dapper.net/services/h7tsnba'),
( u'Diğer Sporlar', u'http://open.dapper.net/services/h7tsdiger'),
( u'Yazarlar & Magazin', u'http://open.dapper.net/services/h7tsyazarmagazin'),
]
def preprocess_html(self, soup):
for alink in soup.findAll('a'):
if alink.string is not None:
tstr = alink.string
alink.replaceWith(tstr)
return soup
# def print_version(self, url):
# return url.replace('http://www.aksiyon.com.tr/aksiyon/newsDetail_getNewsById.action?load=detay&', 'http://www.aksiyon.com.tr/aksiyon/mobile_detailn.action?')

60
recipes/haber7.recipe Normal file
View File

@ -0,0 +1,60 @@
# -*- coding: utf-8 -*-
from calibre.web.feeds.news import BasicNewsRecipe
class Haber7 (BasicNewsRecipe):
title = u'Haber 7'
__author__ = u'thomass'
description = ' Haber 7 sitesinden haberler '
oldest_article =2
max_articles_per_feed =100
no_stylesheets = True
#delay = 1
#use_embedded_content = False
encoding = 'ISO 8859-9'
publisher = 'thomass'
category = 'güncel, haber, türkçe'
language = 'tr'
publication_type = 'newspaper'
conversion_options = {
'tags' : category
,'language' : language
,'publisher' : publisher
,'linearize_tables': True
}
extra_css = 'body{ font-size: 12px}h2{font-weight: bold; font-size: 18px;color:#0000FF} #newsheadcon h1{font-weight: bold; font-size: 18px;color:#0000FF}'
keep_only_tags = [dict(name='div', attrs={'class':['intNews','leftmidmerge']})]
remove_tags = [dict(name='div', attrs={'id':['blocktitle','banner46860body']}),dict(name='div', attrs={'class':[ 'Breadcrumb','shr','mobile/home.jpg','etiket','yorumYazNew','shr','y-list','banner','lftBannerShowcase','comments','interNews','lftBanner','midblock','rightblock','comnum','commentcon',]}) ,dict(name='a', attrs={'class':['saveto','sendto','comlink','newsshare',]}),dict(name='iframe', attrs={'name':['frm111','frm107']}) ,dict(name='ul', attrs={'class':['nocPagi','leftmidmerge']})]
cover_img_url = 'http://dl.dropbox.com/u/39726752/haber7.JPG'
masthead_url = 'http://dl.dropbox.com/u/39726752/haber7.JPG'
remove_empty_feeds= True
feeds = [
( u'Siyaset', u'http://open.dapper.net/services/h7siyaset'),
( u'Güncel', u'http://open.dapper.net/services/h7guncel'),
( u'Yaşam', u'http://open.dapper.net/services/h7yasam'),
( u'Ekonomi', u'http://open.dapper.net/services/h7ekonomi'),
( u'3. Sayfa', u'http://open.dapper.net/services/h73sayfa'),
( u'Dünya', u'http://open.dapper.net/services/h7dunya'),
( u'Medya', u'http://open.dapper.net/services/h7medya'),
( u'Yazarlar', u'http://open.dapper.net/services/h7yazarlar'),
( u'Bilim', u'http://open.dapper.net/services/h7bilim'),
( u'Eğitim', u'http://open.dapper.net/services/h7egitim'),
( u'Spor', u'http://open.dapper.net/services/h7sporv3'),
]
def preprocess_html(self, soup):
for alink in soup.findAll('a'):
if alink.string is not None:
tstr = alink.string
alink.replaceWith(tstr)
return soup
# def print_version(self, url):
# return url.replace('http://www.aksiyon.com.tr/aksiyon/newsDetail_getNewsById.action?load=detay&', 'http://www.aksiyon.com.tr/aksiyon/mobile_detailn.action?')

View File

@ -7,6 +7,7 @@ Hacker News
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ptempfile import PersistentTemporaryFile
from urlparse import urlparse
import re
class HackerNews(BasicNewsRecipe):
title = 'Hacker News'
@ -14,8 +15,8 @@ class HackerNews(BasicNewsRecipe):
description = u'Hacker News, run by Y Combinator. Anything that good hackers would find interesting, with a focus on programming and startups.'
publisher = 'Y Combinator'
category = 'news, programming, it, technology'
masthead_url = 'http://i55.tinypic.com/2u6io76.png'
cover_url = 'http://i55.tinypic.com/2u6io76.png'
masthead_url = 'http://img585.imageshack.us/img585/5011/hnle.png'
cover_url = 'http://img585.imageshack.us/img585/5011/hnle.png'
delay = 1
max_articles_per_feed = 30
use_embedded_content = False
@ -42,12 +43,42 @@ class HackerNews(BasicNewsRecipe):
def get_hn_content(self, url):
self.log('get_hn_content(' + url + ')')
# this could be improved
br = self.get_browser()
f = br.open(url)
html = f.read()
f.close()
return html
soup = self.index_to_soup(url)
main = soup.find('tr').findNextSiblings('tr', limit=2)[1].td
title = self.tag_to_string(main.find('td', 'title'))
link = main.find('td', 'title').find('a')['href']
if link.startswith('item?'):
link = 'http://news.ycombinator.com/' + link
readable_link = link.rpartition('http://')[2].rpartition('https://')[2]
subtext = self.tag_to_string(main.find('td', 'subtext'))
title_content_td = main.find('td', 'title').findParent('tr').findNextSiblings('tr', limit=3)[2].findAll('td', limit=2)[1]
title_content = u''
if not title_content_td.find('form'):
title_content_td.name ='div'
title_content = title_content_td.prettify()
comments = u''
for td in main.findAll('td', 'default'):
comhead = td.find('span', 'comhead')
if comhead:
com_title = u'<h4>' + self.tag_to_string(comhead).replace(' | link', '') + u'</h4>'
comhead.parent.extract()
br = td.find('br')
if br:
br.extract()
reply = td.find('a', attrs = {'href' : re.compile('^reply?')})
if reply:
reply.parent.extract()
td.name = 'div'
indent_width = (int(td.parent.find('td').img['width']) * 2) / 3
td['style'] = 'padding-left: ' + str(indent_width) + 'px'
comments = comments + com_title + td.prettify()
body = u'<h3>' + title + u'</h3><p><a href="' + link + u'">' + readable_link + u'</a><br/><strong>' + subtext + u'</strong></p>' + title_content + u'<br/>'
body = body + comments
return u'<html><title>' + title + u'</title><body>' + body + '</body></html>'
def get_obfuscated_article(self, url):
if url.startswith('http://news.ycombinator.com'):
@ -83,4 +114,10 @@ class HackerNews(BasicNewsRecipe):
article.text_summary = self.prettyify_url(article.url)
article.summary = article.text_summary
# def parse_index(self):
# feeds = []
# feeds.append((u'Hacker News',[{'title': 'Testing', 'url': 'http://news.ycombinator.com/item?id=2935944'}]))
# return feeds

View File

@ -11,9 +11,14 @@ class HBR(BasicNewsRecipe):
timefmt = ' [%B %Y]'
language = 'en'
no_stylesheets = True
recipe_disabled = ('hbr.org has started requiring the use of javascript'
' to log into their website. This is unsupported in calibre, so'
' this recipe has been disabled. If you would like to see '
' HBR supported in calibre, contact hbr.org and ask them'
' to provide a javascript free login method.')
LOGIN_URL = 'http://hbr.org/login?request_url=/'
LOGOUT_URL = 'http://hbr.org/logout?request_url=/'
LOGIN_URL = 'https://hbr.org/login?request_url=/'
LOGOUT_URL = 'https://hbr.org/logout?request_url=/'
INDEX = 'http://hbr.org/archive-toc/BR'
@ -44,7 +49,7 @@ class HBR(BasicNewsRecipe):
br['signin-form:username'] = self.username
br['signin-form:password'] = self.password
raw = br.submit().read()
if 'My Account' not in raw:
if '>Sign out<' not in raw:
raise Exception('Failed to login, are you sure your username and password are correct?')
try:
link = br.find_link(text='Sign out')

View File

@ -5,10 +5,15 @@ class HBR(BasicNewsRecipe):
title = 'Harvard Business Review Blogs'
description = 'To subscribe go to http://hbr.harvardbusiness.org'
needs_subscription = True
__author__ = 'Kovid Goyal'
language = 'en'
no_stylesheets = True
#recipe_disabled = ('hbr.org has started requiring the use of javascript'
# ' to log into their website. This is unsupported in calibre, so'
# ' this recipe has been disabled. If you would like to see '
# ' HBR supported in calibre, contact hbr.org and ask them'
# ' to provide a javascript free login method.')
needs_subscription = False
LOGIN_URL = 'http://hbr.org/login?request_url=/'
LOGOUT_URL = 'http://hbr.org/logout?request_url=/'
@ -36,6 +41,7 @@ class HBR(BasicNewsRecipe):
def get_browser(self):
br = BasicNewsRecipe.get_browser(self)
self.logout_url = None
return br
#'''
br.open(self.LOGIN_URL)

View File

@ -0,0 +1,29 @@
from calibre.web.feeds.news import BasicNewsRecipe
class HindustanTimes(BasicNewsRecipe):
title = u'Hindustan Times'
language = 'en_IN'
__author__ = 'Krittika Goyal'
oldest_article = 1 #days
max_articles_per_feed = 25
use_embedded_content = False
no_stylesheets = True
auto_cleanup = True
feeds = [
('News',
'http://feeds.hindustantimes.com/HT-NewsSectionPage-Topstories'),
('Views',
'http://feeds.hindustantimes.com/HT-ViewsSectionpage-Topstories'),
('Cricket',
'http://feeds.hindustantimes.com/HT-Cricket-TopStories'),
('Business',
'http://feeds.hindustantimes.com/HT-BusinessSectionpage-TopStories'),
('Entertainment',
'http://feeds.hindustantimes.com/HT-HomePage-Entertainment'),
('Lifestyle',
'http://feeds.hindustantimes.com/HT-Homepage-LifestyleNews'),
]

52
recipes/hira.recipe Normal file
View File

@ -0,0 +1,52 @@
# coding=utf-8
from calibre.web.feeds.recipes import BasicNewsRecipe
class Hira(BasicNewsRecipe):
title = 'Hira'
__author__ = 'thomass'
description = 'مجلة حراء مجلة علمية ثقافية فكرية تصدر كل شهرين، تعنى بالعلوم الطبيعية والإنسانية والاجتماعية وتحاور أسرار النفس البشرية وآفاق الكون الشاسعة بالمنظور القرآني الإيماني في تآلف وتناسب بين العلم والإيمان، والعقل والقلب، والفكر والواقع.'
oldest_article = 63
max_articles_per_feed = 50
no_stylesheets = True
#delay = 1
use_embedded_content = False
encoding = 'utf-8'
publisher = 'thomass'
category = 'News'
language = 'ar'
publication_type = 'magazine'
extra_css = ' .title-detail-wrap{ font-weight: bold ;text-align:right;color:#FF0000;font-size:25px}.title-detail{ font-family:sans-serif;text-align:right;} '
conversion_options = {
'tags' : category
,'language' : language
,'publisher' : publisher
,'linearize_tables': True
,'base-font-size':'10'
}
#html2lrf_options = []
keep_only_tags = [
dict(name='div', attrs={'class':['title-detail']})
]
remove_tags = [
dict(name='div', attrs={'class':['clear', 'bbsp']}),
]
remove_attributes = [
'width','height'
]
feeds = [
(u'حراء', 'http://open.dapper.net/services/hira'),
]
def preprocess_html(self, soup):
for alink in soup.findAll('a'):
if alink.string is not None:
tstr = alink.string
alink.replaceWith(tstr)
return soup

View File

@ -0,0 +1,13 @@
from calibre.web.feeds.news import BasicNewsRecipe
class Historia_org_pl(BasicNewsRecipe):
title = u'Historia.org.pl'
__author__ = 'fenuks'
description = u'history site'
cover_url = 'http://lh3.googleusercontent.com/_QeRQus12wGg/TOvHsZ2GN7I/AAAAAAAAD_o/LY1JZDnq7ro/logo5.jpg'
category = 'history'
language = 'pl'
oldest_article = 8
max_articles_per_feed = 100
feeds = [(u'Artykuły', u'http://www.historia.org.pl/index.php?format=feed&type=rss')]

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.6 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 625 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 389 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 658 B

BIN
recipes/icons/cgm_pl.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 837 B

BIN
recipes/icons/dzieje_pl.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 642 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 1023 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 648 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 249 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 806 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.2 KiB

BIN
recipes/icons/lomza.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.0 KiB

BIN
recipes/icons/rtnews.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 606 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 200 B

BIN
recipes/icons/ubuntu_pl.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 508 B

View File

@ -4,16 +4,16 @@ from calibre.web.feeds.news import BasicNewsRecipe
class IDGse(BasicNewsRecipe):
title = 'IDG'
description = 'IDG.se'
language = 'se'
__author__ = 'zapt0'
language = 'sv'
description = 'IDG.se'
oldest_article = 1
max_articles_per_feed = 40
max_articles_per_feed = 256
no_stylesheets = True
encoding = 'ISO-8859-1'
remove_javascript = True
feeds = [(u'Senaste nytt',u'http://feeds.idg.se/idg/vzzs')]
feeds = [(u'Dagens IDG-nyheter',u'http://feeds.idg.se/idg/ETkj?format=xml')]
def print_version(self,url):
return url + '?articleRenderMode=print&m=print'
@ -30,4 +30,3 @@ class IDGse(BasicNewsRecipe):
dict(name='div', attrs={'id':['preamble_ad']}),
dict(name='ul', attrs={'class':['share']})
]

View File

@ -1,76 +1,25 @@
from calibre.web.feeds.news import BasicNewsRecipe
class IndiaToday(BasicNewsRecipe):
title = 'India Today'
__author__ = 'Kovid Goyal'
title = u'India Today'
language = 'en_IN'
timefmt = ' [%d %m, %Y]'
oldest_article = 700
max_articles_per_feed = 10
__author__ = 'Krittika Goyal'
oldest_article = 15 #days
max_articles_per_feed = 25
no_stylesheets = True
auto_cleanup = True
remove_tags_before = dict(id='content_story_title')
remove_tags_after = dict(id='rightblockdiv')
remove_tags = [dict(id=['rightblockdiv', 'share_links'])]
extra_css = '#content_story_title { font-size: 170%; font-weight: bold;}'
conversion_options = { 'linearize_tables': True }
def it_get_index(self):
soup = self.index_to_soup('http://indiatoday.intoday.in/site/archive')
a = soup.find('a', href=lambda x: x and 'issueId=' in x)
url = 'http://indiatoday.intoday.in/site/'+a.get('href')
img = a.find('img')
self.cover_url = img.get('src')
return self.index_to_soup(url)
def parse_index(self):
soup = self.it_get_index()
feeds, current_section, current_articles = [], None, []
for x in soup.findAll(name=['h1', 'a']):
if x.name == 'h1':
if current_section and current_articles:
feeds.append((current_section, current_articles))
current_section = self.tag_to_string(x)
current_articles = []
self.log('\tFound section:', current_section)
elif x.name == 'a' and 'Story' in x.get('href', ''):
title = self.tag_to_string(x)
url = x.get('href')
url = url.replace(' ', '%20')
if not url.startswith('/'):
url = 'http://indiatoday.intoday.in/site/' + url
if title and url:
url += '?complete=1'
self.log('\tFound article:', title)
self.log('\t\t', url)
desc = ''
h3 = x.parent.findNextSibling('h3')
if h3 is not None:
desc = 'By ' + self.tag_to_string(h3)
h4 = h3.findNextSibling('h4')
if h4 is not None:
desc = self.tag_to_string(h4) + ' ' + desc
if desc:
self.log('\t\t', desc)
current_articles.append({'title':title, 'description':desc,
'url':url, 'date':''})
if current_section and current_articles:
feeds.append((current_section, current_articles))
return feeds
def postprocess_html(self, soup, first):
a = soup.find(text='Print')
if a is not None:
tr = a.findParent('tr')
if tr is not None:
tr.extract()
return soup
feeds = [
('Latest News', 'http://indiatoday.intoday.in/rss/article.jsp?sid=4'),
('Cover Story', 'http://indiatoday.intoday.in/rss/article.jsp?sid=30'),
('Nation', 'http://indiatoday.intoday.in/rss/article.jsp?sid=36'),
('States', 'http://indiatoday.intoday.in/rss/article.jsp?sid=21'),
('Economy', 'http://indiatoday.intoday.in/rss/article.jsp?sid=34'),
('World', 'http://indiatoday.intoday.in/rss/article.jsp?sid=61'),
('Sport', 'http://indiatoday.intoday.in/rss/article.jsp?sid=41'),
]

View File

@ -7,56 +7,33 @@ www.inquirer.net
'''
from calibre.web.feeds.recipes import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import Tag
class InquirerNet(BasicNewsRecipe):
title = 'Inquirer.net'
__author__ = 'Darko Miletic'
__author__ = 'Krittika Goyal'
description = 'News from Philipines'
oldest_article = 2
max_articles_per_feed = 100
no_stylesheets = True
use_embedded_content = False
encoding = 'cp1252'
encoding = 'utf8'
publisher = 'inquirer.net'
category = 'news, politics, philipines'
lang = 'en'
language = 'en'
extra_css = ' .fontheadline{font-size: x-large} .fontsubheadline{font-size: large} .fontkick{font-size: medium}'
use_embedded_content = False
html2lrf_options = [
'--comment', description
, '--category', category
, '--publisher', publisher
, '--ignore-tables'
]
html2epub_options = 'publisher="' + publisher + '"\ncomments="' + description + '"\ntags="' + category + '"\nlinearize_tables=True'
remove_tags = [dict(name=['object','link','script','iframe','form'])]
no_stylesheets = True
auto_cleanup = True
feeds = [
(u'Breaking news', u'http://services.inquirer.net/rss/breakingnews.xml' )
,(u'Top stories' , u'http://services.inquirer.net/rss/topstories.xml' )
,(u'Sports' , u'http://services.inquirer.net/rss/brk_breakingnews.xml' )
,(u'InfoTech' , u'http://services.inquirer.net/rss/infotech_tech.xml' )
,(u'InfoTech' , u'http://services.inquirer.net/rss/infotech_tech.xml' )
,(u'Business' , u'http://services.inquirer.net/rss/inq7money_breaking_news.xml' )
,(u'Editorial' , u'http://services.inquirer.net/rss/opinion_editorial.xml' )
,(u'Global Nation', u'http://services.inquirer.net/rss/globalnation_breakingnews.xml')
(u'Inquirer', u'http://www.inquirer.net/fullfeed')
]
def preprocess_html(self, soup):
mlang = Tag(soup,'meta',[("http-equiv","Content-Language"),("content",self.lang)])
mcharset = Tag(soup,'meta',[("http-equiv","Content-Type"),("content","text/html; charset=utf-8")])
soup.head.insert(0,mlang)
soup.head.insert(1,mcharset)
for item in soup.findAll(style=True):
del item['style']
return soup
def get_browser(self):
br = BasicNewsRecipe.get_browser(self)
br.set_handle_gzip(True)
return br
def print_version(self, url):
rest, sep, art = url.rpartition('/view/')
art_id, sp, rrest = art.partition('/')
return 'http://services.inquirer.net/print/print.php?article_id=' + art_id

View File

@ -1,7 +1,5 @@
#!/usr/bin/env python
__license__ = 'GPL v3'
__copyright__ = '2008, Darko Miletic <darko.miletic at gmail.com>'
__copyright__ = '2008-2011, Darko Miletic <darko.miletic at gmail.com>'
'''
japantimes.co.jp
'''
@ -9,24 +7,61 @@ japantimes.co.jp
from calibre.web.feeds.news import BasicNewsRecipe
class JapanTimes(BasicNewsRecipe):
title = u'The Japan Times'
title = 'The Japan Times'
__author__ = 'Darko Miletic'
description = 'News from Japan'
language = 'en'
oldest_article = 7
max_articles_per_feed = 100
description = "Daily news and features on Japan from the most widely read English-language newspaper in Japan. Coverage includes national news, business news, sports news, commentary and features on living in Japan, entertainment, the arts, education and more."
language = 'en_JP'
category = 'news, politics, japan'
publisher = 'The Japan Times'
oldest_article = 5
max_articles_per_feed = 150
no_stylesheets = True
use_embedded_content = False
encoding = 'utf8'
publication_type = 'newspaper'
masthead_url = 'http://search.japantimes.co.jp/images/header_title.gif'
extra_css = 'body{font-family: Geneva,Arial,Helvetica,sans-serif}'
keep_only_tags = [ dict(name='div', attrs={'id':'searchresult'}) ]
remove_tags_after = [ dict(name='div', attrs={'id':'mainbody' }) ]
conversion_options = {
'comment' : description
, 'tags' : category
, 'publisher' : publisher
, 'language' : language
, 'linearize_tables' : True
}
keep_only_tags = [dict(name='div', attrs={'id':'printresult'})]
remove_tags = [
dict(name='div' , attrs={'id':'ads' })
,dict(name='table', attrs={'width':470})
dict(name=['iframe','meta','link','embed','object','base'])
,dict(attrs={'id':'searchfooter'})
]
feeds = [(u'The Japan Times', u'http://feeds.feedburner.com/japantimes')]
remove_attributes = ['border']
def get_article_url(self, article):
rurl = BasicNewsRecipe.get_article_url(self, article)
return rurl.partition('?')[0]
feeds = [
(u'The Japan Times', u'http://feedproxy.google.com/japantimes')
]
def print_version(self, url):
return url.replace('/cgi-bin/','/print/')
def preprocess_html(self, soup):
for item in soup.findAll(style=True):
del item['style']
for item in soup.findAll('img'):
if not item.has_key('alt'):
item['alt'] = 'image'
for item in soup.findAll('photo'):
item.name = 'div'
for item in soup.head.findAll('paragraph'):
item.extract()
for item in soup.findAll('wwfilename'):
item.extract()
for item in soup.findAll('jtcategory'):
item.extract()
for item in soup.findAll('nomooter'):
item.extract()
for item in soup.body.findAll('paragraph'):
item.name = 'p'
return soup

View File

@ -1,4 +1,3 @@
# -*- coding: utf-8 -*-
__license__ = 'GPL v3'
__copyright__ = '2011, Attis <attis@attis.one.pl>'
__version__ = 'v. 0.1'
@ -19,7 +18,7 @@ class KopalniaWiedzy(BasicNewsRecipe):
remove_javascript = True
no_stylesheets = True
remove_tags = [{'name':'p', 'attrs': {'class': 'keywords'} }]
remove_tags = [{'name':'p', 'attrs': {'class': 'keywords'} }, {'name':'div', 'attrs': {'class':'sexy-bookmarks sexy-bookmarks-bg-caring'}}]
remove_tags_after = dict(attrs={'class':'ad-square'})
keep_only_tags = [dict(name="div", attrs={'id':'articleContent'})]
extra_css = '.topimage {margin-top: 30px}'

28
recipes/ksiazka_pl.recipe Normal file
View File

@ -0,0 +1,28 @@
from calibre.web.feeds.news import BasicNewsRecipe
import re
class Ksiazka_net_pl(BasicNewsRecipe):
title = u'ksiazka.net.pl'
__author__ = 'fenuks'
description = u'Ksiazka.net.pl - book vortal'
cover_url = 'http://www.ksiazka.net.pl/fileadmin/templates/ksiazka.net.pl/images/1PortalKsiegarski-logo.jpg'
category = 'books'
language = 'pl'
oldest_article = 8
max_articles_per_feed = 100
no_stylesheets= True
#extra_css = 'img {float: right;}'
preprocess_regexps = [(re.compile(ur'Podoba mi się, kupuję:'), lambda match: '<br />')]
remove_tags_before= dict(name='div', attrs={'class':'m-body'})
remove_tags_after= dict(name='div', attrs={'class':'m-body-link'})
remove_tags=[dict(attrs={'class':['mk_library-icon', 'm-body-link', 'tagi']})]
feeds = [(u'Wiadomości', u'http://www.ksiazka.net.pl/?id=wiadomosci&type=100'),
(u'Książki', u'http://www.ksiazka.net.pl/?id=ksiazki&type=100'),
(u'Rynek', u'http://www.ksiazka.net.pl/?id=rynek&type=100')]
def image_url_processor(self, baseurl, url):
if (('file://' in url) and ('www.ksiazka.net.pl/' not in url)):
return 'http://www.ksiazka.net.pl/' + url[8:]
elif 'http://' not in url:
return 'http://www.ksiazka.net.pl/' + url
else:
return url

View File

@ -46,7 +46,9 @@ class LATimes(BasicNewsRecipe):
remove_tags_after=dict(name='p', attrs={'class':'copyright'})
remove_tags = [
dict(name=['meta','link','iframe','object','embed'])
,dict(attrs={'class':['toolSet','articlerail','googleAd','entry-footer-left','entry-footer-right','entry-footer-social','google-ad-story-bottom','sphereTools']})
,dict(attrs={'class':['toolSet','articlerail','googleAd','entry-footer-left',
'entry-footer-right','entry-footer-social','google-ad-story-bottom',
'sphereTools', 'nextgen-share-tools']})
,dict(attrs={'id':['article-promo','googleads','moduleArticleToolsContainer','gallery-subcontent']})
]
remove_attributes=['lang','xmlns:fb','xmlns:og','border','xtags','i','article_body']

14
recipes/lomza.recipe Normal file
View File

@ -0,0 +1,14 @@
from calibre.web.feeds.news import BasicNewsRecipe
class Lomza(BasicNewsRecipe):
title = u'4Lomza'
__author__ = 'fenuks'
description = u'4Łomża - regional site'
cover_url = 'http://www.4lomza.pl/i/logo4lomza_m.jpg'
language = 'pl'
oldest_article = 15
no_styleseets=True
max_articles_per_feed = 100
remove_tags=[dict(name='div', attrs={'class':['bxbanner', 'drukuj', 'wyslijznajomemu']})]
keep_only_tags=[dict(name='div', attrs={'class':'wiadomosc'})]
feeds = [(u'Łomża', u'http://feeds.feedburner.com/4lomza.pl')]

View File

@ -4,25 +4,17 @@ from calibre.web.feeds.news import BasicNewsRecipe
class AdvancedUserRecipe1308306308(BasicNewsRecipe):
title = u'Macleans Magazine'
language = 'en_CA'
__author__ = 'sexymax15'
oldest_article = 30
max_articles_per_feed = 12
__author__ = 'Medius'
oldest_article = 7
cover_url = 'http://www.rogersmagazines.com/rms_covers/md/CLE_md.jpg'
use_embedded_content = False
remove_empty_feeds = True
no_stylesheets = True
remove_javascript = True
remove_tags = [dict(name ='img'),dict (id='header'),{'class':'postmetadata'}]
remove_tags_after = {'class':'postmetadata'}
remove_tags = [dict(id='header'),{'class':'comment'}]
remove_tags_after = {'class':'pagination'}
feeds = [(u'Blog Central', u'http://www2.macleans.ca/category/blog-central/feed/'),
(u'Canada', u'http://www2.macleans.ca/category/canada/feed/'),
(u'World', u'http://www2.macleans.ca/category/world-from-the-magazine/feed/'),
(u'Business', u'http://www2.macleans.ca/category/business/feed/'),
(u'Arts & Culture', u'http://www2.macleans.ca/category/arts-culture/feed/'),
(u'Opinion', u'http://www2.macleans.ca/category/opinion/feed/'),
(u'Health', u'http://www2.macleans.ca/category/health-from-the-magazine/feed/'),
(u'Environment', u'http://www2.macleans.ca/category/environment-from-the-magazine/feed/')]
def print_version(self, url):
return url + 'print/'
feeds = [(u'Canada', u'http://www2.macleans.ca/category/canada/feed/'),
(u'World', u'http://www2.macleans.ca/category/news-politics/world/feed/'), (u'Business', u'http://www2.macleans.ca/category/business/feed/'), (u'Arts & Culture', u'http://www2.macleans.ca/category/arts/feed/'), (u'Opinion', u'http://www2.macleans.ca/category/opinion/feed/'), (u'Health', u'http://www2.macleans.ca/category/life/health/feed/'), (u'Sports', u'http://www2.macleans.ca/category/life/sports/feed/'), (u'Environment', u'http://www2.macleans.ca/category/life/environment/feed/'), (u'Technology', u'http://www2.macleans.ca/category/life/technology/feed/'), (u'Travel', u'http://www2.macleans.ca/category/life/travel/feed/'), (u'Blog Central', u'http://www2.macleans.ca/category/blog-central/feed/')]

View File

@ -2,7 +2,6 @@ from calibre.web.feeds.news import BasicNewsRecipe
class AdvancedUserRecipe1306097511(BasicNewsRecipe):
title = u'Metro Nieuws NL'
description = u'Metro Nieuws - NL'
# Version 1.2, updated cover image to match the changed website.
# added info date on title
oldest_article = 2
@ -11,14 +10,14 @@ class AdvancedUserRecipe1306097511(BasicNewsRecipe):
description = u'Metro Nederland'
language = u'nl'
simultaneous_downloads = 5
delay = 1
# timefmt = ' [%A, %d %B, %Y]'
#delay = 1
auto_cleanup = True
auto_cleanup_keep = '//div[@class="article-image-caption-2column"]|//div[@id="date"]'
timefmt = ' [%A, %d %b %Y]'
no_stylesheets = True
remove_javascript = True
remove_empty_feeds = True
cover_url = 'http://www.oldreadmetro.com/img/en/metroholland/last/1/small.jpg'
remove_empty_feeds = True
publication_type = 'newspaper'
remove_tags_before = dict(name='div', attrs={'id':'date'})
remove_tags_after = dict(name='div', attrs={'id':'column-1-3'})

View File

@ -12,10 +12,15 @@ __UseChineseTitle__ = False
__KeepImages__ = True
# (HK only) Turn below to true if you wish to use life.mingpao.com as the main article source
__UseLife__ = True
# (HK only) if __UseLife__ is true, turn this on if you want to include the column section
__InclCols__ = False
'''
Change Log:
2011/09/21: fetching "column" section is made optional. Default is False
2011/09/18: parse "column" section stuff from source text file directly.
2011/09/07: disable "column" section as it is no longer offered free.
2011/06/26: add fetching Vancouver and Toronto versions of the paper, also provide captions for images using life.mingpao fetch source
provide options to remove all images in the file
2011/05/12: switch the main parse source to life.mingpao.com, which has more photos on the article pages
@ -51,16 +56,19 @@ class MPRecipe(BasicNewsRecipe):
title = 'Ming Pao - Hong Kong'
description = 'Hong Kong Chinese Newspaper (http://news.mingpao.com)'
category = 'Chinese, News, Hong Kong'
extra_css = 'img {display: block; margin-left: auto; margin-right: auto; margin-top: 10px; margin-bottom: 10px;} font>b {font-size:200%; font-weight:bold;}'
extra_css = 'img {display: block; margin-left: auto; margin-right: auto; margin-top: 10px; margin-bottom: 10px;} font>b {font-size:200%; font-weight:bold;} div[class=heading] {font-size:200%; font-weight:bold;} div[class=images] {font-size:50%;}'
masthead_url = 'http://news.mingpao.com/image/portals_top_logo_news.gif'
keep_only_tags = [dict(name='h1'),
dict(name='font', attrs={'style':['font-size:14pt; line-height:160%;']}), # for entertainment page title
dict(name='font', attrs={'color':['AA0000']}), # for column articles title
dict(attrs={'class':['heading']}), # for heading from txt
dict(attrs={'id':['newscontent']}), # entertainment and column page content
dict(attrs={'id':['newscontent01','newscontent02']}),
dict(attrs={'class':['content']}), # for content from txt
dict(attrs={'class':['photo']}),
dict(name='table', attrs={'width':['100%'], 'border':['0'], 'cellspacing':['5'], 'cellpadding':['0']}), # content in printed version of life.mingpao.com
dict(name='img', attrs={'width':['180'], 'alt':['按圖放大']}) # images for source from life.mingpao.com
dict(name='img', attrs={'width':['180'], 'alt':['按圖放大']}), # images for source from life.mingpao.com
dict(attrs={'class':['images']}) # for images from txt
]
if __KeepImages__:
remove_tags = [dict(name='style'),
@ -230,12 +238,20 @@ class MPRecipe(BasicNewsRecipe):
(u'\u570b\u969b World', 'http://life.mingpao.com/cfm/dailynews2.cfm?Issue=' + dateStr +'&Category=nalta', 'nal'),
(u'\u7d93\u6fdf Finance', 'http://life.mingpao.com/cfm/dailynews2.cfm?Issue=' + dateStr + '&Category=nalea', 'nal'),
(u'\u9ad4\u80b2 Sport', 'http://life.mingpao.com/cfm/dailynews2.cfm?Issue=' + dateStr + '&Category=nalsp', 'nal'),
(u'\u5f71\u8996 Film/TV', 'http://life.mingpao.com/cfm/dailynews2.cfm?Issue=' + dateStr + '&Category=nalma', 'nal'),
(u'\u5c08\u6b04 Columns', 'http://life.mingpao.com/cfm/dailynews2.cfm?Issue=' + dateStr +'&Category=ncolumn', 'ncl')]:
(u'\u5f71\u8996 Film/TV', 'http://life.mingpao.com/cfm/dailynews2.cfm?Issue=' + dateStr + '&Category=nalma', 'nal')
]:
articles = self.parse_section2(url, keystr)
if articles:
feeds.append((title, articles))
if __InclCols__ == True:
# parse column section articles directly from .txt files
for title, url, keystr in [(u'\u5c08\u6b04 Columns', 'http://life.mingpao.com/cfm/dailynews2.cfm?Issue=' + dateStr +'&Category=ncolumn', 'ncl')
]:
articles = self.parse_section2_txt(url, keystr)
if articles:
feeds.append((title, articles))
for title, url in [(u'\u526f\u520a Supplement', 'http://news.mingpao.com/' + dateStr + '/jaindex.htm'),
(u'\u82f1\u6587 English', 'http://news.mingpao.com/' + dateStr + '/emindex.htm')]:
articles = self.parse_section(url)
@ -356,6 +372,24 @@ class MPRecipe(BasicNewsRecipe):
current_articles.reverse()
return current_articles
# parse from text file of life.mingpao.com
def parse_section2_txt(self, url, keystr):
self.get_fetchdate()
soup = self.index_to_soup(url)
a = soup.findAll('a', href=True)
a.reverse()
current_articles = []
included_urls = []
for i in a:
title = self.tag_to_string(i)
url = 'http://life.mingpao.com/cfm/' + i.get('href', False)
if (url not in included_urls) and (not url.rfind('.txt') == -1) and (not url.rfind(keystr) == -1):
url = url.replace('cfm/dailynews3.cfm?File=', 'ftp/Life3/') # use printed version of the article
current_articles.append({'title': title, 'url': url, 'description': ''})
included_urls.append(url)
current_articles.reverse()
return current_articles
# parse from www.mingpaovan.com
def parse_section3(self, url, baseUrl):
self.get_fetchdate()
@ -438,6 +472,39 @@ class MPRecipe(BasicNewsRecipe):
current_articles.reverse()
return current_articles
# preprocess those .txt based files
def preprocess_raw_html(self, raw_html, url):
if url.rfind('ftp') == -1:
return raw_html
else:
splitter = re.compile(r'\n') # Match non-digits
new_raw_html = '<html><head><title>Untitled</title></head><body><div class="images">'
next_is_img_txt = False
title_started = False
met_article_start_char = False
for item in splitter.split(raw_html):
if item.startswith(u'\u3010'):
met_article_start_char = True
new_raw_html = new_raw_html + '</div><div class="content"><p>' + item + '<p>\n'
else:
if next_is_img_txt == False:
if item.startswith('='):
next_is_img_txt = True
new_raw_html += '<img src="' + str(item)[1:].strip() + '.jpg" /><p>\n'
else:
if met_article_start_char == False:
if title_started == False:
new_raw_html = new_raw_html + '</div><div class="heading">' + item + '\n'
title_started = True
else:
new_raw_html = new_raw_html + item + '\n'
else:
new_raw_html = new_raw_html + item + '<p>\n'
else:
next_is_img_txt = False
new_raw_html = new_raw_html + item + '\n'
return new_raw_html + '</div></body></html>'
def preprocess_html(self, soup):
for item in soup.findAll(style=True):
del item['style']
@ -591,4 +658,3 @@ class MPRecipe(BasicNewsRecipe):
with nested(open(opf_path, 'wb'), open(ncx_path, 'wb')) as (opf_file, ncx_file):
opf.render(opf_file, ncx_file)

View File

@ -37,24 +37,24 @@ class NikkeiNet_paper_subscription(BasicNewsRecipe):
#br.set_debug_responses(True)
if self.username is not None and self.password is not None:
print "----------------------------open top page----------------------------------------"
print "-------------------------open top page-------------------------------------"
br.open('http://www.nikkei.com/')
print "----------------------------open first login form--------------------------------"
print "-------------------------open first login form-----------------------------"
link = br.links(url_regex="www.nikkei.com/etc/accounts/login").next()
br.follow_link(link)
#response = br.response()
#print response.get_data()
print "----------------------------JS redirect(send autoPostForm)-----------------------"
print "-------------------------JS redirect(send autoPostForm)--------------------"
br.select_form(name='autoPostForm')
br.submit()
#response = br.response()
print "----------------------------got login form---------------------------------------"
print "-------------------------got login form------------------------------------"
br.select_form(name='LA0210Form01')
br['LA0210Form01:LA0210Email'] = self.username
br['LA0210Form01:LA0210Password'] = self.password
br.submit()
#response = br.response()
print "----------------------------JS redirect------------------------------------------"
print "-------------------------JS redirect---------------------------------------"
br.select_form(nr=0)
br.submit()
@ -64,18 +64,23 @@ class NikkeiNet_paper_subscription(BasicNewsRecipe):
return br
def cleanup(self):
print "----------------------------logout-----------------------------------------------"
print "-------------------------logout--------------------------------------------"
self.browser.open('https://regist.nikkei.com/ds/etc/accounts/logout')
def parse_index(self):
print "----------------------------get index of paper-----------------------------------"
print "-------------------------get index of paper--------------------------------"
result = []
soup = self.index_to_soup('http://www.nikkei.com/paper/')
#soup = self.index_to_soup(self.test_data())
for sect in soup.findAll('div', 'cmn-section kn-special JSID_baseSection'):
sections = soup.findAll('div', 'cmn-section kn-special JSID_baseSection')
if len(sections) == 0:
sections = soup.findAll('div', 'cmn-section kn-special')
for sect in sections:
sect_title = sect.find('h3', 'cmnc-title').string
sect_result = []
for elem in sect.findAll(attrs={'class':['cmn-article_title']}):
if elem.span.a == None or elem.span.a['href'].startswith('javascript') :
continue
url = 'http://www.nikkei.com' + elem.span.a['href']
url = re.sub("/article/", "/print-article/", url) # print version.
span = elem.span.a.span
@ -84,6 +89,5 @@ class NikkeiNet_paper_subscription(BasicNewsRecipe):
sect_result.append(dict(title=title, url=url, date='',
description='', content=''))
result.append([sect_title, sect_result])
#pp.pprint(result)
return result

34
recipes/ntv_spor.recipe Normal file
View File

@ -0,0 +1,34 @@
from calibre.web.feeds.news import BasicNewsRecipe
class AdvancedUserRecipe1313512459(BasicNewsRecipe):
title = u'NTVSpor'
__author__ = 'A Erdogan'
description = 'News from Turkey'
publisher = 'NTVSpor.net'
category = 'sports, Turkey'
oldest_article = 7
max_articles_per_feed = 100
no_stylesheets = True
use_embedded_content = False
masthead_url = 'http://www.ntvspor.net/HTML/r/i/l.png'
language = 'tr'
extra_css ='''
body{font-family:Arial,Helvetica,sans-serif; font-size:small; align:left; color:#000000}
h1{font-size:large; color:#000000}
h2{font-size:small; color:#000000}
p{font-size:small; color:#000000}
'''
conversion_options = {
'comment' : description
, 'tags' : category
, 'publisher' : publisher
, 'language' : language
}
remove_tags = [dict(name=['embed','il','ul','iframe','object','link','base']), dict(name='div', attrs={'id':'contentPhotoGallery'}), dict(name='div', attrs={'class':'SocialMediaWrapper'}), dict(name='div', attrs={'class':'grid2'}), dict(name='div', attrs={'class':'grid8'}), dict(name='div', attrs={'id':'anonsBar'}), dict(name='div', attrs={'id':'header'})]
remove_tags_before = dict(name='h1', attrs={'style':['margin-top: 6px;']})
remove_tags_after = dict(name='div', attrs={'id':'newsBody'})
feeds = [(u'NTVSpor', u'http://www.ntvspor.net/Rss/anasayfa')]

45
recipes/ntv_tr.recipe Normal file
View File

@ -0,0 +1,45 @@
from calibre.web.feeds.news import BasicNewsRecipe
class NTVMSNBC(BasicNewsRecipe):
title = u'NTV'
__author__ = 'A Erdogan'
description = 'News from Turkey'
publisher = 'NTV'
category = 'news, politics, Turkey'
oldest_article = 7
max_articles_per_feed = 100
no_stylesheets = True
use_embedded_content = False
masthead_url = 'http://www.ntvmsnbc.com/images/MSNBC/msnbc_ban.gif'
language = 'tr'
remove_tags_before = dict(name='h1')
remove_tags_after = dict(attrs={'id':'haberDetayYazi'})
extra_css ='''
body{font-family:Arial,Helvetica,sans-serif; font-size:small; align:left; color:#000000}
h1{font-size:large; color:#000000}
h2{font-size:small; color:#000000}
p{font-size:small; color:#000000}
'''
conversion_options = {
'comment' : description
, 'tags' : category
, 'publisher' : publisher
, 'language' : language
}
remove_tags = [dict(name=['embed','il','ul','iframe','object','link','base']), dict(name='div', attrs={'style':['padding: 0pt 10px 10px;']}), dict(name='div', attrs={'style':['padding: 0pt 10px 10px;']}), dict(name='div', attrs={'class':['textSmallGrey w320']}), dict(name='div', attrs={'style':['font-family:Arial; font-size:16px;font-weight:bold; font-color:#003366; margin-bottom:20px; margin-top:20px; border-bottom:solid 1px;border-color: #CCC; padding-bottom:2px;']})]
remove_tags_before = dict(name='h1')
remove_tags_after = dict(name='div', attrs={'style':['font-family:Arial; font-size:16px;font-weight:bold; font-color:#003366; margin-bottom:20px; margin-top:20px; border-bottom:solid 1px;border-color: #CCC; padding-bottom:2px;']})
feeds = [(u'NTV', u'http://www.ntvmsnbc.com/id/3032091/device/rss/rss.xml')]
def print_version(self, url):
articleid = url.rpartition('/id/')[2]
return 'http://www.ntvmsnbc.com/id/' + articleid + '/print/1/displaymode/1098/'
def preprocess_html(self, soup):
return self.adeify_images(soup)

View File

@ -0,0 +1,95 @@
#!/usr/bin/env python
__license__ = 'GPL v3'
__copyright__ = '2008, Kovid Goyal <kovid at kovidgoyal.net>'
'''
pagina12.com.ar
'''
import re
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import Tag, NavigableString
class Pagina12(BasicNewsRecipe):
title = 'Pagina/12 - Edicion Impresa'
__author__ = 'Pablo Marfil'
description = 'Diario argentino'
INDEX = 'http://www.pagina12.com.ar/diario/secciones/index.html'
language = 'es'
encoding = 'cp1252'
remove_tags_before = dict(id='fecha')
remove_tags_after = dict(id='fin')
remove_tags = [dict(id=['fecha', 'fin', 'pageControls','logo','logo_suple','fecha_suple','volver'])]
masthead_url = 'http://www.pagina12.com.ar/commons/imgs/logo-home.gif'
no_stylesheets = True
preprocess_regexps= [(re.compile(r'<!DOCTYPE[^>]+>', re.I), lambda m:'')]
def get_cover_url(self):
soup = self.index_to_soup('http://www.pagina12.com.ar/diario/principal/diario/index.html')
for image in soup.findAll('img',alt=True):
if image['alt'].startswith('Tapa de la fecha'):
return image['src']
print image
return None
def parse_index(self):
articles = []
numero = 1
raw = self.index_to_soup('http://www.pagina12.com.ar/diario/secciones/index.html', raw=True)
raw = re.sub(r'(?i)<!DOCTYPE[^>]+>', '', raw)
soup = self.index_to_soup(raw)
feeds = []
seen_titles = set([])
for section in soup.findAll('div','seccionx'):
numero+=1
print (numero)
section_title = self.tag_to_string(section.find('div','desplegable_titulo on_principal right'))
self.log('Found section:', section_title)
articles = []
for post in section.findAll('h2'):
h = post.find('a', href=True)
title = self.tag_to_string(h)
if title in seen_titles:
continue
seen_titles.add(title)
a = post.find('a', href=True)
url = a['href']
if url.startswith('/'):
url = 'http://pagina12.com.ar/imprimir'+url
p = post.find('div', attrs={'h2'})
desc = None
self.log('\tFound article:', title, 'at', url)
if p is not None:
desc = self.tag_to_string(p)
self.log('\t\t', desc)
articles.append({'title':title, 'url':url, 'description':desc,
'date':''})
if articles:
feeds.append((section_title, articles))
return feeds
def postprocess_html(self, soup, first):
for table in soup.findAll('table', align='right'):
img = table.find('img')
if img is not None:
img.extract()
caption = self.tag_to_string(table).strip()
div = Tag(soup, 'div')
div['style'] = 'text-align:center'
div.insert(0, img)
div.insert(1, Tag(soup, 'br'))
if caption:
div.insert(2, NavigableString(caption))
table.replaceWith(div)
return soup

View File

@ -51,14 +51,13 @@ class pcWorld(BasicNewsRecipe):
keep_only_tags = [
dict(name='div', attrs={'class':'article'})
]
remove_tags = [
dict(name='div', attrs={'class':['toolBar','mac_tags','toolBar btmTools','recommend longRecommend','recommend shortRecommend','textAds']}),
dict(name='div', attrs={'id':['sidebar','comments','mac_tags']}),
dict(name='ul', attrs={'class':'tools'}),
dict(name='li', attrs={'class':'sub'})
dict(name='ul', attrs={'class':['tools', 'tools clearfix']}),
dict(name='li', attrs={'class':'sub'}),
dict(name='p', attrs={'id':'userDesire'})
]
feeds = [
(u'PCWorld Headlines', u'http://feeds.pcworld.com/pcworld/latestnews'),
(u'How-To', u'http://feeds.pcworld.com/pcworld/update/howto'),

View File

@ -14,54 +14,11 @@ class PeopleMag(BasicNewsRecipe):
use_embedded_content = False
oldest_article = 2
max_articles_per_feed = 50
use_embedded_content = False
extra_css = '''
h1{font-family:verdana,arial,helvetica,sans-serif; font-size: large;}
h2{font-family:verdana,arial,helvetica,sans-serif; font-size: small;}
.body-content{font-family:verdana,arial,helvetica,sans-serif; font-size: small;}
.byline {font-size: small; color: #666666; font-style:italic; }
.lastline {font-size: small; color: #666666; font-style:italic;}
.contact {font-size: small; color: #666666;}
.contact p {font-size: small; color: #666666;}
.photoCaption { font-family:verdana,arial,helvetica,sans-serif; font-size:x-small;}
.photoCredit{ font-family:verdana,arial,helvetica,sans-serif; font-size:x-small; color:#666666;}
.article_timestamp{font-size:x-small; color:#666666;}
a {font-family:verdana,arial,helvetica,sans-serif; font-size: x-small;}
'''
keep_only_tags = [
dict(name='div', attrs={'class': 'panel_news_article_main'}),
dict(name='div', attrs={'class':'article_content'}),
dict(name='div', attrs={'class': 'headline'}),
dict(name='div', attrs={'class': 'post'}),
dict(name='div', attrs={'class': 'packageheadlines'}),
dict(name='div', attrs={'class': 'snap_preview'}),
dict(name='div', attrs={'id': 'articlebody'})
]
remove_tags = [
dict(name='div', attrs={'class':'share_comments'}),
dict(name='p', attrs={'class':'twitter_facebook'}),
dict(name='div', attrs={'class':'share_comments_bottom'}),
dict(name='h2', attrs={'id':'related_content'}),
dict(name='div', attrs={'class':'next_article'}),
dict(name='div', attrs={'class':'prev_article'}),
dict(name='ul', attrs={'id':'sharebar'}),
dict(name='div', attrs={'class':'sharelinkcont'}),
dict(name='div', attrs={'class':'categories'}),
dict(name='ul', attrs={'class':'categories'}),
dict(name='div', attrs={'class':'related_content'}),
dict(name='div', attrs={'id':'promo'}),
dict(name='div', attrs={'class':'linksWrapper'}),
dict(name='p', attrs={'class':'tag tvnews'}),
dict(name='p', attrs={'class':'tag movienews'}),
dict(name='p', attrs={'class':'tag musicnews'}),
dict(name='p', attrs={'class':'tag couples'}),
dict(name='p', attrs={'class':'tag gooddeeds'}),
dict(name='p', attrs={'class':'tag weddings'}),
dict(name='p', attrs={'class':'tag health'})
]
no_stylesheets = True
auto_cleanup = True
auto_cleanup_keep = '//div[@id="article-image"]'
feeds = [
@ -69,26 +26,4 @@ class PeopleMag(BasicNewsRecipe):
('US Headlines', 'http://www.usmagazine.com/celebrity_news/rss')
]
def get_article_url(self, article):
ans = article.link
try:
self.log('Looking for full story link in', ans)
soup = self.index_to_soup(ans)
x = soup.find(text="View All")
if x is not None:
ans = ans + '?viewAll=y'
self.log('Found full story link', ans)
except:
pass
return ans
def postprocess_html(self, soup,first):
for tag in soup.findAll(name='div',attrs={'class':"container_ate_qandatitle"}):
tag.extract()
for tag in soup.findAll(name='br'):
tag.extract()
return soup

View File

@ -1,45 +1,35 @@
#!/usr/bin/env python
from calibre.web.feeds.news import BasicNewsRecipe
class AdvancedUserRecipe1308312288(BasicNewsRecipe):
class BasicUserRecipe1314970845(BasicNewsRecipe):
title = u'Philadelphia Inquirer'
__author__ = 'sexymax15'
oldest_article = 3
max_articles_per_feed = 50
auto_cleanup = True
language= 'en'
description = 'Daily news from the Philadelphia Inquirer'
oldest_article = 15
max_articles_per_feed = 20
use_embedded_content = False
remove_empty_feeds = True
no_stylesheets = True
remove_javascript = True
__author__ = 'bing'
requires_version = (0, 8, 16)
# remove_tags_before = {'class':'article_timestamp'}
#remove_tags_after = {'class':'graylabel'}
keep_only_tags= [dict(name=['h1','p'])]
remove_tags = [dict(name=['hr','dl','dt','img','meta','iframe','link','script','form','input','label']),
dict(id=['toggleConfirmEmailDiv','toggleTOS','toggleUsernameMsgDiv','toggleConfirmYear','navT1_philly','secondaryNav','navPlacement','globalPrimaryNav'
,'ugc-footer-philly','bv_footer_include','footer','header',
'container_rag_bottom','section_rectangle','contentrightside'])
,{'class':['megamenu3 megamenu','container misc','container_inner misc_inner'
,'misccontainer_left_32','headlineonly','misccontainer_middle_32'
,'misccontainer_right_32','headline formBegin',
'post_balloon','relatedlist','linkssubhead','b_sq','dotted-rule-above'
,'container','headlines-digest','graylabel','container_inner'
,'rlinks_colorbar1','rlinks_colorbar2','supercontainer','container_5col_left','container_image_left',
'digest-headline2','digest-lead','container_5col_leftmiddle',
'container_5col_middlemiddle','container_5col_rightmiddle'
,'container_5col_right','divclear','supercontainer_outer force-width',
'supercontainer','containertitle kicker-title',
'pollquestion','pollchoice','photomore','pollbutton','container rssbox','containertitle video ',
'containertitle_image ','container_tabtwo','selected'
,'shadetabs','selected','tabcontentstyle','tabcontent','inner_container'
,'arrow','container_ad','containertitlespacer','adUnit','tracking','sitemsg_911 clearfix']}]
extra_css = """
h1{font-family: Georgia,serif; font-size: xx-large}
"""
feeds = [(u'News', u'http://www.philly.com/philly_news.rss')]
feeds = [
(u'Front Page', u'http://www.philly.com/inquirer_front_page.rss'),
(u'Philly.com News', u'http://www.philly.com/philly_news.rss'),
(u'National/World (Philly.com)', u'http://www.philly.com/philly_news_nation.rss'),
(u'Politics (Philly.com)', u'http://www.philly.com/philly_politics.rss'),
(u'Local (Philly.com)', u'http://www.philly.com/philly_news_local.rss'),
(u'South Jersey News', u'http://www.philly.com/inq_news_south_jersey.rss'),
(u'Sports', u'http://www.philly.com/inquirer_sports.rss'),
(u'Tech News', u'http://www.philly.com/philly_tech.rss'),
(u'Daily Magazine', u'http://www.philly.com/inq_magazine_daily.rss'),
(u'Weekend', u'http://www.philly.com/inq_entertainment_weekend.rss'),
(u'Business', u'http://www.philly.com/inq_business.rss'),
(u'Education', u'http://www.philly.com/inquirer_education.rss'),
(u'Books', u'http://www.philly.com/inq_books.rss'),
(u'Entertainment', u'http://www.philly.com/inq_entertainment.rss'),
(u'Food', u'http://www.philly.com/inq_food.rss'),
(u'Health and Science', u'http://www.philly.com/inquirer_health_science.rss'),
(u'Home and Design', u'http://www.philly.com/inq_home_design.rss'),
(u'News Columnists', u'http://www.philly.com/inq_columnists.rss'),
(u'Editorial', u'http://www.philly.com/inq_news_editorial.rss'),
(u'Travel', u'http://www.philly.com/inquirer_travel.rss'),
(u'Obituaries', u'http://www.philly.com/inquirer_obituaries.rss')
]

64
recipes/rtnews.recipe Normal file
View File

@ -0,0 +1,64 @@
__license__ = 'GPL v3'
__copyright__ = '2011, Darko Miletic <darko.miletic at gmail.com>'
'''
rt.com
'''
from calibre.web.feeds.news import BasicNewsRecipe
class RT_eng(BasicNewsRecipe):
title = 'RT in English'
__author__ = 'Darko Miletic'
description = 'RT is the first Russian 24/7 English-language news channel which brings the Russian view on global news.'
publisher = 'Autonomous Nonprofit Organization "TV-Novosti"'
category = 'news, politics, economy, finances, Russia, world'
oldest_article = 2
no_stylesheets = True
encoding = 'utf8'
masthead_url = 'http://rt.com/s/css/img/printlogo.gif'
use_embedded_content = False
remove_empty_feeds = True
language = 'en_RU'
publication_type = 'newsportal'
extra_css = """
body{font-family: Arial,Helvetica,sans-serif}
h1{font-family: Georgia,"Times New Roman",Times,serif}
.grey{color: gray}
.fs12{font-size: small}
"""
conversion_options = {
'comment' : description
, 'tags' : category
, 'publisher': publisher
, 'language' : language
}
keep_only_tags = [dict(name='div', attrs={'class':'all'})]
remove_tags = [
dict(name=['object','link','embed','iframe','meta','link'])
,dict(attrs={'class':'crumbs oh'})
]
remove_attributes = ['clear']
feeds = [
(u'Politics' , u'http://rt.com/politics/rss/' )
,(u'USA' , u'http://rt.com/usa/news/rss/' )
,(u'Business' , u'http://rt.com/business/news/rss/' )
,(u'Sport' , u'http://rt.com/sport/rss/' )
,(u'Art&Culture', u'http://rt.com/art-and-culture/news/rss/')
]
def print_version(self, url):
return url + 'print/'
def preprocess_html(self, soup):
for item in soup.findAll(style=True):
del item['style']
for item in soup.findAll('a'):
str = item.string
if str is None:
str = self.tag_to_string(item)
item.replaceWith(str)
return soup

View File

@ -1,5 +1,5 @@
__license__ = 'GPL v3'
__copyright__ = '2009-2011, Darko Miletic <darko.miletic at gmail.com>'
__copyright__ = '2011, M. Ching modified from work 2009-2011 Darko Miletic <darko.miletic at gmail.com>'
'''
staradvertiser.com
'''
@ -7,12 +7,13 @@ staradvertiser.com
from calibre.web.feeds.news import BasicNewsRecipe
class Starbulletin(BasicNewsRecipe):
title = 'Honolulu Star Advertiser'
title = 'Honolulu Star-Advertiser'
__author__ = 'Darko Miletic'
description = 'Latest national and local Hawaii sports news'
publisher = 'Honolulu Star-Advertiser'
category = 'news, Honolulu, Hawaii'
oldest_article = 2
needs_subscription = True
max_articles_per_feed = 100
language = 'en'
no_stylesheets = True
@ -20,12 +21,12 @@ class Starbulletin(BasicNewsRecipe):
encoding = 'utf8'
publication_type = 'newspaper'
masthead_url = 'http://media.staradvertiser.com/designimages/star-advertiser-logo-small.gif'
extra_css = """
body{font-family: Verdana,Arial,Helvetica,sans-serif}
h1,.brown,.postCredit{color: #663300}
.storyDeck{font-size: 1.2em; font-weight: bold}
img{display: block}
"""
# extra_css = """
# body{font-family: Verdana,Arial,Helvetica,sans-serif}
# h1,.brown,.hsa_postCredit{color: #663300}
# .storyDeck{font-size: 1.2em; font-weight: bold}
# img{display: block}
# """
conversion_options = {
'comment' : description
@ -35,25 +36,37 @@ class Starbulletin(BasicNewsRecipe):
, 'linearize_tables' : True
}
keep_only_tags = [
dict(attrs={'id':'storyTitle'})
,dict(attrs={'class':['storyDeck','postCredit']})
,dict(name='span',attrs={'class':'brown'})
dict(attrs={'id':'hsa_storyTitle'})
,dict(attrs={'id':'hsa_storyTitle article-important'})
,dict(attrs={'class':['hsa_dateStamp','hsa_postCredit','storyDeck']})
,dict(name='span',attrs={'class':['hsa_dateStamp','hsa_postCredit']})
,dict(name='span',attrs={'class':['hsa_dateStamp article-important','hsa_postCredit article-important']})
,dict(name='div',attrs={'class':'storytext article-important'})
,dict(name='div',attrs={'class':'storytext'})
]
remove_tags = [
dict(name=['object','link','script','span','meta','base','iframe'])
dict(name=['object','link','script','meta','base','iframe'])
# removed 'span' from preceding list to permit keeping of author and timestamp
,dict(attrs={'class':['insideStoryImage','insideStoryAd']})
,dict(attrs={'name':'fb_share'})
]
def get_browser(self):
br = BasicNewsRecipe.get_browser()
if self.username is not None and self.password is not None:
br.open('http://www.staradvertiser.com/manage/Login/')
br.select_form(name='loginForm')
br['email'] = self.username
br['password'] = self.password
br.submit()
return br
feeds = [
(u'Headlines' , u'http://www.staradvertiser.com/staradvertiser_headlines.rss' )
,(u'News' , u'http://www.staradvertiser.com/news/index.rss' )
,(u'Sports' , u'http://www.staradvertiser.com/sports/index.rss' )
,(u'Features' , u'http://www.staradvertiser.com/features/index.rss' )
,(u'Editorials', u'http://www.staradvertiser.com/editorials/index.rss' )
,(u'Business' , u'http://www.staradvertiser.com/business/index.rss' )
,(u'Travel' , u'http://www.staradvertiser.com/travel/index.rss' )
(u'Breaking News', u'http://www.staradvertiser.com/news/breaking/index.rss')
,(u'News', u'http://www.staradvertiser.com/newspremium/index.rss')
,(u'Business', u'http://www.staradvertiser.com/businesspremium/index.rss')
,(u'Sports', u'http://www.staradvertiser.com/sportspremium/index.rss')
,(u'Features', u'http://www.staradvertiser.com/featurespremium/index.rss')
]
def preprocess_html(self, soup):

12
recipes/tablety_pl.recipe Normal file
View File

@ -0,0 +1,12 @@
from calibre.web.feeds.news import BasicNewsRecipe
class Tablety_pl(BasicNewsRecipe):
title = u'Tablety.pl'
__author__ = 'fenuks'
description = u'tablety.pl - latest tablet news'
cover_url = 'http://www.tablety.pl/wp-content/themes/kolektyw/img/logo.png'
category = 'IT'
language = 'pl'
oldest_article = 8
max_articles_per_feed = 100
feeds = [(u'Najnowsze posty', u'http://www.tablety.pl/feed/')]

30
recipes/taipei.recipe Normal file
View File

@ -0,0 +1,30 @@
from calibre.web.feeds.news import BasicNewsRecipe
class TN(BasicNewsRecipe):
title = u'Taipei Times'
language = 'en_CN'
__author__ = 'Krittika Goyal'
oldest_article = 1 #days
max_articles_per_feed = 25
use_embedded_content = False
no_stylesheets = True
auto_cleanup = True
auto_cleanup_keep = '//*[@class="main_ipic"]'
feeds = [
('Editorials',
'http://www.taipeitimes.com/xml/editorials.rss'),
('Taiwan',
'http://www.taipeitimes.com/xml/taiwan.rss'),
('Features',
'http://www.taipeitimes.com/xml/feat.rss'),
('Business',
'http://www.taipeitimes.com/xml/biz.rss'),
('World',
'http://www.taipeitimes.com/xml/world.rss'),
('Sports',
'http://www.taipeitimes.com/xml/sport.rss'),
]

View File

@ -15,12 +15,12 @@ class Time(BasicNewsRecipe):
# ' publish complete articles on the web.')
title = u'Time'
__author__ = 'Kovid Goyal'
description = 'Weekly magazine'
description = ('Weekly US magazine.')
encoding = 'utf-8'
no_stylesheets = True
language = 'en'
remove_javascript = True
#needs_subscription = 'optional'
keep_only_tags = [
{
@ -41,6 +41,21 @@ class Time(BasicNewsRecipe):
preprocess_regexps = [(re.compile(
r'<meta .+/>'), lambda m:'')]
def get_browser(self):
br = BasicNewsRecipe.get_browser(self)
if False and self.username and self.password:
# This site uses javascript in its login process
res = br.open('http://www.time.com/time/magazine')
br.select_form(nr=1)
br['username'] = self.username
br['password'] = self.password
res = br.submit()
raw = res.read()
if '>Log Out<' not in raw:
raise ValueError('Failed to login to time.com, check'
' your username and password')
return br
def parse_index(self):
raw = self.index_to_soup('http://www.time.com/time/magazine', raw=True)
root = html.fromstring(raw)

View File

@ -1,12 +1,9 @@
#!/usr/bin/env python
__license__ = 'GPL v3'
__copyright__ = '2009, Darko Miletic <darko.miletic at gmail.com>'
__copyright__ = '2009-2011, Darko Miletic <darko.miletic at gmail.com>'
'''
twitchfilm.net/site/
twitchfilm.net/news/
'''
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import Tag
class Twitchfilm(BasicNewsRecipe):
title = 'Twitch Films'
@ -15,29 +12,46 @@ class Twitchfilm(BasicNewsRecipe):
oldest_article = 30
max_articles_per_feed = 100
no_stylesheets = True
use_embedded_content = True
use_embedded_content = False
encoding = 'utf-8'
publisher = 'Twitch'
masthead_url = 'http://twitchfilm.com/img/logo.png'
category = 'twitch, twitchfilm, movie news, movie reviews, cult cinema, independent cinema, anime, foreign cinema, geek talk'
language = 'en'
lang = 'en-US'
conversion_options = {
'comment' : description
, 'tags' : category
, 'publisher': publisher
, 'language' : lang
, 'pretty_print' : True
, 'language' : language
}
remove_tags = [dict(name='div', attrs={'class':'feedflare'})]
keep_only_tags=[dict(attrs={'class':'asset-header'})]
remove_tags_after=dict(attrs={'class':'asset-body'})
remove_tags = [ dict(name='div', attrs={'class':['social','categories']})
, dict(attrs={'id':'main-asset'})
, dict(name=['meta','link','iframe','embed','object'])
]
feeds = [(u'News', u'http://feedproxy.google.com/TwitchEverything')]
feeds = [(u'News', u'http://feeds.twitchfilm.net/TwitchEverything')]
def preprocess_html(self, soup):
mtag = Tag(soup,'meta',[('http-equiv','Content-Type'),('context','text/html; charset=utf-8')])
soup.head.insert(0,mtag)
soup.html['lang'] = self.lang
return self.adeify_images(soup)
for item in soup.findAll(style=True):
del item['style']
for item in soup.findAll('a'):
limg = item.find('img')
if item.string is not None:
str = item.string
item.replaceWith(str)
else:
if limg:
item.name = 'div'
item.attrs = []
else:
str = self.tag_to_string(item)
item.replaceWith(str)
for item in soup.findAll('img'):
if not item.has_key('alt'):
item['alt'] = 'image'
return soup

16
recipes/ubuntu_pl.recipe Normal file
View File

@ -0,0 +1,16 @@
from calibre.web.feeds.news import BasicNewsRecipe
class Ubuntu_pl(BasicNewsRecipe):
title = u'UBUNTU.pl'
__author__ = 'fenuks'
description = 'UBUNTU.pl - polish ubuntu community site'
cover_url = 'http://ubuntu.pl/img/logo.jpg'
category = 'linux, IT'
language = 'pl'
no_stylesheets = True
oldest_article = 8
max_articles_per_feed = 100
extra_css = '#main {text-align:left;}'
keep_only_tags= [dict(name='td', attrs={'class':'teaser-node-mc'}), dict(name='h3', attrs={'class':'entry-title'}), dict(name='div', attrs={'class':'entry-content'})]
remove_tags_after= [dict(name='div' , attrs={'class':'content'})]
feeds = [('Czytelnia Ubuntu', 'http://feeds.feedburner.com/ubuntu-czytelnia'), (u'WikiGames', u'http://feeds.feedburner.com/WikiGames')]

View File

@ -13,6 +13,7 @@ class USAToday(BasicNewsRecipe):
title = 'USA Today'
__author__ = 'Kovid Goyal'
oldest_article = 1
publication_type = 'newspaper'
timefmt = ''
max_articles_per_feed = 20
language = 'en'

View File

@ -94,9 +94,11 @@ class WallStreetJournal(BasicNewsRecipe):
if date is not None:
self.timefmt = ' [%s]'%self.tag_to_string(date)
cov = soup.find('a', attrs={'class':'icon pdf'}, href=True)
cov = soup.find('div', attrs={'class':'itpSectionHeaderPdf'})
if cov is not None:
self.cover_url = cov['href']
a = cov.find('a', href=True)
if a is not None:
self.cover_url = a['href']
feeds = []
div = soup.find('div', attrs={'class':'itpHeader'})

View File

@ -61,18 +61,27 @@ authors_completer_append_separator = False
# selecting 'manage authors', and pressing 'Recalculate all author sort values'.
# The author name suffixes are words that are ignored when they occur at the
# end of an author name. The case of the suffix is ignored and trailing
# periods are automatically handled.
# periods are automatically handled. The same is true for prefixes.
# The author name copy words are a set of words which if they occur in an
# author name cause the automatically geenrated author sort string to be
# author name cause the automatically generated author sort string to be
# identical to the author name. This means that the sort for a string like Acme
# Inc. will be Acme Inc. instead of Inc., Acme
author_sort_copy_method = 'comma'
author_name_suffixes = ('Jr', 'Sr', 'Inc', 'Ph.D', 'Phd',
'MD', 'M.D', 'I', 'II', 'III', 'IV',
'Junior', 'Senior')
author_name_prefixes = ('Mr', 'Mrs', 'Ms', 'Dr', 'Prof')
author_name_copywords = ('Corporation', 'Company', 'Co.', 'Agency', 'Council',
'Committee', 'Inc.', 'Institute', 'Society', 'Club', 'Team')
#: Splitting multiple author names
# By default, calibre splits a string containing multiple author names on
# ampersands and the words "and" and "with". You can customize the splitting
# by changing the regular expression below. Strings are split on whatever the
# specified regular expression matches.
# Default: r'(?i),?\s+(and|with)\s+'
authors_split_regex = r'(?i),?\s+(and|with)\s+'
#: Use author sort in Tag Browser
# Set which author field to display in the tags pane (the list of authors,
# series, publishers etc on the left hand side). The choices are author and

View File

@ -98,7 +98,7 @@
<xsl:apply-templates/>
</emph>
</xsl:when>
<xsl:when test = "@underlined">
<xsl:when test = "@underlined and @underlined != 'false'">
<emph rend = "paragraph-emph-underlined">
<xsl:apply-templates/>
</emph>
@ -220,7 +220,7 @@
</xsl:template>
<xsl:template name="parse-styles-attrs">
<!--<xsl:text>position:relative;</xsl:text>-->
<!--<xsl:text>position:relative;</xsl:text>
<xsl:if test="@space-before">
<xsl:text>padding-top:</xsl:text>
<xsl:value-of select="@space-before"/>
@ -230,7 +230,7 @@
<xsl:text>padding-bottom:</xsl:text>
<xsl:value-of select="@space-after"/>
<xsl:text>pt;</xsl:text>
</xsl:if>
</xsl:if>-->
<xsl:if test="@left-indent">
<xsl:text>padding-left:</xsl:text>
<xsl:value-of select="@left-indent"/>
@ -256,15 +256,15 @@
<xsl:value-of select="'italic'"/>
<xsl:text>;</xsl:text>
</xsl:if>
<xsl:if test="@underline and @underline != 'false'">
<xsl:if test="@underlined and @underlined != 'false'">
<xsl:text>text-decoration:underline</xsl:text>
<xsl:text>;</xsl:text>
</xsl:if>
<xsl:if test="@line-spacing">
<!--<xsl:if test="@line-spacing">
<xsl:text>line-height:</xsl:text>
<xsl:value-of select="@line-spacing"/>
<xsl:text>pt;</xsl:text>
</xsl:if>
</xsl:if>-->
<xsl:if test="(@align = 'just')">
<xsl:text>text-align: justify;</xsl:text>
</xsl:if>
@ -314,7 +314,6 @@
</xsl:attribute>
<xsl:apply-templates/>
</xsl:element>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
@ -446,8 +445,15 @@
<xsl:template match = "rtf:field[@type='hyperlink']">
<xsl:element name ="a">
<xsl:attribute name = "href">
<xsl:value-of select = "@link"/>
<xsl:attribute name = "href"><xsl:if test="not(contains(@link, '/'))">#</xsl:if><xsl:value-of select = "@link"/></xsl:attribute>
<xsl:apply-templates/>
</xsl:element>
</xsl:template>
<xsl:template match = "rtf:field[@type='bookmark-start']">
<xsl:element name ="a">
<xsl:attribute name = "id">
<xsl:value-of select = "@number"/>
</xsl:attribute>
<xsl:apply-templates/>
</xsl:element>

View File

@ -63,10 +63,10 @@ class Check(Command):
for f in x[-1]:
y = self.j(x[0], f)
mtime = os.stat(y).st_mtime
if f.endswith('.py') and f not in ('ptempfile.py', 'feedparser.py',
'pyparsing.py', 'markdown.py') and \
'genshi' not in y and cache.get(y, 0) != mtime and \
'prs500/driver.py' not in y:
if (f.endswith('.py') and f not in ('ptempfile.py', 'feedparser.py',
'pyparsing.py', 'markdown.py') and
'genshi' not in y and cache.get(y, 0) != mtime and
'prs500/driver.py' not in y):
yield y, mtime
for x in os.walk(self.j(self.d(self.SRC), 'recipes')):

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -8,17 +8,18 @@
msgid ""
msgstr ""
"Project-Id-Version: iso_639_3\n"
"Report-Msgid-Bugs-To: Debian iso-codes team <pkg-isocodes-devel@lists.alioth."
"debian.org>\n"
"POT-Creation-Date: 2011-05-27 14:59+0200\n"
"PO-Revision-Date: 2011-07-10 19:40+0300\n"
"Last-Translator: Roumen Petrov <transl@roumenpetrov.info>\n"
"Report-Msgid-Bugs-To: Debian iso-codes team <pkg-isocodes-"
"devel@lists.alioth.debian.org>\n"
"POT-Creation-Date: 2011-09-02 16:21+0000\n"
"PO-Revision-Date: 2011-08-27 04:12+0000\n"
"Last-Translator: Roumen Petrov <Unknown>\n"
"Language-Team: Bulgarian <dict@fsa-bg.org>\n"
"Language: bg\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Plural-Forms: nplurals=2; plural=(n != 1);\n"
"X-Launchpad-Export-Date: 2011-09-03 04:56+0000\n"
"X-Generator: Launchpad (build 13830)\n"
"Language: bg\n"
#. name for aaa
msgid "Ghotuo"

File diff suppressed because it is too large Load Diff

View File

@ -9,16 +9,18 @@
msgid ""
msgstr ""
"Project-Id-Version: iso_639_3\n"
"Report-Msgid-Bugs-To: Debian iso-codes team <pkg-isocodes-devel@lists.alioth."
"debian.org>\n"
"POT-Creation-Date: 2011-05-27 14:59+0200\n"
"PO-Revision-Date: 2009-04-19 06:03+0000\n"
"Last-Translator: Denis <Unknown>\n"
"Report-Msgid-Bugs-To: Debian iso-codes team <pkg-isocodes-"
"devel@lists.alioth.debian.org>\n"
"POT-Creation-Date: 2011-09-02 16:21+0000\n"
"PO-Revision-Date: 2011-08-27 07:57+0000\n"
"Last-Translator: Kovid Goyal <Unknown>\n"
"Language-Team: Breton <brenux@free.fr>\n"
"Language: br\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"X-Launchpad-Export-Date: 2011-09-03 04:56+0000\n"
"X-Generator: Launchpad (build 13830)\n"
"Language: br\n"
#. name for aaa
msgid "Ghotuo"
@ -125,9 +127,8 @@ msgid "Ayta, Ambala"
msgstr "Ayta, Ambala"
#. name for abd
#, fuzzy
msgid "Manide"
msgstr "Bade"
msgstr ""
#. name for abe
msgid "Abnaki, Western"
@ -174,9 +175,8 @@ msgid "Abon"
msgstr "Abon"
#. name for abp
#, fuzzy
msgid "Ayta, Abellen"
msgstr "Ayta, Abenlen"
msgstr ""
#. name for abq
msgid "Abaza"
@ -599,9 +599,8 @@ msgid "Aguacateco"
msgstr "Aguakateko"
#. name for agv
#, fuzzy
msgid "Dumagat, Remontado"
msgstr "Agta, Remontado"
msgstr ""
#. name for agw
msgid "Kahua"
@ -1288,9 +1287,8 @@ msgid "Bukiyip"
msgstr "Bukiyip"
#. name for apf
#, fuzzy
msgid "Agta, Pahanan"
msgstr "Ayta, Bataan"
msgstr ""
#. name for apg
msgid "Ampanang"
@ -1325,9 +1323,8 @@ msgid "Apinayé"
msgstr "Apinaye"
#. name for apo
#, fuzzy
msgid "Ambul"
msgstr "Ambulas"
msgstr ""
#. name for app
msgid "Apma"
@ -1402,9 +1399,8 @@ msgid "Arhâ"
msgstr "Arhâ"
#. name for aqz
#, fuzzy
msgid "Akuntsu"
msgstr "Awutu"
msgstr ""
#. name for ara
msgid "Arabic"
@ -2003,9 +1999,8 @@ msgid "Ayta, Sorsogon"
msgstr "Ayta, Sorsogon"
#. name for ayt
#, fuzzy
msgid "Ayta, Magbukun"
msgstr "Ayta, Bataan"
msgstr ""
#. name for ayu
msgid "Ayu"
@ -2368,9 +2363,8 @@ msgid "Bade"
msgstr "Bade"
#. name for bdf
#, fuzzy
msgid "Biage"
msgstr "Bade"
msgstr ""
#. name for bdg
msgid "Bonggi"
@ -2641,9 +2635,8 @@ msgid "Bondo"
msgstr "Bondo"
#. name for bfx
#, fuzzy
msgid "Bantayanon"
msgstr "Bantawa"
msgstr ""
#. name for bfy
msgid "Bagheli"
@ -3110,9 +3103,8 @@ msgid "Bakumpai"
msgstr ""
#. name for bks
#, fuzzy
msgid "Sorsoganon, Northern"
msgstr "Bai, Kreisteiz"
msgstr ""
#. name for bkt
msgid "Boloki"
@ -3355,9 +3347,8 @@ msgid "Bookan"
msgstr ""
#. name for bnc
#, fuzzy
msgid "Bontok"
msgstr "Bondo"
msgstr ""
#. name for bnd
msgid "Banda (Indonesia)"
@ -4220,9 +4211,8 @@ msgid "Dibole"
msgstr ""
#. name for bvy
#, fuzzy
msgid "Baybayanon"
msgstr "Babango"
msgstr ""
#. name for bvz
msgid "Bauzi"
@ -4453,9 +4443,8 @@ msgid "Baygo"
msgstr ""
#. name for byh
#, fuzzy
msgid "Bhujel"
msgstr "Bhele"
msgstr ""
#. name for byi
msgid "Buyu"
@ -4794,9 +4783,8 @@ msgid "Cacua"
msgstr ""
#. name for cbw
#, fuzzy
msgid "Kinabalian"
msgstr "Ainbai"
msgstr ""
#. name for cby
msgid "Carabayo"
@ -6487,9 +6475,8 @@ msgid "Duma"
msgstr ""
#. name for dmb
#, fuzzy
msgid "Dogon, Mombo"
msgstr "One, Molmo"
msgstr ""
#. name for dmc
msgid "Dimir"
@ -7008,9 +6995,8 @@ msgid "Ebughu"
msgstr ""
#. name for ebk
#, fuzzy
msgid "Bontok, Eastern"
msgstr "Abnaki, Eastern"
msgstr ""
#. name for ebo
msgid "Teke-Ebo"
@ -7561,9 +7547,8 @@ msgid "Fars, Northwestern"
msgstr ""
#. name for fbl
#, fuzzy
msgid "Bikol, West Albay"
msgstr "Bikolano, Albay"
msgstr ""
#. name for fcs
msgid "Quebec Sign Language"
@ -7726,9 +7711,8 @@ msgid "French, Old (842-ca. 1400)"
msgstr ""
#. name for frp
#, fuzzy
msgid "Arpitan"
msgstr "Arta"
msgstr ""
#. name for frq
msgid "Forak"
@ -10823,9 +10807,8 @@ msgid "Ngile"
msgstr ""
#. name for jls
#, fuzzy
msgid "Jamaican Sign Language"
msgstr "Yezh ar sinoù Amerika"
msgstr ""
#. name for jma
msgid "Dima"
@ -13540,9 +13523,8 @@ msgid "Kpatili"
msgstr ""
#. name for kyn
#, fuzzy
msgid "Binukidnon, Northern"
msgstr "Bai, Kreisteiz"
msgstr ""
#. name for kyo
msgid "Kelon"
@ -13697,9 +13679,8 @@ msgid "Kalabra"
msgstr ""
#. name for laa
#, fuzzy
msgid "Subanen, Southern"
msgstr "Bai, Hanternoz"
msgstr ""
#. name for lab
msgid "Linear A"
@ -13834,9 +13815,8 @@ msgid "Ladakhi"
msgstr ""
#. name for lbk
#, fuzzy
msgid "Bontok, Central"
msgstr "Bicolano, Kreiz"
msgstr ""
#. name for lbl
msgid "Bikol, Libon"
@ -14567,9 +14547,8 @@ msgid "Lanoh"
msgstr ""
#. name for lni
#, fuzzy
msgid "Daantanai'"
msgstr "Babatana"
msgstr ""
#. name for lnj
msgid "Leningitij"
@ -14832,18 +14811,16 @@ msgid "Trinidad and Tobago Sign Language"
msgstr ""
#. name for lsy
#, fuzzy
msgid "Mauritian Sign Language"
msgstr "Yezh ar sinoù Aotrich"
msgstr ""
#. name for ltc
msgid "Chinese, Late Middle"
msgstr ""
#. name for ltg
#, fuzzy
msgid "Latgalian"
msgstr "Katalaneg"
msgstr ""
#. name for lti
msgid "Leti (Indonesia)"
@ -14974,9 +14951,8 @@ msgid "Lavukaleve"
msgstr ""
#. name for lvs
#, fuzzy
msgid "Latvian, Standard"
msgstr "Arabeg dre ziouer"
msgstr ""
#. name for lvu
msgid "Levuka"
@ -16127,9 +16103,8 @@ msgid "Kituba (Congo)"
msgstr ""
#. name for mkx
#, fuzzy
msgid "Manobo, Kinamiging"
msgstr "Manobo, Ata"
msgstr ""
#. name for mky
msgid "Makian, East"
@ -16476,9 +16451,8 @@ msgid "Morori"
msgstr ""
#. name for mom
#, fuzzy
msgid "Mangue"
msgstr "Mapudungun"
msgstr ""
#. name for mon
msgid "Mongolian"
@ -16829,9 +16803,8 @@ msgid "Maremgi"
msgstr ""
#. name for mry
#, fuzzy
msgid "Mandaya"
msgstr "Makayam"
msgstr ""
#. name for mrz
msgid "Marind"
@ -17026,9 +16999,8 @@ msgid "Asaro'o"
msgstr ""
#. name for mtw
#, fuzzy
msgid "Binukidnon, Southern"
msgstr "Bai, Hanternoz"
msgstr ""
#. name for mtx
msgid "Mixtec, Tidaá"
@ -18327,9 +18299,8 @@ msgid "Nias"
msgstr ""
#. name for nib
#, fuzzy
msgid "Nakame"
msgstr "Barama"
msgstr ""
#. name for nid
msgid "Ngandi"
@ -18892,9 +18863,8 @@ msgid "Noiri"
msgstr ""
#. name for noj
#, fuzzy
msgid "Nonuya"
msgstr "Nanubae"
msgstr ""
#. name for nok
msgid "Nooksack"
@ -18985,9 +18955,8 @@ msgid "Napu"
msgstr ""
#. name for nqg
#, fuzzy
msgid "Nago, Southern"
msgstr "Bai, Hanternoz"
msgstr ""
#. name for nqk
msgid "Ede Nago, Kura"
@ -19558,9 +19527,8 @@ msgid "Obispeño"
msgstr ""
#. name for obk
#, fuzzy
msgid "Bontok, Southern"
msgstr "Bai, Hanternoz"
msgstr ""
#. name for obl
msgid "Oblo"
@ -20523,9 +20491,8 @@ msgid "Pomo, Southern"
msgstr ""
#. name for pes
#, fuzzy
msgid "Persian, Iranian"
msgstr "Paranan"
msgstr ""
#. name for pev
msgid "Pémono"
@ -21564,9 +21531,8 @@ msgid "Poyanáwa"
msgstr ""
#. name for pys
#, fuzzy
msgid "Paraguayan Sign Language"
msgstr "Yezh ar sinoù Afganistan"
msgstr ""
#. name for pyu
msgid "Puyuma"
@ -21921,9 +21887,8 @@ msgid "Palaung, Rumai"
msgstr ""
#. name for rbk
#, fuzzy
msgid "Bontok, Northern"
msgstr "Bai, Kreisteiz"
msgstr ""
#. name for rbl
msgid "Bikol, Miraya"
@ -22890,9 +22855,8 @@ msgid "Irish, Old (to 900)"
msgstr ""
#. name for sgb
#, fuzzy
msgid "Ayta, Mag-antsi"
msgstr "Ayta, Bataan"
msgstr ""
#. name for sgc
msgid "Kipsigis"
@ -22959,9 +22923,8 @@ msgid "Sierra Leone Sign Language"
msgstr ""
#. name for sgy
#, fuzzy
msgid "Sanglechi"
msgstr "Balouchi"
msgstr ""
#. name for sgz
msgid "Sursurunga"
@ -23912,9 +23875,8 @@ msgid "Suruí"
msgstr ""
#. name for srv
#, fuzzy
msgid "Sorsoganon, Southern"
msgstr "Aymara, Hanternoz"
msgstr ""
#. name for srw
msgid "Serua"
@ -25009,9 +24971,8 @@ msgid "Tagalog"
msgstr ""
#. name for tgn
#, fuzzy
msgid "Tandaganon"
msgstr "Inabaknon"
msgstr ""
#. name for tgo
msgid "Sudest"
@ -26750,9 +26711,8 @@ msgid "Uma' Lung"
msgstr ""
#. name for ulw
#, fuzzy
msgid "Ulwa"
msgstr "Alawa"
msgstr ""
#. name for uma
msgid "Umatilla"
@ -27115,9 +27075,8 @@ msgid "Babar, Southeast"
msgstr ""
#. name for vbk
#, fuzzy
msgid "Bontok, Southwestern"
msgstr "Bai, Hanternoz"
msgstr ""
#. name for vec
msgid "Venetian"
@ -28352,9 +28311,8 @@ msgid "Breton, Middle"
msgstr "Krennvrezhoneg"
#. name for xbn
#, fuzzy
msgid "Kenaboi"
msgstr "Karnai"
msgstr ""
#. name for xbo
msgid "Bolgarian"
@ -28581,9 +28539,8 @@ msgid "Kalkoti"
msgstr ""
#. name for xkb
#, fuzzy
msgid "Nago, Northern"
msgstr "Bai, Kreisteiz"
msgstr ""
#. name for xkc
msgid "Kho'ini"
@ -28742,9 +28699,8 @@ msgid "Makhuwa-Marrevone"
msgstr ""
#. name for xmd
#, fuzzy
msgid "Mbudum"
msgstr "Buduma"
msgstr ""
#. name for xme
msgid "Median"
@ -29335,9 +29291,8 @@ msgid "Banda-Yangere"
msgstr ""
#. name for yak
#, fuzzy
msgid "Yakama"
msgstr "Barama"
msgstr ""
#. name for yal
msgid "Yalunka"
@ -29348,9 +29303,8 @@ msgid "Yamba"
msgstr ""
#. name for yan
#, fuzzy
msgid "Mayangna"
msgstr "Ginyanga"
msgstr ""
#. name for yao
msgid "Yao"
@ -29705,9 +29659,8 @@ msgid "Yaul"
msgstr ""
#. name for ylb
#, fuzzy
msgid "Yaleba"
msgstr "Baba"
msgstr ""
#. name for yle
msgid "Yele"
@ -29898,9 +29851,8 @@ msgid "Yombe"
msgstr ""
#. name for yon
#, fuzzy
msgid "Yongkom"
msgstr "Bobongko"
msgstr ""
#. name for yor
msgid "Yoruba"

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -6,20 +6,18 @@
msgid ""
msgstr ""
"Project-Id-Version: iso_639_3\n"
"Report-Msgid-Bugs-To: Debian iso-codes team <pkg-isocodes-devel@lists.alioth."
"debian.org>\n"
"POT-Creation-Date: 2011-05-27 14:59+0200\n"
"PO-Revision-Date: 2009-03-01 03:27-0600\n"
"Report-Msgid-Bugs-To: Debian iso-codes team <pkg-isocodes-"
"devel@lists.alioth.debian.org>\n"
"POT-Creation-Date: 2011-09-02 16:21+0000\n"
"PO-Revision-Date: 2011-08-27 03:17+0000\n"
"Last-Translator: Reşat SABIQ <tilde.birlik@gmail.com>\n"
"Language-Team: Crimean Tatar <tilde-birlik-tercime@lists.sourceforge.net>\n"
"Language: crh\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"X-Generator: KBabel 1.11.4\n"
"X-Launchpad-Export-Date: 2009-03-01 08:44+0000\n"
"X-Generator: Launchpad (build Unknown)\n"
"Plural-Forms: nplurals=2; plural=n != 1;\n"
"X-Launchpad-Export-Date: 2011-09-03 04:58+0000\n"
"X-Generator: Launchpad (build 13830)\n"
"Language: crh\n"
#. name for aaa
msgid "Ghotuo"
@ -126,9 +124,8 @@ msgid "Ayta, Ambala"
msgstr "Ayta, Ambala"
#. name for abd
#, fuzzy
msgid "Manide"
msgstr "Mander"
msgstr ""
#. name for abe
msgid "Abnaki, Western"
@ -175,9 +172,8 @@ msgid "Abon"
msgstr "Abon"
#. name for abp
#, fuzzy
msgid "Ayta, Abellen"
msgstr "Ayta, Abenlen"
msgstr ""
#. name for abq
msgid "Abaza"
@ -600,9 +596,8 @@ msgid "Aguacateco"
msgstr "Aguacateco"
#. name for agv
#, fuzzy
msgid "Dumagat, Remontado"
msgstr "Agta, Remontado"
msgstr ""
#. name for agw
msgid "Kahua"
@ -1289,9 +1284,8 @@ msgid "Bukiyip"
msgstr "Bukiyip"
#. name for apf
#, fuzzy
msgid "Agta, Pahanan"
msgstr "Ayta, Bataan"
msgstr ""
#. name for apg
msgid "Ampanang"
@ -1326,9 +1320,8 @@ msgid "Apinayé"
msgstr "Apinayé"
#. name for apo
#, fuzzy
msgid "Ambul"
msgstr "Ambulas"
msgstr ""
#. name for app
msgid "Apma"
@ -1379,9 +1372,8 @@ msgid "Archi"
msgstr "Archi"
#. name for aqd
#, fuzzy
msgid "Dogon, Ampari"
msgstr "Dogon, Jamsay"
msgstr ""
#. name for aqg
msgid "Arigidi"
@ -1404,9 +1396,8 @@ msgid "Arhâ"
msgstr "Arhâ"
#. name for aqz
#, fuzzy
msgid "Akuntsu"
msgstr "Awutu"
msgstr ""
#. name for ara
msgid "Arabic"
@ -2005,9 +1996,8 @@ msgid "Ayta, Sorsogon"
msgstr "Ayta, Sorsogon"
#. name for ayt
#, fuzzy
msgid "Ayta, Magbukun"
msgstr "Ayta, Mag-Indi"
msgstr ""
#. name for ayu
msgid "Ayu"
@ -2370,9 +2360,8 @@ msgid "Bade"
msgstr "Bade"
#. name for bdf
#, fuzzy
msgid "Biage"
msgstr "Bitare"
msgstr ""
#. name for bdg
msgid "Bonggi"
@ -2643,9 +2632,8 @@ msgid "Bondo"
msgstr "Bondo"
#. name for bfx
#, fuzzy
msgid "Bantayanon"
msgstr "Bantoanon"
msgstr ""
#. name for bfy
msgid "Bagheli"
@ -3112,9 +3100,8 @@ msgid "Bakumpai"
msgstr "Bakumpai"
#. name for bks
#, fuzzy
msgid "Sorsoganon, Northern"
msgstr "Sorsogon, Masbate"
msgstr ""
#. name for bkt
msgid "Boloki"
@ -3129,9 +3116,8 @@ msgid "Bekwarra"
msgstr "Bekwarra"
#. name for bkw
#, fuzzy
msgid "Bekwel"
msgstr "Bekwil"
msgstr ""
#. name for bkx
msgid "Baikeno"
@ -3358,9 +3344,8 @@ msgid "Bookan"
msgstr "Bookan"
#. name for bnc
#, fuzzy
msgid "Bontok"
msgstr "Bondo"
msgstr ""
#. name for bnd
msgid "Banda (Indonesia)"
@ -4223,9 +4208,8 @@ msgid "Dibole"
msgstr "Dibole"
#. name for bvy
#, fuzzy
msgid "Baybayanon"
msgstr "Babango"
msgstr ""
#. name for bvz
msgid "Bauzi"
@ -4456,9 +4440,8 @@ msgid "Baygo"
msgstr "Baygo"
#. name for byh
#, fuzzy
msgid "Bhujel"
msgstr "Bhele"
msgstr ""
#. name for byi
msgid "Buyu"
@ -4621,9 +4604,8 @@ msgid "Basa (Nigeria)"
msgstr ""
#. name for bzx
#, fuzzy
msgid "Bozo, Kɛlɛngaxo"
msgstr "Bozo, Hainyaxo"
msgstr ""
#. name for bzy
msgid "Obanliku"
@ -4798,9 +4780,8 @@ msgid "Cacua"
msgstr "Cacua"
#. name for cbw
#, fuzzy
msgid "Kinabalian"
msgstr "Kabardian"
msgstr ""
#. name for cby
msgid "Carabayo"
@ -6491,9 +6472,8 @@ msgid "Duma"
msgstr "Duma"
#. name for dmb
#, fuzzy
msgid "Dogon, Mombo"
msgstr "Dogon, Kolum So"
msgstr ""
#. name for dmc
msgid "Dimir"
@ -6688,9 +6668,8 @@ msgid "Dair"
msgstr "Dair"
#. name for drc
#, fuzzy
msgid "Minderico"
msgstr "Mindiri"
msgstr ""
#. name for drd
msgid "Darmiya"
@ -7013,9 +6992,8 @@ msgid "Ebughu"
msgstr "Ebughu"
#. name for ebk
#, fuzzy
msgid "Bontok, Eastern"
msgstr "Abnaki, Şarqiy"
msgstr ""
#. name for ebo
msgid "Teke-Ebo"
@ -7230,9 +7208,8 @@ msgid "Emplawas"
msgstr "Emplawas"
#. name for emx
#, fuzzy
msgid "Erromintxela"
msgstr "Edomite"
msgstr ""
#. name for emy
msgid "Mayan, Epigraphic"
@ -7567,9 +7544,8 @@ msgid "Fars, Northwestern"
msgstr ""
#. name for fbl
#, fuzzy
msgid "Bikol, West Albay"
msgstr "Bicolano, Albay"
msgstr ""
#. name for fcs
msgid "Quebec Sign Language"
@ -7732,9 +7708,8 @@ msgid "French, Old (842-ca. 1400)"
msgstr ""
#. name for frp
#, fuzzy
msgid "Arpitan"
msgstr "Arta"
msgstr ""
#. name for frq
msgid "Forak"
@ -10401,9 +10376,8 @@ msgid "Nkem-Nkum"
msgstr "Nkem-Nkum"
#. name for isk
#, fuzzy
msgid "Ishkashimi"
msgstr "Sanglechi-Ishkashimi"
msgstr ""
#. name for isl
msgid "Icelandic"
@ -10830,9 +10804,8 @@ msgid "Ngile"
msgstr "Ngile"
#. name for jls
#, fuzzy
msgid "Jamaican Sign Language"
msgstr "Zambiyalı İşaret Tili"
msgstr ""
#. name for jma
msgid "Dima"
@ -11215,9 +11188,8 @@ msgid "Keliko"
msgstr "Keliko"
#. name for kbp
#, fuzzy
msgid "Kabiyè"
msgstr "Kabiyé"
msgstr ""
#. name for kbq
msgid "Kamano"
@ -12192,9 +12164,8 @@ msgid "Kendeje"
msgstr "Kendeje"
#. name for klg
#, fuzzy
msgid "Tagakaulo"
msgstr "Tagalog"
msgstr ""
#. name for klh
msgid "Weliki"
@ -12217,9 +12188,8 @@ msgid "Kalagan, Kagan"
msgstr "Kalagan, Kagan"
#. name for klm
#, fuzzy
msgid "Migum"
msgstr "Mogum"
msgstr ""
#. name for kln
msgid "Kalenjin"
@ -12286,9 +12256,8 @@ msgid "Dong, Southern"
msgstr ""
#. name for kmd
#, fuzzy
msgid "Kalinga, Majukayang"
msgstr "Kalinga, Madukayang"
msgstr ""
#. name for kme
msgid "Bakole"
@ -13551,9 +13520,8 @@ msgid "Kpatili"
msgstr "Kpatili"
#. name for kyn
#, fuzzy
msgid "Binukidnon, Northern"
msgstr "Özbekçe, Şimaliy"
msgstr ""
#. name for kyo
msgid "Kelon"
@ -13708,9 +13676,8 @@ msgid "Kalabra"
msgstr "Kalabra"
#. name for laa
#, fuzzy
msgid "Subanen, Southern"
msgstr "Özbekçe, Cenübiy"
msgstr ""
#. name for lab
msgid "Linear A"
@ -13849,9 +13816,8 @@ msgid "Bontok, Central"
msgstr ""
#. name for lbl
#, fuzzy
msgid "Bikol, Libon"
msgstr "Biao Mon"
msgstr ""
#. name for lbm
msgid "Lodhi"
@ -14578,9 +14544,8 @@ msgid "Lanoh"
msgstr "Lanoh"
#. name for lni
#, fuzzy
msgid "Daantanai'"
msgstr "Lantanai"
msgstr ""
#. name for lnj
msgid "Leningitij"
@ -14843,18 +14808,16 @@ msgid "Trinidad and Tobago Sign Language"
msgstr "Trinidad Tobago İşaret Tili"
#. name for lsy
#, fuzzy
msgid "Mauritian Sign Language"
msgstr "Maritime İşaret Tili"
msgstr ""
#. name for ltc
msgid "Chinese, Late Middle"
msgstr ""
#. name for ltg
#, fuzzy
msgid "Latgalian"
msgstr "Katalanca"
msgstr ""
#. name for lti
msgid "Leti (Indonesia)"
@ -14985,9 +14948,8 @@ msgid "Lavukaleve"
msgstr "Lavukaleve"
#. name for lvs
#, fuzzy
msgid "Latvian, Standard"
msgstr "Letonyalı İşaret Tili"
msgstr ""
#. name for lvu
msgid "Levuka"
@ -15378,9 +15340,8 @@ msgid "Massalat"
msgstr "Massalat"
#. name for mdh
#, fuzzy
msgid "Maguindanaon"
msgstr "Maguindanao"
msgstr ""
#. name for mdi
msgid "Mamvu"
@ -16139,9 +16100,8 @@ msgid "Kituba (Congo)"
msgstr "Kituba (Kongo)"
#. name for mkx
#, fuzzy
msgid "Manobo, Kinamiging"
msgstr "Manobo, Cinamiguin"
msgstr ""
#. name for mky
msgid "Makian, East"
@ -16488,9 +16448,8 @@ msgid "Morori"
msgstr "Morori"
#. name for mom
#, fuzzy
msgid "Mangue"
msgstr "Mang"
msgstr ""
#. name for mon
msgid "Mongolian"
@ -16769,9 +16728,8 @@ msgid "Elseng"
msgstr "Elseng"
#. name for mrg
#, fuzzy
msgid "Mising"
msgstr "Maisin"
msgstr ""
#. name for mrh
msgid "Chin, Mara"
@ -16842,9 +16800,8 @@ msgid "Maremgi"
msgstr "Maremgi"
#. name for mry
#, fuzzy
msgid "Mandaya"
msgstr "Mandara"
msgstr ""
#. name for mrz
msgid "Marind"
@ -17039,9 +16996,8 @@ msgid "Asaro'o"
msgstr "Asaro'o"
#. name for mtw
#, fuzzy
msgid "Binukidnon, Southern"
msgstr "Sesotho"
msgstr ""
#. name for mtx
msgid "Mixtec, Tidaá"
@ -18340,9 +18296,8 @@ msgid "Nias"
msgstr "Nias"
#. name for nib
#, fuzzy
msgid "Nakame"
msgstr "Nakama"
msgstr ""
#. name for nid
msgid "Ngandi"
@ -18905,9 +18860,8 @@ msgid "Noiri"
msgstr "Noiri"
#. name for noj
#, fuzzy
msgid "Nonuya"
msgstr "Nkonya"
msgstr ""
#. name for nok
msgid "Nooksack"
@ -18998,9 +18952,8 @@ msgid "Napu"
msgstr "Napu"
#. name for nqg
#, fuzzy
msgid "Nago, Southern"
msgstr "Sesotho"
msgstr ""
#. name for nqk
msgid "Ede Nago, Kura"
@ -19067,9 +19020,8 @@ msgid "Kalapuya, Northern"
msgstr ""
#. name for nru
#, fuzzy
msgid "Narua"
msgstr "Sarua"
msgstr ""
#. name for nrx
msgid "Ngurmbur"
@ -19272,9 +19224,8 @@ msgid "Niuafo'ou"
msgstr "Niuafo'ou"
#. name for nun
#, fuzzy
msgid "Anong"
msgstr "Konongo"
msgstr ""
#. name for nuo
msgid "Nguôn"
@ -19549,9 +19500,8 @@ msgid "Nzakambay"
msgstr "Nzakambay"
#. name for nzz
#, fuzzy
msgid "Dogon, Nanga Dama"
msgstr "Dogon, Yanda Dom"
msgstr ""
#. name for oaa
msgid "Orok"
@ -19574,9 +19524,8 @@ msgid "Obispeño"
msgstr "Obispeño"
#. name for obk
#, fuzzy
msgid "Bontok, Southern"
msgstr "Sesotho"
msgstr ""
#. name for obl
msgid "Oblo"
@ -20539,9 +20488,8 @@ msgid "Pomo, Southern"
msgstr ""
#. name for pes
#, fuzzy
msgid "Persian, Iranian"
msgstr "Paranan"
msgstr ""
#. name for pev
msgid "Pémono"
@ -20588,9 +20536,8 @@ msgid "Rerep"
msgstr "Rerep"
#. name for pgl
#, fuzzy
msgid "Irish, Primitive"
msgstr "Türkçe, Qırım; Qırımtatarca"
msgstr ""
#. name for pgn
msgid "Paelignian"
@ -21293,9 +21240,8 @@ msgid "Puri"
msgstr "Puri"
#. name for prs
#, fuzzy
msgid "Persian, Afghan"
msgstr "Farsça"
msgstr ""
#. name for prt
msgid "Phai"
@ -21582,9 +21528,8 @@ msgid "Poyanáwa"
msgstr "Poyanáwa"
#. name for pys
#, fuzzy
msgid "Paraguayan Sign Language"
msgstr "Uruguaylı İşaret Tili"
msgstr ""
#. name for pyu
msgid "Puyuma"
@ -21939,14 +21884,12 @@ msgid "Palaung, Rumai"
msgstr "Palaung, Rumai"
#. name for rbk
#, fuzzy
msgid "Bontok, Northern"
msgstr "Sesotho"
msgstr ""
#. name for rbl
#, fuzzy
msgid "Bikol, Miraya"
msgstr "Samo, Maya"
msgstr ""
#. name for rcf
msgid "Creole French, Réunion"
@ -22909,9 +22852,8 @@ msgid "Irish, Old (to 900)"
msgstr ""
#. name for sgb
#, fuzzy
msgid "Ayta, Mag-antsi"
msgstr "Ayta, Mag-Indi"
msgstr ""
#. name for sgc
msgid "Kipsigis"
@ -22958,9 +22900,8 @@ msgid "Sangisari"
msgstr "Sangisari"
#. name for sgs
#, fuzzy
msgid "Samogitian"
msgstr "Samoan"
msgstr ""
#. name for sgt
msgid "Brokpake"
@ -22979,9 +22920,8 @@ msgid "Sierra Leone Sign Language"
msgstr "Sierra Leone İşaret Tili"
#. name for sgy
#, fuzzy
msgid "Sanglechi"
msgstr "Shangzhai"
msgstr ""
#. name for sgz
msgid "Sursurunga"
@ -23000,9 +22940,8 @@ msgid "Sonde"
msgstr "Sonde"
#. name for shd
#, fuzzy
msgid "Kundal Shahi"
msgstr "Kudmali"
msgstr ""
#. name for she
msgid "Sheko"
@ -23933,9 +23872,8 @@ msgid "Suruí"
msgstr "Suruí"
#. name for srv
#, fuzzy
msgid "Sorsoganon, Southern"
msgstr "Sesotho"
msgstr ""
#. name for srw
msgid "Serua"
@ -23966,9 +23904,8 @@ msgid "Siroi"
msgstr "Siroi"
#. name for sse
#, fuzzy
msgid "Sama, Bangingih"
msgstr "Sama, Pangutaran"
msgstr ""
#. name for ssf
msgid "Thao"
@ -24799,9 +24736,8 @@ msgid "Tai Nüa"
msgstr "Tai Nüa"
#. name for tde
#, fuzzy
msgid "Dogon, Tiranige Diga"
msgstr "Dogon, Tene Kan"
msgstr ""
#. name for tdf
msgid "Talieng"
@ -25032,9 +24968,8 @@ msgid "Tagalog"
msgstr "Tagalog"
#. name for tgn
#, fuzzy
msgid "Tandaganon"
msgstr "Tondano"
msgstr ""
#. name for tgo
msgid "Sudest"
@ -25317,9 +25252,8 @@ msgid "Tukumanféd"
msgstr "Tukumanféd"
#. name for tkg
#, fuzzy
msgid "Malagasy, Tesaka"
msgstr "Malagasy, Bara"
msgstr ""
#. name for tkl
msgid "Tokelau"
@ -26774,9 +26708,8 @@ msgid "Uma' Lung"
msgstr "Uma' Lung"
#. name for ulw
#, fuzzy
msgid "Ulwa"
msgstr "Ukwa"
msgstr ""
#. name for uma
msgid "Umatilla"
@ -27139,9 +27072,8 @@ msgid "Babar, Southeast"
msgstr ""
#. name for vbk
#, fuzzy
msgid "Bontok, Southwestern"
msgstr "Sesotho"
msgstr ""
#. name for vec
msgid "Venetian"
@ -28376,9 +28308,8 @@ msgid "Breton, Middle"
msgstr ""
#. name for xbn
#, fuzzy
msgid "Kenaboi"
msgstr "Kenati"
msgstr ""
#. name for xbo
msgid "Bolgarian"
@ -28605,9 +28536,8 @@ msgid "Kalkoti"
msgstr "Kalkoti"
#. name for xkb
#, fuzzy
msgid "Nago, Northern"
msgstr "Naga, Nocte"
msgstr ""
#. name for xkc
msgid "Kho'ini"
@ -28766,9 +28696,8 @@ msgid "Makhuwa-Marrevone"
msgstr "Makhuwa-Marrevone"
#. name for xmd
#, fuzzy
msgid "Mbudum"
msgstr "Buduma"
msgstr ""
#. name for xme
msgid "Median"
@ -28835,9 +28764,8 @@ msgid "Kamu"
msgstr "Kamu"
#. name for xmv
#, fuzzy
msgid "Malagasy, Tankarana"
msgstr "Malagasy, Antankarana"
msgstr ""
#. name for xmw
msgid "Malagasy, Tsimihety"
@ -28888,9 +28816,8 @@ msgid "Kanashi"
msgstr "Kanashi"
#. name for xnt
#, fuzzy
msgid "Narragansett"
msgstr "Karagas"
msgstr ""
#. name for xoc
msgid "O'chi'chi'"
@ -29361,9 +29288,8 @@ msgid "Banda-Yangere"
msgstr "Banda-Yangere"
#. name for yak
#, fuzzy
msgid "Yakama"
msgstr "Nakama"
msgstr ""
#. name for yal
msgid "Yalunka"
@ -29374,9 +29300,8 @@ msgid "Yamba"
msgstr "Yamba"
#. name for yan
#, fuzzy
msgid "Mayangna"
msgstr "Mayaguduna"
msgstr ""
#. name for yao
msgid "Yao"
@ -29731,9 +29656,8 @@ msgid "Yaul"
msgstr "Yaul"
#. name for ylb
#, fuzzy
msgid "Yaleba"
msgstr "Yareba"
msgstr ""
#. name for yle
msgid "Yele"
@ -29924,9 +29848,8 @@ msgid "Yombe"
msgstr "Yombe"
#. name for yon
#, fuzzy
msgid "Yongkom"
msgstr "Yonggom"
msgstr ""
#. name for yor
msgid "Yoruba"
@ -30777,9 +30700,8 @@ msgid "Mirgan"
msgstr "Mirgan"
#. name for zrn
#, fuzzy
msgid "Zerenkel"
msgstr "Zirenkel"
msgstr ""
#. name for zro
msgid "Záparo"

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

Some files were not shown because too many files have changed in this diff Show More