News downloads: When getting an article URL from a RSS feed, look first for an original article link. This speeds up the download of news services that use a syndication service like feedburner or pheedo to publish their RSS feeds.

This commit is contained in:
Kovid Goyal 2010-01-21 11:28:45 -07:00
parent 69c10e202c
commit 3e69d4c2aa

View File

@ -357,9 +357,17 @@ class BasicNewsRecipe(Recipe):
Override in a subclass to customize extraction of the :term:`URL` that points Override in a subclass to customize extraction of the :term:`URL` that points
to the content for each article. Return the to the content for each article. Return the
article URL. It is called with `article`, an object representing a parsed article article URL. It is called with `article`, an object representing a parsed article
from a feed. See `feedsparser <http://www.feedparser.org/docs/>`_. from a feed. See `feedparser <http://www.feedparser.org/docs/>`_.
By default it returns `article.link <http://www.feedparser.org/docs/reference-entry-link.html>`_. By default it looks for the original link (for feeds syndicated via a
service like feedburner or pheedo) and if found,
returns that or else returns
`article.link <http://www.feedparser.org/docs/reference-entry-link.html>`_.
''' '''
for key in article.keys():
if key.endswith('_origlink'):
url = article[key]
if url and url.startswith('http://'):
return url
return article.get('link', None) return article.get('link', None)
def preprocess_html(self, soup): def preprocess_html(self, soup):