mirror of
https://github.com/kovidgoyal/calibre.git
synced 2025-07-09 03:04:10 -04:00
This commit is contained in:
parent
7595b2a6f3
commit
bd1daa67a7
@ -3,7 +3,7 @@
|
|||||||
|
|
||||||
__license__ = 'GPL v3'
|
__license__ = 'GPL v3'
|
||||||
__copyright__ = '2008, Kovid Goyal <kovid at kovidgoyal.net>'
|
__copyright__ = '2008, Kovid Goyal <kovid at kovidgoyal.net>'
|
||||||
import shutil, sys, os, inspect, re
|
import sys, os, inspect, re
|
||||||
from sphinx.builder import StandaloneHTMLBuilder, bold
|
from sphinx.builder import StandaloneHTMLBuilder, bold
|
||||||
from sphinx.util import rpartition
|
from sphinx.util import rpartition
|
||||||
from sphinx.ext.autodoc import get_module_charset, prepare_docstring
|
from sphinx.ext.autodoc import get_module_charset, prepare_docstring
|
||||||
|
@ -89,6 +89,11 @@ Content From The Web
|
|||||||
:depth: 1
|
:depth: 1
|
||||||
:local:
|
:local:
|
||||||
|
|
||||||
|
I obtained a recipe for a news site as a .py file from somewhere, how do I use it?
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
Start the :guilabel:`Add custom news sources` dialog (from the :guilabel:`Fetch news` menu) and click the :guilabel:`Switch to advanced mode` button. Delete everything in the box with the recipe source code and copy paste the contents of your .py file into the box. Click :guilabel:`Add/update recipe`.
|
||||||
|
|
||||||
|
|
||||||
I want |app| to download news from my favorite news website.
|
I want |app| to download news from my favorite news website.
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
If you are reasonably proficient with computers, you can teach |app| to download news from any website of your choosing. To learn how to do this see :ref:`news`.
|
If you are reasonably proficient with computers, you can teach |app| to download news from any website of your choosing. To learn how to do this see :ref:`news`.
|
||||||
|
@ -152,25 +152,50 @@ We see several new features in this :term:`recipe`. First, we have::
|
|||||||
This sets the displayed time on the front page of the created e-book to be in the format,
|
This sets the displayed time on the front page of the created e-book to be in the format,
|
||||||
``Day, Day_Number Month, Year``. See :attr:`timefmt <libprs500.web.feeds.news.BasicNewsRecipe.timefmt>`.
|
``Day, Day_Number Month, Year``. See :attr:`timefmt <libprs500.web.feeds.news.BasicNewsRecipe.timefmt>`.
|
||||||
|
|
||||||
|
Then we see a group of directives to cleanup the downloaded :term:`HTML`::
|
||||||
|
|
||||||
|
remove_tags_before = dict(name='h1')
|
||||||
|
remove_tags_after = dict(id='footer')
|
||||||
|
remove_tags = ...
|
||||||
|
|
||||||
|
These remove everything before the first ``<h1>`` tag and everything after the first tag whose id is ``footer``. See :attr:`remove_tags <libprs500.web.feeds.news.BasicNewsRecipe.remove_tags>`, :attr:`remove_tags_before <libprs500.web.feeds.news.BasicNewsRecipe.remove_tags_before>`, :attr:`remove_tags_after <libprs500.web.feeds.news.BasicNewsRecipe.remove_tags_after>`.
|
||||||
|
|
||||||
|
The next interesting feature is::
|
||||||
|
|
||||||
|
needs_subscription = True
|
||||||
|
...
|
||||||
|
def get_growser(self):
|
||||||
|
...
|
||||||
|
|
||||||
|
``needs_subscription = True`` tells |app| that this recipe needs a username and password in order to access the content. This causes, |app| to ask for a username and password whenever you try to use this recipe. The code in :meth:`libprs500.web.feeds.news.BasicNewsRecipe.get_browser` actually does the login into the NYT website. Once logged in, |app| will use the same, logged in, browser instance to fetch all content. See `mechanize <http://wwwsearch.sourceforge.net/mechanize/>`_ to understand the code in ``get_browser``.
|
||||||
|
|
||||||
|
The last new feature is the :meth:`libprs500.web.feeds.news.BasicNewsRecipe.parse_index` method. Its job is to go to http://www.nytimes.com/pages/todayspaper/index.html and fetch the list of articles that appear in *todays* paper. While more complex than simply using :term:`RSS`, the recipe creates an e-book that corresponds very closely to the days paper. ``parse_index`` makes heavy use of `BeautifulSoup <http://www.crummy.com/software/BeautifulSoup/documentation.html>`_ to parse the daily paper webpage.
|
||||||
|
|
||||||
Tips for developing new recipes
|
Tips for developing new recipes
|
||||||
---------------------------------
|
---------------------------------
|
||||||
|
|
||||||
The best way to develop new recipes is to use the command line interface. Create the recipe using your favorite python editor and save it to a file say :file:`myrecipe.py`. You can download content using this recipe with the command:
|
The best way to develop new recipes is to use the command line interface. Create the recipe using your favorite python editor and save it to a file say :file:`myrecipe.py`. You can download content using this recipe with the command::
|
||||||
|
|
||||||
:command:`feeds2disk` :option:`--debug` :option:`--test` myrecipe.py
|
feeds2disk --debug --test myrecipe.py
|
||||||
|
|
||||||
The :command:`feeds2disk` will download all the webpages and save them to the current directory. The :option:`--debug` makes feeds2disk spit out a lot of information about what it is doing. The :option:`--test` makes it download only a couple of articles from at most two feeds.
|
The :command:`feeds2disk` will download all the webpages and save them to the current directory. The :option:`--debug` makes feeds2disk spit out a lot of information about what it is doing. The :option:`--test` makes it download only a couple of articles from at most two feeds.
|
||||||
|
|
||||||
Once the download is complete, you can look at the downloaded :term:`HTML` by opening the file :file:`index.html` in a browser. Once you're satisfied that the download and preprocessing is happening correctly, you can generate an LRF ebook with the command
|
Once the download is complete, you can look at the downloaded :term:`HTML` by opening the file :file:`index.html` in a browser. Once you're satisfied that the download and preprocessing is happening correctly, you can generate an LRF ebook with the command::
|
||||||
|
|
||||||
:command:`html2lrf` :option:`--use-spine` :option:`--page-break-before` "$" index.html
|
html2lrf --use-spine --page-break-before "$" index.html
|
||||||
|
|
||||||
If the generated :term:`LRF` looks good, you can finally, run
|
If the generated :term:`LRF` looks good, you can finally, run::
|
||||||
|
|
||||||
:command:`feeds2lrf` myrecipe.py
|
feeds2lrf myrecipe.py
|
||||||
|
|
||||||
to see the final :term:`LRF` format e-book generated from your recipe. If you're satisfied with your recipe, consider attaching it to `the wiki <http://libprs500.kovidgoyal.net/wiki/UserRecipes>`_, so that others can use it as well. If you feel there is enough demand to justify its inclusion into the set of built-in recipes, add a comment to the ticket http://libprs500.kovidgoyal.net/ticket/405
|
to see the final :term:`LRF` format e-book generated from your recipe. If you're satisfied with your recipe, consider attaching it to `the wiki <http://libprs500.kovidgoyal.net/wiki/UserRecipes>`_, so that others can use it as well. If you feel there is enough demand to justify its inclusion into the set of built-in recipes, add a comment to the ticket http://libprs500.kovidgoyal.net/ticket/405
|
||||||
|
|
||||||
|
|
||||||
|
If you just want to quickly test a couple of feeds, you can use the :option:`--feeds` option::
|
||||||
|
|
||||||
|
feeds2disk --feeds "['http://feeds.newsweek.com/newsweek/TopNews', 'http://feeds.newsweek.com/headlines/politics']"
|
||||||
|
|
||||||
|
|
||||||
.. seealso::
|
.. seealso::
|
||||||
|
|
||||||
:ref:`feeds2disk`
|
:ref:`feeds2disk`
|
||||||
|
Loading…
x
Reference in New Issue
Block a user