mirror of
https://github.com/kovidgoyal/calibre.git
synced 2025-07-09 03:04:10 -04:00
Start documenting the new function mode
This commit is contained in:
parent
c983f4f999
commit
2349f4507a
@ -247,6 +247,13 @@ that you can apply. You can even select multiple entries in the list by holding
|
|||||||
down the Ctrl Key while clicking so as to run multiple search and replace
|
down the Ctrl Key while clicking so as to run multiple search and replace
|
||||||
expressions in a single operation.
|
expressions in a single operation.
|
||||||
|
|
||||||
|
Function mode
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
Function mode allows you to write arbitrarily powerful python functions that
|
||||||
|
are run on every Find/replace. You can do pretty much any text manipulation you
|
||||||
|
like in function mode. For more information, see :doc:`function_mode`.
|
||||||
|
|
||||||
Automated tools
|
Automated tools
|
||||||
-------------------
|
-------------------
|
||||||
|
|
||||||
@ -693,3 +700,7 @@ particularly useful to directly create EPUB files from your own hand-edited
|
|||||||
HTML files. You can do this via :guilabel:`File->Import an HTML or DOCX file as
|
HTML files. You can do this via :guilabel:`File->Import an HTML or DOCX file as
|
||||||
a new book`.
|
a new book`.
|
||||||
|
|
||||||
|
.. toctree::
|
||||||
|
:hidden:
|
||||||
|
|
||||||
|
function_mode
|
||||||
|
110
manual/function_mode.rst
Normal file
110
manual/function_mode.rst
Normal file
@ -0,0 +1,110 @@
|
|||||||
|
Function Mode for Search & Replace in the Editor
|
||||||
|
=======================================================================
|
||||||
|
|
||||||
|
The Search & Replace tool in the editor support a *function mode*. In this
|
||||||
|
mode, you can combine regular expressions (see :doc:`regexp`) with
|
||||||
|
arbitrarily powerful python functions to do all sorts of advanced text
|
||||||
|
processing.
|
||||||
|
|
||||||
|
In the standard *regexp* mode for search and replace, you specify both a
|
||||||
|
regular expression to search for as well as a template that is used to replace
|
||||||
|
all found matches. In function mode, instead of using a fixed template, you
|
||||||
|
specify an arbitrary function, in the
|
||||||
|
`python programming language <https://docs.python.org/2.7/>`_. This allows
|
||||||
|
you to do lots of things that are not possible with simple templates.
|
||||||
|
|
||||||
|
Techniques for using function mode and the syntax will be described by means of
|
||||||
|
examples, showing you how to create functions to perform progressively more
|
||||||
|
complex tasks.
|
||||||
|
|
||||||
|
|
||||||
|
.. image:: images/function_replace.png
|
||||||
|
:alt: The Function mode
|
||||||
|
:align: center
|
||||||
|
|
||||||
|
Automatically fixing the case of headings in the document
|
||||||
|
-------------------------------------------------------------
|
||||||
|
|
||||||
|
Here, we will leverage one of the builtin functions in the editor to
|
||||||
|
automatically change the case of all text inside heading tags to title case::
|
||||||
|
|
||||||
|
Find expression: <[Hh][1-6][^>]*>([^<>]+)</[hH][1-6]>
|
||||||
|
|
||||||
|
For the function, simply choose the :guilabel:`Title-case text` builtin
|
||||||
|
function. The will change titles that look like: ``<h1>some TITLE</h1>`` to
|
||||||
|
``<h1>Some Title</h1>``.
|
||||||
|
|
||||||
|
|
||||||
|
Your first custom function - smartening hyphens
|
||||||
|
------------------------------------------------------------------
|
||||||
|
|
||||||
|
The real power of function mode comes from being able to create your own
|
||||||
|
functions to process text in arbitrary ways. The Smarten Punctuation tool in
|
||||||
|
the editor leaves individual hyphens alone, so you can use the this function to
|
||||||
|
replace them with em-dashes.
|
||||||
|
|
||||||
|
To create a new function, simply click the Create/Edit button to create a new
|
||||||
|
function and copy the python code from below.
|
||||||
|
|
||||||
|
.. code-block:: python
|
||||||
|
|
||||||
|
def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs):
|
||||||
|
return match.group().replace('--', '—').replace('-', '—')
|
||||||
|
|
||||||
|
Every Search & Replace custom function must have a unique name and consist of a
|
||||||
|
python function named replace, that accepts all the arguments shown above.
|
||||||
|
For the moment, we wont worry about all the different arguments to
|
||||||
|
``replace()`` function. Just focus on the ``match`` argument. It represents a
|
||||||
|
match when running a search and replace. Its full documentation in available
|
||||||
|
`here <https://docs.python.org/2.7/library/re.html#match-objects>`_.
|
||||||
|
``match.group()`` simply returns all the matched text and all we do is replace
|
||||||
|
hyphens in that text with em-dashes, first replacing double hyphens and
|
||||||
|
then single hyphens.
|
||||||
|
|
||||||
|
Use this function with the find regular expression::
|
||||||
|
|
||||||
|
>[^<>]+<
|
||||||
|
|
||||||
|
And it will replace all hyphens with em-dashes, but only in actual text and not
|
||||||
|
inside HTML tag definitions.
|
||||||
|
|
||||||
|
|
||||||
|
The power of function mode - using a spelling dictionary to fix mis-hyphenated words
|
||||||
|
----------------------------------------------------------------------------------------
|
||||||
|
|
||||||
|
Often, ebooks created from scans of printed books contain mis-hyphenated words
|
||||||
|
-- words that were split at the end of the line on the printed page. We will
|
||||||
|
write a simple function to automatically find and fix such words.
|
||||||
|
|
||||||
|
.. code-block:: python
|
||||||
|
|
||||||
|
import regex
|
||||||
|
|
||||||
|
def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs):
|
||||||
|
|
||||||
|
def replace_word(wmatch):
|
||||||
|
# Try to remove the hyphen and replace the words if the resulting
|
||||||
|
# hyphen free word is recognized by the dictionary
|
||||||
|
without_hyphen = wmatch.group(1) + wmatch.group(2)
|
||||||
|
if dictionaries.recognized(without_hyphen):
|
||||||
|
return without_hyphen
|
||||||
|
return wmatch.group()
|
||||||
|
|
||||||
|
# Search for words split by a hyphen
|
||||||
|
return regex.sub(r'(\w+)\s*-\s*(\w+)', replace_word, match.group(), flags=regex.VERSION1 | regex.UNICODE)
|
||||||
|
|
||||||
|
Use this function with the same find expressions as before, namely::
|
||||||
|
|
||||||
|
>[^<>]+<
|
||||||
|
|
||||||
|
And it will magically fix all mis-hyphenated words in the text of the book. The
|
||||||
|
main trick is to use one of the useful extra arguments to the replace function,
|
||||||
|
``dictionaries``. This refers to the dictionaries the editor itself uses to
|
||||||
|
spell check text in the book. What this function does is look for words
|
||||||
|
separated by a hyphen, remove the hyphen and check if the dictionary recognizes
|
||||||
|
the composite word, if it does, the original words are replaced by the hyphen
|
||||||
|
free composite word.
|
||||||
|
|
||||||
|
Note that one limitation of this technique is it will only work for
|
||||||
|
mono-lingual books, because, by default, ``dictionaries.recognized()`` uses the
|
||||||
|
main language of the book.
|
BIN
manual/images/function_replace.png
Normal file
BIN
manual/images/function_replace.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 12 KiB |
Loading…
x
Reference in New Issue
Block a user