String changes

This commit is contained in:
Kovid Goyal 2017-12-15 19:42:20 +05:30
parent cfba208bc3
commit 5c9e8dce11
No known key found for this signature in database
GPG Key ID: 06BC317B515ACE7C

View File

@ -14,8 +14,8 @@ used throughout the rest of calibre.
Character classes Character classes
------------------ ------------------
Character classes are useful to represent differet groups of characters, Character classes are useful to represent different groups of characters,
succintly. succinctly.
Examples: Examples:
@ -72,7 +72,7 @@ Shorthand character classes
| ``\S`` | Any “non-whitespace” character | | ``\S`` | Any “non-whitespace” character |
| | | | | |
+---------------------+----------------------------------------------------------------------------------------------------------------------------------------------+ +---------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
| ``.`` | Any character except newline. Use the “dot all” checkbox or the ``(?s)`` regex modifier to include the newline character. | | ``.`` | Any character except newline. Use the “dot all” checkbox or the ``(?s)`` regexp modifier to include the newline character. |
| | | | | |
+---------------------+----------------------------------------------------------------------------------------------------------------------------------------------+ +---------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
@ -122,7 +122,7 @@ Alternation
----------- -----------
The ``|`` character in a regular expression is a logical ``OR``. It means The ``|`` character in a regular expression is a logical ``OR``. It means
that either the preceeding or the follwoing expression can match. that either the preceding or the following expression can match.
Exclusion Exclusion
--------- ---------
@ -164,7 +164,7 @@ character. The most useful anchors for text processing are:
``\K`` ``\K``
Resets the start position of the selection to its position in the pattern. Resets the start position of the selection to its position in the pattern.
Some regex engines (but not calibre) do not allow lookbehind of variable Some regexp engines (but not calibre) do not allow lookbehind of variable
length, especially with quantifiers. When you can use ``\K`` with these length, especially with quantifiers. When you can use ``\K`` with these
engines, it also allows you to get rid of this limit by writing the engines, it also allows you to get rid of this limit by writing the
equivalent of a positive lookbehind of variable length. equivalent of a positive lookbehind of variable length.
@ -181,7 +181,7 @@ Groups
Group that does not capture the selection Group that does not capture the selection
``(?>expression)`` ``(?>expression)``
Atomic Group: As soon as the expression is satisfied, the regex engine Atomic Group: As soon as the expression is satisfied, the regexp engine
passes, and if the rest of the pattern fails, it will not backtrack to passes, and if the rest of the pattern fails, it will not backtrack to
try other combinations with the expression. Atomic groups do not try other combinations with the expression. Atomic groups do not
capture. capture.
@ -218,7 +218,7 @@ Lookarounds
Lookaheads and lookbehinds do not consume characters, they are zero length and Lookaheads and lookbehinds do not consume characters, they are zero length and
do not capture. They are atomic groups: as soon as the assertion is satisfied, do not capture. They are atomic groups: as soon as the assertion is satisfied,
the regex engine passes, and if the rest of the pattern fails, it will not the regexp engine passes, and if the rest of the pattern fails, it will not
backtrack inside the lookaround to try other combinations. backtrack inside the lookaround to try other combinations.
When looking for multiple matches in a string, at the starting position of each When looking for multiple matches in a string, at the starting position of each
@ -227,7 +227,7 @@ position. Therefore, on the string 123, the pattern ``(?<=\d)\d`` (a digit prece
by a digit) should, in theory, select 2 and 3. On the other hand, ``\d\K\d`` can by a digit) should, in theory, select 2 and 3. On the other hand, ``\d\K\d`` can
only select 2, because the starting position after the first selection is only select 2, because the starting position after the first selection is
immediately before 3, and there are not enough digits for a second match. immediately before 3, and there are not enough digits for a second match.
Similarly, ``\d(\d)`` only captures 2. In calibre's regex engine practice, the Similarly, ``\d(\d)`` only captures 2. In calibre's regexp engine practice, the
positive lookbehind behaves in the same way, and selects only 2, contrary to positive lookbehind behaves in the same way, and selects only 2, contrary to
theory. theory.
@ -297,16 +297,16 @@ Special characters
| | | | | |
+--------------------+-------------------+ +--------------------+-------------------+
Metacharacters Meta-characters
-------------- ---------------
Metacharacters are those that have a special meaning for the regex engine. Of Meta-characters are those that have a special meaning for the regexp engine. Of
these, twelve must be preceded by an escape character, the backslash (``\``), to these, twelve must be preceded by an escape character, the backslash (``\``), to
lose their special meaning and become a regular character again:: lose their special meaning and become a regular character again::
^ . [ ] $ ( ) * + ? | \ ^ . [ ] $ ( ) * + ? | \
Seven other metacharacters do not need to be preceded by a backslash (but can Seven other meta-characters do not need to be preceded by a backslash (but can
be without any other consequence):: be without any other consequence)::
{ } ! < > = : { } ! < > = :
@ -315,10 +315,10 @@ be without any other consequence)::
Special characters lose their status if they are used inside a class (between Special characters lose their status if they are used inside a class (between
brackets ``[]``). The closing bracket and the dash have a special status in a brackets ``[]``). The closing bracket and the dash have a special status in a
class. Outside the class, the dash is a simple literal, the closing bracket class. Outside the class, the dash is a simple literal, the closing bracket
remains a metacharacter. remains a meta-character.
The slash (/) and the number sign (or hash character) (#) are not The slash (/) and the number sign (or hash character) (#) are not
metacharacters, they dont need to be escaped. meta-characters, they dont need to be escaped.
In some tools, like regex101.com with the Python engine, double quotes have the In some tools, like regex101.com with the Python engine, double quotes have the
special status of separator, and must be escaped, or the options changed. This special status of separator, and must be escaped, or the options changed. This
@ -328,7 +328,7 @@ Modes
----- -----
``(?s)`` ``(?s)``
Causes the dot (``.``)to match newline characters as well Causes the dot (``.``) to match newline characters as well
``(?m)`` ``(?m)``
Makes the ``^`` and ``$`` anchors match the start and end of lines Makes the ``^`` and ``$`` anchors match the start and end of lines