diff --git a/manual/regexp.rst b/manual/regexp.rst index c7046f92b3..011a8d255c 100644 --- a/manual/regexp.rst +++ b/manual/regexp.rst @@ -34,7 +34,7 @@ A regular expression is a way to describe sets of strings. A single regular expr Care to explain? -------------------- -Well, that's why we're here. First, this is the most important concept in regular expressions: *A string by itself is a regular expression that matches itself*. That is to say, if I wanted to match the string ``"Hello, World!"`` using a regular expression, the regular expression to use would be ``Hello, World!``. And yes, it really is that simple. You'll notice, though, that this *only* matches the exact string ``"Hello, World!"``, not e.g. ``"Hello, wOrld!"`` or ``"hello, world!"`` or any other such variation. +Well, that's why we're here. First, this is the most important concept in regular expressions: *A string by itself is a regular expression that matches itself*. That is to say, if I wanted to match the string ``"Hello, World!"`` using a regular expression, the regular expression to use would be ``Hello, World!``. And yes, it really is that simple. You'll notice, though, that this *only* matches the exact string ``"Hello, World!"``, not e.g. ``"Hello, wOrld!"`` or ``"hello, world!"`` or any other such variation. That doesn't sound too bad. What's next? ------------------------------------------ @@ -58,15 +58,15 @@ You can of course do that: Just put a backslash in front of any special characte So, what are the most useful sets? ------------------------------------ -Knew you'd ask. Some useful sets are ``[0-9]`` matching a single number, ``[a-z]`` matching a single lowercase letter, ``[A-Z]`` matching a single uppercase letter, ``[a-zA-Z]`` matching a single letter and ``[a-zA-Z0-9]`` matching a single letter or number. You can also use an escape sequence as shorthand:: +Knew you'd ask. Some useful sets are ``[0-9]`` matching a single number, ``[a-z]`` matching a single lowercase letter, ``[A-Z]`` matching a single uppercase letter, ``[a-zA-Z]`` matching a single letter and ``[a-zA-Z0-9]`` matching a single letter or number. You can also use an escape sequence as shorthand:: \d is equivalent to [0-9] \w is equivalent to [a-zA-Z0-9_] \s is equivalent to any whitespace - + .. note:: - "Whitespace" is a term for anything that won't be printed. These characters include space, tabulator, line feed, form feed and carriage return. - + "Whitespace" is a term for anything that won't be printed. These characters include space, tabulator, line feed, form feed and carriage return. + As a last note on sets, you can also define a set as any character *but* those in the set. You do that by including the character ``"^"`` as the *very first character in the set*. Thus, ``[^a]`` would match any character excluding "a". That's called complementing the set. Those escape sequence shorthands we saw earlier can also be complemented: ``"\D"`` means any non-number character, thus being equivalent to ``[^0-9]``. The other shorthands can be complemented by, you guessed it, using the respective uppercase letter instead of the lowercase one. So, going back to the example ``
]*>`` from the previous section, now you can see that the character set it's using tries to match any character except for a closing angle bracket. But if I had a few varying strings I wanted to match, things get complicated? @@ -87,7 +87,7 @@ In the beginning, you said there was a way to make a regular expression case ins Yes, I did, thanks for paying attention and reminding me. You can tell calibre how you want certain things handled by using something called flags. You include flags in your expression by using the special construct ``(?flags go here)`` where, obviously, you'd replace "flags go here" with the specific flags you want. For ignoring case, the flag is ``i``, thus you include ``(?i)`` in your expression. Thus, ``(?i)test`` would match "Test", "tEst", "TEst" and any case variation you could think of. -Another useful flag lets the dot match any character at all, *including* the newline, the flag ``s``. If you want to use multiple flags in an expression, just put them in the same statement: ``(?is)`` would ignore case and make the dot match all. It doesn't matter which flag you state first, ``(?si)`` would be equivalent to the above. +Another useful flag lets the dot match any character at all, *including* the newline, the flag ``s``. If you want to use multiple flags in an expression, just put them in the same statement: ``(?is)`` would ignore case and make the dot match all. It doesn't matter which flag you state first, ``(?si)`` would be equivalent to the above. I think I'm beginning to understand these regular expressions now... how do I use them in calibre? ----------------------------------------------------------------------------------------------------- @@ -104,7 +104,7 @@ Let's begin with the conversion settings, which is really neat. In the :guilabel http://www.processtext.com/abclit.html
It had only been two years since Addison v. Clark.
The court case gave us a revised version of what life was
-
+
(shamelessly ripped out of `this thread