mirror of
https://github.com/kovidgoyal/calibre.git
synced 2025-07-09 03:04:10 -04:00
...
This commit is contained in:
parent
2aaef2f34c
commit
333ab06fbd
@ -39,7 +39,7 @@ Well, that's why we're here. First, this is the most important concept in regula
|
|||||||
That doesn't sound too bad. What's next?
|
That doesn't sound too bad. What's next?
|
||||||
------------------------------------------
|
------------------------------------------
|
||||||
|
|
||||||
Next is the beginning of the really good stuff. Remember where I said that regular expressions can match multiple strings? This is were it gets a little more complicated. Say, as a somewhat more practical exercise, the ebook you wanted to convert had a nasty footer counting the pages, like "Page 5 of 423". Obviously the page number would rise from 1 to 423, thus you'd have to match 423 different strings, right? Wrong, actually: regular expressions allow you to define sets of characters that are matched: To define a set, you put all the characters you want to be in the set into square brackets. So, for example, the set ``[abc]`` would match either the character "a", "b" or "c". *Sets will always only match one of the characters in the set*. They "understand" character ranges, that is, if you wanted to match all the lower case characters, you'd use the set ``[a-z]`` for lower- and uppercase characters you'd use ``[a-zA-Z]`` and so on. Got the idea? So, obviously, using the expression ``Page [0-9] of 423`` you'd be able to match the first 9 pages, thus reducing the expressions needed to three: The second expression ``Page [0-9][0-9] of 423`` would match all two-digit page numbers, and I'm sure you can guess what the third expression would look like. Yes, go ahead. Write it down.
|
Next is the beginning of the really good stuff. Remember where I said that regular expressions can match multiple strings? This is where it gets a little more complicated. Say, as a somewhat more practical exercise, the ebook you wanted to convert had a nasty footer counting the pages, like "Page 5 of 423". Obviously the page number would rise from 1 to 423, thus you'd have to match 423 different strings, right? Wrong, actually: regular expressions allow you to define sets of characters that are matched: To define a set, you put all the characters you want to be in the set into square brackets. So, for example, the set ``[abc]`` would match either the character "a", "b" or "c". *Sets will always only match one of the characters in the set*. They "understand" character ranges, that is, if you wanted to match all the lower case characters, you'd use the set ``[a-z]`` for lower- and uppercase characters you'd use ``[a-zA-Z]`` and so on. Got the idea? So, obviously, using the expression ``Page [0-9] of 423`` you'd be able to match the first 9 pages, thus reducing the expressions needed to three: The second expression ``Page [0-9][0-9] of 423`` would match all two-digit page numbers, and I'm sure you can guess what the third expression would look like. Yes, go ahead. Write it down.
|
||||||
|
|
||||||
Hey, neat! This is starting to make sense!
|
Hey, neat! This is starting to make sense!
|
||||||
---------------------------------------------
|
---------------------------------------------
|
||||||
|
Loading…
x
Reference in New Issue
Block a user