Commit Graph

43497 Commits

Author SHA1 Message Date
Kovid Goyal 3dccfae35a String changes 2021-06-21 12:37:46 +05:30
Kovid Goyal 7fe5fff311 Add tests for stemming 2021-06-21 11:48:22 +05:30
Kovid Goyal 2bfc3d1e7f ... 2021-06-20 22:00:22 +05:30
Kovid Goyal ff01e16610 Fix #1459 (Recipe: slight improvement to Hindu) 2021-06-20 21:15:06 +05:30
Kovid Goyal 9904f33941 Add libstemmer to Arch CI deps 2021-06-20 14:55:22 +05:30
Kovid Goyal f5d56958b8 Start work on stemming for the ICU tokenizer 2021-06-20 14:43:24 +05:30
Kovid Goyal 5565c3395e ... 2021-06-20 12:39:12 +05:30
Kovid Goyal 8457379487 Add libstemmer as a dependency
Will be used for tokenizing in the new ICU based FTS tokenizer
2021-06-20 12:38:04 +05:30
Kovid Goyal 6c3f1ebb5f Fix #1933004 [TheAtlantic.com recipe not downloading article text](https://bugs.launchpad.net/calibre/+bug/1933004) 2021-06-20 09:09:15 +05:30
Kovid Goyal 755b58d1f5 Test tokenization with different UI langauges 2021-06-19 15:16:48 +05:30
Kovid Goyal 6f7766fbf4 Another script block tokenizer test 2021-06-19 15:00:07 +05:30
Kovid Goyal 53168e075e Alias for test name option 2021-06-19 14:59:48 +05:30
Kovid Goyal c75f20a875 Dont repeatedly lookup the word iterator 2021-06-19 14:47:02 +05:30
Kovid Goyal a547ffd26e Fix script block loop
use the correct language based iterator and also update the start of the
block correctly
2021-06-19 14:27:54 +05:30
Kovid Goyal fafacae005 Merge branch 'master' of https://github.com/cbhaley/calibre
Fixes #1932984 [AttributeError on 'Add-subcategory to <main-category>'](https://bugs.launchpad.net/calibre/+bug/1932984)
2021-06-19 13:59:30 +05:30
Kovid Goyal 6f7454f1ad Ensure text fed to the FTS engine is in NFKC form 2021-06-19 13:58:28 +05:30
Charles Haley 7e49b481e9 Bug 1932984: AttributeError on 'Add-subcategory to <main-category>' 2021-06-19 09:17:09 +01:00
Kovid Goyal 52a87af143 Bounds check access to byte_offsets 2021-06-19 13:34:29 +05:30
Kovid Goyal d9c0da9ec3 ... 2021-06-19 13:13:03 +05:30
Kovid Goyal 6e62ccab38 Forgot to test boolean operators in queries 2021-06-19 11:50:46 +05:30
Kovid Goyal e0dad27caa tests for fts query syntax 2021-06-19 11:47:52 +05:30
Kovid Goyal 310a1a7d2e Add FTS tokenizer tests with Chinese 2021-06-19 10:54:34 +05:30
Kovid Goyal ef78b19912 Also hold global lock when constructing a tokenizer and setting its current_ui_language 2021-06-18 21:40:14 +05:30
Kovid Goyal d9b773bd19 Ensure tokenizer tests are run with a fixed UI language 2021-06-18 21:38:15 +05:30
Kovid Goyal c86f439e64 ... 2021-06-18 21:16:59 +05:30
Kovid Goyal 6ef1ec1656 Add currency and other symbols to allowed token characters 2021-06-18 21:04:31 +05:30
Kovid Goyal 2cf31be2ba Use ICU Word BreakIterator for tokenization 2021-06-18 18:06:15 +05:30
Kovid Goyal 879262929e Merge branch 'master' of https://github.com/MorganSeltzer000/calibre
E-book viewer: Fix scrolling backwards by screen-fulls not working
with very large page margins.
2021-06-18 07:50:31 +05:30
Kovid Goyal febc066142 A function to ensure lang specific iterators 2021-06-18 07:43:10 +05:30
Morgan Seltzer 501d6d0cf2 Fixed Pageup Occasionally Failing
Before, pageup failed when the page margins were greater than half the
screen width, because previous_screen_location() went backward by
screen_inline, which did not account for the margins but worked most of
the time due to later rounding. Now this has been fixed.

Signed-off-by: Morgan Seltzer <MorganSeltzer000@gmail.com>
2021-06-17 12:42:18 -05:00
Kovid Goyal 87b85cac39 Start work on ICU word break iterator based tokenization 2021-06-17 15:56:12 +05:30
Kovid Goyal 0cb9637e8c ... 2021-06-17 14:38:00 +05:30
Kovid Goyal d818bc17b8 ... 2021-06-17 12:12:59 +05:30
Kovid Goyal 6302937c4f Allow directly testing the tokenizer 2021-06-17 12:10:24 +05:30
Kovid Goyal 4127117e8a Add a UI language based iterator 2021-06-17 09:53:02 +05:30
Kovid Goyal 06d34a2df9 Add a test for snippets 2021-06-17 08:31:16 +05:30
Kovid Goyal 53b8bed17a Function to get available locales for break iteration 2021-06-17 07:25:15 +05:30
Kovid Goyal f138d716a5 Merge branch 'python3.10' of https://github.com/swt2c/calibre 2021-06-17 06:16:25 +05:30
Scott Talbert 2e272a39d0 Fix building with Python 3.10 2021-06-16 14:19:40 -04:00
Kovid Goyal 6773b36a42 Forgot to add header to extension definition 2021-06-16 21:57:44 +05:30
Kovid Goyal 584eacdee4 E-book viewer: Fix font sizes specified in absolute units not being honored in locales where the decimal separator is not the period. Fixes #1932152 [The e-book viewer ignores font-size property when using some absolute lenght units](https://bugs.launchpad.net/calibre/+bug/1932152) 2021-06-16 21:55:51 +05:30
Kovid Goyal 12e9769b4b Dont resize scratch unneccessarily 2021-06-16 21:40:17 +05:30
Kovid Goyal 22af8ab304 silence compiler warning 2021-06-16 21:38:32 +05:30
Kovid Goyal 9e77e2848e ... 2021-06-16 20:39:45 +05:30
Kovid Goyal 03b7feb507 Avoid ipython repeated exception when not available 2021-06-16 19:47:54 +05:30
Kovid Goyal a37c14499c Fix building of sqlite_extension on ancient Linux 2021-06-16 17:14:31 +05:30
Kovid Goyal d8595e5bf5 Fix ICU build on Windows 2021-06-16 17:02:07 +05:30
Kovid Goyal ae25a1f425 Also add test without diacritics removal 2021-06-16 16:16:03 +05:30
Kovid Goyal bbee5b0acb Implement diacritics removal in the new tokenizer 2021-06-16 14:54:15 +05:30
Kovid Goyal ab313c836f Implement the unicode61 tokenizer with ICU
Still have to implement removal of diacritics
2021-06-16 12:51:43 +05:30