This is invalid, but there apparently exist some books in the wild that
use it. Sigh. See #2146609 (*LOTS* of undesired splits on EPUB to AZW3 conversion)
- Add hyphen_is_extra_break flag to icu_BreakIterator struct
- Set flag at creation time by checking if any extra break char is
a hyphen (0x2d or 0x2010) via IS_HYPHEN_CHAR
- Move IS_HYPHEN_CHAR macro before struct definition so it's usable
in the constructor
- Guard all hyphen-joining logic (leading_hyphen, trailing_hyphen,
is_hyphen_sep) and sub-segment trailing-hyphen detection behind
!bi->hyphen_is_extra_break check
- Add test: BreakIterator with '-' extra break splits 'out-of-the-box'
into ['out', 'of', 'the', 'box']
Co-authored-by: kovidgoyal <1308621+kovidgoyal@users.noreply.github.com>
Agent-Logs-Url: https://github.com/kovidgoyal/calibre/sessions/b439270b-8a40-4b51-96f2-8f869de7983d
- Add optional extra_word_break_chars field (sorted UChar32[]) to
icu_BreakIterator struct, stored as a sorted array for efficient lookup
- icu_BreakIterator_new accepts optional 3rd argument (Python str) that is
parsed into a sorted UChar32[] via insertion sort; only applies to UBRK_WORD
- icu_BreakIterator_dealloc frees the extra chars array
- New find_extra_word_break() inline helper scans a UTF-16 segment for the
first matching extra-break codepoint using U16_NEXT + linear search
- BreakIterState gains extra_break_active/seg_start/seg_end sub-segmentation
state fields (zero-initialized by memset in break_iter_state_init)
- break_iter_state_next refactored from while loop to for(;;) to drain
sub-segments before fetching more ICU data; extra break within an ICU word
segment causes the piece before it to flow through normal hyphen-joining
logic while the tail is deferred; trailing-hyphen detection on sub-segments
enables hyphen-joining with subsequent ICU segments
- Fast path: num_extra_word_break_chars == 0 → single comparison, zero overhead
- Tests added covering: letter extra break char, count_words/split2, adjacent
breaks, multiple chars, None arg, surrogate-pair extra break char
Co-authored-by: kovidgoyal <1308621+kovidgoyal@users.noreply.github.com>
Agent-Logs-Url: https://github.com/kovidgoyal/calibre/sessions/c003ae42-1e56-4dbb-9ef2-9f1645b76c70
Previously a single 'pending-annot-upload' IDB key was used, so only
the last-annotated book's pending upload survived an app kill. With
multiple books annotated offline (or across multiple tabs), earlier
books' uploads were silently dropped from the queue. A related bug
caused stale in-memory state from a previous book to be used on Sync
after navigating between books in the same tab, potentially sending the
wrong annotations to the wrong book endpoint.
Changes:
- IDB key is now 'pending-annot-upload:{library_id}/{book_id}/{fmt}',
one entry per book, so all books' pending uploads survive independently
- New get_all_pending_annot_uploads() uses an IDB cursor range query to
retrieve every pending entry
- clear_pending_annot_upload() now takes book identity params and a
completion callback so the next upload starts only after the IDB
delete has committed
- _make_annot_upload_done() returns a per-book closure used as the ajax
callback, replacing the single _annot_upload_done method
- After each successful upload, _upload_next_from_idb() fetches and
uploads the next pending entry, draining the queue sequentially
- _on_network_restored() no longer requires a book to be open, so
pending uploads from other books are flushed even from the homepage
- load_book() clears unsynced_amap and the indicator timer/state so
stale in-memory state from the previous book is never used
When multiple tabs open simultaneously, coordinate with navigator.locks
using ifAvailable so only the first tab calls persist(). Tabs that lose
the lock skip the call entirely, preventing the browser from showing
redundant permission prompts. Falls back to the persisted() check for
browsers without Web Locks support.
Check navigator.storage.persisted() before calling persist(), so tabs
opened after permission was already granted do not trigger a redundant
browser prompt.
Replace hardcoded #d4a017 with builtin_color('yellow', is_dark_theme()),
which resolves to #ffeb6b (light theme) or #906e00 (dark theme) — the
same values used for yellow text highlights. Single source of truth in
calibre/constants.py.