calibre

mirror of https://github.com/kovidgoyal/calibre.git synced 2026-05-30 18:45:20 -04:00

Author	SHA1	Message	Date
Hassan Raza	e1cdb70dc2	Add rooted path containment helper	2026-05-23 20:02:00 +05:00
Kovid Goyal	387f1d05fa	Merge branch 'count-pages-fixed-layout' of https://github.com/un-pogaz/calibre	2026-05-23 07:02:39 +05:30
Kovid Goyal	7d2f1597ea	Merge branch 'fix/http-connection-header-tokens' of https://github.com/M-Hassan-Raza/calibre	2026-05-23 06:52:37 +05:30
un-pogaz	332ccea5c8	pages count: support fixed-layout	2026-05-22 20:48:01 +02:00
Hassan Raza	289b77463a	Parse Connection header tokens	2026-05-22 22:58:10 +05:00
Kovid Goyal	cb2b1d195f	Merge branch 'fix/http-content-length-framing' of https://github.com/M-Hassan-Raza/calibre	2026-05-22 22:58:51 +05:30
Hassan Raza	74d8ab0c1b	Reject invalid HTTP Content-Length framing	2026-05-22 22:21:25 +05:00
Kovid Goyal	f31ee236ce	Merge branch 'fix/copy-to-library-move-duplicate' of https://github.com/M-Hassan-Raza/calibre	2026-05-22 22:33:02 +05:30
Hassan Raza	648343f888	Fix ignored duplicate moves in content server	2026-05-22 12:59:03 +05:00
Kovid Goyal	a5dd4a47cd	Merge branch 'fix-content-server-restrictions' of https://github.com/M-Hassan-Raza/calibre	2026-05-22 07:40:28 +05:30
Hassan Raza	5a15ed3d5a	Respect content server book restrictions	2026-05-21 22:21:33 +05:00
Kovid Goyal	66501a6ae7	Merge branch 'master' of https://github.com/unkn0w7n/calibre	2026-05-21 14:51:46 +05:30
unkn0w7n	b8b56c3607	Update indian_express.recipe	2026-05-21 14:47:34 +05:30
unkn_wn	b7b876196e	Update business_standard_print.recipe	2026-05-21 14:46:58 +05:30
Kovid Goyal	92e0132d62	Bump dependency for CVE	2026-05-20 20:38:48 +05:30
Kovid Goyal	9cd210dad0	pep8	2026-05-19 07:29:13 +05:30
Kovid Goyal	6dbc00d054	Merge branch 'ap-filter-by-publish-date' of https://github.com/claybdavis/calibre	2026-05-19 07:28:41 +05:30
Kovid Goyal	80f379147d	Merge branch 'newcriterion-wp-migration' of https://github.com/claybdavis/calibre	2026-05-19 07:27:51 +05:30
Kovid Goyal	d8490c2208	Merge branch 'bbc-sport-headline-block-fix' of https://github.com/claybdavis/calibre	2026-05-19 07:26:53 +05:30
Kovid Goyal	19d2488ca5	Merge branch 'bbc-drop-dead-feeds' of https://github.com/claybdavis/calibre	2026-05-19 07:25:58 +05:30
Kovid Goyal	806a3a5bfc	Merge branch 'dependabot/github_actions/actions-8abaa2cbc6' of https://github.com/kovidgoyal/calibre	2026-05-19 07:24:30 +05:30
dependabot[bot]	b450e0e2ca	Bump github/codeql-action from 4.35.3 to 4.35.4 in the actions group Bumps the actions group with 1 update: [github/codeql-action](https://github.com/github/codeql-action). Updates `github/codeql-action` from 4.35.3 to 4.35.4 - [Release notes](https://github.com/github/codeql-action/releases) - [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md) - [Commits](https://github.com/github/codeql-action/compare/v4.35.3...v4.35.4) --- updated-dependencies: - dependency-name: github/codeql-action dependency-version: 4.35.4 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: actions ... Signed-off-by: dependabot[bot] <support@github.com>	2026-05-19 01:37:27 +00:00
claybdavis	821f82b730	bbc: drop three dead/stale feed entries Audit (2026-05-18) of the BBC News feed list found three entries that no longer produce content: * Special Reports (https://feeds.bbci.co.uk/news/special_reports/rss.xml) HTTP/2 404. Wayback's last successful capture is 2024-07-23, so the URL has been dead for roughly two years. * Also in the News (https://feeds.bbci.co.uk/news/also_in_the_news/rss.xml) HTTP/2 404. Wayback has no successful captures of this URL at all. * Magazine (https://feeds.bbci.co.uk/news/magazine/rss.xml) 301-redirects to /news/stories/rss.xml which still returns 200 OK, but the content there has been stale since December 2022. The endpoint is alive; the section is abandoned. Because the recipes set remove_empty_feeds=True these three have been silently swallowed on every fetch, costing four wasted HTTP calls per run (the Magazine redirect doubles up). Dropping them cleans the active feed list without changing what readers actually receive. bbc.recipe had all three active entries; bbc_fast.recipe only carried Magazine. Both files patched accordingly. The ~50 commented-out legacy feed URLs in the same block are NOT touched here -- that is a separate cleanup.	2026-05-18 16:10:22 -05:00
claybdavis	507feb15fa	bbc: emit <h1> for Sport articles (handle new headline block-type) Sport articles fetched via recipes/bbc.recipe and recipes/bbc_fast.recipe were shipping with no <h1>, because BBC restructured the window.__INITIAL_DATA__ JSON. article['headline'] is now None for Sport, and the headline lives either in a new 'headline' block-type or — for the 'high-impact' layout — nested under a 'topper' block's model.heading.blocks list. The previous parse_article_json loop only branched on 'image' and 'text', so neither variant produced anything. Fix: prefer the plain-text article['metadata']['seoHeadline'] when the legacy article['headline'] field is empty, and as a defensive fallback extract the headline from a 'headline' or 'topper' block via a small extract_text_block_plaintext helper. Verified against live Sport URLs covering both block-type variants; legacy News articles that still populate article['headline'] are unaffected. bbc_fast.recipe carries an identical copy of parse_article_json, so the same patch is applied to both files.	2026-05-18 16:00:45 -05:00
claybdavis	a8797f05f1	newcriterion: update parse_index and login for WordPress migration newcriterion.com moved from October CMS to WordPress. The old recipe looked for <div id="main"> and an /issues/YYYY/M/ URL pattern, so parse_index crashed with AttributeError: 'NoneType' object has no attribute 'findAll' against the new layout. Rewrites parse_index for the new markup: issue URLs of the form /issues/<month>-<year>/, a <div class="issue-layout"> container, and <article class="article-display"> blocks with <h2><a> for title+URL and <p class="post-excerpt"> for the dek. Also ports get_browser from the old October-CMS XHR signin endpoint to standard wp-login.php form submission, and drops the now-unused urlencode, mechanize.Request, and re imports.	2026-05-18 14:50:10 -05:00
claybdavis	bdf0679ecf	ap: filter articles by article:published_time meta tag AP has no RSS feeds and parse_index had no date logic, so the framework's oldest_article knob was a no-op and cross-day duplicates accumulated indefinitely. This fix fetches each candidate article during indexing and reads <meta property="article:published_time"> to populate both the article's timestamp (which oldest_article actually filters on) and a formatted date string for the TOC. Cached per-URL across the front-page walk so duplicate links are fetched once. Articles whose published_time can't be read are skipped with a warning rather than kept dateless. Sets oldest_article = 1 (AP publishes constantly). The trade-off is roughly 30-60s extra wall time and a doubling of HTTP volume per run, paid for by dropping the ~24 stale articles per consecutive-day fetch. Same per-URL-fetch idiom as #3132 (latimes og:description).	2026-05-18 13:27:20 -05:00
Kovid Goyal	8544091c82	Content server: Apply null metadata when serving book files. Matches behavior of save to disk. Fixes #2152879 [tags not deleted in ePub by content server](https://bugs.launchpad.net/calibre/+bug/2152879 )	2026-05-18 14:27:32 +05:30
Kovid Goyal	68c567b372	Bump dependency for CVE	2026-05-16 13:32:20 +05:30
Kovid Goyal	111abb9a43	Merge branch 'propublica-drop-newsroom-blurb' of https://github.com/claybdavis/calibre	2026-05-14 11:16:44 +05:30
Kovid Goyal	23b27a71be	Merge branch 'latimes-fetch-og-description' of https://github.com/claybdavis/calibre	2026-05-14 11:15:54 +05:30
Kovid Goyal	af5f132bdf	Merge branch 'latimes-drop-follow-link' of https://github.com/claybdavis/calibre	2026-05-14 11:15:35 +05:30
Kovid Goyal	3c800802d5	Merge branch 'latimes-narrow-date-and-fix-images' of https://github.com/claybdavis/calibre	2026-05-14 11:14:17 +05:30
Kovid Goyal	b07017934c	Merge branch 'wapo-print-tag-subhead-and-caption' of https://github.com/claybdavis/calibre	2026-05-14 11:13:20 +05:30
Kovid Goyal	1fbb66b58f	Merge branch 'newyorker-drop-cartoon-stubs' of https://github.com/claybdavis/calibre	2026-05-14 11:10:39 +05:30
Kovid Goyal	125cdbf7e4	Merge branch 'tls-safari-ua' of https://github.com/claybdavis/calibre	2026-05-14 11:10:03 +05:30
claybdavis	8da3da0b66	propublica: drop the "ProPublica is a nonprofit newsroom" blurb The ProPublica WordPress theme prepends a standing "ProPublica is a nonprofit newsroom that investigates abuses of power..." block to every article body, wrapped as `<div class="wp-block-propublica-notes--top wp-block-propublica-note">`. This fix extends remove_tags to strip it via the --top BEM modifier. (The bare wp-block-propublica-note class is reused for in-body editor's-note boxes that should pass through, so the modifier scopes the strip to the top-of-article boilerplate only.)	2026-05-14 00:25:34 -05:00
claybdavis	d5c66c2653	latimes: populate per-article description from og:description LAT's section index pages only attach a teaser to <10% of article tiles, so most TOC entries in the resulting EPUB show no description under the headline. This fix fetches each article page once during indexing and reads <meta property="og:description"> to populate art['description']. Cached per-URL across the section walks so duplicates are fetched once. Descriptions shorter than 20 chars are dropped (LAT occasionally publishes placeholder text).	2026-05-14 00:19:21 -05:00
claybdavis	19692fbaeb	latimes: drop the Twitter Follow link from the byline LATimes bylines contain an <a data-social-trigger="enhancedByline"> wrapping an SVG <use> sprite reference and the literal text "Follow". The sprite targets a <shape> ID in a sibling <svg> elsewhere in the source document — that cross-document reference can't resolve in the converted EPUB, so the anchor renders as an empty icon slot followed a badly placed "Follow." This fix decomposes the anchor in preprocess_html. The "Staff Writer" <span> sibling is not a descendant, so the rubric stays in place.	2026-05-14 00:13:57 -05:00
claybdavis	58dda3ab13	latimes: narrow article regex to today+yesterday, strip picture wrappers Two issues fixed in LATimes: 1. The /story/YYYY-MM-DD/ regex in parse_index was date-agnostic, so each section walk pulled articles from arbitrary past dates. This fix narrows the date path-segment to today+yesterday — same idea as chicago_tribune.recipe's inline tdy/yest filter (just expressed as a regex alternation since latimes uses re.compile). 2. LAT wraps each photo in <picture><sourcesrcset="remote-webp"/> <img sizes="100vw" fetchpriority="high"src="local.jpg"/></picture>. The <source> URLs never load in EPUB (remote, no network at readtime), and sizes="100vw" tells the reader to render at full viewport width — combined with the image's natural aspect ratio that overflows portrait photos onto the next page. Decomposes <source>, unwraps <picture>, and removes the sizes/fetchpriority hints so readers respect the image's embedded dimensions.	2026-05-14 00:08:03 -05:00
claybdavis	93f6f0dfc8	wash_post_print: tag subhead and image captions for styling hooks recipe emits dek as `<p class="subt">...</h3>'. opening and closing tags are mismatched, leaving the dek without a clean hook. also, captions (promo, video, image) emit as a bare <div> with no class. my fixed recipe emits the subhead as `<h3 class="subhead">...</h3>` (matched pair) and tags each image-caption div with `class="caption"`. This change gives downstream stylesheets proper targets.	2026-05-13 23:59:30 -05:00
claybdavis	68ef0e420b	new_yorker: skip Cartoon Caption Contest and Slideshow stub tiles The magazine TOC walks <a class="summary-item__hed-link"> anchors inside every SummaryItemWrapper, including two non-article tile types that have no article body to extract: - summary-item--externallink (Cartoon Caption Contest — links to the interactive contest widget) - summary-item--gallery (Cartoon Slide Show — links to the gallery widget) Filters both via the BEM modifier on the wrapper. Real articles use summary-item--article (including Crossword) and pass through unchanged.	2026-05-13 23:37:28 -05:00
claybdavis	f78f24ae2a	tls_mag: force Safari UA so CloudFront serves the real page the-tls.com is fronted by CloudFront, which serves Calibre's default Chrome UA a CAPTCHA page instead of the issue page. The CAPTCHA response lacks the rel=shortlink Link header that parse_index reads to extract the WordPress issue id, so the recipe crashes at soup.find('link', rel='shortlink')['href']. Overrides get_browser to force a Safari UA, which CloudFront passes through cleanly.	2026-05-13 23:31:20 -05:00
Kovid Goyal	4c04c56033	Merge branch 'arb-disable-auto-cleanup' of https://github.com/claybdavis/calibre	2026-05-14 05:53:04 +05:30
Kovid Goyal	0407cbad79	Merge branch 'vox-keep-byline-and-lede' of https://github.com/claybdavis/calibre	2026-05-14 05:52:18 +05:30
claybdavis	3a9dfe8bbb	asianreviewofbooks: disable auto_cleanup so reviewer info isn't dropped auto_cleanup keeps only the body, but on asianreviewofbooks.com the reviewer name, publication date, category tags, h1 title, book-cover thumbnail, and the contributor-bio block at the foot all live in sibling sections within the same WordPress single-entry article wrapper — so they get stripped as chrome. Disables auto_cleanup and selects the whole <article class="... single-entry ..."> wrapper via keep_only_tags. remove_tags strips the JS-only social-share container, the duplicate post-tags footer, and the "Related" carousel of other reviews appended after the body.	2026-05-13 16:52:13 -05:00
claybdavis	90e286c831	vox: fetch live pages so byline, breadcrumb, and dek aren't dropped Vox's RSS feeds carry the article body inline in <content type="html">, and BasicNewsRecipe uses that embedded copy when present. The embedded copy is missing the byline, the section breadcrumb (e.g. "Future Perfect"), the dek, and the publication timestamp — all of which live only on the live page. Sets use_embedded_content = False so the live page is fetched, and selects the lede block (breadcrumb + h1 + dek + byline + timestamp + hero image) and body container explicitly via keep_only_tags. Drops the social-share row via remove_tags.	2026-05-13 16:49:05 -05:00
Kovid Goyal	ed5a92d9ae	Clean up test	2026-05-13 15:49:33 +05:30
Kovid Goyal	cf3a6f4e6f	E-book viewer: Fix incorrect search match offsets in normal search mode when the text contains non-BMP Unicode characters. Fixes #2152227 [search ignors some chars](https://bugs.launchpad.net/calibre/+bug/2152227 )	2026-05-13 15:46:02 +05:30
Kovid Goyal	801af5a41d	Ignore inapplicable CVE	2026-05-13 15:03:50 +05:30
Kovid Goyal	150094ddb7	Merge branch 'conversation-keep-byline-disclosure' of https://github.com/claybdavis/calibre	2026-05-13 11:31:33 +05:30

1 2 3 4 5 ...

53513 Commits