439 Commits

Author SHA1 Message Date
shamoon
566afdffca
Enhancement: unify text search to use tantivy (#12485) 2026-04-03 13:53:45 -07:00
Trenton H
aed9abe48c
Feature: Replace Whoosh with tantivy search backend (#12471)
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Antoine Mérino <3023499+Merinorus@users.noreply.github.com>
2026-04-02 12:38:22 -07:00
shamoon
d2328b776a
Performance: support bulk edit without id lists (#12355) 2026-03-31 18:23:28 +00:00
shamoon
245514ad10
Performance: deprecate and remove usage of all in API results (#12309) 2026-03-31 07:55:59 -07:00
shamoon
f715533770
Performance: support passing selection data with filtered document requests (#12300) 2026-03-30 16:38:52 +00:00
shamoon
129da3ade7
Tweakhancement: show file extension in StoragePath test (#12452) 2026-03-28 13:58:33 -07:00
shamoon
ae0474450f
Chore: logger, response and template sanitization cleanup (#12439) 2026-03-26 07:36:02 -07:00
Trenton H
701735f6e5
Chore: Drop old signal and unneeded apps, transition to parser registry instead (#12405)
* refactor: switch consumer and callers to ParserRegistry (Phase 4)

Replace all Django signal-based parser discovery with direct registry
calls. Removes `_parser_cleanup`, `parser_is_new_style` shims, and all
old-style isinstance checks. All parser instantiation now uses the
`with parser_class() as parser:` context manager pattern.

- documents/parsers.py: delegate to get_parser_registry(); drop lru_cache
- documents/consumer.py: use registry + context manager; remove shims
- documents/tasks.py: same pattern
- documents/management/commands/document_thumbnails.py: same pattern
- documents/views.py: get_metadata uses context manager
- documents/checks.py: use get_parser_registry().all_parsers()
- paperless/parsers/registry.py: add all_parsers() public method
- tests: update mocks to target documents.consumer.get_parser_class_for_mime_type

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* refactor: drop get_parser_class_for_mime_type; callers use registry directly

All callers now call get_parser_registry().get_parser_for_file() with
the actual filename and path, enabling score() to use file extension
hints. The MIME-only helper is removed.

- consumer.py: passes self.filename + self.working_copy
- tasks.py: passes document.original_filename + document.source_path
- document_thumbnails.py: same pattern
- views.py: passes Path(file).name + Path(file)
- parsers.py: internal helpers inline the registry call with filename=""
- test_parsers.py: drop TestParserDiscovery (was testing mock behavior);
  TestParserAvailability uses registry directly
- test_consumer.py: mocks switch to documents.consumer.get_parser_registry

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* refactor: remove document_consumer_declaration signal infrastructure

Remove the document_consumer_declaration signal that was previously used
for parser registration. Each parser app no longer connects to this signal,
and the signal declaration itself has been removed from documents/signals.

Changes:
- Remove document_consumer_declaration from documents/signals/__init__.py
- Remove ready() methods and signal imports from all parser app configs
- Delete signal shim files (signals.py) from all parser apps:
  - paperless_tesseract/signals.py
  - paperless_text/signals.py
  - paperless_tika/signals.py
  - paperless_mail/signals.py
  - paperless_remote/signals.py

Parser discovery now happens exclusively through the ParserRegistry
system introduced in the previous refactor phases.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* refactor: remove empty paperless_text and paperless_tika Django apps

After parser classes were moved to paperless/parsers/ in the plugin
refactor, these Django apps contained only empty AppConfig classes
with no models, views, tasks, migrations, or other functionality.

- Remove paperless_text and paperless_tika from INSTALLED_APPS
- Delete empty app directories entirely
- Update pyproject.toml test exclusions
- Clean stale mypy baseline entries for moved parser files

paperless_remote app is retained as it contains meaningful system
checks for Azure AI configuration.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Moves the checks and tests to the main application and removes the old applications

* Adds a comment to satisy Sonar

* refactor: remove automatic log_summary() call from get_parser_registry()

The summary was logged once per process, causing it to appear repeatedly
during Docker startup (management commands, web server, each Celery
worker subprocess). External parsers are already announced individually
at INFO when discovered; the full summary is redundant noise.
log_summary() is retained on ParserRegistry for manual/debug use.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Cleans up the duplicate test file/fixture

* Fixes a race condition where webserver threads could race to populate the registry

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 06:53:32 -07:00
shamoon
0f84af27d0
Merge branch 'main' into dev
# Conflicts:
#	docs/setup.md
#	src-ui/src/main.ts
#	src/documents/tests/test_api_bulk_edit.py
#	src/documents/tests/test_api_custom_fields.py
#	src/documents/tests/test_api_search.py
#	src/documents/tests/test_api_status.py
#	src/documents/tests/test_workflows.py
#	src/paperless_mail/tests/test_api.py
2026-03-21 02:12:19 -07:00
shamoon
3cbdf5d0b7
Fix: require view permission for more-like search 2026-03-21 01:20:59 -07:00
shamoon
9e9fc6213c Resolve GHSA-96jx-fj7m-qh6x 2026-03-20 15:39:15 -07:00
shamoon
87ebd13abc
Fix: remove pagination from document notes api spec (#12388) 2026-03-18 06:48:05 -07:00
Trenton H
aea2927a02
Feature: Convert Tika parser to the plugin system (#12333)
* Chore: move Tika parser and tests to paperless/

Move TikaDocumentParser and its tests to the canonical parser package
location, matching the pattern established for TextDocumentParser:

- src/paperless_tika/parsers.py → src/paperless/parsers/tika.py
- src/paperless_tika/tests/test_tika_parser.py → src/paperless/tests/parsers/test_tika_parser.py
- src/paperless_tika/tests/samples/ → src/paperless/tests/samples/tika/

Merge tika fixtures (tika_parser, sample_odt_file, sample_docx_file,
sample_doc_file, sample_broken_odt) into the shared parsers conftest.
Remove the now-empty src/paperless_tika/tests/conftest.py.

Content is unchanged — this commit is rename-only so git history is
preserved on the moved files.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Feature: Phase 3 — migrate TikaDocumentParser to ParserProtocol

Refactor TikaDocumentParser to satisfy ParserProtocol without subclassing
the legacy DocumentParser ABC:

- Add ClassVars: name, version, author, url
- Add supported_mime_types() classmethod (12 Office/ODF/RTF MIME types)
- Add score() classmethod — returns None when TIKA_ENABLED is False, 10 otherwise
- can_produce_archive = False (PDF is for display, not an OCR archive)
- requires_pdf_rendition = True (Office formats need PDF for browser display)
- __enter__/__exit__ via ExitStack: TikaClient opened once per parser
  lifetime and shared across parse() and extract_metadata() calls
- extract_metadata() falls back to a short-lived TikaClient when called
  outside a context manager (legacy view-layer metadata path)
- _convert_to_pdf() uses OutputTypeConfig() to honour the database-stored
  ApplicationConfiguration before falling back to the env-var setting
- Rename convert_to_pdf → _convert_to_pdf (private helper)

Update paperless_tika/signals.py shim to import from the new module path
and drop the legacy logging_group/progress_callback kwargs.

Update documents/consumer.py to extend the existing TextDocumentParser
special cases to also cover TikaDocumentParser (parse/get_thumbnail
signatures, __exit__ cleanup).

Add TestTikaParserRegistryInterface (7 tests) covering score(), properties,
and ParserProtocol isinstance check.  Update existing tests to use the new
accessor API (get_text, get_date, get_archive_path, _convert_to_pdf).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Fix: update remaining imports and move live Tika tests after parser migration

- src/documents/tests/test_parsers.py: import TikaDocumentParser from
  paperless.parsers.tika (old paperless_tika.parsers no longer exists)
- git mv paperless_tika/tests/test_live_tika.py →
  paperless/tests/parsers/test_live_tika.py to co-locate all Tika tests
  with the parser; update import and replace old attribute API
  (tika_parser.text/.archive_path) with accessor methods
  (get_text/get_archive_path)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Fix: satisfy mypy and pyrefly for TikaDocumentParser

Use a TYPE_CHECKING-guarded assert to narrow self._tika_client from
TikaClient | None to TikaClient at the point of use in parse().  The
assert is visible to type checkers (TYPE_CHECKING=True) so both mypy
and pyrefly accept the subsequent attribute accesses without error;
at runtime TYPE_CHECKING is False so the assert never executes and no
ruff S101 suppression is required.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Fix: require context manager for TikaDocumentParser; clean up client lifecycle

- consumer.py: call __enter__ for new-style parsers so _tika_client and
  _gotenberg_client are set before parse() is invoked
- views.py: use `with parser` (via nullcontext for old-style parsers) in
  get_metadata so extract_metadata always runs inside a context manager
- tika.py: GotenbergClient added to ExitStack alongside TikaClient;
  inline client creation removed from extract_metadata and _convert_to_pdf;
  __exit__ uses ExitStack.close() instead of __exit__ pass-through

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-17 15:43:28 -07:00
shamoon
2bb4af2be6
Change: sort custom fields alphabetically by default (#12358) 2026-03-15 22:52:02 -07:00
shamoon
45b363659e
Chore: mark document detail email action as deprecated (#12308) 2026-03-12 15:42:14 +00:00
shamoon
86573fc1a0
Chore: separate actions from bulk edit endpoint (#12286) 2026-03-10 18:55:36 +00:00
shamoon
85a18e5911
Enhancement: saved view sharing (#12142) 2026-03-04 14:15:43 -08:00
shamoon
d51a118aac
Merge branch 'main' into dev 2026-03-04 13:31:20 -08:00
shamoon
5b809122b5
Fix: apply ordering after annotating tag document count (#12238) 2026-03-04 00:33:13 -08:00
shamoon
96ac7b2336
Tweak: Ignore version docs for workflows (#12217) 2026-03-02 08:21:14 -08:00
shamoon
f65807b906
Merge branch 'main' into dev
# Conflicts:
#	docs/setup.md
#	src-ui/src/app/components/manage/document-attributes/management-list/management-list.component.ts
#	src/documents/tests/test_api_documents.py
2026-02-28 02:31:20 -08:00
shamoon
c7f83212a3
Enforce on selection_data too 2026-02-28 01:27:40 -08:00
Jan Kleine
c86ebc0260
Enhancment: Formatted filename for single document downloads (#12095)
---------

Co-authored-by: shamoon <4887959+shamoon@users.noreply.github.com>
2026-02-26 18:06:47 +00:00
shamoon
ceee769e26
Feature: document file versions (#12061) 2026-02-26 16:46:54 +00:00
Trenton H
53ac338946
Breaking: Removes API v1 and the related serializer (#12166) 2026-02-26 04:06:43 +00:00
shamoon
426c0a8974
Chore: typing fixes 2026-02-16 09:54:06 -08:00
shamoon
4884b67714
Fix more typing failures 2026-02-16 09:37:33 -08:00
shamoon
02896a15fd
Fix: only pass user to SerializerWithPerms serializers 2026-02-16 09:31:33 -08:00
shamoon
d8e07b8d84
Fix typing issue 2026-02-16 09:17:20 -08:00
shamoon
be4e29a19c
Merge branch 'main' into dev 2026-02-16 09:01:19 -08:00
shamoon
afaf39e43a
Fix/GHSA-x395-6h48-wr8v 2026-02-16 00:02:15 -08:00
shamoon
5b45b89d35
Performance fix: use subqueries to improve object retrieval in large installs (#11950) 2026-02-05 08:46:32 -08:00
Trenton H
2ec8ec96c8
Feature: Enable users to customize date parsing via plugins (#11931) 2026-02-03 20:09:13 +00:00
Sebastian Steinbeißer
3b5ffbf9fa
Chore(mypy): Annotate None returns for typing improvements (#11213) 2026-02-02 08:44:12 -08:00
shamoon
c3b036e0d3
Merge branch 'main' into dev 2026-01-31 09:10:33 -08:00
shamoon
c8c4c7c749
Security: enforce permissions for post_document 2026-01-30 12:14:18 -08:00
shamoon
e4b861d76f
Fix: prevent note deletion outside doc 2026-01-29 13:35:01 -08:00
shamoon
1f074390e4
Feature: sharelink bundles (#11682) 2026-01-27 18:54:51 +00:00
Antoine Mérino
df07b8a03e
Performance: faster statistics panel on dashboard (#11760) 2026-01-26 12:10:57 -08:00
shamoon
857aaca493
Merge branch 'release/v2.20.x' into dev 2026-01-26 09:25:58 -08:00
shamoon
891f4a2faf
Fix: correctly extract all ids for nested tags (#11888) 2026-01-26 09:12:03 -08:00
shamoon
2312314aa7
Performance: improve treenode inefficiencies (#11606) 2026-01-25 21:47:08 -08:00
Trenton H
d0032c18be
Breaking: Remove support for document and thumbnail encryption (#11850) 2026-01-24 19:29:54 -08:00
shamoon
00bb92e3e1
Fix: support ordering by storage path name (#11661) 2026-01-13 09:36:14 -08:00
shamoon
e940764fe0
Feature: Paperless AI (#10319) 2026-01-13 16:24:42 +00:00
shamoon
7b666e7569
Performance: improve treenode inefficiencies (#11606) 2026-01-12 13:04:03 -08:00
shamoon
5b1e66be91
Feature: password removal action (#11656) 2026-01-08 21:36:11 +00:00
shamoon
99724a25a2
Fix: support ordering by storage path name (#11661) 2025-12-28 16:05:21 -08:00
shamoon
b12f1e757c
Fixhancement: refactor email attachment logic (#11336) 2025-11-14 17:28:46 +00:00
shamoon
0219df5b67
Fixhancement: trim whitespace for some text searches (#11357) 2025-11-14 08:09:09 -08:00