Updated v3 Ideas List (markdown)

2025-07-09 03:04:12 -04:00 · 2025-07-01 09:44:25 -07:00 · 2025-07-01 09:44:25 -07:00 · a550f7e348
commit a550f7e348
parent aa26a44a12
1 changed files with 10 additions and 20 deletions
--- a/v3-Ideas-List.md
+++ b/v3-Ideas-List.md
@ -9,7 +9,7 @@
 ### Settings Updates

 - Remove all but Django settings from the environment
- Separate OCR vs other settings
+- Separate OCR vs other settings (call them site setting?)
 - Create multiple levels of OCR settings:
  - A default system configuration, controlled by staff/superusers
  - A user specific settings set
@ -29,18 +29,24 @@
 - An initial task takes the file, waits for it to be unmodified, then determines the next task to start.
 - Or alternatively, the initial task builds a pipeline and starts that.
 - Handles deciding if the file can be consumed, rather than when a new file is seen (see plugin ideas)
+- Make each step along the well a well defined status update, sent over websocket, but also configure something like apprise/ntfy
+- TODO: If something fails along the chain, the DB shouldn't be updated.  Maybe 1 task, multiple steps, wrapped in a transaction?

 ### Actual Plugins

 - Design a system to allow plugins, while splitting apart the current code into plugins
 - I can see the following being plugins:
-  - Parsers (obviously.  Includes things like AI/cloud OCR to get the content or even could talk to a remote, but local network API)
+  - Parsers (obviously.  Includes things like AI/cloud OCR to get the content or even could talk to a remote API)
  - Archive generation (example, use Gotenberg to convert a PDF to PDF/A instead of ocrmypdf)
  - Thumbnail generation (maybe you want to handle PDFs differently than JPEGs?)
  - Date parsing (handling non-latin dates, for example)
  - Machine learning (provides an interface which returns the proposed tags, type, etc)
- Ideally, plugins should be registered when installed, declaring what mime types they support
+- Ideally, plugins should be registered when installed, declaring what mime types they support, with some sort of conflict resolution
 - With the settings updates above, a workflow could also be used to set the parser based on matching certain values
+- Provide "paperless", a core set of functionality, including models
+- Provide the existing parsers, re-configured to match the new format
+- Rework the other parts to conform to the plugin API spec
+

 ### Simpler consumer

@ -85,23 +91,6 @@
 - The getting of a image or PDF document content should be separated from the generation of an archive file
 - Just too many interactions between them, leading to odd combinations

-## Break apart consumer
-
- The consumer does so much stuff, break it apart into smaller, more discrete steps
- Make each step well defined with possible status/states to report over the websocket and/or notifications
- Make it a chain of tasks, passing a package through which accumulates data, etc, before being saved
-
-## Settings Manager
-
- Allow multiple levels of settings to be defined
-  - From matching, apply certain settings
-  - From the user (if known), apply their settings
-  - From the system wide settings
-  - From environment variable settings
-  - Then defaults
- settings at lower levels have less priority, so a matched setting is never changed
- Settings travel through the new consumer with the document
-
 ## Django Ninja

 - Really like the OpenAPI spec it generates
@ -114,6 +103,7 @@
  - Could track, with some resolution, when a token was last used.  Might be nice to display and allow removing old tokens which haven't been used
  - Could implement expiration too
 - Async pagination isn't working quite yet
+- No idea about allauth/oidc integration

 ## Vector Embeddings