mirror of
				https://github.com/paperless-ngx/paperless-ngx.git
				synced 2025-10-31 02:27:10 -04:00 
			
		
		
		
	Documentation: Fix list indentation (#8050)
--------- Co-authored-by: shamoon <4887959+shamoon@users.noreply.github.com>
This commit is contained in:
		
							parent
							
								
									149d770ad1
								
							
						
					
					
						commit
						605aa50b00
					
				| @ -7,9 +7,9 @@ | ||||
|     "trailingComma": "es5", | ||||
|     "overrides": [ | ||||
|         { | ||||
|             "files": ["index.md", "administration.md"], | ||||
|             "files": ["docs/*.md"], | ||||
|             "options": { | ||||
|                 "tabWidth": 4 | ||||
|                 "tabWidth": 4, | ||||
|             } | ||||
|         } | ||||
|     ] | ||||
|  | ||||
| @ -25,20 +25,20 @@ documents. | ||||
| 
 | ||||
| The following algorithms are available: | ||||
| 
 | ||||
| - **None:** No matching will be performed. | ||||
| - **Any:** Looks for any occurrence of any word provided in match in | ||||
|   the PDF. If you define the match as `Bank1 Bank2`, it will match | ||||
|   documents containing either of these terms. | ||||
| - **All:** Requires that every word provided appears in the PDF, | ||||
|   albeit not in the order provided. | ||||
| - **Exact:** Matches only if the match appears exactly as provided | ||||
|   (i.e. preserve ordering) in the PDF. | ||||
| - **Regular expression:** Parses the match as a regular expression and | ||||
|   tries to find a match within the document. | ||||
| - **Fuzzy match:** Uses a partial matching based on locating the tag text | ||||
|   inside the document, using a [partial ratio](https://rapidfuzz.github.io/RapidFuzz/Usage/fuzz.html#partial-ratio) | ||||
| - **Auto:** Tries to automatically match new documents. This does not | ||||
|   require you to set a match. See the [notes below](#automatic-matching). | ||||
| -   **None:** No matching will be performed. | ||||
| -   **Any:** Looks for any occurrence of any word provided in match in | ||||
|     the PDF. If you define the match as `Bank1 Bank2`, it will match | ||||
|     documents containing either of these terms. | ||||
| -   **All:** Requires that every word provided appears in the PDF, | ||||
|     albeit not in the order provided. | ||||
| -   **Exact:** Matches only if the match appears exactly as provided | ||||
|     (i.e. preserve ordering) in the PDF. | ||||
| -   **Regular expression:** Parses the match as a regular expression and | ||||
|     tries to find a match within the document. | ||||
| -   **Fuzzy match:** Uses a partial matching based on locating the tag text | ||||
|     inside the document, using a [partial ratio](https://rapidfuzz.github.io/RapidFuzz/Usage/fuzz.html#partial-ratio) | ||||
| -   **Auto:** Tries to automatically match new documents. This does not | ||||
|     require you to set a match. See the [notes below](#automatic-matching). | ||||
| 
 | ||||
| When using the _any_ or _all_ matching algorithms, you can search for | ||||
| terms that consist of multiple words by enclosing them in double quotes. | ||||
| @ -69,33 +69,33 @@ Paperless tries to hide much of the involved complexity with this | ||||
| approach. However, there are a couple caveats you need to keep in mind | ||||
| when using this feature: | ||||
| 
 | ||||
| - Changes to your documents are not immediately reflected by the | ||||
|   matching algorithm. The neural network needs to be _trained_ on your | ||||
|   documents after changes. Paperless periodically (default: once each | ||||
|   hour) checks for changes and does this automatically for you. | ||||
| - The Auto matching algorithm only takes documents into account which | ||||
|   are NOT placed in your inbox (i.e. have any inbox tags assigned to | ||||
|   them). This ensures that the neural network only learns from | ||||
|   documents which you have correctly tagged before. | ||||
| - The matching algorithm can only work if there is a correlation | ||||
|   between the tag, correspondent, document type, or storage path and | ||||
|   the document itself. Your bank statements usually contain your bank | ||||
|   account number and the name of the bank, so this works reasonably | ||||
|   well, However, tags such as "TODO" cannot be automatically | ||||
|   assigned. | ||||
| - The matching algorithm needs a reasonable number of documents to | ||||
|   identify when to assign tags, correspondents, storage paths, and | ||||
|   types. If one out of a thousand documents has the correspondent | ||||
|   "Very obscure web shop I bought something five years ago", it will | ||||
|   probably not assign this correspondent automatically if you buy | ||||
|   something from them again. The more documents, the better. | ||||
| - Paperless also needs a reasonable amount of negative examples to | ||||
|   decide when not to assign a certain tag, correspondent, document | ||||
|   type, or storage path. This will usually be the case as you start | ||||
|   filling up paperless with documents. Example: If all your documents | ||||
|   are either from "Webshop" or "Bank", paperless will assign one | ||||
|   of these correspondents to ANY new document, if both are set to | ||||
|   automatic matching. | ||||
| -   Changes to your documents are not immediately reflected by the | ||||
|     matching algorithm. The neural network needs to be _trained_ on your | ||||
|     documents after changes. Paperless periodically (default: once each | ||||
|     hour) checks for changes and does this automatically for you. | ||||
| -   The Auto matching algorithm only takes documents into account which | ||||
|     are NOT placed in your inbox (i.e. have any inbox tags assigned to | ||||
|     them). This ensures that the neural network only learns from | ||||
|     documents which you have correctly tagged before. | ||||
| -   The matching algorithm can only work if there is a correlation | ||||
|     between the tag, correspondent, document type, or storage path and | ||||
|     the document itself. Your bank statements usually contain your bank | ||||
|     account number and the name of the bank, so this works reasonably | ||||
|     well, However, tags such as "TODO" cannot be automatically | ||||
|     assigned. | ||||
| -   The matching algorithm needs a reasonable number of documents to | ||||
|     identify when to assign tags, correspondents, storage paths, and | ||||
|     types. If one out of a thousand documents has the correspondent | ||||
|     "Very obscure web shop I bought something five years ago", it will | ||||
|     probably not assign this correspondent automatically if you buy | ||||
|     something from them again. The more documents, the better. | ||||
| -   Paperless also needs a reasonable amount of negative examples to | ||||
|     decide when not to assign a certain tag, correspondent, document | ||||
|     type, or storage path. This will usually be the case as you start | ||||
|     filling up paperless with documents. Example: If all your documents | ||||
|     are either from "Webshop" or "Bank", paperless will assign one | ||||
|     of these correspondents to ANY new document, if both are set to | ||||
|     automatic matching. | ||||
| 
 | ||||
| ## Hooking into the consumption process {#consume-hooks} | ||||
| 
 | ||||
| @ -242,12 +242,12 @@ webserver: | ||||
| 
 | ||||
| Troubleshooting: | ||||
| 
 | ||||
| - Monitor the Docker Compose log | ||||
|   `cd ~/paperless-ngx; docker compose logs -f` | ||||
| - Check your script's permission e.g. in case of permission error | ||||
|   `sudo chmod 755 post-consumption-example.sh` | ||||
| - Pipe your scripts's output to a log file e.g. | ||||
|   `echo "${DOCUMENT_ID}" | tee --append /usr/src/paperless/scripts/post-consumption-example.log` | ||||
| -   Monitor the Docker Compose log | ||||
|     `cd ~/paperless-ngx; docker compose logs -f` | ||||
| -   Check your script's permission e.g. in case of permission error | ||||
|     `sudo chmod 755 post-consumption-example.sh` | ||||
| -   Pipe your scripts's output to a log file e.g. | ||||
|     `echo "${DOCUMENT_ID}" | tee --append /usr/src/paperless/scripts/post-consumption-example.log` | ||||
| 
 | ||||
| ## File name handling {#file-name-handling} | ||||
| 
 | ||||
| @ -302,35 +302,35 @@ will create a directory structure as follows: | ||||
| 
 | ||||
| Paperless provides the following variables for use within filenames: | ||||
| 
 | ||||
| - `{{ asn }}`: The archive serial number of the document, or "none". | ||||
| - `{{ correspondent }}`: The name of the correspondent, or "none". | ||||
| - `{{ document_type }}`: The name of the document type, or "none". | ||||
| - `{{ tag_list }}`: A comma separated list of all tags assigned to the | ||||
|   document. | ||||
| - `{{ title }}`: The title of the document. | ||||
| - `{{ created }}`: The full date (ISO format) the document was created. | ||||
| - `{{ created_year }}`: Year created only, formatted as the year with | ||||
|   century. | ||||
| - `{{ created_year_short }}`: Year created only, formatted as the year | ||||
|   without century, zero padded. | ||||
| - `{{ created_month }}`: Month created only (number 01-12). | ||||
| - `{{ created_month_name }}`: Month created name, as per locale | ||||
| - `{{ created_month_name_short }}`: Month created abbreviated name, as per | ||||
|   locale | ||||
| - `{{ created_day }}`: Day created only (number 01-31). | ||||
| - `{{ added }}`: The full date (ISO format) the document was added to | ||||
|   paperless. | ||||
| - `{{ added_year }}`: Year added only. | ||||
| - `{{ added_year_short }}`: Year added only, formatted as the year without | ||||
|   century, zero padded. | ||||
| - `{{ added_month }}`: Month added only (number 01-12). | ||||
| - `{{ added_month_name }}`: Month added name, as per locale | ||||
| - `{{ added_month_name_short }}`: Month added abbreviated name, as per | ||||
|   locale | ||||
| - `{{ added_day }}`: Day added only (number 01-31). | ||||
| - `{{ owner_username }}`: Username of document owner, if any, or "none" | ||||
| - `{{ original_name }}`: Document original filename, minus the extension, if any, or "none" | ||||
| - `{{ doc_pk }}`: The paperless identifier (primary key) for the document. | ||||
| -   `{{ asn }}`: The archive serial number of the document, or "none". | ||||
| -   `{{ correspondent }}`: The name of the correspondent, or "none". | ||||
| -   `{{ document_type }}`: The name of the document type, or "none". | ||||
| -   `{{ tag_list }}`: A comma separated list of all tags assigned to the | ||||
|     document. | ||||
| -   `{{ title }}`: The title of the document. | ||||
| -   `{{ created }}`: The full date (ISO format) the document was created. | ||||
| -   `{{ created_year }}`: Year created only, formatted as the year with | ||||
|     century. | ||||
| -   `{{ created_year_short }}`: Year created only, formatted as the year | ||||
|     without century, zero padded. | ||||
| -   `{{ created_month }}`: Month created only (number 01-12). | ||||
| -   `{{ created_month_name }}`: Month created name, as per locale | ||||
| -   `{{ created_month_name_short }}`: Month created abbreviated name, as per | ||||
|     locale | ||||
| -   `{{ created_day }}`: Day created only (number 01-31). | ||||
| -   `{{ added }}`: The full date (ISO format) the document was added to | ||||
|     paperless. | ||||
| -   `{{ added_year }}`: Year added only. | ||||
| -   `{{ added_year_short }}`: Year added only, formatted as the year without | ||||
|     century, zero padded. | ||||
| -   `{{ added_month }}`: Month added only (number 01-12). | ||||
| -   `{{ added_month_name }}`: Month added name, as per locale | ||||
| -   `{{ added_month_name_short }}`: Month added abbreviated name, as per | ||||
|     locale | ||||
| -   `{{ added_day }}`: Day added only (number 01-31). | ||||
| -   `{{ owner_username }}`: Username of document owner, if any, or "none" | ||||
| -   `{{ original_name }}`: Document original filename, minus the extension, if any, or "none" | ||||
| -   `{{ doc_pk }}`: The paperless identifier (primary key) for the document. | ||||
| 
 | ||||
| !!! warning | ||||
| 
 | ||||
| @ -381,10 +381,10 @@ before empty placeholders are removed as well, empty directories are omitted. | ||||
| When a single storage layout is not sufficient for your use case, storage paths allow for more complex | ||||
| structure to set precisely where each document is stored in the file system. | ||||
| 
 | ||||
| - Each storage path is a [`PAPERLESS_FILENAME_FORMAT`](configuration.md#PAPERLESS_FILENAME_FORMAT) and | ||||
|   follows the rules described above | ||||
| - Each document is assigned a storage path using the matching algorithms described above, but can be | ||||
|   overwritten at any time | ||||
| -   Each storage path is a [`PAPERLESS_FILENAME_FORMAT`](configuration.md#PAPERLESS_FILENAME_FORMAT) and | ||||
|     follows the rules described above | ||||
| -   Each document is assigned a storage path using the matching algorithms described above, but can be | ||||
|     overwritten at any time | ||||
| 
 | ||||
| For example, you could define the following two storage paths: | ||||
| 
 | ||||
| @ -435,8 +435,8 @@ with more complex logic. | ||||
| 
 | ||||
| #### Additional Variables | ||||
| 
 | ||||
| - `{{ tag_name_list }}`: A list of tag names applied to the document, ordered by the tag name. Note this is a list, not a single string | ||||
| - `{{ custom_fields }}`: A mapping of custom field names to their type and value. A user can access the mapping by field name or check if a field is applied by checking its existence in the variable. | ||||
| -   `{{ tag_name_list }}`: A list of tag names applied to the document, ordered by the tag name. Note this is a list, not a single string | ||||
| -   `{{ custom_fields }}`: A mapping of custom field names to their type and value. A user can access the mapping by field name or check if a field is applied by checking its existence in the variable. | ||||
| 
 | ||||
| !!! tip | ||||
| 
 | ||||
| @ -532,15 +532,15 @@ installation, you can use volumes to accomplish this: | ||||
| 
 | ||||
| ```yaml | ||||
| services: | ||||
|   # ... | ||||
|   webserver: | ||||
|     environment: | ||||
|       - PAPERLESS_ENABLE_FLOWER | ||||
|     ports: | ||||
|       - 5555:5555 # (2)! | ||||
|     # ... | ||||
|     volumes: | ||||
|       - /path/to/my/flowerconfig.py:/usr/src/paperless/src/paperless/flowerconfig.py:ro # (1)! | ||||
|     webserver: | ||||
|         environment: | ||||
|             - PAPERLESS_ENABLE_FLOWER | ||||
|         ports: | ||||
|             - 5555:5555 # (2)! | ||||
|         # ... | ||||
|         volumes: | ||||
|             - /path/to/my/flowerconfig.py:/usr/src/paperless/src/paperless/flowerconfig.py:ro # (1)! | ||||
| ``` | ||||
| 
 | ||||
| 1. Note the `:ro` tag means the file will be mounted as read only. | ||||
| @ -571,11 +571,11 @@ For example, using Docker Compose: | ||||
| 
 | ||||
| ```yaml | ||||
| services: | ||||
|   # ... | ||||
|   webserver: | ||||
|     # ... | ||||
|     volumes: | ||||
|       - /path/to/my/scripts:/custom-cont-init.d:ro # (1)! | ||||
|     webserver: | ||||
|         # ... | ||||
|         volumes: | ||||
|             - /path/to/my/scripts:/custom-cont-init.d:ro # (1)! | ||||
| ``` | ||||
| 
 | ||||
| 1. Note the `:ro` tag means the folder will be mounted as read only. This is for extra security against changes | ||||
| @ -623,16 +623,16 @@ Paperless is able to utilize barcodes for automatically performing some tasks. | ||||
| 
 | ||||
| At this time, the library utilized for detection of barcodes supports the following types: | ||||
| 
 | ||||
| - AN-13/UPC-A | ||||
| - UPC-E | ||||
| - EAN-8 | ||||
| - Code 128 | ||||
| - Code 93 | ||||
| - Code 39 | ||||
| - Codabar | ||||
| - Interleaved 2 of 5 | ||||
| - QR Code | ||||
| - SQ Code | ||||
| -   AN-13/UPC-A | ||||
| -   UPC-E | ||||
| -   EAN-8 | ||||
| -   Code 128 | ||||
| -   Code 93 | ||||
| -   Code 39 | ||||
| -   Codabar | ||||
| -   Interleaved 2 of 5 | ||||
| -   QR Code | ||||
| -   SQ Code | ||||
| 
 | ||||
| You may check for updates on the [zbar library homepage](https://github.com/mchehab/zbar). | ||||
| For usage in Paperless, the type of barcode does not matter, only the contents of it. | ||||
| @ -819,9 +819,9 @@ If using docker, you'll need to add the following volume mounts to your `docker- | ||||
| 
 | ||||
| ```yaml | ||||
| webserver: | ||||
|   volumes: | ||||
|     - /home/user/.gnupg/pubring.gpg:/usr/src/paperless/.gnupg/pubring.gpg | ||||
|     - <path to gpg-agent.extra socket>:/usr/src/paperless/.gnupg/S.gpg-agent | ||||
|     volumes: | ||||
|         - /home/user/.gnupg/pubring.gpg:/usr/src/paperless/.gnupg/pubring.gpg | ||||
|         - <path to gpg-agent.extra socket>:/usr/src/paperless/.gnupg/S.gpg-agent | ||||
| ``` | ||||
| 
 | ||||
| For a 'bare-metal' installation no further configuration is necessary. If you | ||||
| @ -829,9 +829,9 @@ want to use a separate `GNUPG_HOME`, you can do so by configuring the [PAPERLESS | ||||
| 
 | ||||
| ### Troubleshooting | ||||
| 
 | ||||
| - Make sure, that `gpg-agent` is running on your host machine | ||||
| - Make sure, that encryption and decryption works from inside the container using the `gpg` commands from above. | ||||
| - Check that all files in `/usr/src/paperless/.gnupg` have correct permissions | ||||
| -   Make sure, that `gpg-agent` is running on your host machine | ||||
| -   Make sure, that encryption and decryption works from inside the container using the `gpg` commands from above. | ||||
| -   Check that all files in `/usr/src/paperless/.gnupg` have correct permissions | ||||
| 
 | ||||
| ```shell | ||||
| paperless@9da1865df327:~/.gnupg$ ls -al | ||||
|  | ||||
							
								
								
									
										332
									
								
								docs/api.md
									
									
									
									
									
								
							
							
						
						
									
										332
									
								
								docs/api.md
									
									
									
									
									
								
							| @ -8,23 +8,23 @@ most of the available filters and ordering fields. | ||||
| 
 | ||||
| The API provides the following main endpoints: | ||||
| 
 | ||||
| - `/api/correspondents/`: Full CRUD support. | ||||
| - `/api/custom_fields/`: Full CRUD support. | ||||
| - `/api/documents/`: Full CRUD support, except POSTing new documents. | ||||
|   See [below](#file-uploads). | ||||
| - `/api/document_types/`: Full CRUD support. | ||||
| - `/api/groups/`: Full CRUD support. | ||||
| - `/api/logs/`: Read-Only. | ||||
| - `/api/mail_accounts/`: Full CRUD support. | ||||
| - `/api/mail_rules/`: Full CRUD support. | ||||
| - `/api/profile/`: GET, PATCH | ||||
| - `/api/share_links/`: Full CRUD support. | ||||
| - `/api/storage_paths/`: Full CRUD support. | ||||
| - `/api/tags/`: Full CRUD support. | ||||
| - `/api/tasks/`: Read-only. | ||||
| - `/api/users/`: Full CRUD support. | ||||
| - `/api/workflows/`: Full CRUD support. | ||||
| - `/api/search/` GET, see [below](#global-search). | ||||
| -   `/api/correspondents/`: Full CRUD support. | ||||
| -   `/api/custom_fields/`: Full CRUD support. | ||||
| -   `/api/documents/`: Full CRUD support, except POSTing new documents. | ||||
|     See [below](#file-uploads). | ||||
| -   `/api/document_types/`: Full CRUD support. | ||||
| -   `/api/groups/`: Full CRUD support. | ||||
| -   `/api/logs/`: Read-Only. | ||||
| -   `/api/mail_accounts/`: Full CRUD support. | ||||
| -   `/api/mail_rules/`: Full CRUD support. | ||||
| -   `/api/profile/`: GET, PATCH | ||||
| -   `/api/share_links/`: Full CRUD support. | ||||
| -   `/api/storage_paths/`: Full CRUD support. | ||||
| -   `/api/tags/`: Full CRUD support. | ||||
| -   `/api/tasks/`: Read-only. | ||||
| -   `/api/users/`: Full CRUD support. | ||||
| -   `/api/workflows/`: Full CRUD support. | ||||
| -   `/api/search/` GET, see [below](#global-search). | ||||
| 
 | ||||
| All of these endpoints except for the logging endpoint allow you to | ||||
| fetch (and edit and delete where appropriate) individual objects by | ||||
| @ -33,32 +33,32 @@ appending their primary key to the path, e.g. `/api/documents/454/`. | ||||
| The objects served by the document endpoint contain the following | ||||
| fields: | ||||
| 
 | ||||
| - `id`: ID of the document. Read-only. | ||||
| - `title`: Title of the document. | ||||
| - `content`: Plain text content of the document. | ||||
| - `tags`: List of IDs of tags assigned to this document, or empty | ||||
|   list. | ||||
| - `document_type`: Document type of this document, or null. | ||||
| - `correspondent`: Correspondent of this document or null. | ||||
| - `created`: The date time at which this document was created. | ||||
| - `created_date`: The date (YYYY-MM-DD) at which this document was | ||||
|   created. Optional. If also passed with created, this is ignored. | ||||
| - `modified`: The date at which this document was last edited in | ||||
|   paperless. Read-only. | ||||
| - `added`: The date at which this document was added to paperless. | ||||
|   Read-only. | ||||
| - `archive_serial_number`: The identifier of this document in a | ||||
|   physical document archive. | ||||
| - `original_file_name`: Verbose filename of the original document. | ||||
|   Read-only. | ||||
| - `archived_file_name`: Verbose filename of the archived document. | ||||
|   Read-only. Null if no archived document is available. | ||||
| - `notes`: Array of notes associated with the document. | ||||
| - `page_count`: Number of pages. | ||||
| - `set_permissions`: Allows setting document permissions. Optional, | ||||
|   write-only. See [below](#permissions). | ||||
| - `custom_fields`: Array of custom fields & values, specified as | ||||
|   `{ field: CUSTOM_FIELD_ID, value: VALUE }` | ||||
| -   `id`: ID of the document. Read-only. | ||||
| -   `title`: Title of the document. | ||||
| -   `content`: Plain text content of the document. | ||||
| -   `tags`: List of IDs of tags assigned to this document, or empty | ||||
|     list. | ||||
| -   `document_type`: Document type of this document, or null. | ||||
| -   `correspondent`: Correspondent of this document or null. | ||||
| -   `created`: The date time at which this document was created. | ||||
| -   `created_date`: The date (YYYY-MM-DD) at which this document was | ||||
|     created. Optional. If also passed with created, this is ignored. | ||||
| -   `modified`: The date at which this document was last edited in | ||||
|     paperless. Read-only. | ||||
| -   `added`: The date at which this document was added to paperless. | ||||
|     Read-only. | ||||
| -   `archive_serial_number`: The identifier of this document in a | ||||
|     physical document archive. | ||||
| -   `original_file_name`: Verbose filename of the original document. | ||||
|     Read-only. | ||||
| -   `archived_file_name`: Verbose filename of the archived document. | ||||
|     Read-only. Null if no archived document is available. | ||||
| -   `notes`: Array of notes associated with the document. | ||||
| -   `page_count`: Number of pages. | ||||
| -   `set_permissions`: Allows setting document permissions. Optional, | ||||
|     write-only. See [below](#permissions). | ||||
| -   `custom_fields`: Array of custom fields & values, specified as | ||||
|     `{ field: CUSTOM_FIELD_ID, value: VALUE }` | ||||
| 
 | ||||
| !!! note | ||||
| 
 | ||||
| @ -69,11 +69,11 @@ fields: | ||||
| In addition to that, the document endpoint offers these additional | ||||
| actions on individual documents: | ||||
| 
 | ||||
| - `/api/documents/<pk>/download/`: Download the document. | ||||
| - `/api/documents/<pk>/preview/`: Display the document inline, without | ||||
|   downloading it. | ||||
| - `/api/documents/<pk>/thumb/`: Download the PNG thumbnail of a | ||||
|   document. | ||||
| -   `/api/documents/<pk>/download/`: Download the document. | ||||
| -   `/api/documents/<pk>/preview/`: Display the document inline, without | ||||
|     downloading it. | ||||
| -   `/api/documents/<pk>/thumb/`: Download the PNG thumbnail of a | ||||
|     document. | ||||
| 
 | ||||
| Paperless generates archived PDF/A documents from consumed files and | ||||
| stores both the original files as well as the archived files. By | ||||
| @ -107,30 +107,30 @@ Access the metadata of a document with an ID `id` at | ||||
| 
 | ||||
| The endpoint reports the following data: | ||||
| 
 | ||||
| - `original_checksum`: MD5 checksum of the original document. | ||||
| - `original_size`: Size of the original document, in bytes. | ||||
| - `original_mime_type`: Mime type of the original document. | ||||
| - `media_filename`: Current filename of the document, under which it | ||||
|   is stored inside the media directory. | ||||
| - `has_archive_version`: True, if this document is archived, false | ||||
|   otherwise. | ||||
| - `original_metadata`: A list of metadata associated with the original | ||||
|   document. See below. | ||||
| - `archive_checksum`: MD5 checksum of the archived document, or null. | ||||
| - `archive_size`: Size of the archived document in bytes, or null. | ||||
| - `archive_metadata`: Metadata associated with the archived document, | ||||
|   or null. See below. | ||||
| -   `original_checksum`: MD5 checksum of the original document. | ||||
| -   `original_size`: Size of the original document, in bytes. | ||||
| -   `original_mime_type`: Mime type of the original document. | ||||
| -   `media_filename`: Current filename of the document, under which it | ||||
|     is stored inside the media directory. | ||||
| -   `has_archive_version`: True, if this document is archived, false | ||||
|     otherwise. | ||||
| -   `original_metadata`: A list of metadata associated with the original | ||||
|     document. See below. | ||||
| -   `archive_checksum`: MD5 checksum of the archived document, or null. | ||||
| -   `archive_size`: Size of the archived document in bytes, or null. | ||||
| -   `archive_metadata`: Metadata associated with the archived document, | ||||
|     or null. See below. | ||||
| 
 | ||||
| File metadata is reported as a list of objects in the following form: | ||||
| 
 | ||||
| ```json | ||||
| [ | ||||
|   { | ||||
|     "namespace": "http://ns.adobe.com/pdf/1.3/", | ||||
|     "prefix": "pdf", | ||||
|     "key": "Producer", | ||||
|     "value": "SparklePDF, Fancy edition" | ||||
|   } | ||||
|     { | ||||
|         "namespace": "http://ns.adobe.com/pdf/1.3/", | ||||
|         "prefix": "pdf", | ||||
|         "key": "Producer", | ||||
|         "value": "SparklePDF, Fancy edition" | ||||
|     } | ||||
| ] | ||||
| ``` | ||||
| 
 | ||||
| @ -140,9 +140,9 @@ document. Paperless only reports PDF metadata at this point. | ||||
| 
 | ||||
| ## Documents additional endpoints | ||||
| 
 | ||||
| - `/api/documents/<id>/notes/`: Retrieve notes for a document. | ||||
| - `/api/documents/<id>/share_links/`: Retrieve share links for a document. | ||||
| - `/api/documents/<id>/history/`: Retrieve history of changes for a document. | ||||
| -   `/api/documents/<id>/notes/`: Retrieve notes for a document. | ||||
| -   `/api/documents/<id>/share_links/`: Retrieve share links for a document. | ||||
| -   `/api/documents/<id>/history/`: Retrieve history of changes for a document. | ||||
| 
 | ||||
| ## Authorization | ||||
| 
 | ||||
| @ -228,10 +228,10 @@ Full text searching is available on the `/api/documents/` endpoint. Two | ||||
| specific query parameters cause the API to return full text search | ||||
| results: | ||||
| 
 | ||||
| - `/api/documents/?query=your%20search%20query`: Search for a document | ||||
|   using a full text query. For details on the syntax, see [Basic Usage - Searching](usage.md#basic-usage_searching). | ||||
| - `/api/documents/?more_like_id=1234`: Search for documents similar to | ||||
|   the document with id 1234. | ||||
| -   `/api/documents/?query=your%20search%20query`: Search for a document | ||||
|     using a full text query. For details on the syntax, see [Basic Usage - Searching](usage.md#basic-usage_searching). | ||||
| -   `/api/documents/?more_like_id=1234`: Search for documents similar to | ||||
|     the document with id 1234. | ||||
| 
 | ||||
| Pagination works exactly the same as it does for normal requests on this | ||||
| endpoint. | ||||
| @ -268,12 +268,12 @@ attribute with various information about the search results: | ||||
| } | ||||
| ``` | ||||
| 
 | ||||
| - `score` is an indication how well this document matches the query | ||||
|   relative to the other search results. | ||||
| - `highlights` is an excerpt from the document content and highlights | ||||
|   the search terms with `<span>` tags as shown above. | ||||
| - `rank` is the index of the search results. The first result will | ||||
|   have rank 0. | ||||
| -   `score` is an indication how well this document matches the query | ||||
|     relative to the other search results. | ||||
| -   `highlights` is an excerpt from the document content and highlights | ||||
|     the search terms with `<span>` tags as shown above. | ||||
| -   `rank` is the index of the search results. The first result will | ||||
|     have rank 0. | ||||
| 
 | ||||
| ### Filtering by custom fields | ||||
| 
 | ||||
| @ -284,33 +284,33 @@ use cases: | ||||
| 1. Documents with a custom field "due" (date) between Aug 1, 2024 and | ||||
|    Sept 1, 2024 (inclusive): | ||||
| 
 | ||||
|    `?custom_field_query=["due", "range", ["2024-08-01", "2024-09-01"]]` | ||||
|     `?custom_field_query=["due", "range", ["2024-08-01", "2024-09-01"]]` | ||||
| 
 | ||||
| 2. Documents with a custom field "customer" (text) that equals "bob" | ||||
|    (case sensitive): | ||||
| 
 | ||||
|    `?custom_field_query=["customer", "exact", "bob"]` | ||||
|     `?custom_field_query=["customer", "exact", "bob"]` | ||||
| 
 | ||||
| 3. Documents with a custom field "answered" (boolean) set to `true`: | ||||
| 
 | ||||
|    `?custom_field_query=["answered", "exact", true]` | ||||
|     `?custom_field_query=["answered", "exact", true]` | ||||
| 
 | ||||
| 4. Documents with a custom field "favorite animal" (select) set to either | ||||
|    "cat" or "dog": | ||||
| 
 | ||||
|    `?custom_field_query=["favorite animal", "in", ["cat", "dog"]]` | ||||
|     `?custom_field_query=["favorite animal", "in", ["cat", "dog"]]` | ||||
| 
 | ||||
| 5. Documents with a custom field "address" (text) that is empty: | ||||
| 
 | ||||
|    `?custom_field_query=["OR", ["address", "isnull", true], ["address", "exact", ""]]` | ||||
|     `?custom_field_query=["OR", ["address", "isnull", true], ["address", "exact", ""]]` | ||||
| 
 | ||||
| 6. Documents that don't have a field called "foo": | ||||
| 
 | ||||
|    `?custom_field_query=["foo", "exists", false]` | ||||
|     `?custom_field_query=["foo", "exists", false]` | ||||
| 
 | ||||
| 7. Documents that have document links "references" to both document 3 and 7: | ||||
| 
 | ||||
|    `?custom_field_query=["references", "contains", [3, 7]]` | ||||
|     `?custom_field_query=["references", "contains", [3, 7]]` | ||||
| 
 | ||||
| All field types support basic operations including `exact`, `in`, `isnull`, | ||||
| and `exists`. String, URL, and monetary fields support case-insensitive | ||||
| @ -326,8 +326,8 @@ Get auto completions for a partial search term. | ||||
| 
 | ||||
| Query parameters: | ||||
| 
 | ||||
| - `term`: The incomplete term. | ||||
| - `limit`: Amount of results. Defaults to 10. | ||||
| -   `term`: The incomplete term. | ||||
| -   `limit`: Amount of results. Defaults to 10. | ||||
| 
 | ||||
| Results returned by the endpoint are ordered by importance of the term | ||||
| in the document index. The first result is the term that has the highest | ||||
| @ -351,19 +351,19 @@ from there. | ||||
| 
 | ||||
| The endpoint supports the following optional form fields: | ||||
| 
 | ||||
| - `title`: Specify a title that the consumer should use for the | ||||
|   document. | ||||
| - `created`: Specify a DateTime where the document was created (e.g. | ||||
|   "2016-04-19" or "2016-04-19 06:15:00+02:00"). | ||||
| - `correspondent`: Specify the ID of a correspondent that the consumer | ||||
|   should use for the document. | ||||
| - `document_type`: Similar to correspondent. | ||||
| - `storage_path`: Similar to correspondent. | ||||
| - `tags`: Similar to correspondent. Specify this multiple times to | ||||
|   have multiple tags added to the document. | ||||
| - `archive_serial_number`: An optional archive serial number to set. | ||||
| - `custom_fields`: An array of custom field ids to assign (with an empty | ||||
|   value) to the document. | ||||
| -   `title`: Specify a title that the consumer should use for the | ||||
|     document. | ||||
| -   `created`: Specify a DateTime where the document was created (e.g. | ||||
|     "2016-04-19" or "2016-04-19 06:15:00+02:00"). | ||||
| -   `correspondent`: Specify the ID of a correspondent that the consumer | ||||
|     should use for the document. | ||||
| -   `document_type`: Similar to correspondent. | ||||
| -   `storage_path`: Similar to correspondent. | ||||
| -   `tags`: Similar to correspondent. Specify this multiple times to | ||||
|     have multiple tags added to the document. | ||||
| -   `archive_serial_number`: An optional archive serial number to set. | ||||
| -   `custom_fields`: An array of custom field ids to assign (with an empty | ||||
|     value) to the document. | ||||
| 
 | ||||
| The endpoint will immediately return HTTP 200 if the document consumption | ||||
| process was started successfully, with the UUID of the consumption task | ||||
| @ -429,50 +429,50 @@ a json payload of the format: | ||||
| 
 | ||||
| The following methods are supported: | ||||
| 
 | ||||
| - `set_correspondent` | ||||
|   - Requires `parameters`: `{ "correspondent": CORRESPONDENT_ID }` | ||||
| - `set_document_type` | ||||
|   - Requires `parameters`: `{ "document_type": DOCUMENT_TYPE_ID }` | ||||
| - `set_storage_path` | ||||
|   - Requires `parameters`: `{ "storage_path": STORAGE_PATH_ID }` | ||||
| - `add_tag` | ||||
|   - Requires `parameters`: `{ "tag": TAG_ID }` | ||||
| - `remove_tag` | ||||
|   - Requires `parameters`: `{ "tag": TAG_ID }` | ||||
| - `modify_tags` | ||||
|   - Requires `parameters`: `{ "add_tags": [LIST_OF_TAG_IDS] }` and / or `{ "remove_tags": [LIST_OF_TAG_IDS] }` | ||||
| - `delete` | ||||
|   - No `parameters` required | ||||
| - `reprocess` | ||||
|   - No `parameters` required | ||||
| - `set_permissions` | ||||
|   - Requires `parameters`: | ||||
|     - `"set_permissions": PERMISSIONS_OBJ` (see format [above](#permissions)) and / or | ||||
|     - `"owner": OWNER_ID or null` | ||||
|     - `"merge": true or false` (defaults to false) | ||||
|   - The `merge` flag determines if the supplied permissions will overwrite all existing permissions (including | ||||
|     removing them) or be merged with existing permissions. | ||||
| - `merge` | ||||
|   - No additional `parameters` required. | ||||
|   - The ordering of the merged document is determined by the list of IDs. | ||||
|   - Optional `parameters`: | ||||
|     - `"metadata_document_id": DOC_ID` apply metadata (tags, correspondent, etc.) from this document to the merged document. | ||||
|     - `"delete_originals": true` to delete the original documents. This requires the calling user being the owner of | ||||
|       all documents that are merged. | ||||
| - `split` | ||||
|   - Requires `parameters`: | ||||
|     - `"pages": [..]` The list should be a list of pages and/or a ranges, separated by commas e.g. `"[1,2-3,4,5-7]"` | ||||
|   - Optional `parameters`: | ||||
|     - `"delete_originals": true` to delete the original document after consumption. This requires the calling user being the owner of | ||||
|       the document. | ||||
|   - The split operation only accepts a single document. | ||||
| - `rotate` | ||||
|   - Requires `parameters`: | ||||
|     - `"degrees": DEGREES`. Must be an integer i.e. 90, 180, 270 | ||||
| - `delete_pages` | ||||
|   - Requires `parameters`: | ||||
|     - `"pages": [..]` The list should be a list of integers e.g. `"[2,3,4]"` | ||||
|   - The delete_pages operation only accepts a single document. | ||||
| -   `set_correspondent` | ||||
|     -   Requires `parameters`: `{ "correspondent": CORRESPONDENT_ID }` | ||||
| -   `set_document_type` | ||||
|     -   Requires `parameters`: `{ "document_type": DOCUMENT_TYPE_ID }` | ||||
| -   `set_storage_path` | ||||
|     -   Requires `parameters`: `{ "storage_path": STORAGE_PATH_ID }` | ||||
| -   `add_tag` | ||||
|     -   Requires `parameters`: `{ "tag": TAG_ID }` | ||||
| -   `remove_tag` | ||||
|     -   Requires `parameters`: `{ "tag": TAG_ID }` | ||||
| -   `modify_tags` | ||||
|     -   Requires `parameters`: `{ "add_tags": [LIST_OF_TAG_IDS] }` and / or `{ "remove_tags": [LIST_OF_TAG_IDS] }` | ||||
| -   `delete` | ||||
|     -   No `parameters` required | ||||
| -   `reprocess` | ||||
|     -   No `parameters` required | ||||
| -   `set_permissions` | ||||
|     -   Requires `parameters`: | ||||
|         -   `"set_permissions": PERMISSIONS_OBJ` (see format [above](#permissions)) and / or | ||||
|         -   `"owner": OWNER_ID or null` | ||||
|         -   `"merge": true or false` (defaults to false) | ||||
|     -   The `merge` flag determines if the supplied permissions will overwrite all existing permissions (including | ||||
|         removing them) or be merged with existing permissions. | ||||
| -   `merge` | ||||
|     -   No additional `parameters` required. | ||||
|     -   The ordering of the merged document is determined by the list of IDs. | ||||
|     -   Optional `parameters`: | ||||
|         -   `"metadata_document_id": DOC_ID` apply metadata (tags, correspondent, etc.) from this document to the merged document. | ||||
|         -   `"delete_originals": true` to delete the original documents. This requires the calling user being the owner of | ||||
|             all documents that are merged. | ||||
| -   `split` | ||||
|     -   Requires `parameters`: | ||||
|         -   `"pages": [..]` The list should be a list of pages and/or a ranges, separated by commas e.g. `"[1,2-3,4,5-7]"` | ||||
|     -   Optional `parameters`: | ||||
|         -   `"delete_originals": true` to delete the original document after consumption. This requires the calling user being the owner of | ||||
|             the document. | ||||
|     -   The split operation only accepts a single document. | ||||
| -   `rotate` | ||||
|     -   Requires `parameters`: | ||||
|         -   `"degrees": DEGREES`. Must be an integer i.e. 90, 180, 270 | ||||
| -   `delete_pages` | ||||
|     -   Requires `parameters`: | ||||
|         -   `"pages": [..]` The list should be a list of integers e.g. `"[2,3,4]"` | ||||
|     -   The delete_pages operation only accepts a single document. | ||||
| 
 | ||||
| ### Objects | ||||
| 
 | ||||
| @ -494,16 +494,16 @@ operations, using the endpoint: `/api/bulk_edit_objects/`, which requires a json | ||||
| 
 | ||||
| The REST API is versioned since Paperless-ngx 1.3.0. | ||||
| 
 | ||||
| - Versioning ensures that changes to the API don't break older | ||||
|   clients. | ||||
| - Clients specify the specific version of the API they wish to use | ||||
|   with every request and Paperless will handle the request using the | ||||
|   specified API version. | ||||
| - Even if the underlying data model changes, older API versions will | ||||
|   always serve compatible data. | ||||
| - If no version is specified, Paperless will serve version 1 to ensure | ||||
|   compatibility with older clients that do not request a specific API | ||||
|   version. | ||||
| -   Versioning ensures that changes to the API don't break older | ||||
|     clients. | ||||
| -   Clients specify the specific version of the API they wish to use | ||||
|     with every request and Paperless will handle the request using the | ||||
|     specified API version. | ||||
| -   Even if the underlying data model changes, older API versions will | ||||
|     always serve compatible data. | ||||
| -   If no version is specified, Paperless will serve version 1 to ensure | ||||
|     compatibility with older clients that do not request a specific API | ||||
|     version. | ||||
| 
 | ||||
| API versions are specified by submitting an additional HTTP `Accept` | ||||
| header with every request: | ||||
| @ -540,19 +540,19 @@ Initial API version. | ||||
| 
 | ||||
| #### Version 2 | ||||
| 
 | ||||
| - Added field `Tag.color`. This read/write string field contains a hex | ||||
|   color such as `#a6cee3`. | ||||
| - Added read-only field `Tag.text_color`. This field contains the text | ||||
|   color to use for a specific tag, which is either black or white | ||||
|   depending on the brightness of `Tag.color`. | ||||
| - Removed field `Tag.colour`. | ||||
| -   Added field `Tag.color`. This read/write string field contains a hex | ||||
|     color such as `#a6cee3`. | ||||
| -   Added read-only field `Tag.text_color`. This field contains the text | ||||
|     color to use for a specific tag, which is either black or white | ||||
|     depending on the brightness of `Tag.color`. | ||||
| -   Removed field `Tag.colour`. | ||||
| 
 | ||||
| #### Version 3 | ||||
| 
 | ||||
| - Permissions endpoints have been added. | ||||
| - The format of the `/api/ui_settings/` has changed. | ||||
| -   Permissions endpoints have been added. | ||||
| -   The format of the `/api/ui_settings/` has changed. | ||||
| 
 | ||||
| #### Version 4 | ||||
| 
 | ||||
| - Consumption templates were refactored to workflows and API endpoints | ||||
|   changed as such. | ||||
| -   Consumption templates were refactored to workflows and API endpoints | ||||
|     changed as such. | ||||
|  | ||||
							
								
								
									
										7560
									
								
								docs/changelog.md
									
									
									
									
									
								
							
							
						
						
									
										7560
									
								
								docs/changelog.md
									
									
									
									
									
								
							
										
											
												File diff suppressed because it is too large
												Load Diff
											
										
									
								
							| @ -8,17 +8,17 @@ common [OCR](#ocr) related settings and some frontend settings. If set, these wi | ||||
| preference over the settings via environment variables. If not set, the environment setting | ||||
| or applicable default will be utilized instead. | ||||
| 
 | ||||
| - If you run paperless on docker, `paperless.conf` is not used. | ||||
|   Rather, configure paperless by copying necessary options to | ||||
|   `docker-compose.env`. | ||||
| -   If you run paperless on docker, `paperless.conf` is not used. | ||||
|     Rather, configure paperless by copying necessary options to | ||||
|     `docker-compose.env`. | ||||
| 
 | ||||
| - If you are running paperless on anything else, paperless will search | ||||
|   for the configuration file in these locations and use the first one | ||||
|   it finds: | ||||
|   - The environment variable `PAPERLESS_CONFIGURATION_PATH` | ||||
|   - `/path/to/paperless/paperless.conf` | ||||
|   - `/etc/paperless.conf` | ||||
|   - `/usr/local/etc/paperless.conf` | ||||
| -   If you are running paperless on anything else, paperless will search | ||||
|     for the configuration file in these locations and use the first one | ||||
|     it finds: | ||||
|     -   The environment variable `PAPERLESS_CONFIGURATION_PATH` | ||||
|     -   `/path/to/paperless/paperless.conf` | ||||
|     -   `/etc/paperless.conf` | ||||
|     -   `/usr/local/etc/paperless.conf` | ||||
| 
 | ||||
| ## Required services | ||||
| 
 | ||||
|  | ||||
| @ -6,23 +6,23 @@ on Paperless-ngx. | ||||
| Check out the source from GitHub. The repository is organized in the | ||||
| following way: | ||||
| 
 | ||||
| - `main` always represents the latest release and will only see | ||||
|   changes when a new release is made. | ||||
| - `dev` contains the code that will be in the next release. | ||||
| - `feature-X` contains bigger changes that will be in some release, but | ||||
|   not necessarily the next one. | ||||
| -   `main` always represents the latest release and will only see | ||||
|     changes when a new release is made. | ||||
| -   `dev` contains the code that will be in the next release. | ||||
| -   `feature-X` contains bigger changes that will be in some release, but | ||||
|     not necessarily the next one. | ||||
| 
 | ||||
| When making functional changes to Paperless-ngx, _always_ make your changes | ||||
| on the `dev` branch. | ||||
| 
 | ||||
| Apart from that, the folder structure is as follows: | ||||
| 
 | ||||
| - `docs/` - Documentation. | ||||
| - `src-ui/` - Code of the front end. | ||||
| - `src/` - Code of the back end. | ||||
| - `scripts/` - Various scripts that help with different parts of | ||||
|   development. | ||||
| - `docker/` - Files required to build the docker image. | ||||
| -   `docs/` - Documentation. | ||||
| -   `src-ui/` - Code of the front end. | ||||
| -   `src/` - Code of the back end. | ||||
| -   `scripts/` - Various scripts that help with different parts of | ||||
|     development. | ||||
| -   `docker/` - Files required to build the docker image. | ||||
| 
 | ||||
| ## Contributing to Paperless-ngx | ||||
| 
 | ||||
| @ -99,17 +99,17 @@ first-time setup. | ||||
| 
 | ||||
| 7.  You can now either ... | ||||
| 
 | ||||
|     - install redis or | ||||
|     -   install redis or | ||||
| 
 | ||||
|     - use the included `scripts/start_services.sh` to use docker to fire | ||||
|       up a redis instance (and some other services such as tika, | ||||
|       gotenberg and a database server) or | ||||
|     -   use the included `scripts/start_services.sh` to use docker to fire | ||||
|         up a redis instance (and some other services such as tika, | ||||
|         gotenberg and a database server) or | ||||
| 
 | ||||
|     - spin up a bare redis container | ||||
|     -   spin up a bare redis container | ||||
| 
 | ||||
|       ``` | ||||
|       $ docker run -d -p 6379:6379 --restart unless-stopped redis:latest | ||||
|       ``` | ||||
|         ``` | ||||
|         $ docker run -d -p 6379:6379 --restart unless-stopped redis:latest | ||||
|         ``` | ||||
| 
 | ||||
| 8.  Continue with either back-end or front-end development – or both :-). | ||||
| 
 | ||||
| @ -122,9 +122,9 @@ work well for development, but you can use whatever you want. | ||||
| Configure the IDE to use the `src/`-folder as the base source folder. | ||||
| Configure the following launch configurations in your IDE: | ||||
| 
 | ||||
| - `python3 manage.py runserver` | ||||
| - `python3 manage.py document_consumer` | ||||
| - `celery --app paperless worker -l DEBUG` (or any other log level) | ||||
| -   `python3 manage.py runserver` | ||||
| -   `python3 manage.py document_consumer` | ||||
| -   `celery --app paperless worker -l DEBUG` (or any other log level) | ||||
| 
 | ||||
| To start them all: | ||||
| 
 | ||||
| @ -150,11 +150,11 @@ $ ng build --configuration production | ||||
| 
 | ||||
| ### Testing | ||||
| 
 | ||||
| - Run `pytest` in the `src/` directory to execute all tests. This also | ||||
|   generates a HTML coverage report. When runnings test, `paperless.conf` | ||||
|   is loaded as well. However, the tests rely on the default | ||||
|   configuration. This is not ideal. But for now, make sure no settings | ||||
|   except for DEBUG are overridden when testing. | ||||
| -   Run `pytest` in the `src/` directory to execute all tests. This also | ||||
|     generates a HTML coverage report. When runnings test, `paperless.conf` | ||||
|     is loaded as well. However, the tests rely on the default | ||||
|     configuration. This is not ideal. But for now, make sure no settings | ||||
|     except for DEBUG are overridden when testing. | ||||
| 
 | ||||
| !!! note | ||||
| 
 | ||||
| @ -245,14 +245,14 @@ these parts have to be translated separately. | ||||
| 
 | ||||
| ### Front end localization | ||||
| 
 | ||||
| - The AngularJS front end does localization according to the [Angular | ||||
|   documentation](https://angular.io/guide/i18n). | ||||
| - The source language of the project is "en_US". | ||||
| - The source strings end up in the file `src-ui/messages.xlf`. | ||||
| - The translated strings need to be placed in the | ||||
|   `src-ui/src/locale/` folder. | ||||
| - In order to extract added or changed strings from the source files, | ||||
|   call `ng extract-i18n`. | ||||
| -   The AngularJS front end does localization according to the [Angular | ||||
|     documentation](https://angular.io/guide/i18n). | ||||
| -   The source language of the project is "en_US". | ||||
| -   The source strings end up in the file `src-ui/messages.xlf`. | ||||
| -   The translated strings need to be placed in the | ||||
|     `src-ui/src/locale/` folder. | ||||
| -   In order to extract added or changed strings from the source files, | ||||
|     call `ng extract-i18n`. | ||||
| 
 | ||||
| Adding new languages requires adding the translated files in the | ||||
| `src-ui/src/locale/` folder and adjusting a couple files. | ||||
| @ -298,18 +298,18 @@ A majority of the strings that appear in the back end appear only when | ||||
| the admin is used. However, some of these are still shown on the front | ||||
| end (such as error messages). | ||||
| 
 | ||||
| - The django application does localization according to the [Django | ||||
|   documentation](https://docs.djangoproject.com/en/3.1/topics/i18n/translation/). | ||||
| - The source language of the project is "en_US". | ||||
| - Localization files end up in the folder `src/locale/`. | ||||
| - In order to extract strings from the application, call | ||||
|   `python3 manage.py makemessages -l en_US`. This is important after | ||||
|   making changes to translatable strings. | ||||
| - The message files need to be compiled for them to show up in the | ||||
|   application. Call `python3 manage.py compilemessages` to do this. | ||||
|   The generated files don't get committed into git, since these are | ||||
|   derived artifacts. The build pipeline takes care of executing this | ||||
|   command. | ||||
| -   The django application does localization according to the [Django | ||||
|     documentation](https://docs.djangoproject.com/en/3.1/topics/i18n/translation/). | ||||
| -   The source language of the project is "en_US". | ||||
| -   Localization files end up in the folder `src/locale/`. | ||||
| -   In order to extract strings from the application, call | ||||
|     `python3 manage.py makemessages -l en_US`. This is important after | ||||
|     making changes to translatable strings. | ||||
| -   The message files need to be compiled for them to show up in the | ||||
|     application. Call `python3 manage.py compilemessages` to do this. | ||||
|     The generated files don't get committed into git, since these are | ||||
|     derived artifacts. The build pipeline takes care of executing this | ||||
|     command. | ||||
| 
 | ||||
| Adding new languages requires adding the translated files in the | ||||
| `src/locale/`-folder and adjusting the file | ||||
| @ -378,10 +378,10 @@ base code. | ||||
| Paperless-ngx uses parsers to add documents. A parser is | ||||
| responsible for: | ||||
| 
 | ||||
| - Retrieving the content from the original | ||||
| - Creating a thumbnail | ||||
| - _optional:_ Retrieving a created date from the original | ||||
| - _optional:_ Creating an archived document from the original | ||||
| -   Retrieving the content from the original | ||||
| -   Creating a thumbnail | ||||
| -   _optional:_ Retrieving a created date from the original | ||||
| -   _optional:_ Creating an archived document from the original | ||||
| 
 | ||||
| Custom parsers can be added to Paperless-ngx to support more file types. In | ||||
| order to do that, you need to write the parser itself and announce its | ||||
| @ -439,14 +439,14 @@ def myparser_consumer_declaration(sender, **kwargs): | ||||
|     } | ||||
| ``` | ||||
| 
 | ||||
| - `parser` is a reference to a class that extends `DocumentParser`. | ||||
| - `weight` is used whenever two or more parsers are able to parse a | ||||
|   file: The parser with the higher weight wins. This can be used to | ||||
|   override the parsers provided by Paperless-ngx. | ||||
| - `mime_types` is a dictionary. The keys are the mime types your | ||||
|   parser supports and the value is the default file extension that | ||||
|   Paperless-ngx should use when storing files and serving them for | ||||
|   download. We could guess that from the file extensions, but some | ||||
|   mime types have many extensions associated with them and the Python | ||||
|   methods responsible for guessing the extension do not always return | ||||
|   the same value. | ||||
| -   `parser` is a reference to a class that extends `DocumentParser`. | ||||
| -   `weight` is used whenever two or more parsers are able to parse a | ||||
|     file: The parser with the higher weight wins. This can be used to | ||||
|     override the parsers provided by Paperless-ngx. | ||||
| -   `mime_types` is a dictionary. The keys are the mime types your | ||||
|     parser supports and the value is the default file extension that | ||||
|     Paperless-ngx should use when storing files and serving them for | ||||
|     download. We could guess that from the file extensions, but some | ||||
|     mime types have many extensions associated with them and the Python | ||||
|     methods responsible for guessing the extension do not always return | ||||
|     the same value. | ||||
|  | ||||
							
								
								
									
										44
									
								
								docs/faq.md
									
									
									
									
									
								
							
							
						
						
									
										44
									
								
								docs/faq.md
									
									
									
									
									
								
							| @ -40,28 +40,28 @@ system. On Linux, chances are high that this location is | ||||
| You can always drag those files out of that folder to use them | ||||
| elsewhere. Here are a couple notes about that. | ||||
| 
 | ||||
| - Paperless-ngx never modifies your original documents. It keeps | ||||
|   checksums of all documents and uses a scheduled sanity checker to | ||||
|   check that they remain the same. | ||||
| - By default, paperless uses the internal ID of each document as its | ||||
|   filename. This might not be very convenient for export. However, you | ||||
|   can adjust the way files are stored in paperless by | ||||
|   [configuring the filename format](advanced_usage.md#file-name-handling). | ||||
| - [The exporter](administration.md#exporter) is | ||||
|   another easy way to get your files out of paperless with reasonable | ||||
|   file names. | ||||
| -   Paperless-ngx never modifies your original documents. It keeps | ||||
|     checksums of all documents and uses a scheduled sanity checker to | ||||
|     check that they remain the same. | ||||
| -   By default, paperless uses the internal ID of each document as its | ||||
|     filename. This might not be very convenient for export. However, you | ||||
|     can adjust the way files are stored in paperless by | ||||
|     [configuring the filename format](advanced_usage.md#file-name-handling). | ||||
| -   [The exporter](administration.md#exporter) is | ||||
|     another easy way to get your files out of paperless with reasonable | ||||
|     file names. | ||||
| 
 | ||||
| ## _What file types does paperless-ngx support?_ | ||||
| 
 | ||||
| **A:** Currently, the following files are supported: | ||||
| 
 | ||||
| - PDF documents, PNG images, JPEG images, TIFF images, GIF images and | ||||
|   WebP images are processed with OCR and converted into PDF documents. | ||||
| - Plain text documents are supported as well and are added verbatim to | ||||
|   paperless. | ||||
| - With the optional Tika integration enabled (see [Tika configuration](https://docs.paperless-ngx.com/configuration#tika)), | ||||
|   Paperless also supports various Office documents (.docx, .doc, odt, | ||||
|   .ppt, .pptx, .odp, .xls, .xlsx, .ods). | ||||
| -   PDF documents, PNG images, JPEG images, TIFF images, GIF images and | ||||
|     WebP images are processed with OCR and converted into PDF documents. | ||||
| -   Plain text documents are supported as well and are added verbatim to | ||||
|     paperless. | ||||
| -   With the optional Tika integration enabled (see [Tika configuration](https://docs.paperless-ngx.com/configuration#tika)), | ||||
|     Paperless also supports various Office documents (.docx, .doc, odt, | ||||
|     .ppt, .pptx, .odp, .xls, .xlsx, .ods). | ||||
| 
 | ||||
| Paperless-ngx determines the type of a file by inspecting its content. | ||||
| The file extensions do not matter. | ||||
| @ -127,11 +127,11 @@ ASGI-enabled web server as well that processes WebSocket connections, | ||||
| and configure Apache to redirect WebSocket connections to this server. | ||||
| Multiple options for ASGI servers exist: | ||||
| 
 | ||||
| - `gunicorn` with `uvicorn` as the worker implementation (the default | ||||
|   of paperless) | ||||
| - `daphne` as a standalone server, which is the reference | ||||
|   implementation for ASGI. | ||||
| - `uvicorn` as a standalone server | ||||
| -   `gunicorn` with `uvicorn` as the worker implementation (the default | ||||
|     of paperless) | ||||
| -   `daphne` as a standalone server, which is the reference | ||||
|     implementation for ASGI. | ||||
| -   `uvicorn` as a standalone server | ||||
| 
 | ||||
| ## _What about the Redis licensing change and using one of the open source forks_? | ||||
| 
 | ||||
|  | ||||
							
								
								
									
										222
									
								
								docs/setup.md
									
									
									
									
									
								
							
							
						
						
									
										222
									
								
								docs/setup.md
									
									
									
									
									
								
							| @ -2,11 +2,11 @@ | ||||
| 
 | ||||
| You can go multiple routes to setup and run Paperless: | ||||
| 
 | ||||
| - [Use the easy install docker script](#docker_script) | ||||
| - [Pull the image from Docker Hub](#docker_hub) | ||||
| - [Build the Docker image yourself](#docker_build) | ||||
| - [Install Paperless directly on your system manually (bare metal)](#bare_metal) | ||||
| - A user-maintained list of commercial hosting providers can be found [in the wiki](https://github.com/paperless-ngx/paperless-ngx/wiki/Related-Projects) | ||||
| -   [Use the easy install docker script](#docker_script) | ||||
| -   [Pull the image from Docker Hub](#docker_hub) | ||||
| -   [Build the Docker image yourself](#docker_build) | ||||
| -   [Install Paperless directly on your system manually (bare metal)](#bare_metal) | ||||
| -   A user-maintained list of commercial hosting providers can be found [in the wiki](https://github.com/paperless-ngx/paperless-ngx/wiki/Related-Projects) | ||||
| 
 | ||||
| The Docker routes are quick & easy. These are the recommended routes. | ||||
| This configures all the stuff from the above automatically so that it | ||||
| @ -105,14 +105,14 @@ steps described in [Docker setup](#docker_hub) automatically. | ||||
| 
 | ||||
|     ```yaml | ||||
|     ports: | ||||
|       - 8000:8000 | ||||
|         - 8000:8000 | ||||
|     ``` | ||||
| 
 | ||||
|     Replace the part BEFORE the colon with a port of your choice: | ||||
| 
 | ||||
|     ```yaml | ||||
|     ports: | ||||
|       - 8010:8000 | ||||
|         - 8010:8000 | ||||
|     ``` | ||||
| 
 | ||||
|     Don't change the part after the colon or edit other lines that | ||||
| @ -129,11 +129,11 @@ steps described in [Docker setup](#docker_hub) automatically. | ||||
|     If you want to run Paperless as a rootless container, you will need | ||||
|     to do the following in your `docker-compose.yml`: | ||||
| 
 | ||||
|     - set the `user` running the container to map to the `paperless` | ||||
|       user in the container. This value (`user_id` below), should be | ||||
|       the same id that `USERMAP_UID` and `USERMAP_GID` are set to in | ||||
|       the next step. See `USERMAP_UID` and `USERMAP_GID` | ||||
|       [here](configuration.md#docker). | ||||
|     -   set the `user` running the container to map to the `paperless` | ||||
|         user in the container. This value (`user_id` below), should be | ||||
|         the same id that `USERMAP_UID` and `USERMAP_GID` are set to in | ||||
|         the next step. See `USERMAP_UID` and `USERMAP_GID` | ||||
|         [here](configuration.md#docker). | ||||
| 
 | ||||
|     Your entry for Paperless should contain something like: | ||||
| 
 | ||||
| @ -222,7 +222,7 @@ steps described in [Docker setup](#docker_hub) automatically. | ||||
| 
 | ||||
|     ```yaml | ||||
|     webserver: | ||||
|       image: ghcr.io/paperless-ngx/paperless-ngx:latest | ||||
|         image: ghcr.io/paperless-ngx/paperless-ngx:latest | ||||
|     ``` | ||||
| 
 | ||||
|     and replace it with a line that instructs Docker Compose to build | ||||
| @ -230,8 +230,8 @@ steps described in [Docker setup](#docker_hub) automatically. | ||||
| 
 | ||||
|     ```yaml | ||||
|     webserver: | ||||
|       build: | ||||
|         context: . | ||||
|         build: | ||||
|             context: . | ||||
|     ``` | ||||
| 
 | ||||
| 4.  Follow steps 3 to 8 of [Docker Setup](#docker_hub). When asked to run | ||||
| @ -257,20 +257,20 @@ are released, dependency support is confirmed, etc. | ||||
| 
 | ||||
| 1.  Install dependencies. Paperless requires the following packages. | ||||
| 
 | ||||
|     - `python3` | ||||
|     - `python3-pip` | ||||
|     - `python3-dev` | ||||
|     - `default-libmysqlclient-dev` for MariaDB | ||||
|     - `pkg-config` for mysqlclient (python dependency) | ||||
|     - `fonts-liberation` for generating thumbnails for plain text | ||||
|       files | ||||
|     - `imagemagick` >= 6 for PDF conversion | ||||
|     - `gnupg` for handling encrypted documents | ||||
|     - `libpq-dev` for PostgreSQL | ||||
|     - `libmagic-dev` for mime type detection | ||||
|     - `mariadb-client` for MariaDB compile time | ||||
|     - `libzbar0` for barcode detection | ||||
|     - `poppler-utils` for barcode detection | ||||
|     -   `python3` | ||||
|     -   `python3-pip` | ||||
|     -   `python3-dev` | ||||
|     -   `default-libmysqlclient-dev` for MariaDB | ||||
|     -   `pkg-config` for mysqlclient (python dependency) | ||||
|     -   `fonts-liberation` for generating thumbnails for plain text | ||||
|         files | ||||
|     -   `imagemagick` >= 6 for PDF conversion | ||||
|     -   `gnupg` for handling encrypted documents | ||||
|     -   `libpq-dev` for PostgreSQL | ||||
|     -   `libmagic-dev` for mime type detection | ||||
|     -   `mariadb-client` for MariaDB compile time | ||||
|     -   `libzbar0` for barcode detection | ||||
|     -   `poppler-utils` for barcode detection | ||||
| 
 | ||||
|     Use this list for your preferred package management: | ||||
| 
 | ||||
| @ -281,17 +281,17 @@ are released, dependency support is confirmed, etc. | ||||
|     These dependencies are required for OCRmyPDF, which is used for text | ||||
|     recognition. | ||||
| 
 | ||||
|     - `unpaper` | ||||
|     - `ghostscript` | ||||
|     - `icc-profiles-free` | ||||
|     - `qpdf` | ||||
|     - `liblept5` | ||||
|     - `libxml2` | ||||
|     - `pngquant` (suggested for certain PDF image optimizations) | ||||
|     - `zlib1g` | ||||
|     - `tesseract-ocr` >= 4.0.0 for OCR | ||||
|     - `tesseract-ocr` language packs (`tesseract-ocr-eng`, | ||||
|       `tesseract-ocr-deu`, etc) | ||||
|     -   `unpaper` | ||||
|     -   `ghostscript` | ||||
|     -   `icc-profiles-free` | ||||
|     -   `qpdf` | ||||
|     -   `liblept5` | ||||
|     -   `libxml2` | ||||
|     -   `pngquant` (suggested for certain PDF image optimizations) | ||||
|     -   `zlib1g` | ||||
|     -   `tesseract-ocr` >= 4.0.0 for OCR | ||||
|     -   `tesseract-ocr` language packs (`tesseract-ocr-eng`, | ||||
|         `tesseract-ocr-deu`, etc) | ||||
| 
 | ||||
|     Use this list for your preferred package management: | ||||
| 
 | ||||
| @ -301,15 +301,15 @@ are released, dependency support is confirmed, etc. | ||||
| 
 | ||||
|     On Raspberry Pi, these libraries are required as well: | ||||
| 
 | ||||
|     - `libatlas-base-dev` | ||||
|     - `libxslt1-dev` | ||||
|     - `mime-support` | ||||
|     -   `libatlas-base-dev` | ||||
|     -   `libxslt1-dev` | ||||
|     -   `mime-support` | ||||
| 
 | ||||
|     You will also need these for installing some of the python dependencies: | ||||
| 
 | ||||
|     - `build-essential` | ||||
|     - `python3-setuptools` | ||||
|     - `python3-wheel` | ||||
|     -   `build-essential` | ||||
|     -   `python3-setuptools` | ||||
|     -   `python3-wheel` | ||||
| 
 | ||||
|     Use this list for your preferred package management: | ||||
| 
 | ||||
| @ -361,33 +361,33 @@ are released, dependency support is confirmed, etc. | ||||
|     needs. Required settings for getting | ||||
|     paperless running are: | ||||
| 
 | ||||
|     - [`PAPERLESS_REDIS`](configuration.md#PAPERLESS_REDIS) should point to your redis server, such as | ||||
|       <redis://localhost:6379>. | ||||
|     - [`PAPERLESS_DBENGINE`](configuration.md#PAPERLESS_DBENGINE) optional, and should be one of `postgres`, | ||||
|       `mariadb`, or `sqlite` | ||||
|     - [`PAPERLESS_DBHOST`](configuration.md#PAPERLESS_DBHOST) should be the hostname on which your | ||||
|       PostgreSQL server is running. Do not configure this to use | ||||
|       SQLite instead. Also configure port, database name, user and | ||||
|       password as necessary. | ||||
|     - [`PAPERLESS_CONSUMPTION_DIR`](configuration.md#PAPERLESS_CONSUMPTION_DIR) should point to a folder which | ||||
|       paperless should watch for documents. You might want to have | ||||
|       this somewhere else. Likewise, [`PAPERLESS_DATA_DIR`](configuration.md#PAPERLESS_DATA_DIR) and | ||||
|       [`PAPERLESS_MEDIA_ROOT`](configuration.md#PAPERLESS_MEDIA_ROOT) define where paperless stores its data. | ||||
|       If you like, you can point both to the same directory. | ||||
|     - [`PAPERLESS_SECRET_KEY`](configuration.md#PAPERLESS_SECRET_KEY) should be a random sequence of | ||||
|       characters. It's used for authentication. Failure to do so | ||||
|       allows third parties to forge authentication credentials. | ||||
|     - [`PAPERLESS_URL`](configuration.md#PAPERLESS_URL) if you are behind a reverse proxy. This should | ||||
|       point to your domain. Please see | ||||
|       [configuration](configuration.md) for more | ||||
|       information. | ||||
|     -   [`PAPERLESS_REDIS`](configuration.md#PAPERLESS_REDIS) should point to your redis server, such as | ||||
|         <redis://localhost:6379>. | ||||
|     -   [`PAPERLESS_DBENGINE`](configuration.md#PAPERLESS_DBENGINE) optional, and should be one of `postgres`, | ||||
|         `mariadb`, or `sqlite` | ||||
|     -   [`PAPERLESS_DBHOST`](configuration.md#PAPERLESS_DBHOST) should be the hostname on which your | ||||
|         PostgreSQL server is running. Do not configure this to use | ||||
|         SQLite instead. Also configure port, database name, user and | ||||
|         password as necessary. | ||||
|     -   [`PAPERLESS_CONSUMPTION_DIR`](configuration.md#PAPERLESS_CONSUMPTION_DIR) should point to a folder which | ||||
|         paperless should watch for documents. You might want to have | ||||
|         this somewhere else. Likewise, [`PAPERLESS_DATA_DIR`](configuration.md#PAPERLESS_DATA_DIR) and | ||||
|         [`PAPERLESS_MEDIA_ROOT`](configuration.md#PAPERLESS_MEDIA_ROOT) define where paperless stores its data. | ||||
|         If you like, you can point both to the same directory. | ||||
|     -   [`PAPERLESS_SECRET_KEY`](configuration.md#PAPERLESS_SECRET_KEY) should be a random sequence of | ||||
|         characters. It's used for authentication. Failure to do so | ||||
|         allows third parties to forge authentication credentials. | ||||
|     -   [`PAPERLESS_URL`](configuration.md#PAPERLESS_URL) if you are behind a reverse proxy. This should | ||||
|         point to your domain. Please see | ||||
|         [configuration](configuration.md) for more | ||||
|         information. | ||||
| 
 | ||||
|     Many more adjustments can be made to paperless, especially the OCR | ||||
|     part. The following options are recommended for everyone: | ||||
| 
 | ||||
|     - Set [`PAPERLESS_OCR_LANGUAGE`](configuration.md#PAPERLESS_OCR_LANGUAGE) to the language most of your | ||||
|       documents are written in. | ||||
|     - Set [`PAPERLESS_TIME_ZONE`](configuration.md#PAPERLESS_TIME_ZONE) to your local time zone. | ||||
|     -   Set [`PAPERLESS_OCR_LANGUAGE`](configuration.md#PAPERLESS_OCR_LANGUAGE) to the language most of your | ||||
|         documents are written in. | ||||
|     -   Set [`PAPERLESS_TIME_ZONE`](configuration.md#PAPERLESS_TIME_ZONE) to your local time zone. | ||||
| 
 | ||||
|     !!! warning | ||||
| 
 | ||||
| @ -395,9 +395,9 @@ are released, dependency support is confirmed, etc. | ||||
| 
 | ||||
| 7.  Create the following directories if they are missing: | ||||
| 
 | ||||
|     - `/opt/paperless/media` | ||||
|     - `/opt/paperless/data` | ||||
|     - `/opt/paperless/consume` | ||||
|     -   `/opt/paperless/media` | ||||
|     -   `/opt/paperless/data` | ||||
|     -   `/opt/paperless/consume` | ||||
| 
 | ||||
|     Adjust as necessary if you configured different folders. | ||||
|     Ensure that the paperless user has write permissions for every one | ||||
| @ -586,21 +586,21 @@ your setup depending on how you installed paperless. | ||||
| This setup describes how to update an existing paperless Docker | ||||
| installation. The important things to keep in mind are as follows: | ||||
| 
 | ||||
| - Read the [changelog](changelog.md) and | ||||
|   take note of breaking changes. | ||||
| - You should decide if you want to stick with SQLite or want to | ||||
|   migrate your database to PostgreSQL. See [documentation](#sqlite_to_psql) | ||||
|   for details on | ||||
|   how to move your data from SQLite to PostgreSQL. Both work fine with | ||||
|   paperless. However, if you already have a database server running | ||||
|   for other services, you might as well use it for paperless as well. | ||||
| - The task scheduler of paperless, which is used to execute periodic | ||||
|   tasks such as email checking and maintenance, requires a | ||||
|   [redis](https://redis.io/) message broker instance. The | ||||
|   Docker Compose route takes care of that. | ||||
| - The layout of the folder structure for your documents and data | ||||
|   remains the same, so you can just plug your old docker volumes into | ||||
|   paperless-ngx and expect it to find everything where it should be. | ||||
| -   Read the [changelog](changelog.md) and | ||||
|     take note of breaking changes. | ||||
| -   You should decide if you want to stick with SQLite or want to | ||||
|     migrate your database to PostgreSQL. See [documentation](#sqlite_to_psql) | ||||
|     for details on | ||||
|     how to move your data from SQLite to PostgreSQL. Both work fine with | ||||
|     paperless. However, if you already have a database server running | ||||
|     for other services, you might as well use it for paperless as well. | ||||
| -   The task scheduler of paperless, which is used to execute periodic | ||||
|     tasks such as email checking and maintenance, requires a | ||||
|     [redis](https://redis.io/) message broker instance. The | ||||
|     Docker Compose route takes care of that. | ||||
| -   The layout of the folder structure for your documents and data | ||||
|     remains the same, so you can just plug your old docker volumes into | ||||
|     paperless-ngx and expect it to find everything where it should be. | ||||
| 
 | ||||
| Migration to paperless-ngx is then performed in a few simple steps: | ||||
| 
 | ||||
| @ -763,30 +763,30 @@ Paperless runs on Raspberry Pi. However, some things are rather slow on | ||||
| the Pi and configuring some options in paperless can help improve | ||||
| performance immensely: | ||||
| 
 | ||||
| - Stick with SQLite to save some resources. | ||||
| - Consider setting [`PAPERLESS_OCR_PAGES`](configuration.md#PAPERLESS_OCR_PAGES) to 1, so that paperless will | ||||
|   only OCR the first page of your documents. In most cases, this page | ||||
|   contains enough information to be able to find it. | ||||
| - [`PAPERLESS_TASK_WORKERS`](configuration.md#PAPERLESS_TASK_WORKERS) and [`PAPERLESS_THREADS_PER_WORKER`](configuration.md#PAPERLESS_THREADS_PER_WORKER) are | ||||
|   configured to use all cores. The Raspberry Pi models 3 and up have 4 | ||||
|   cores, meaning that paperless will use 2 workers and 2 threads per | ||||
|   worker. This may result in sluggish response times during | ||||
|   consumption, so you might want to lower these settings (example: 2 | ||||
|   workers and 1 thread to always have some computing power left for | ||||
|   other tasks). | ||||
| - Keep [`PAPERLESS_OCR_MODE`](configuration.md#PAPERLESS_OCR_MODE) at its default value `skip` and consider | ||||
|   OCR'ing your documents before feeding them into paperless. Some | ||||
|   scanners are able to do this! | ||||
| - Set [`PAPERLESS_OCR_SKIP_ARCHIVE_FILE`](configuration.md#PAPERLESS_OCR_SKIP_ARCHIVE_FILE) to `with_text` to skip archive | ||||
|   file generation for already ocr'ed documents, or `always` to skip it | ||||
|   for all documents. | ||||
| - If you want to perform OCR on the device, consider using | ||||
|   `PAPERLESS_OCR_CLEAN=none`. This will speed up OCR times and use | ||||
|   less memory at the expense of slightly worse OCR results. | ||||
| - If using docker, consider setting [`PAPERLESS_WEBSERVER_WORKERS`](configuration.md#PAPERLESS_WEBSERVER_WORKERS) to 1. This will save some memory. | ||||
| - Consider setting [`PAPERLESS_ENABLE_NLTK`](configuration.md#PAPERLESS_ENABLE_NLTK) to false, to disable the | ||||
|   more advanced language processing, which can take more memory and | ||||
|   processing time. | ||||
| -   Stick with SQLite to save some resources. | ||||
| -   Consider setting [`PAPERLESS_OCR_PAGES`](configuration.md#PAPERLESS_OCR_PAGES) to 1, so that paperless will | ||||
|     only OCR the first page of your documents. In most cases, this page | ||||
|     contains enough information to be able to find it. | ||||
| -   [`PAPERLESS_TASK_WORKERS`](configuration.md#PAPERLESS_TASK_WORKERS) and [`PAPERLESS_THREADS_PER_WORKER`](configuration.md#PAPERLESS_THREADS_PER_WORKER) are | ||||
|     configured to use all cores. The Raspberry Pi models 3 and up have 4 | ||||
|     cores, meaning that paperless will use 2 workers and 2 threads per | ||||
|     worker. This may result in sluggish response times during | ||||
|     consumption, so you might want to lower these settings (example: 2 | ||||
|     workers and 1 thread to always have some computing power left for | ||||
|     other tasks). | ||||
| -   Keep [`PAPERLESS_OCR_MODE`](configuration.md#PAPERLESS_OCR_MODE) at its default value `skip` and consider | ||||
|     OCR'ing your documents before feeding them into paperless. Some | ||||
|     scanners are able to do this! | ||||
| -   Set [`PAPERLESS_OCR_SKIP_ARCHIVE_FILE`](configuration.md#PAPERLESS_OCR_SKIP_ARCHIVE_FILE) to `with_text` to skip archive | ||||
|     file generation for already ocr'ed documents, or `always` to skip it | ||||
|     for all documents. | ||||
| -   If you want to perform OCR on the device, consider using | ||||
|     `PAPERLESS_OCR_CLEAN=none`. This will speed up OCR times and use | ||||
|     less memory at the expense of slightly worse OCR results. | ||||
| -   If using docker, consider setting [`PAPERLESS_WEBSERVER_WORKERS`](configuration.md#PAPERLESS_WEBSERVER_WORKERS) to 1. This will save some memory. | ||||
| -   Consider setting [`PAPERLESS_ENABLE_NLTK`](configuration.md#PAPERLESS_ENABLE_NLTK) to false, to disable the | ||||
|     more advanced language processing, which can take more memory and | ||||
|     processing time. | ||||
| 
 | ||||
| For details, refer to [configuration](configuration.md). | ||||
| 
 | ||||
|  | ||||
| @ -4,27 +4,27 @@ | ||||
| 
 | ||||
| Check for the following issues: | ||||
| 
 | ||||
| - Ensure that the directory you're putting your documents in is the | ||||
|   folder paperless is watching. With docker, this setting is performed | ||||
|   in the `docker-compose.yml` file. Without Docker, look at the | ||||
|   `CONSUMPTION_DIR` setting. Don't adjust this setting if you're | ||||
|   using docker. | ||||
| -   Ensure that the directory you're putting your documents in is the | ||||
|     folder paperless is watching. With docker, this setting is performed | ||||
|     in the `docker-compose.yml` file. Without Docker, look at the | ||||
|     `CONSUMPTION_DIR` setting. Don't adjust this setting if you're | ||||
|     using docker. | ||||
| 
 | ||||
| - Ensure that redis is up and running. Paperless does its task | ||||
|   processing asynchronously, and for documents to arrive at the task | ||||
|   processor, it needs redis to run. | ||||
| -   Ensure that redis is up and running. Paperless does its task | ||||
|     processing asynchronously, and for documents to arrive at the task | ||||
|     processor, it needs redis to run. | ||||
| 
 | ||||
| - Ensure that the task processor is running. Docker does this | ||||
|   automatically. Manually invoke the task processor by executing | ||||
| -   Ensure that the task processor is running. Docker does this | ||||
|     automatically. Manually invoke the task processor by executing | ||||
| 
 | ||||
|   ```shell-session | ||||
|   $ celery --app paperless worker | ||||
|   ``` | ||||
|     ```shell-session | ||||
|     $ celery --app paperless worker | ||||
|     ``` | ||||
| 
 | ||||
| - Look at the output of paperless and inspect it for any errors. | ||||
| -   Look at the output of paperless and inspect it for any errors. | ||||
| 
 | ||||
| - Go to the admin interface, and check if there are failed tasks. If | ||||
|   so, the tasks will contain an error message. | ||||
| -   Go to the admin interface, and check if there are failed tasks. If | ||||
|     so, the tasks will contain an error message. | ||||
| 
 | ||||
| ## Consumer warns `OCR for XX failed` | ||||
| 
 | ||||
| @ -78,12 +78,12 @@ Ensure that `chown` is possible on these directories. | ||||
| This indicates that the Auto matching algorithm found no documents to | ||||
| learn from. This may have two reasons: | ||||
| 
 | ||||
| - You don't use the Auto matching algorithm: The error can be safely | ||||
|   ignored in this case. | ||||
| - You are using the Auto matching algorithm: The classifier explicitly | ||||
|   excludes documents with Inbox tags. Verify that there are documents | ||||
|   in your archive without inbox tags. The algorithm will only learn | ||||
|   from documents not in your inbox. | ||||
| -   You don't use the Auto matching algorithm: The error can be safely | ||||
|     ignored in this case. | ||||
| -   You are using the Auto matching algorithm: The classifier explicitly | ||||
|     excludes documents with Inbox tags. Verify that there are documents | ||||
|     in your archive without inbox tags. The algorithm will only learn | ||||
|     from documents not in your inbox. | ||||
| 
 | ||||
| ## UserWarning in sklearn on every single document | ||||
| 
 | ||||
| @ -127,10 +127,10 @@ change in the `docker-compose.yml` file: | ||||
| # The gotenberg chromium route is used to convert .eml files. We do not | ||||
| # want to allow external content like tracking pixels or even javascript. | ||||
| command: | ||||
|   - 'gotenberg' | ||||
|   - '--chromium-disable-javascript=true' | ||||
|   - '--chromium-allow-list=file:///tmp/.*' | ||||
|   - '--api-timeout=60' | ||||
|     - 'gotenberg' | ||||
|     - '--chromium-disable-javascript=true' | ||||
|     - '--chromium-allow-list=file:///tmp/.*' | ||||
|     - '--api-timeout=60' | ||||
| ``` | ||||
| 
 | ||||
| ## Permission denied errors in the consumption directory | ||||
|  | ||||
							
								
								
									
										338
									
								
								docs/usage.md
									
									
									
									
									
								
							
							
						
						
									
										338
									
								
								docs/usage.md
									
									
									
									
									
								
							| @ -10,37 +10,37 @@ and provides many utilities for finding and managing your documents. | ||||
| Paperless essentially consists of two different parts for managing your | ||||
| documents: | ||||
| 
 | ||||
| - The _consumer_ watches a specified folder and adds all documents in | ||||
|   that folder to paperless. | ||||
| - The _web server_ provides a UI that you use to manage and search for | ||||
|   your scanned documents. | ||||
| -   The _consumer_ watches a specified folder and adds all documents in | ||||
|     that folder to paperless. | ||||
| -   The _web server_ provides a UI that you use to manage and search for | ||||
|     your scanned documents. | ||||
| 
 | ||||
| Each document has a couple of fields that you can assign to them: | ||||
| 
 | ||||
| - A _Document_ is a piece of paper that sometimes contains valuable | ||||
|   information. | ||||
| - The _correspondent_ of a document is the person, institution or | ||||
|   company that a document either originates from, or is sent to. | ||||
| - A _tag_ is a label that you can assign to documents. Think of labels | ||||
|   as more powerful folders: Multiple documents can be grouped together | ||||
|   with a single tag, however, a single document can also have multiple | ||||
|   tags. This is not possible with folders. The reason folders are not | ||||
|   implemented in paperless is simply that tags are much more versatile | ||||
|   than folders. | ||||
| - A _document type_ is used to demarcate the type of a document such | ||||
|   as letter, bank statement, invoice, contract, etc. It is used to | ||||
|   identify what a document is about. | ||||
| - The _date added_ of a document is the date the document was scanned | ||||
|   into paperless. You cannot and should not change this date. | ||||
| - The _date created_ of a document is the date the document was | ||||
|   initially issued. This can be the date you bought a product, the | ||||
|   date you signed a contract, or the date a letter was sent to you. | ||||
| - The _archive serial number_ (short: ASN) of a document is the | ||||
|   identifier of the document in your physical document binders. See | ||||
|   [recommended workflow](#usage-recommended-workflow) below. | ||||
| - The _content_ of a document is the text that was OCR'ed from the | ||||
|   document. This text is fed into the search engine and is used for | ||||
|   matching tags, correspondents and document types. | ||||
| -   A _Document_ is a piece of paper that sometimes contains valuable | ||||
|     information. | ||||
| -   The _correspondent_ of a document is the person, institution or | ||||
|     company that a document either originates from, or is sent to. | ||||
| -   A _tag_ is a label that you can assign to documents. Think of labels | ||||
|     as more powerful folders: Multiple documents can be grouped together | ||||
|     with a single tag, however, a single document can also have multiple | ||||
|     tags. This is not possible with folders. The reason folders are not | ||||
|     implemented in paperless is simply that tags are much more versatile | ||||
|     than folders. | ||||
| -   A _document type_ is used to demarcate the type of a document such | ||||
|     as letter, bank statement, invoice, contract, etc. It is used to | ||||
|     identify what a document is about. | ||||
| -   The _date added_ of a document is the date the document was scanned | ||||
|     into paperless. You cannot and should not change this date. | ||||
| -   The _date created_ of a document is the date the document was | ||||
|     initially issued. This can be the date you bought a product, the | ||||
|     date you signed a contract, or the date a letter was sent to you. | ||||
| -   The _archive serial number_ (short: ASN) of a document is the | ||||
|     identifier of the document in your physical document binders. See | ||||
|     [recommended workflow](#usage-recommended-workflow) below. | ||||
| -   The _content_ of a document is the text that was OCR'ed from the | ||||
|     document. This text is fed into the search engine and is used for | ||||
|     matching tags, correspondents and document types. | ||||
| 
 | ||||
| ## Adding documents to paperless | ||||
| 
 | ||||
| @ -142,21 +142,21 @@ patterns can include wildcards and multiple patterns separated by a comma. | ||||
| The actions all ensure that the same mail is not consumed twice by | ||||
| different means. These are as follows: | ||||
| 
 | ||||
| - **Delete:** Immediately deletes mail that paperless has consumed | ||||
|   documents from. Use with caution. | ||||
| - **Mark as read:** Mark consumed mail as read. Paperless will not | ||||
|   consume documents from already read mails. If you read a mail before | ||||
|   paperless sees it, it will be ignored. | ||||
| - **Flag:** Sets the 'important' flag on mails with consumed | ||||
|   documents. Paperless will not consume flagged mails. | ||||
| - **Move to folder:** Moves consumed mails out of the way so that | ||||
|   paperless won't consume them again. | ||||
| - **Add custom Tag:** Adds a custom tag to mails with consumed | ||||
|   documents (the IMAP standard calls these "keywords"). Paperless | ||||
|   will not consume mails already tagged. Not all mail servers support | ||||
|   this feature! | ||||
| -   **Delete:** Immediately deletes mail that paperless has consumed | ||||
|     documents from. Use with caution. | ||||
| -   **Mark as read:** Mark consumed mail as read. Paperless will not | ||||
|     consume documents from already read mails. If you read a mail before | ||||
|     paperless sees it, it will be ignored. | ||||
| -   **Flag:** Sets the 'important' flag on mails with consumed | ||||
|     documents. Paperless will not consume flagged mails. | ||||
| -   **Move to folder:** Moves consumed mails out of the way so that | ||||
|     paperless won't consume them again. | ||||
| -   **Add custom Tag:** Adds a custom tag to mails with consumed | ||||
|     documents (the IMAP standard calls these "keywords"). Paperless | ||||
|     will not consume mails already tagged. Not all mail servers support | ||||
|     this feature! | ||||
| 
 | ||||
|   - **Apple Mail support:** Apple Mail clients allow differently colored tags. For this to work use `apple:<color>` (e.g. _apple:green_) as a custom tag. Available colors are _red_, _orange_, _yellow_, _blue_, _green_, _violet_ and _grey_. | ||||
|     -   **Apple Mail support:** Apple Mail clients allow differently colored tags. For this to work use `apple:<color>` (e.g. _apple:green_) as a custom tag. Available colors are _red_, _orange_, _yellow_, _blue_, _green_, _violet_ and _grey_. | ||||
| 
 | ||||
| !!! warning | ||||
| 
 | ||||
| @ -360,32 +360,32 @@ flowchart TD | ||||
| 
 | ||||
| Workflows allow you to filter by: | ||||
| 
 | ||||
| - Source, e.g. documents uploaded via consume folder, API (& the web UI) and mail fetch | ||||
| - File name, including wildcards e.g. \*.pdf will apply to all pdfs | ||||
| - File path, including wildcards. Note that enabling `PAPERLESS_CONSUMER_RECURSIVE` would allow, for | ||||
|   example, automatically assigning documents to different owners based on the upload directory. | ||||
| - Mail rule. Choosing this option will force 'mail fetch' to be the workflow source. | ||||
| - Content matching (`Added` and `Updated` triggers only). Filter document content using the matching settings. | ||||
| - Tags (`Added` and `Updated` triggers only). Filter for documents with any of the specified tags | ||||
| - Document type (`Added` and `Updated` triggers only). Filter documents with this doc type | ||||
| - Correspondent (`Added` and `Updated` triggers only). Filter documents with this correspondent | ||||
| -   Source, e.g. documents uploaded via consume folder, API (& the web UI) and mail fetch | ||||
| -   File name, including wildcards e.g. \*.pdf will apply to all pdfs | ||||
| -   File path, including wildcards. Note that enabling `PAPERLESS_CONSUMER_RECURSIVE` would allow, for | ||||
|     example, automatically assigning documents to different owners based on the upload directory. | ||||
| -   Mail rule. Choosing this option will force 'mail fetch' to be the workflow source. | ||||
| -   Content matching (`Added` and `Updated` triggers only). Filter document content using the matching settings. | ||||
| -   Tags (`Added` and `Updated` triggers only). Filter for documents with any of the specified tags | ||||
| -   Document type (`Added` and `Updated` triggers only). Filter documents with this doc type | ||||
| -   Correspondent (`Added` and `Updated` triggers only). Filter documents with this correspondent | ||||
| 
 | ||||
| ### Workflow Actions | ||||
| 
 | ||||
| There are currently two types of workflow actions, "Assignment", which can assign: | ||||
| 
 | ||||
| - Title, see [title placeholders](usage.md#title-placeholders) below | ||||
| - Tags, correspondent, document type and storage path | ||||
| - Document owner | ||||
| - View and / or edit permissions to users or groups | ||||
| - Custom fields. Note that no value for the field will be set | ||||
| -   Title, see [title placeholders](usage.md#title-placeholders) below | ||||
| -   Tags, correspondent, document type and storage path | ||||
| -   Document owner | ||||
| -   View and / or edit permissions to users or groups | ||||
| -   Custom fields. Note that no value for the field will be set | ||||
| 
 | ||||
| and "Removal" actions, which can remove either all of or specific sets of the following: | ||||
| 
 | ||||
| - Tags, correspondents, document types or storage paths | ||||
| - Document owner | ||||
| - View and / or edit permissions | ||||
| - Custom fields | ||||
| -   Tags, correspondents, document types or storage paths | ||||
| -   Document owner | ||||
| -   View and / or edit permissions | ||||
| -   Custom fields | ||||
| 
 | ||||
| #### Title placeholders | ||||
| 
 | ||||
| @ -393,29 +393,29 @@ Workflow titles can include placeholders but the available options differ depend | ||||
| workflow trigger. This is because at the time of consumption (when the title is to be set), no automatic tags etc. have been | ||||
| applied. You can use the following placeholders with any trigger type: | ||||
| 
 | ||||
| - `{correspondent}`: assigned correspondent name | ||||
| - `{document_type}`: assigned document type name | ||||
| - `{owner_username}`: assigned owner username | ||||
| - `{added}`: added datetime | ||||
| - `{added_year}`: added year | ||||
| - `{added_year_short}`: added year | ||||
| - `{added_month}`: added month | ||||
| - `{added_month_name}`: added month name | ||||
| - `{added_month_name_short}`: added month short name | ||||
| - `{added_day}`: added day | ||||
| - `{added_time}`: added time in HH:MM format | ||||
| - `{original_filename}`: original file name without extension | ||||
| -   `{correspondent}`: assigned correspondent name | ||||
| -   `{document_type}`: assigned document type name | ||||
| -   `{owner_username}`: assigned owner username | ||||
| -   `{added}`: added datetime | ||||
| -   `{added_year}`: added year | ||||
| -   `{added_year_short}`: added year | ||||
| -   `{added_month}`: added month | ||||
| -   `{added_month_name}`: added month name | ||||
| -   `{added_month_name_short}`: added month short name | ||||
| -   `{added_day}`: added day | ||||
| -   `{added_time}`: added time in HH:MM format | ||||
| -   `{original_filename}`: original file name without extension | ||||
| 
 | ||||
| The following placeholders are only available for "added" or "updated" triggers | ||||
| 
 | ||||
| - `{created}`: created datetime | ||||
| - `{created_year}`: created year | ||||
| - `{created_year_short}`: created year | ||||
| - `{created_month}`: created month | ||||
| - `{created_month_name}`: created month name | ||||
| - `{created_month_name_short}`: created month short name | ||||
| - `{created_day}`: created day | ||||
| - `{created_time}`: created time in HH:MM format | ||||
| -   `{created}`: created datetime | ||||
| -   `{created_year}`: created year | ||||
| -   `{created_year_short}`: created year | ||||
| -   `{created_month}`: created month | ||||
| -   `{created_month_name}`: created month name | ||||
| -   `{created_month_name_short}`: created month short name | ||||
| -   `{created_day}`: created day | ||||
| -   `{created_time}`: created time in HH:MM format | ||||
| 
 | ||||
| ### Workflow permissions | ||||
| 
 | ||||
| @ -450,24 +450,24 @@ Multiple fields may be attached to a document but the same field name cannot be | ||||
| 
 | ||||
| The following custom field types are supported: | ||||
| 
 | ||||
| - `Text`: any text | ||||
| - `Boolean`: true / false (check / unchecked) field | ||||
| - `Date`: date | ||||
| - `URL`: a valid url | ||||
| - `Integer`: integer number e.g. 12 | ||||
| - `Number`: float number e.g. 12.3456 | ||||
| - `Monetary`: [ISO 4217 currency code](https://en.wikipedia.org/wiki/ISO_4217#List_of_ISO_4217_currency_codes) and a number with exactly two decimals, e.g. USD12.30 | ||||
| - `Document Link`: reference(s) to other document(s) displayed as links, automatically creates a symmetrical link in reverse | ||||
| - `Select`: a pre-defined list of strings from which the user can choose | ||||
| -   `Text`: any text | ||||
| -   `Boolean`: true / false (check / unchecked) field | ||||
| -   `Date`: date | ||||
| -   `URL`: a valid url | ||||
| -   `Integer`: integer number e.g. 12 | ||||
| -   `Number`: float number e.g. 12.3456 | ||||
| -   `Monetary`: [ISO 4217 currency code](https://en.wikipedia.org/wiki/ISO_4217#List_of_ISO_4217_currency_codes) and a number with exactly two decimals, e.g. USD12.30 | ||||
| -   `Document Link`: reference(s) to other document(s) displayed as links, automatically creates a symmetrical link in reverse | ||||
| -   `Select`: a pre-defined list of strings from which the user can choose | ||||
| 
 | ||||
| ## Share Links | ||||
| 
 | ||||
| Paperless-ngx added the ability to create shareable links to files in version 2.0. You can find the button for this on the document detail screen. | ||||
| 
 | ||||
| - Share links do not require a user to login and thus link directly to a file. | ||||
| - Links are unique and are of the form `{paperless-url}/share/{randomly-generated-slug}`. | ||||
| - Links can optionally have an expiration time set. | ||||
| - After a link expires or is deleted users will be redirected to the regular paperless-ngx login. | ||||
| -   Share links do not require a user to login and thus link directly to a file. | ||||
| -   Links are unique and are of the form `{paperless-url}/share/{randomly-generated-slug}`. | ||||
| -   Links can optionally have an expiration time set. | ||||
| -   After a link expires or is deleted users will be redirected to the regular paperless-ngx login. | ||||
| 
 | ||||
| !!! tip | ||||
| 
 | ||||
| @ -477,10 +477,10 @@ Paperless-ngx added the ability to create shareable links to files in version 2. | ||||
| 
 | ||||
| Paperless-ngx supports four basic editing operations for PDFs (these operations currently cannot be performed on non-PDF files): | ||||
| 
 | ||||
| - Merging documents: available when selecting multiple documents for 'bulk editing'. | ||||
| - Rotating documents: available when selecting multiple documents for 'bulk editing' and from an individual document's details page. | ||||
| - Splitting documents: available from an individual document's details page. | ||||
| - Deleting pages: available from an individual document's details page. | ||||
| -   Merging documents: available when selecting multiple documents for 'bulk editing'. | ||||
| -   Rotating documents: available when selecting multiple documents for 'bulk editing' and from an individual document's details page. | ||||
| -   Splitting documents: available from an individual document's details page. | ||||
| -   Deleting pages: available from an individual document's details page. | ||||
| 
 | ||||
| !!! important | ||||
| 
 | ||||
| @ -558,18 +558,18 @@ the system. | ||||
| Here are a couple examples of tags and types that you could use in your | ||||
| collection. | ||||
| 
 | ||||
| - An `inbox` tag for newly added documents that you haven't manually | ||||
|   edited yet. | ||||
| - A tag `car` for everything car related (repairs, registration, | ||||
|   insurance, etc) | ||||
| - A tag `todo` for documents that you still need to do something with, | ||||
|   such as reply, or perform some task online. | ||||
| - A tag `bank account x` for all bank statement related to that | ||||
|   account. | ||||
| - A tag `mail` for anything that you added to paperless via its mail | ||||
|   processing capabilities. | ||||
| - A tag `missing_metadata` when you still need to add some metadata to | ||||
|   a document, but can't or don't want to do this right now. | ||||
| -   An `inbox` tag for newly added documents that you haven't manually | ||||
|     edited yet. | ||||
| -   A tag `car` for everything car related (repairs, registration, | ||||
|     insurance, etc) | ||||
| -   A tag `todo` for documents that you still need to do something with, | ||||
|     such as reply, or perform some task online. | ||||
| -   A tag `bank account x` for all bank statement related to that | ||||
|     account. | ||||
| -   A tag `mail` for anything that you added to paperless via its mail | ||||
|     processing capabilities. | ||||
| -   A tag `missing_metadata` when you still need to add some metadata to | ||||
|     a document, but can't or don't want to do this right now. | ||||
| 
 | ||||
| ## Searching {#basic-usage_searching} | ||||
| 
 | ||||
| @ -658,8 +658,8 @@ The following diagram shows how easy it is to manage your documents. | ||||
| 
 | ||||
| ### Preparations in paperless | ||||
| 
 | ||||
| - Create an inbox tag that gets assigned to all new documents. | ||||
| - Create a TODO tag. | ||||
| -   Create an inbox tag that gets assigned to all new documents. | ||||
| -   Create a TODO tag. | ||||
| 
 | ||||
| ### Processing of the physical documents | ||||
| 
 | ||||
| @ -733,78 +733,78 @@ Some documents require attention and require you to act on the document. | ||||
| You may take two different approaches to handle these documents based on | ||||
| how regularly you intend to scan documents and use paperless. | ||||
| 
 | ||||
| - If you scan and process your documents in paperless regularly, | ||||
|   assign a TODO tag to all scanned documents that you need to process. | ||||
|   Create a saved view on the dashboard that shows all documents with | ||||
|   this tag. | ||||
| - If you do not scan documents regularly and use paperless solely for | ||||
|   archiving, create a physical todo box next to your physical inbox | ||||
|   and put documents you need to process in the TODO box. When you | ||||
|   performed the task associated with the document, move it to the | ||||
|   inbox. | ||||
| -   If you scan and process your documents in paperless regularly, | ||||
|     assign a TODO tag to all scanned documents that you need to process. | ||||
|     Create a saved view on the dashboard that shows all documents with | ||||
|     this tag. | ||||
| -   If you do not scan documents regularly and use paperless solely for | ||||
|     archiving, create a physical todo box next to your physical inbox | ||||
|     and put documents you need to process in the TODO box. When you | ||||
|     performed the task associated with the document, move it to the | ||||
|     inbox. | ||||
| 
 | ||||
| ## Architecture | ||||
| 
 | ||||
| Paperless-ngx consists of the following components: | ||||
| 
 | ||||
| - **The webserver:** This serves the administration pages, the API, | ||||
|   and the new frontend. This is the main tool you'll be using to interact | ||||
|   with paperless. You may start the webserver directly with | ||||
| -   **The webserver:** This serves the administration pages, the API, | ||||
|     and the new frontend. This is the main tool you'll be using to interact | ||||
|     with paperless. You may start the webserver directly with | ||||
| 
 | ||||
|   ```shell-session | ||||
|   $ cd /path/to/paperless/src/ | ||||
|   $ gunicorn -c ../gunicorn.conf.py paperless.wsgi | ||||
|   ``` | ||||
|     ```shell-session | ||||
|     $ cd /path/to/paperless/src/ | ||||
|     $ gunicorn -c ../gunicorn.conf.py paperless.wsgi | ||||
|     ``` | ||||
| 
 | ||||
|   or by any other means such as Apache `mod_wsgi`. | ||||
|     or by any other means such as Apache `mod_wsgi`. | ||||
| 
 | ||||
| - **The consumer:** This is what watches your consumption folder for | ||||
|   documents. However, the consumer itself does not really consume your | ||||
|   documents. Now it notifies a task processor that a new file is ready | ||||
|   for consumption. I suppose it should be named differently. This was | ||||
|   also used to check your emails, but that's now done elsewhere as | ||||
|   well. | ||||
| -   **The consumer:** This is what watches your consumption folder for | ||||
|     documents. However, the consumer itself does not really consume your | ||||
|     documents. Now it notifies a task processor that a new file is ready | ||||
|     for consumption. I suppose it should be named differently. This was | ||||
|     also used to check your emails, but that's now done elsewhere as | ||||
|     well. | ||||
| 
 | ||||
|   Start the consumer with the management command `document_consumer`: | ||||
|     Start the consumer with the management command `document_consumer`: | ||||
| 
 | ||||
|   ```shell-session | ||||
|   $ cd /path/to/paperless/src/ | ||||
|   $ python3 manage.py document_consumer | ||||
|   ``` | ||||
|     ```shell-session | ||||
|     $ cd /path/to/paperless/src/ | ||||
|     $ python3 manage.py document_consumer | ||||
|     ``` | ||||
| 
 | ||||
| - **The task processor:** Paperless relies on [Celery - Distributed | ||||
|   Task Queue](https://docs.celeryq.dev/en/stable/index.html) for doing | ||||
|   most of the heavy lifting. This is a task queue that accepts tasks | ||||
|   from multiple sources and processes these in parallel. It also comes | ||||
|   with a scheduler that executes certain commands periodically. | ||||
| -   **The task processor:** Paperless relies on [Celery - Distributed | ||||
|     Task Queue](https://docs.celeryq.dev/en/stable/index.html) for doing | ||||
|     most of the heavy lifting. This is a task queue that accepts tasks | ||||
|     from multiple sources and processes these in parallel. It also comes | ||||
|     with a scheduler that executes certain commands periodically. | ||||
| 
 | ||||
|   This task processor is responsible for: | ||||
|     This task processor is responsible for: | ||||
| 
 | ||||
|   - Consuming documents. When the consumer finds new documents, it | ||||
|     notifies the task processor to start a consumption task. | ||||
|   - The task processor also performs the consumption of any | ||||
|     documents you upload through the web interface. | ||||
|   - Consuming emails. It periodically checks your configured | ||||
|     accounts for new emails and notifies the task processor to | ||||
|     consume the attachment of an email. | ||||
|   - Maintaining the search index and the automatic matching | ||||
|     algorithm. These are things that paperless needs to do from time | ||||
|     to time in order to operate properly. | ||||
|     -   Consuming documents. When the consumer finds new documents, it | ||||
|         notifies the task processor to start a consumption task. | ||||
|     -   The task processor also performs the consumption of any | ||||
|         documents you upload through the web interface. | ||||
|     -   Consuming emails. It periodically checks your configured | ||||
|         accounts for new emails and notifies the task processor to | ||||
|         consume the attachment of an email. | ||||
|     -   Maintaining the search index and the automatic matching | ||||
|         algorithm. These are things that paperless needs to do from time | ||||
|         to time in order to operate properly. | ||||
| 
 | ||||
|   This allows paperless to process multiple documents from your | ||||
|   consumption folder in parallel! On a modern multi core system, this | ||||
|   makes the consumption process with full OCR blazingly fast. | ||||
|     This allows paperless to process multiple documents from your | ||||
|     consumption folder in parallel! On a modern multi core system, this | ||||
|     makes the consumption process with full OCR blazingly fast. | ||||
| 
 | ||||
|   The task processor comes with a built-in admin interface that you | ||||
|   can use to check whenever any of the tasks fail and inspect the | ||||
|   errors (i.e., wrong email credentials, errors during consuming a | ||||
|   specific file, etc). | ||||
|     The task processor comes with a built-in admin interface that you | ||||
|     can use to check whenever any of the tasks fail and inspect the | ||||
|     errors (i.e., wrong email credentials, errors during consuming a | ||||
|     specific file, etc). | ||||
| 
 | ||||
| - A [redis](https://redis.io/) message broker: This is a really | ||||
|   lightweight service that is responsible for getting the tasks from | ||||
|   the webserver and the consumer to the task scheduler. These run in a | ||||
|   different process (maybe even on different machines!), and | ||||
|   therefore, this is necessary. | ||||
| -   A [redis](https://redis.io/) message broker: This is a really | ||||
|     lightweight service that is responsible for getting the tasks from | ||||
|     the webserver and the consumer to the task scheduler. These run in a | ||||
|     different process (maybe even on different machines!), and | ||||
|     therefore, this is necessary. | ||||
| 
 | ||||
| - Optional: A database server. Paperless supports PostgreSQL, MariaDB | ||||
|   and SQLite for storing its data. | ||||
| -   Optional: A database server. Paperless supports PostgreSQL, MariaDB | ||||
|     and SQLite for storing its data. | ||||
|  | ||||
		Loading…
	
	
			
			x
			
			
		
	
		Reference in New Issue
	
	Block a user