mirror of
https://github.com/paperless-ngx/paperless-ngx.git
synced 2025-11-09 00:03:59 -05:00
commit b1410a854e03087023c89998b14c3296ac669f1f
Merge: f9ce4d8f 8ec9c77e
Author: shamoon <4887959+shamoon@users.noreply.github.com>
Date: Thu Dec 29 20:09:09 2022 -0800
Merge pull request #2263 from paperless-ngx/v1.11.0-changelog
[Documentation] Add v1.11.0 changelog
commit 8ec9c77e51dc492f6b7f468ab533204848a554b3
Author: github-actions <41898282+github-actions[bot]@users.noreply.github.com>
Date: Fri Dec 30 04:08:17 2022 +0000
Changelog v1.11.0 - GHA
commit f9ce4d8f6a9086d21f7f9c5411a28dd8b0b7135e
Author: Michael Shamoon <4887959+shamoon@users.noreply.github.com>
Date: Thu Dec 29 19:40:25 2022 -0800
Update version strings for 1.11.0
commit 8c9a74ee0ca03d1f1afd7dee9203648d48bb33c1
Merge: 605f86f0 0b59ef2c
Author: Michael Shamoon <4887959+shamoon@users.noreply.github.com>
Date: Thu Dec 29 19:39:38 2022 -0800
Merge branch 'dev'
commit 605f86f0cfb908761d2f71d7e17c1e60668b7edf
Merge: 800e842a 8cbaca22
Author: shamoon <4887959+shamoon@users.noreply.github.com>
Date: Wed Dec 28 15:55:35 2022 -0800
Merge pull request #2256 from mendelk/patch-1
Fixed typo in docs
commit 8cbaca22c12b5f3129b52a376dd56f00600f27be
Author: Mendel Kramer <mendelk@users.noreply.github.com>
Date: Wed Dec 28 18:16:00 2022 -0500
Fixed typo in docs
commit 800e842ab304ce2fcb1c126d491dac0770ad66ff
Author: ThellraAK <github.com@absurdlybored.com>
Date: Wed Dec 21 01:36:37 2022 -0900
Removing Mariadb default open port (#2227)
* Removing Mariadb default open port
Removing the listening port 3306 for the DB, Docker networks will let the containers talk to one another. The existing setup would allow anyone to connect to the DB and use the default passwords.
* Update docker-compose.mariadb-tika.yml
Adding change to the other compose file to remove open port
* Remove excess blank lines
* Remove excess blank lines
Co-authored-by: Felix E <felix@eckhofer.com>
commit 6f6f365e2b36410110275ca92b5ba467500bb577
Merge: 6d324dbd 43b863b8
Author: shamoon <4887959+shamoon@users.noreply.github.com>
Date: Sat Dec 17 19:58:06 2022 -0800
Merge pull request #2203 from tooomm/docs_updates
Docs: More fixes and improvements
commit 43b863b816337dd19dd9b903e76ecf50b47f1583
Author: tooomm <tooomm@users.noreply.github.com>
Date: Sun Dec 11 19:44:18 2022 +0100
doc fixes
This reverts commit e015babdc102a65a3cce0cc71812d3eb730da92e.
link fix
fix escaping, spacing, profile links, typo
revert
~~add~~ at fixes
Revert "~~add~~ at fixes"
This reverts commit ce0192b733c19614048de81ea917660e25bb35f2.
commit 6d324dbd8e73c5acdd3b53fd9013c70c53d012e1
Author: shamoon <4887959+shamoon@users.noreply.github.com>
Date: Fri Dec 16 09:10:11 2022 -0800
Update config.yml
commit 8ddf05e573c4bc2a55ef6d20f5e36181ccf534b5
Author: shamoon <4887959+shamoon@users.noreply.github.com>
Date: Fri Dec 16 09:09:48 2022 -0800
Update bug-report.yml
commit 0472dfe25a02b3bc9b148f435bcda6e2e2987355
Author: tooomm <tooomm@users.noreply.github.com>
Date: Sun Dec 11 19:12:58 2022 +0100
Docs: Fix leftover issues from conversion (#2172)
commit 8b36c9ad64bb7638e33d9cb22217f3d8345d5c1e
Author: tooomm <tooomm@users.noreply.github.com>
Date: Sun Dec 11 16:07:08 2022 +0100
more fixes and cleanup
commit 1266f2d5b948b7d99dab267e34840ece6a3fbaa4
Author: tooomm <tooomm@users.noreply.github.com>
Date: Sun Dec 11 12:06:15 2022 +0100
fix links
commit 81960519592095df714fb0e0f7a0e907488fa269
Merge: 06a6eb03 d198142a
Author: shamoon <4887959+shamoon@users.noreply.github.com>
Date: Fri Dec 9 16:12:20 2022 -0800
Merge pull request #2157 from Weltraumschaf/patch-1
Update setup.md
commit d198142a1ef8cdcaa0d19d126d67b4ade754fceb
Author: Sven Strittmatter <ich@weltraumschaf.de>
Date: Fri Dec 9 22:09:06 2022 +0100
Update setup.md
W/o the slash it resolves to /setup/configuration/ which does 404.
commit 06a6eb0326af6eb3bbe523b0c0061fc324578834
Author: Michael Shamoon <4887959+shamoon@users.noreply.github.com>
Date: Fri Dec 9 08:15:03 2022 -0800
fix code block indentation
commit 28819d6d0fb77b8f6030865b0c0d2a1b74a39cad
Author: shamoon <4887959+shamoon@users.noreply.github.com>
Date: Fri Dec 9 08:11:42 2022 -0800
Fix code block indentation
commit 8cd5e25364768512af90c773c6a2d307cf59febe
Merge: 32d54674 7788d932
Author: shamoon <4887959+shamoon@users.noreply.github.com>
Date: Tue Dec 6 11:23:15 2022 -0800
Merge pull request #2137 from paperless-ngx/more-docs-cleanup
Chore: Cleanup of new documentation
commit 7788d932275fd108f6ab9425b1daeabd2c931422
Author: Trenton Holmes <797416+stumpylog@users.noreply.github.com>
Date: Sun Dec 4 08:34:49 2022 -0800
Further cleanup of docs, including fixing autoconvert issues and general cleanups
commit 32d546740bd4f086369d1a81ddb6658b2f9298b0
Merge: b0ca57a7 24da3e50
Author: shamoon <4887959+shamoon@users.noreply.github.com>
Date: Sun Dec 4 19:12:27 2022 -0800
Merge pull request #2118 from alexander-bauer/chart-bump
commit 24da3e50342d3494ba93c83a601c8f44c635e43d
Author: Alexander Bauer <sasha@linux.com>
Date: Mon Dec 5 02:51:35 2022 +0000
Bump Helm Chart version to trigger release
commit b0ca57a7f0e5694f5442303e6b17cf6abe120f9a
Merge: cdd49c51 c864b3cd
Author: shamoon <4887959+shamoon@users.noreply.github.com>
Date: Sun Dec 4 14:36:00 2022 -0800
Merge pull request #2114 from paperless-ngx/v1.10.2-changelog
[Documentation] Add v1.10.2 changelog
commit cdd49c51426e0de8937210a65e717fb46eea6101
Author: Michael Shamoon <4887959+shamoon@users.noreply.github.com>
Date: Sun Dec 4 14:32:08 2022 -0800
Update frontend compilation info
commit c864b3cd19da3dc37f2f3ba3afa34cfcb73892a8
Author: github-actions <41898282+github-actions[bot]@users.noreply.github.com>
Date: Sun Dec 4 21:17:16 2022 +0000
Changelog v1.10.2 - GHA
335 lines
13 KiB
Markdown
335 lines
13 KiB
Markdown
# Troubleshooting
|
|
|
|
## No files are added by the consumer
|
|
|
|
Check for the following issues:
|
|
|
|
- Ensure that the directory you're putting your documents in is the
|
|
folder paperless is watching. With docker, this setting is performed
|
|
in the `docker-compose.yml` file. Without docker, look at the
|
|
`CONSUMPTION_DIR` setting. Don't adjust this setting if you're
|
|
using docker.
|
|
|
|
- Ensure that redis is up and running. Paperless does its task
|
|
processing asynchronously, and for documents to arrive at the task
|
|
processor, it needs redis to run.
|
|
|
|
- Ensure that the task processor is running. Docker does this
|
|
automatically. Manually invoke the task processor by executing
|
|
|
|
```shell-session
|
|
$ celery --app paperless worker
|
|
```
|
|
|
|
- Look at the output of paperless and inspect it for any errors.
|
|
|
|
- Go to the admin interface, and check if there are failed tasks. If
|
|
so, the tasks will contain an error message.
|
|
|
|
## Consumer warns `OCR for XX failed`
|
|
|
|
If you find the OCR accuracy to be too low, and/or the document consumer
|
|
warns that
|
|
`OCR for XX failed, but we're going to stick with what we've got since FORGIVING_OCR is enabled`,
|
|
then you might need to install the [Tesseract language
|
|
files](https://packages.ubuntu.com/search?keywords=tesseract-ocr)
|
|
marching your document's languages.
|
|
|
|
As an example, if you are running Paperless-ngx from any Ubuntu or
|
|
Debian box, and your documents are written in Spanish you may need to
|
|
run:
|
|
|
|
apt-get install -y tesseract-ocr-spa
|
|
|
|
## Consumer fails to pickup any new files
|
|
|
|
If you notice that the consumer will only pickup files in the
|
|
consumption directory at startup, but won't find any other files added
|
|
later, you will need to enable filesystem polling with the configuration
|
|
option `PAPERLESS_CONSUMER_POLLING`, see
|
|
`[here](/configuration#polling).
|
|
|
|
This will disable listening to filesystem changes with inotify and
|
|
paperless will manually check the consumption directory for changes
|
|
instead.
|
|
|
|
## Paperless always redirects to /admin
|
|
|
|
You probably had the old paperless installed at some point. Paperless
|
|
installed a permanent redirect to /admin in your browser, and you need
|
|
to clear your browsing data / cache to fix that.
|
|
|
|
## Operation not permitted
|
|
|
|
You might see errors such as:
|
|
|
|
```shell-session
|
|
chown: changing ownership of '../export': Operation not permitted
|
|
```
|
|
|
|
The container tries to set file ownership on the listed directories.
|
|
This is required so that the user running paperless inside docker has
|
|
write permissions to these folders. This happens when pointing these
|
|
directories to NFS shares, for example.
|
|
|
|
Ensure that `chown` is possible on these directories.
|
|
|
|
## Classifier error: No training data available
|
|
|
|
This indicates that the Auto matching algorithm found no documents to
|
|
learn from. This may have two reasons:
|
|
|
|
- You don't use the Auto matching algorithm: The error can be safely
|
|
ignored in this case.
|
|
- You are using the Auto matching algorithm: The classifier explicitly
|
|
excludes documents with Inbox tags. Verify that there are documents
|
|
in your archive without inbox tags. The algorithm will only learn
|
|
from documents not in your inbox.
|
|
|
|
## UserWarning in sklearn on every single document
|
|
|
|
You may encounter warnings like this:
|
|
|
|
```
|
|
/usr/local/lib/python3.7/site-packages/sklearn/base.py:315:
|
|
UserWarning: Trying to unpickle estimator CountVectorizer from version 0.23.2 when using version 0.24.0.
|
|
This might lead to breaking code or invalid results. Use at your own risk.
|
|
```
|
|
|
|
This happens when certain dependencies of paperless that are responsible
|
|
for the auto matching algorithm are updated. After updating these, your
|
|
current training data _might_ not be compatible anymore. This can be
|
|
ignored in most cases. This warning will disappear automatically when
|
|
paperless updates the training data.
|
|
|
|
If you want to get rid of the warning or actually experience issues with
|
|
automatic matching, delete the file `classification_model.pickle` in the
|
|
data directory and let paperless recreate it.
|
|
|
|
## 504 Server Error: Gateway Timeout when adding Office documents
|
|
|
|
You may experience these errors when using the optional TIKA
|
|
integration:
|
|
|
|
```
|
|
requests.exceptions.HTTPError: 504 Server Error: Gateway Timeout for url: http://gotenberg:3000/forms/libreoffice/convert
|
|
```
|
|
|
|
Gotenberg is a server that converts Office documents into PDF documents
|
|
and has a default timeout of 30 seconds. When conversion takes longer,
|
|
Gotenberg raises this error.
|
|
|
|
You can increase the timeout by configuring a command flag for Gotenberg
|
|
(see also [here](https://gotenberg.dev/docs/modules/api#properties)). If
|
|
using docker-compose, this is achieved by the following configuration
|
|
change in the `docker-compose.yml` file:
|
|
|
|
```yaml
|
|
# The gotenberg chromium route is used to convert .eml files. We do not
|
|
# want to allow external content like tracking pixels or even javascript.
|
|
command:
|
|
- 'gotenberg'
|
|
- '--chromium-disable-javascript=true'
|
|
- '--chromium-allow-list=file:///tmp/.*'
|
|
- '--api-timeout=60'
|
|
```
|
|
|
|
## Permission denied errors in the consumption directory
|
|
|
|
You might encounter errors such as:
|
|
|
|
```shell-session
|
|
The following error occured while consuming document.pdf: [Errno 13] Permission denied: '/usr/src/paperless/src/../consume/document.pdf'
|
|
```
|
|
|
|
This happens when paperless does not have permission to delete files
|
|
inside the consumption directory. Ensure that `USERMAP_UID` and
|
|
`USERMAP_GID` are set to the user id and group id you use on the host
|
|
operating system, if these are different from `1000`. See [Docker setup](/setup#docker_hub).
|
|
|
|
Also ensure that you are able to read and write to the consumption
|
|
directory on the host.
|
|
|
|
## OSError: \[Errno 19\] No such device when consuming files
|
|
|
|
If you experience errors such as:
|
|
|
|
```shell-session
|
|
File "/usr/local/lib/python3.7/site-packages/whoosh/codec/base.py", line 570, in open_compound_file
|
|
return CompoundStorage(dbfile, use_mmap=storage.supports_mmap)
|
|
File "/usr/local/lib/python3.7/site-packages/whoosh/filedb/compound.py", line 75, in __init__
|
|
self._source = mmap.mmap(fileno, 0, access=mmap.ACCESS_READ)
|
|
OSError: [Errno 19] No such device
|
|
|
|
During handling of the above exception, another exception occurred:
|
|
|
|
Traceback (most recent call last):
|
|
File "/usr/local/lib/python3.7/site-packages/django_q/cluster.py", line 436, in worker
|
|
res = f(*task["args"], **task["kwargs"])
|
|
File "/usr/src/paperless/src/documents/tasks.py", line 73, in consume_file
|
|
override_tag_ids=override_tag_ids)
|
|
File "/usr/src/paperless/src/documents/consumer.py", line 271, in try_consume_file
|
|
raise ConsumerError(e)
|
|
```
|
|
|
|
Paperless uses a search index to provide better and faster full text
|
|
searching. This search index is stored inside the `data` folder. The
|
|
search index uses memory-mapped files (mmap). The above error indicates
|
|
that paperless was unable to create and open these files.
|
|
|
|
This happens when you're trying to store the data directory on certain
|
|
file systems (mostly network shares) that don't support memory-mapped
|
|
files.
|
|
|
|
## Web-UI stuck at "Loading\..."
|
|
|
|
This might have multiple reasons.
|
|
|
|
1. If you built the docker image yourself or deployed using the bare
|
|
metal route, make sure that there are files in
|
|
`<paperless-root>/static/frontend/<lang-code>/`. If there are no
|
|
files, make sure that you executed `collectstatic` successfully,
|
|
either manually or as part of the docker image build.
|
|
|
|
If the front end is still missing, make sure that the front end is
|
|
compiled (files present in `src/documents/static/frontend`). If it
|
|
is not, you need to compile the front end yourself or download the
|
|
release archive instead of cloning the repository.
|
|
|
|
2. Check the output of the web server. You might see errors like this:
|
|
|
|
```
|
|
[2021-01-25 10:08:04 +0000] [40] [ERROR] Socket error processing request.
|
|
Traceback (most recent call last):
|
|
File "/usr/local/lib/python3.7/site-packages/gunicorn/workers/sync.py", line 134, in handle
|
|
self.handle_request(listener, req, client, addr)
|
|
File "/usr/local/lib/python3.7/site-packages/gunicorn/workers/sync.py", line 190, in handle_request
|
|
util.reraise(*sys.exc_info())
|
|
File "/usr/local/lib/python3.7/site-packages/gunicorn/util.py", line 625, in reraise
|
|
raise value
|
|
File "/usr/local/lib/python3.7/site-packages/gunicorn/workers/sync.py", line 178, in handle_request
|
|
resp.write_file(respiter)
|
|
File "/usr/local/lib/python3.7/site-packages/gunicorn/http/wsgi.py", line 396, in write_file
|
|
if not self.sendfile(respiter):
|
|
File "/usr/local/lib/python3.7/site-packages/gunicorn/http/wsgi.py", line 386, in sendfile
|
|
sent += os.sendfile(sockno, fileno, offset + sent, count)
|
|
OSError: [Errno 22] Invalid argument
|
|
```
|
|
|
|
To fix this issue, add
|
|
|
|
```
|
|
SENDFILE=0
|
|
```
|
|
|
|
to your `docker-compose.env` file.
|
|
|
|
## Error while reading metadata
|
|
|
|
You might find messages like these in your log files:
|
|
|
|
```
|
|
[WARNING] [paperless.parsing.tesseract] Error while reading metadata
|
|
```
|
|
|
|
This indicates that paperless failed to read PDF metadata from one of
|
|
your documents. This happens when you open the affected documents in
|
|
paperless for editing. Paperless will continue to work, and will simply
|
|
not show the invalid metadata.
|
|
|
|
## Consumer fails with a FileNotFoundError
|
|
|
|
You might find messages like these in your log files:
|
|
|
|
```
|
|
[ERROR] [paperless.consumer] Error while consuming document SCN_0001.pdf: FileNotFoundError: [Errno 2] No such file or directory: '/tmp/ocrmypdf.io.yhk3zbv0/origin.pdf'
|
|
Traceback (most recent call last):
|
|
File "/app/paperless/src/paperless_tesseract/parsers.py", line 261, in parse
|
|
ocrmypdf.ocr(**args)
|
|
File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/api.py", line 337, in ocr
|
|
return run_pipeline(options=options, plugin_manager=plugin_manager, api=True)
|
|
File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/_sync.py", line 385, in run_pipeline
|
|
exec_concurrent(context, executor)
|
|
File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/_sync.py", line 302, in exec_concurrent
|
|
pdf = post_process(pdf, context, executor)
|
|
File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/_sync.py", line 235, in post_process
|
|
pdf_out = metadata_fixup(pdf_out, context)
|
|
File "/usr/local/lib/python3.8/dist-packages/ocrmypdf/_pipeline.py", line 798, in metadata_fixup
|
|
with pikepdf.open(context.origin) as original, pikepdf.open(working_file) as pdf:
|
|
File "/usr/local/lib/python3.8/dist-packages/pikepdf/_methods.py", line 923, in open
|
|
pdf = Pdf._open(
|
|
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/ocrmypdf.io.yhk3zbv0/origin.pdf'
|
|
```
|
|
|
|
This probably indicates paperless tried to consume the same file twice.
|
|
This can happen for a number of reasons, depending on how documents are
|
|
placed into the consume folder. If paperless is using inotify (the
|
|
default) to check for documents, try adjusting the
|
|
[inotify configuration](/configuration#inotify). If polling is enabled, try adjusting the
|
|
[polling configuration](/configuration#polling).
|
|
|
|
## Consumer fails waiting for file to remain unmodified.
|
|
|
|
You might find messages like these in your log files:
|
|
|
|
```
|
|
[ERROR] [paperless.management.consumer] Timeout while waiting on file /usr/src/paperless/src/../consume/SCN_0001.pdf to remain unmodified.
|
|
```
|
|
|
|
This indicates paperless timed out while waiting for the file to be
|
|
completely written to the consume folder. Adjusting
|
|
[polling configuration](/configuration#polling) values should resolve the issue.
|
|
|
|
!!! note
|
|
|
|
The user will need to manually move the file out of the consume folder
|
|
and back in, for the initial failing file to be consumed.
|
|
|
|
## Consumer fails reporting "OS reports file as busy still".
|
|
|
|
You might find messages like these in your log files:
|
|
|
|
```
|
|
[WARNING] [paperless.management.consumer] Not consuming file /usr/src/paperless/src/../consume/SCN_0001.pdf: OS reports file as busy still
|
|
```
|
|
|
|
This indicates paperless was unable to open the file, as the OS reported
|
|
the file as still being in use. To prevent a crash, paperless did not
|
|
try to consume the file. If paperless is using inotify (the default) to
|
|
check for documents, try adjusting the
|
|
[inotify configuration](/configuration#inotify). If polling is enabled, try adjusting the
|
|
[polling configuration](/configuration#polling).
|
|
|
|
!!! note
|
|
|
|
The user will need to manually move the file out of the consume folder
|
|
and back in, for the initial failing file to be consumed.
|
|
|
|
## Log reports "Creating PaperlessTask failed".
|
|
|
|
You might find messages like these in your log files:
|
|
|
|
```
|
|
[ERROR] [paperless.management.consumer] Creating PaperlessTask failed: db locked
|
|
```
|
|
|
|
You are likely using an sqlite based installation, with an increased
|
|
number of workers and are running into sqlite's concurrency
|
|
limitations. Uploading or consuming multiple files at once results in
|
|
many workers attempting to access the database simultaneously.
|
|
|
|
Consider changing to the PostgreSQL database if you will be processing
|
|
many documents at once often. Otherwise, try tweaking the
|
|
`PAPERLESS_DB_TIMEOUT` setting to allow more time for the database to
|
|
unlock. This may have minor performance implications.
|
|
|
|
## gunicorn fails to start with "is not a valid port number"
|
|
|
|
You are likely running using Kubernetes, which automatically creates an
|
|
environment variable named `${serviceName}_PORT`. This is
|
|
the same environment variable which is used by Paperless to optionally
|
|
change the port gunicorn listens on.
|
|
|
|
To fix this, set `PAPERLESS_PORT` again to your desired port, or the
|
|
default of 8000.
|