mirror of
				https://github.com/paperless-ngx/paperless-ngx.git
				synced 2025-11-03 19:17:13 -05:00 
			
		
		
		
	
		
			
				
	
	
		
			104 lines
		
	
	
		
			4.4 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
			
		
		
	
	
			104 lines
		
	
	
		
			4.4 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
***************
 | 
						|
Troubleshooting
 | 
						|
***************
 | 
						|
 | 
						|
No files are added by the consumer
 | 
						|
##################################
 | 
						|
 | 
						|
Check for the following issues:
 | 
						|
 | 
						|
*   Ensure that the directory you're putting your documents in is the folder
 | 
						|
    paperless is watching. With docker, this setting is performed in the
 | 
						|
    ``docker-compose.yml`` file. Without docker, look at the ``CONSUMPTION_DIR``
 | 
						|
    setting. Don't adjust this setting if you're using docker.
 | 
						|
*   Ensure that redis is up and running. Paperless does its task processing
 | 
						|
    asynchronously, and for documents to arrive at the task processor, it needs
 | 
						|
    redis to run.
 | 
						|
*   Ensure that the task processor is running. Docker does this automatically.
 | 
						|
    Manually invoke the task processor by executing
 | 
						|
 | 
						|
    .. code:: shell-session
 | 
						|
 | 
						|
        $ python3 manage.py qcluster
 | 
						|
 | 
						|
*   Look at the output of paperless and inspect it for any errors.
 | 
						|
*   Go to the admin interface, and check if there are failed tasks. If so, the
 | 
						|
    tasks will contain an error message.
 | 
						|
 | 
						|
 | 
						|
Consumer fails to pickup any new files
 | 
						|
######################################
 | 
						|
 | 
						|
If you notice, that the consumer will only pickup files in the consumption
 | 
						|
directory at startup, but won't find any other files added later, check out
 | 
						|
the configuration file and enable filesystem polling with the setting
 | 
						|
``PAPERLESS_CONSUMER_POLLING``.
 | 
						|
 | 
						|
 | 
						|
Consumer warns ``OCR for XX failed``
 | 
						|
####################################
 | 
						|
 | 
						|
If you find the OCR accuracy to be too low, and/or the document consumer warns
 | 
						|
that ``OCR for XX failed, but we're going to stick with what we've got since
 | 
						|
FORGIVING_OCR is enabled``, then you might need to install the
 | 
						|
`Tesseract language files <http://packages.ubuntu.com/search?keywords=tesseract-ocr>`_
 | 
						|
marching your document's languages.
 | 
						|
 | 
						|
As an example, if you are running Paperless from any Ubuntu or Debian
 | 
						|
box, and your documents are written in Spanish you may need to run::
 | 
						|
 | 
						|
    apt-get install -y tesseract-ocr-spa
 | 
						|
 | 
						|
 | 
						|
 | 
						|
Consumer dies with ``convert: unable to extent pixel cache``
 | 
						|
############################################################
 | 
						|
 | 
						|
During the consumption process, Paperless invokes ImageMagick's ``convert``
 | 
						|
program to translate the source document into something that the OCR engine can
 | 
						|
understand and this can burn a Very Large amount of memory if the original
 | 
						|
document is rather long.  Similarly, if your system doesn't have a lot of
 | 
						|
memory to begin with (ie. a Raspberry Pi), then this can happen for even
 | 
						|
medium-sized documents.
 | 
						|
 | 
						|
The solution is to tell ImageMagick *not* to Use All The RAM, as is its
 | 
						|
default, and instead tell it to used a fixed amount.  ``convert`` will then
 | 
						|
break up the job into hundreds of individual files and use them to slowly
 | 
						|
compile the finished image.  Simply set ``PAPERLESS_CONVERT_MEMORY_LIMIT`` in
 | 
						|
``/etc/paperless.conf`` to something like ``32000000`` and you'll limit
 | 
						|
``convert`` to 32MB.  Fiddle with this value as you like.
 | 
						|
 | 
						|
**HOWEVER**: Simply setting this value may not be enough on system where
 | 
						|
``/tmp`` is mounted as tmpfs, as this is where ``convert`` will write its
 | 
						|
temporary files.  In these cases (most Systemd machines), you need to tell
 | 
						|
ImageMagick to use a different space for its scratch work.  You do this by
 | 
						|
setting ``PAPERLESS_CONVERT_TMPDIR`` in ``/etc/paperless.conf`` to somewhere
 | 
						|
that's actually on a physical disk (and writable by the user running
 | 
						|
Paperless), like ``/var/tmp/paperless`` or ``/home/my_user/tmp`` in a pinch.
 | 
						|
 | 
						|
 | 
						|
DecompressionBombWarning and/or no text in the OCR output
 | 
						|
#########################################################
 | 
						|
 | 
						|
Some users have had issues using Paperless to consume PDFs that were created
 | 
						|
by merging Very Large Scanned Images into one PDF.  If this happens to you,
 | 
						|
it's likely because the PDF you've created contains some very large pages
 | 
						|
(millions of pixels) and the process of converting the PDF to a OCR-friendly
 | 
						|
image is exploding.
 | 
						|
 | 
						|
Typically, this happens because the scanned images are created with a high
 | 
						|
DPI and then rolled into the PDF with an assumed DPI of 72 (the default).
 | 
						|
The best solution then is to specify the DPI used in the scan in the
 | 
						|
conversion-to-PDF step.  So for example, if you scanned the original image
 | 
						|
with a DPI of 300, then merging the images into the single PDF with
 | 
						|
``convert`` should look like this:
 | 
						|
 | 
						|
.. code:: bash
 | 
						|
 | 
						|
    $ convert -density 300 *.jpg finished.pdf
 | 
						|
 | 
						|
For more information on this and situations like it, you should take a look
 | 
						|
at `Issue #118`_ as that's where this tip originated.
 | 
						|
 | 
						|
.. _Issue #118: https://github.com/the-paperless-project/paperless/issues/118
 |