mirror of
				https://github.com/paperless-ngx/paperless-ngx.git
				synced 2025-10-26 08:12:34 -04:00 
			
		
		
		
	
		
			
				
	
	
		
			104 lines
		
	
	
		
			4.4 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
			
		
		
	
	
			104 lines
		
	
	
		
			4.4 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
| ***************
 | |
| Troubleshooting
 | |
| ***************
 | |
| 
 | |
| No files are added by the consumer
 | |
| ##################################
 | |
| 
 | |
| Check for the following issues:
 | |
| 
 | |
| *   Ensure that the directory you're putting your documents in is the folder
 | |
|     paperless is watching. With docker, this setting is performed in the
 | |
|     ``docker-compose.yml`` file. Without docker, look at the ``CONSUMPTION_DIR``
 | |
|     setting. Don't adjust this setting if you're using docker.
 | |
| *   Ensure that redis is up and running. Paperless does its task processing
 | |
|     asynchronously, and for documents to arrive at the task processor, it needs
 | |
|     redis to run.
 | |
| *   Ensure that the task processor is running. Docker does this automatically.
 | |
|     Manually invoke the task processor by executing
 | |
| 
 | |
|     .. code:: shell-session
 | |
| 
 | |
|         $ python3 manage.py qcluster
 | |
| 
 | |
| *   Look at the output of paperless and inspect it for any errors.
 | |
| *   Go to the admin interface, and check if there are failed tasks. If so, the
 | |
|     tasks will contain an error message.
 | |
| 
 | |
| 
 | |
| Consumer fails to pickup any new files
 | |
| ######################################
 | |
| 
 | |
| If you notice, that the consumer will only pickup files in the consumption
 | |
| directory at startup, but won't find any other files added later, check out
 | |
| the configuration file and enable filesystem polling with the setting
 | |
| ``PAPERLESS_CONSUMER_POLLING``.
 | |
| 
 | |
| 
 | |
| Consumer warns ``OCR for XX failed``
 | |
| ####################################
 | |
| 
 | |
| If you find the OCR accuracy to be too low, and/or the document consumer warns
 | |
| that ``OCR for XX failed, but we're going to stick with what we've got since
 | |
| FORGIVING_OCR is enabled``, then you might need to install the
 | |
| `Tesseract language files <http://packages.ubuntu.com/search?keywords=tesseract-ocr>`_
 | |
| marching your document's languages.
 | |
| 
 | |
| As an example, if you are running Paperless from any Ubuntu or Debian
 | |
| box, and your documents are written in Spanish you may need to run::
 | |
| 
 | |
|     apt-get install -y tesseract-ocr-spa
 | |
| 
 | |
| 
 | |
| 
 | |
| Consumer dies with ``convert: unable to extent pixel cache``
 | |
| ############################################################
 | |
| 
 | |
| During the consumption process, Paperless invokes ImageMagick's ``convert``
 | |
| program to translate the source document into something that the OCR engine can
 | |
| understand and this can burn a Very Large amount of memory if the original
 | |
| document is rather long.  Similarly, if your system doesn't have a lot of
 | |
| memory to begin with (ie. a Raspberry Pi), then this can happen for even
 | |
| medium-sized documents.
 | |
| 
 | |
| The solution is to tell ImageMagick *not* to Use All The RAM, as is its
 | |
| default, and instead tell it to used a fixed amount.  ``convert`` will then
 | |
| break up the job into hundreds of individual files and use them to slowly
 | |
| compile the finished image.  Simply set ``PAPERLESS_CONVERT_MEMORY_LIMIT`` in
 | |
| ``/etc/paperless.conf`` to something like ``32000000`` and you'll limit
 | |
| ``convert`` to 32MB.  Fiddle with this value as you like.
 | |
| 
 | |
| **HOWEVER**: Simply setting this value may not be enough on system where
 | |
| ``/tmp`` is mounted as tmpfs, as this is where ``convert`` will write its
 | |
| temporary files.  In these cases (most Systemd machines), you need to tell
 | |
| ImageMagick to use a different space for its scratch work.  You do this by
 | |
| setting ``PAPERLESS_CONVERT_TMPDIR`` in ``/etc/paperless.conf`` to somewhere
 | |
| that's actually on a physical disk (and writable by the user running
 | |
| Paperless), like ``/var/tmp/paperless`` or ``/home/my_user/tmp`` in a pinch.
 | |
| 
 | |
| 
 | |
| DecompressionBombWarning and/or no text in the OCR output
 | |
| #########################################################
 | |
| 
 | |
| Some users have had issues using Paperless to consume PDFs that were created
 | |
| by merging Very Large Scanned Images into one PDF.  If this happens to you,
 | |
| it's likely because the PDF you've created contains some very large pages
 | |
| (millions of pixels) and the process of converting the PDF to a OCR-friendly
 | |
| image is exploding.
 | |
| 
 | |
| Typically, this happens because the scanned images are created with a high
 | |
| DPI and then rolled into the PDF with an assumed DPI of 72 (the default).
 | |
| The best solution then is to specify the DPI used in the scan in the
 | |
| conversion-to-PDF step.  So for example, if you scanned the original image
 | |
| with a DPI of 300, then merging the images into the single PDF with
 | |
| ``convert`` should look like this:
 | |
| 
 | |
| .. code:: bash
 | |
| 
 | |
|     $ convert -density 300 *.jpg finished.pdf
 | |
| 
 | |
| For more information on this and situations like it, you should take a look
 | |
| at `Issue #118`_ as that's where this tip originated.
 | |
| 
 | |
| .. _Issue #118: https://github.com/the-paperless-project/paperless/issues/118
 |