mirror of
				https://github.com/paperless-ngx/paperless-ngx.git
				synced 2025-10-26 08:12:34 -04:00 
			
		
		
		
	
		
			
				
	
	
		
			218 lines
		
	
	
		
			6.5 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
			
		
		
	
	
			218 lines
		
	
	
		
			6.5 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
| .. _utilities:
 | |
| 
 | |
| Utilities
 | |
| =========
 | |
| 
 | |
| There's basically three utilities to Paperless: the webserver, consumer, and
 | |
| if needed, the exporter.  They're all detailed here.
 | |
| 
 | |
| 
 | |
| .. _utilities-webserver:
 | |
| 
 | |
| The Webserver
 | |
| -------------
 | |
| 
 | |
| At the heart of it, Paperless is a simple Django webservice, and the entire
 | |
| interface is based on Django's standard admin interface.  Once running, visiting
 | |
| the URL for your service delivers the admin, through which you can get a
 | |
| detailed listing of all available documents, search for specific files, and
 | |
| download whatever it is you're looking for.
 | |
| 
 | |
| 
 | |
| .. _utilities-webserver-howto:
 | |
| 
 | |
| How to Use It
 | |
| .............
 | |
| 
 | |
| The webserver is started via the ``manage.py`` script:
 | |
| 
 | |
| .. code-block:: shell-session
 | |
| 
 | |
|     $ /path/to/paperless/src/manage.py runserver
 | |
| 
 | |
| By default, the server runs on localhost, port 8000, but you can change this
 | |
| with a few arguments, run ``manage.py --help`` for more information.
 | |
| 
 | |
| Add the option ``--noreload`` to reduce resource usage. Otherwise, the server
 | |
| continuously polls all source files for changes to auto-reload them.
 | |
| 
 | |
| Note that when exiting this command your webserver will disappear.
 | |
| If you want to run this full-time (which is kind of the point)
 | |
| you'll need to have it start in the background -- something you'll need to
 | |
| figure out for your own system.  To get you started though, there are Systemd
 | |
| service files in the ``scripts`` directory.
 | |
| 
 | |
| 
 | |
| .. _utilities-consumer:
 | |
| 
 | |
| The Consumer
 | |
| ------------
 | |
| 
 | |
| The consumer script runs in an infinite loop, constantly looking at a directory
 | |
| for PDF files to parse and index.  The process is pretty straightforward:
 | |
| 
 | |
| 1. Look in ``CONSUMPTION_DIR`` for a PDF.  If one is found, go to #2.  If not,
 | |
|    wait 10 seconds and try again.
 | |
| 2. Parse the PDF with Tesseract
 | |
| 3. Create a new record in the database with the OCR'd text
 | |
| 4. Attempt to automatically assign document attributes by doing some guesswork.
 | |
|    Read up on the :ref:`guesswork documentation<guesswork>` for more
 | |
|    information about this process.
 | |
| 5. Encrypt the PDF and store it in the ``media`` directory under
 | |
|    ``documents/pdf``.
 | |
| 6. Go to #1.
 | |
| 
 | |
| 
 | |
| .. _utilities-consumer-howto:
 | |
| 
 | |
| How to Use It
 | |
| .............
 | |
| 
 | |
| The consumer is started via the ``manage.py`` script:
 | |
| 
 | |
| .. code-block:: shell-session
 | |
| 
 | |
|     $ /path/to/paperless/src/manage.py document_consumer
 | |
| 
 | |
| This starts the service that will run in a loop, consuming PDF files as they
 | |
| appear in ``CONSUMPTION_DIR``.
 | |
| 
 | |
| Note that this command runs continuously, so exiting it will mean your webserver
 | |
| disappears.  If you want to run this full-time (which is kind of the point)
 | |
| you'll need to have it start in the background -- something you'll need to
 | |
| figure out for your own system.  To get you started though, there are Systemd
 | |
| service files in the ``scripts`` directory.
 | |
| 
 | |
| Some command line arguments are available to customize the behavior of the
 | |
| consumer. By default it will use ``/etc/paperless.conf`` values. Display the
 | |
| help with:
 | |
| 
 | |
| .. code-block:: shell-session
 | |
| 
 | |
|     $ /path/to/paperless/src/manage.py document_consumer --help
 | |
| 
 | |
| .. _utilities-exporter:
 | |
| 
 | |
| The Exporter
 | |
| ------------
 | |
| 
 | |
| Tired of fiddling with Paperless, or just want to do something stupid and are
 | |
| afraid of accidentally damaging your files?  You can export all of your PDFs
 | |
| into neatly named, dated, and unencrypted.
 | |
| 
 | |
| 
 | |
| .. _utilities-exporter-howto:
 | |
| 
 | |
| How to Use It
 | |
| .............
 | |
| 
 | |
| This too is done via the ``manage.py`` script:
 | |
| 
 | |
| .. code-block:: shell-session
 | |
| 
 | |
|     $ /path/to/paperless/src/manage.py document_exporter /path/to/somewhere/
 | |
| 
 | |
| This will dump all of your unencrypted PDFs into ``/path/to/somewhere`` for you
 | |
| to do with as you please.  The files are accompanied with a special file,
 | |
| ``manifest.json`` which can be used to
 | |
| :ref:`import the files <utilities-importer>` at a later date if you wish.
 | |
| 
 | |
| 
 | |
| .. _utilities-exporter-howto-docker:
 | |
| 
 | |
| Docker
 | |
| ______
 | |
| 
 | |
| If you are :ref:`using Docker <setup-installation-docker>`, running the
 | |
| expoorter is almost as easy.  To mount a volume for exports, follow the
 | |
| instructions in the ``docker-compose.yml.example`` file for the ``/export``
 | |
| volume (making the changes in your own ``docker-compose.yml`` file, of course).
 | |
| Once you have the volume mounted, the command to run an export is:
 | |
| 
 | |
| .. code-block:: shell-session
 | |
| 
 | |
|    $ docker-compose run --rm consumer document_exporter /export
 | |
| 
 | |
| If you prefer to use ``docker run`` directly, supplying the necessary commandline
 | |
| options:
 | |
| 
 | |
| .. code-block:: shell-session
 | |
| 
 | |
|    $ # Identify your containers
 | |
|    $ docker-compose ps
 | |
|            Name                       Command                State     Ports
 | |
|    -------------------------------------------------------------------------
 | |
|    paperless_consumer_1    /sbin/docker-entrypoint.sh ...   Exit 0
 | |
|    paperless_webserver_1   /sbin/docker-entrypoint.sh ...   Exit 0
 | |
| 
 | |
|    $ # Make sure to replace your passphrase and remove or adapt the id mapping
 | |
|    $ docker run --rm \
 | |
|        --volumes-from paperless_data_1 \
 | |
|        --volume /path/to/arbitrary/place:/export \
 | |
|        -e PAPERLESS_PASSPHRASE=YOUR_PASSPHRASE \
 | |
|        -e USERMAP_UID=1000 -e USERMAP_GID=1000 \
 | |
|        paperless document_exporter /export
 | |
| 
 | |
| 
 | |
| .. _utilities-importer:
 | |
| 
 | |
| The Importer
 | |
| ------------
 | |
| 
 | |
| Looking to transfer Paperless data from one instance to another, or just want
 | |
| to restore from a backup?  This is your go-to toy.
 | |
| 
 | |
| 
 | |
| .. _utilities-importer-howto:
 | |
| 
 | |
| How to Use It
 | |
| .............
 | |
| 
 | |
| The importer works just like the exporter.  You point it at a directory, and
 | |
| the script does the rest of the work:
 | |
| 
 | |
| .. code-block:: shell-session
 | |
| 
 | |
|     $ /path/to/paperless/src/manage.py document_importer /path/to/somewhere/
 | |
| 
 | |
| Docker
 | |
| ______
 | |
| 
 | |
| Assuming that you've already gone through the steps above in the
 | |
| :ref:`export <utilities-exporter-howto-docker>` section, then the easiest thing
 | |
| to do is just re-use the ``/export`` path you already setup:
 | |
| 
 | |
| .. code-block:: shell-session
 | |
| 
 | |
|    $ docker-compose run --rm consumer document_importer /export
 | |
| 
 | |
| Similarly, if you're not using docker-compose, you can adjust the export
 | |
| instructions above to do the import.
 | |
| 
 | |
| 
 | |
| .. _utilities-retagger:
 | |
| 
 | |
| The Re-tagger
 | |
| -------------
 | |
| 
 | |
| Say you've imported a few hundred documents and now want to introduce a tag
 | |
| and apply its matching to all of the currently-imported docs.  This problem is
 | |
| common enough that there's a tool for it.
 | |
| 
 | |
| 
 | |
| .. _utilities-retagger-howto:
 | |
| 
 | |
| How to Use It
 | |
| .............
 | |
| 
 | |
| This too is done via the ``manage.py`` script:
 | |
| 
 | |
| .. code:: bash
 | |
| 
 | |
|     $ /path/to/paperless/src/manage.py document_retagger
 | |
| 
 | |
| That's it.  It'll loop over all of the documents in your database and attempt
 | |
| to match all of your tags to them.  If one matches, it'll be applied.  And
 | |
| don't worry, you can run this as often as you like, it' won't double-tag
 | |
| a document.
 |