mirror of
				https://github.com/paperless-ngx/paperless-ngx.git
				synced 2025-11-03 11:07:13 -05:00 
			
		
		
		
	
		
			
				
	
	
		
			205 lines
		
	
	
		
			6.0 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
			
		
		
	
	
			205 lines
		
	
	
		
			6.0 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
.. _utilities:
 | 
						|
 | 
						|
Utilities
 | 
						|
=========
 | 
						|
 | 
						|
There's basically three utilities to *Paperless*: the webserver, consumer, and
 | 
						|
if needed, the exporter.  They're all detailed here.
 | 
						|
 | 
						|
 | 
						|
.. _utilities-webserver:
 | 
						|
 | 
						|
The Webserver
 | 
						|
-------------
 | 
						|
 | 
						|
At the heart of it, *Paperless* is a simple Django webservice, and the entire
 | 
						|
interface is based on Django's standard admin interface.  Once running, visiting
 | 
						|
the URL for your service delivers the admin, through which you can get a
 | 
						|
detailed listing of all available documents, search for specific files, and
 | 
						|
download whatever it is you're looking for.
 | 
						|
 | 
						|
 | 
						|
.. _utilities-webserver-howto:
 | 
						|
 | 
						|
How to Use It
 | 
						|
.............
 | 
						|
 | 
						|
The webserver is started via the ``manage.py`` script:
 | 
						|
 | 
						|
.. code-block:: shell-session
 | 
						|
 | 
						|
    $ /path/to/paperless/src/manage.py runserver
 | 
						|
 | 
						|
By default, the server runs on localhost, port 8000, but you can change this
 | 
						|
with a few arguments, run ``manage.py --help`` for more information.
 | 
						|
 | 
						|
Note that this command runs continuously, so exiting it will mean your webserver
 | 
						|
disappears.  If you want to run this full-time (which is kind of the point)
 | 
						|
you'll need to have it start in the background -- something you'll need to
 | 
						|
figure out for your own system.  To get you started though, there are Systemd
 | 
						|
service files in the ``scripts`` directory.
 | 
						|
 | 
						|
 | 
						|
.. _utilities-consumer:
 | 
						|
 | 
						|
The Consumer
 | 
						|
------------
 | 
						|
 | 
						|
The consumer script runs in an infinite loop, constantly looking at a directory
 | 
						|
for PDF files to parse and index.  The process is pretty straightforward:
 | 
						|
 | 
						|
1. Look in ``CONSUMPTION_DIR`` for a PDF.  If one is found, go to #2.  If not,
 | 
						|
   wait 10 seconds and try again.
 | 
						|
2. Parse the PDF with Tesseract
 | 
						|
3. Create a new record in the database with the OCR'd text
 | 
						|
4. Encrypt the PDF and store it in the ``media`` directory under
 | 
						|
   ``documents/pdf``.
 | 
						|
5. Go to #1.
 | 
						|
 | 
						|
 | 
						|
.. _utilities-consumer-howto:
 | 
						|
 | 
						|
How to Use It
 | 
						|
.............
 | 
						|
 | 
						|
The consumer is started via the ``manage.py`` script:
 | 
						|
 | 
						|
.. code-block:: shell-session
 | 
						|
 | 
						|
    $ /path/to/paperless/src/manage.py document_consumer
 | 
						|
 | 
						|
This starts the service that will run in a loop, consuming PDF files as they
 | 
						|
appear in ``CONSUMPTION_DIR``.
 | 
						|
 | 
						|
Note that this command runs continuously, so exiting it will mean your webserver
 | 
						|
disappears.  If you want to run this full-time (which is kind of the point)
 | 
						|
you'll need to have it start in the background -- something you'll need to
 | 
						|
figure out for your own system.  To get you started though, there are Systemd
 | 
						|
service files in the ``scripts`` directory.
 | 
						|
 | 
						|
 | 
						|
.. _utilities-exporter:
 | 
						|
 | 
						|
The Exporter
 | 
						|
------------
 | 
						|
 | 
						|
Tired of fiddling with *Paperless*, or just want to do something stupid and are
 | 
						|
afraid of accidentally damaging your files?  You can export all of your PDFs
 | 
						|
into neatly named, dated, and unencrypted.
 | 
						|
 | 
						|
 | 
						|
.. _utilities-exporter-howto:
 | 
						|
 | 
						|
How to Use It
 | 
						|
.............
 | 
						|
 | 
						|
This too is done via the ``manage.py`` script:
 | 
						|
 | 
						|
.. code-block:: shell-session
 | 
						|
 | 
						|
    $ /path/to/paperless/src/manage.py document_exporter /path/to/somewhere/
 | 
						|
 | 
						|
This will dump all of your unencrypted PDFs into ``/path/to/somewhere`` for you
 | 
						|
to do with as you please.  The files are accompanied with a special file,
 | 
						|
``manifest.json`` which can be used to
 | 
						|
:ref:`import the files <utilities-importer>` at a later date if you wish.
 | 
						|
 | 
						|
 | 
						|
.. _utilities-exporter-howto-docker:
 | 
						|
 | 
						|
Docker
 | 
						|
______
 | 
						|
 | 
						|
If you are :ref:`using Docker <setup-installation-docker>`, running the
 | 
						|
expoorter is almost as easy.  To mount a volume for exports, follow the
 | 
						|
instructions in the ``docker-compose.yml.example`` file for the ``/export``
 | 
						|
volume (making the changes in your own ``docker-compose.yml`` file, of course).
 | 
						|
Once you have the volume mounted, the command to run an export is:
 | 
						|
 | 
						|
.. code-block:: shell-session
 | 
						|
 | 
						|
   $ docker-compose run --rm consumer document_exporter /export
 | 
						|
 | 
						|
If you prefer to use ``docker run`` directly, supplying the necessary commandline
 | 
						|
options:
 | 
						|
 | 
						|
.. code-block:: shell-session
 | 
						|
 | 
						|
   $ # Identify your containers
 | 
						|
   $ docker-compose ps
 | 
						|
           Name                       Command                State     Ports
 | 
						|
   -------------------------------------------------------------------------
 | 
						|
   paperless_consumer_1    /sbin/docker-entrypoint.sh ...   Exit 0
 | 
						|
   paperless_webserver_1   /sbin/docker-entrypoint.sh ...   Exit 0
 | 
						|
 | 
						|
   $ # Make sure to replace your passphrase and remove or adapt the id mapping
 | 
						|
   $ docker run --rm \
 | 
						|
       --volumes-from paperless_data_1 \
 | 
						|
       --volume /path/to/arbitrary/place:/export \
 | 
						|
       -e PAPERLESS_PASSPHRASE=YOUR_PASSPHRASE \
 | 
						|
       -e USERMAP_UID=1000 -e USERMAP_GID=1000 \
 | 
						|
       paperless document_exporter /export
 | 
						|
 | 
						|
 | 
						|
.. _utilities-importer:
 | 
						|
 | 
						|
The Importer
 | 
						|
------------
 | 
						|
 | 
						|
Looking to transfer Paperless data from one instance to another, or just want
 | 
						|
to restore from a backup?  This is your go-to toy.
 | 
						|
 | 
						|
 | 
						|
.. _utilities-importer-howto:
 | 
						|
 | 
						|
How to Use It
 | 
						|
.............
 | 
						|
 | 
						|
The importer works just like the exporter.  You point it at a directory, and
 | 
						|
the script does the rest of the work:
 | 
						|
 | 
						|
.. code-block:: shell-session
 | 
						|
 | 
						|
    $ /path/to/paperless/src/manage.py document_importer /path/to/somewhere/
 | 
						|
 | 
						|
Docker
 | 
						|
______
 | 
						|
 | 
						|
Assuming that you've already gone through the steps above in the
 | 
						|
:ref:`export <utilities-exporter-howto-docker>` section, then the easiest thing
 | 
						|
to do is just re-use the ``/export`` path you already setup:
 | 
						|
 | 
						|
.. code-block:: shell-session
 | 
						|
 | 
						|
   $ docker-compose run --rm consumer document_importer /export
 | 
						|
 | 
						|
Similarly, if you're not using docker-compose, you can adjust the export
 | 
						|
instructions above to do the import.
 | 
						|
 | 
						|
 | 
						|
.. _utilities-retagger:
 | 
						|
 | 
						|
The Re-tagger
 | 
						|
-------------
 | 
						|
 | 
						|
Say you've imported a few hundred documents and now want to introduce a tag
 | 
						|
and apply its matching to all of the currently-imported docs.  This problem is
 | 
						|
common enough that there's a tool for it.
 | 
						|
 | 
						|
 | 
						|
.. _utilities-retagger-howto:
 | 
						|
 | 
						|
How to Use It
 | 
						|
.............
 | 
						|
 | 
						|
This too is done via the ``manage.py`` script:
 | 
						|
 | 
						|
.. code:: bash
 | 
						|
 | 
						|
    $ /path/to/paperless/src/manage.py document_retagger
 | 
						|
 | 
						|
That's it.  It'll loop over all of the documents in your database and attempt
 | 
						|
to match all of your tags to them.  If one matches, it'll be applied.  And
 | 
						|
don't worry, you can run this as often as you like, it' won't double-tag
 | 
						|
a document.
 |