paperless-ngx/docs/utilities.rst

.. _utilities:

Utilities
=========

There's basically three utilities to Paperless: the webserver, consumer, and
if needed, the exporter.  They're all detailed here.


.. _utilities-webserver:

The Webserver
-------------

At the heart of it, Paperless is a simple Django webservice, and the entire
interface is based on Django's standard admin interface.  Once running, visiting
the URL for your service delivers the admin, through which you can get a
detailed listing of all available documents, search for specific files, and
download whatever it is you're looking for.


.. _utilities-webserver-howto:

How to Use It
.............

The webserver is started via the ``manage.py`` script:

.. code-block:: shell-session

    $ /path/to/paperless/src/manage.py runserver

By default, the server runs on localhost, port 8000, but you can change this
with a few arguments, run ``manage.py --help`` for more information.

Note that this command runs continuously, so exiting it will mean your webserver
disappears.  If you want to run this full-time (which is kind of the point)
you'll need to have it start in the background -- something you'll need to
figure out for your own system.  To get you started though, there are Systemd
service files in the ``scripts`` directory.


.. _utilities-consumer:

The Consumer
------------

The consumer script runs in an infinite loop, constantly looking at a directory
for PDF files to parse and index.  The process is pretty straightforward:

1. Look in ``CONSUMPTION_DIR`` for a PDF.  If one is found, go to #2.  If not,
   wait 10 seconds and try again.
2. Parse the PDF with Tesseract
3. Create a new record in the database with the OCR'd text
4. Attempt to automatically assign document attributes by doing some guesswork.
   Read up on the :ref:`guesswork documentation<guesswork>` for more
   information about this process.
5. Encrypt the PDF and store it in the ``media`` directory under
   ``documents/pdf``.
6. Go to #1.


.. _utilities-consumer-howto:

How to Use It
.............

The consumer is started via the ``manage.py`` script:

.. code-block:: shell-session

    $ /path/to/paperless/src/manage.py document_consumer

This starts the service that will run in a loop, consuming PDF files as they
appear in ``CONSUMPTION_DIR``.

Note that this command runs continuously, so exiting it will mean your webserver
disappears.  If you want to run this full-time (which is kind of the point)
you'll need to have it start in the background -- something you'll need to
figure out for your own system.  To get you started though, there are Systemd
service files in the ``scripts`` directory.


.. _utilities-exporter:

The Exporter
------------

Tired of fiddling with Paperless, or just want to do something stupid and are
afraid of accidentally damaging your files?  You can export all of your PDFs
into neatly named, dated, and unencrypted.


.. _utilities-exporter-howto:

How to Use It
.............

This too is done via the ``manage.py`` script:

.. code-block:: shell-session

    $ /path/to/paperless/src/manage.py document_exporter /path/to/somewhere/

This will dump all of your unencrypted PDFs into ``/path/to/somewhere`` for you
to do with as you please.  The files are accompanied with a special file,
``manifest.json`` which can be used to
:ref:`import the files <utilities-importer>` at a later date if you wish.


.. _utilities-exporter-howto-docker:

Docker
______

If you are :ref:`using Docker <setup-installation-docker>`, running the
expoorter is almost as easy.  To mount a volume for exports, follow the
instructions in the ``docker-compose.yml.example`` file for the ``/export``
volume (making the changes in your own ``docker-compose.yml`` file, of course).
Once you have the volume mounted, the command to run an export is:

.. code-block:: shell-session

   $ docker-compose run --rm consumer document_exporter /export

If you prefer to use ``docker run`` directly, supplying the necessary commandline
options:

.. code-block:: shell-session

   $ # Identify your containers
   $ docker-compose ps
           Name                       Command                State     Ports
   -------------------------------------------------------------------------
   paperless_consumer_1    /sbin/docker-entrypoint.sh ...   Exit 0
   paperless_webserver_1   /sbin/docker-entrypoint.sh ...   Exit 0

   $ # Make sure to replace your passphrase and remove or adapt the id mapping
   $ docker run --rm \
       --volumes-from paperless_data_1 \
       --volume /path/to/arbitrary/place:/export \
       -e PAPERLESS_PASSPHRASE=YOUR_PASSPHRASE \
       -e USERMAP_UID=1000 -e USERMAP_GID=1000 \
       paperless document_exporter /export


.. _utilities-importer:

The Importer
------------

Looking to transfer Paperless data from one instance to another, or just want
to restore from a backup?  This is your go-to toy.


.. _utilities-importer-howto:

How to Use It
.............

The importer works just like the exporter.  You point it at a directory, and
the script does the rest of the work:

.. code-block:: shell-session

    $ /path/to/paperless/src/manage.py document_importer /path/to/somewhere/

Docker
______

Assuming that you've already gone through the steps above in the
:ref:`export <utilities-exporter-howto-docker>` section, then the easiest thing
to do is just re-use the ``/export`` path you already setup:

.. code-block:: shell-session

   $ docker-compose run --rm consumer document_importer /export

Similarly, if you're not using docker-compose, you can adjust the export
instructions above to do the import.


.. _utilities-retagger:

The Re-tagger
-------------

Say you've imported a few hundred documents and now want to introduce a tag
and apply its matching to all of the currently-imported docs.  This problem is
common enough that there's a tool for it.


.. _utilities-retagger-howto:

How to Use It
.............

This too is done via the ``manage.py`` script:

.. code:: bash

    $ /path/to/paperless/src/manage.py document_retagger

That's it.  It'll loop over all of the documents in your database and attempt
to match all of your tags to them.  If one matches, it'll be applied.  And
don't worry, you can run this as often as you like, it' won't double-tag
a document.