mirror of
				https://github.com/paperless-ngx/paperless-ngx.git
				synced 2025-11-03 11:07:13 -05:00 
			
		
		
		
	
		
			
				
	
	
		
			219 lines
		
	
	
		
			6.6 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
			
		
		
	
	
			219 lines
		
	
	
		
			6.6 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
.. _utilities:
 | 
						|
 | 
						|
Utilities
 | 
						|
=========
 | 
						|
 | 
						|
There's basically three utilities to Paperless: the webserver, consumer, and
 | 
						|
if needed, the exporter.  They're all detailed here.
 | 
						|
 | 
						|
 | 
						|
.. _utilities-webserver:
 | 
						|
 | 
						|
The Webserver
 | 
						|
-------------
 | 
						|
 | 
						|
At the heart of it, Paperless is a simple Django webservice, and the entire
 | 
						|
interface is based on Django's standard admin interface.  Once running, visiting
 | 
						|
the URL for your service delivers the admin, through which you can get a
 | 
						|
detailed listing of all available documents, search for specific files, and
 | 
						|
download whatever it is you're looking for.
 | 
						|
 | 
						|
 | 
						|
.. _utilities-webserver-howto:
 | 
						|
 | 
						|
How to Use It
 | 
						|
.............
 | 
						|
 | 
						|
The webserver is started via the ``manage.py`` script:
 | 
						|
 | 
						|
.. code-block:: shell-session
 | 
						|
 | 
						|
    $ /path/to/paperless/src/manage.py runserver
 | 
						|
 | 
						|
By default, the server runs on localhost, port 8000, but you can change this
 | 
						|
with a few arguments, run ``manage.py --help`` for more information.
 | 
						|
 | 
						|
Add the option ``--noreload`` to reduce resource usage. Otherwise, the server
 | 
						|
continuously polls all source files for changes to auto-reload them.
 | 
						|
 | 
						|
Note that when exiting this command your webserver will disappear.
 | 
						|
If you want to run this full-time (which is kind of the point)
 | 
						|
you'll need to have it start in the background -- something you'll need to
 | 
						|
figure out for your own system.  To get you started though, there are Systemd
 | 
						|
service files in the ``scripts`` directory.
 | 
						|
 | 
						|
 | 
						|
.. _utilities-consumer:
 | 
						|
 | 
						|
The Consumer
 | 
						|
------------
 | 
						|
 | 
						|
The consumer script runs in an infinite loop, constantly looking at a directory
 | 
						|
for documents to parse and index.  The process is pretty straightforward:
 | 
						|
 | 
						|
1. Look in ``CONSUMPTION_DIR`` for a document.  If one is found, go to #2.
 | 
						|
   If not, wait 10 seconds and try again.  On Linux, new documents are detected
 | 
						|
   instantly via inotify, so there's no waiting involved.
 | 
						|
2. Parse the document with Tesseract
 | 
						|
3. Create a new record in the database with the OCR'd text
 | 
						|
4. Attempt to automatically assign document attributes by doing some guesswork.
 | 
						|
   Read up on the :ref:`guesswork documentation<guesswork>` for more
 | 
						|
   information about this process.
 | 
						|
5. Encrypt the document (if you have a passphrase set) and store it in the
 | 
						|
   ``media`` directory under ``documents/originals``.
 | 
						|
6. Go to #1.
 | 
						|
 | 
						|
 | 
						|
.. _utilities-consumer-howto:
 | 
						|
 | 
						|
How to Use It
 | 
						|
.............
 | 
						|
 | 
						|
The consumer is started via the ``manage.py`` script:
 | 
						|
 | 
						|
.. code-block:: shell-session
 | 
						|
 | 
						|
    $ /path/to/paperless/src/manage.py document_consumer
 | 
						|
 | 
						|
This starts the service that will consume documents as they appear in
 | 
						|
``CONSUMPTION_DIR``.
 | 
						|
 | 
						|
Note that this command runs continuously, so exiting it will mean your webserver
 | 
						|
disappears.  If you want to run this full-time (which is kind of the point)
 | 
						|
you'll need to have it start in the background -- something you'll need to
 | 
						|
figure out for your own system.  To get you started though, there are Systemd
 | 
						|
service files in the ``scripts`` directory.
 | 
						|
 | 
						|
Some command line arguments are available to customize the behavior of the
 | 
						|
consumer. By default it will use ``/etc/paperless.conf`` values. Display the
 | 
						|
help with:
 | 
						|
 | 
						|
.. code-block:: shell-session
 | 
						|
 | 
						|
    $ /path/to/paperless/src/manage.py document_consumer --help
 | 
						|
 | 
						|
.. _utilities-exporter:
 | 
						|
 | 
						|
The Exporter
 | 
						|
------------
 | 
						|
 | 
						|
Tired of fiddling with Paperless, or just want to do something stupid and are
 | 
						|
afraid of accidentally damaging your files?  You can export all of your
 | 
						|
documents into neatly named, dated, and unencrypted files.
 | 
						|
 | 
						|
 | 
						|
.. _utilities-exporter-howto:
 | 
						|
 | 
						|
How to Use It
 | 
						|
.............
 | 
						|
 | 
						|
This too is done via the ``manage.py`` script:
 | 
						|
 | 
						|
.. code-block:: shell-session
 | 
						|
 | 
						|
    $ /path/to/paperless/src/manage.py document_exporter /path/to/somewhere/
 | 
						|
 | 
						|
This will dump all of your unencrypted documents into ``/path/to/somewhere``
 | 
						|
for you to do with as you please.  The files are accompanied with a special
 | 
						|
file, ``manifest.json`` which can be used to :ref:`import the files
 | 
						|
<utilities-importer>` at a later date if you wish.
 | 
						|
 | 
						|
 | 
						|
.. _utilities-exporter-howto-docker:
 | 
						|
 | 
						|
Docker
 | 
						|
______
 | 
						|
 | 
						|
If you are :ref:`using Docker <setup-installation-docker>`, running the
 | 
						|
expoorter is almost as easy.  To mount a volume for exports, follow the
 | 
						|
instructions in the ``docker-compose.yml.example`` file for the ``/export``
 | 
						|
volume (making the changes in your own ``docker-compose.yml`` file, of course).
 | 
						|
Once you have the volume mounted, the command to run an export is:
 | 
						|
 | 
						|
.. code-block:: shell-session
 | 
						|
 | 
						|
   $ docker-compose run --rm consumer document_exporter /export
 | 
						|
 | 
						|
If you prefer to use ``docker run`` directly, supplying the necessary commandline
 | 
						|
options:
 | 
						|
 | 
						|
.. code-block:: shell-session
 | 
						|
 | 
						|
   $ # Identify your containers
 | 
						|
   $ docker-compose ps
 | 
						|
           Name                       Command                State     Ports
 | 
						|
   -------------------------------------------------------------------------
 | 
						|
   paperless_consumer_1    /sbin/docker-entrypoint.sh ...   Exit 0
 | 
						|
   paperless_webserver_1   /sbin/docker-entrypoint.sh ...   Exit 0
 | 
						|
 | 
						|
   $ # Make sure to replace your passphrase and remove or adapt the id mapping
 | 
						|
   $ docker run --rm \
 | 
						|
       --volumes-from paperless_data_1 \
 | 
						|
       --volume /path/to/arbitrary/place:/export \
 | 
						|
       -e PAPERLESS_PASSPHRASE=YOUR_PASSPHRASE \
 | 
						|
       -e USERMAP_UID=1000 -e USERMAP_GID=1000 \
 | 
						|
       paperless document_exporter /export
 | 
						|
 | 
						|
 | 
						|
.. _utilities-importer:
 | 
						|
 | 
						|
The Importer
 | 
						|
------------
 | 
						|
 | 
						|
Looking to transfer Paperless data from one instance to another, or just want
 | 
						|
to restore from a backup?  This is your go-to toy.
 | 
						|
 | 
						|
 | 
						|
.. _utilities-importer-howto:
 | 
						|
 | 
						|
How to Use It
 | 
						|
.............
 | 
						|
 | 
						|
The importer works just like the exporter.  You point it at a directory, and
 | 
						|
the script does the rest of the work:
 | 
						|
 | 
						|
.. code-block:: shell-session
 | 
						|
 | 
						|
    $ /path/to/paperless/src/manage.py document_importer /path/to/somewhere/
 | 
						|
 | 
						|
Docker
 | 
						|
______
 | 
						|
 | 
						|
Assuming that you've already gone through the steps above in the
 | 
						|
:ref:`export <utilities-exporter-howto-docker>` section, then the easiest thing
 | 
						|
to do is just re-use the ``/export`` path you already setup:
 | 
						|
 | 
						|
.. code-block:: shell-session
 | 
						|
 | 
						|
   $ docker-compose run --rm consumer document_importer /export
 | 
						|
 | 
						|
Similarly, if you're not using docker-compose, you can adjust the export
 | 
						|
instructions above to do the import.
 | 
						|
 | 
						|
 | 
						|
.. _utilities-retagger:
 | 
						|
 | 
						|
The Re-tagger
 | 
						|
-------------
 | 
						|
 | 
						|
Say you've imported a few hundred documents and now want to introduce a tag
 | 
						|
and apply its matching to all of the currently-imported docs.  This problem is
 | 
						|
common enough that there's a tool for it.
 | 
						|
 | 
						|
 | 
						|
.. _utilities-retagger-howto:
 | 
						|
 | 
						|
How to Use It
 | 
						|
.............
 | 
						|
 | 
						|
This too is done via the ``manage.py`` script:
 | 
						|
 | 
						|
.. code:: bash
 | 
						|
 | 
						|
    $ /path/to/paperless/src/manage.py document_retagger
 | 
						|
 | 
						|
That's it.  It'll loop over all of the documents in your database and attempt
 | 
						|
to match all of your tags to them.  If one matches, it'll be applied.  And
 | 
						|
don't worry, you can run this as often as you like, it' won't double-tag
 | 
						|
a document.
 |