mirror of
				https://github.com/paperless-ngx/paperless-ngx.git
				synced 2025-10-26 08:12:34 -04:00 
			
		
		
		
	This commit adds a `Dockerfile` to the root of the project, accompanied
by a `docker-compose.yml.example` for simplified deployment. The
`Dockerfile` is agnostic to whether it will be the webserver, the
consumer, or if it is run for a one-off command (i.e. creation of a
superuser, migration of the database, document export, ...).
The containers entrypoint is the `scripts/docker-entrypoint.sh` script.
This script verifies that the required permissions are set, remaps the
default users and/or groups id if required and installs additional
languages if the user wishes to.
After initialization, it analyzes the command the user supplied:
  - If the command starts with a slash, it is expected that the user
    wants to execute a binary file and the command will be executed
    without further intervention. (Using `exec` to effectively replace
    the started shell-script and not have any reaping-issues.)
  - If the command does not start with a slash, the command will be
    passed directly to the `manage.py` script without further
    modification. (Again using `exec`.)
The default command is set to `--help`.
If the user wants to execute a command that is not meant for `manage.py`
but doesn't start with a slash, the Docker `--entrypoint` parameter can
be used to circumvent the mechanics of `docker-entrypoint.sh`.
Further information can be found in `docs/setup.rst` and in
`docs/migrating.rst`.
For additional convenience, a `Dockerfile` has been added to the `docs/`
directory which allows for easy building and serving of the
documentation. This is documented in `docs/requirements.rst`.
		
	
			
		
			
				
	
	
		
			286 lines
		
	
	
		
			12 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
			
		
		
	
	
			286 lines
		
	
	
		
			12 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
| .. _setup:
 | |
| 
 | |
| Setup
 | |
| =====
 | |
| 
 | |
| Paperless isn't a very complicated app, but there are a few components, so some
 | |
| basic documentation is in order.  If you go follow along in this document and
 | |
| still have trouble, please open an `issue on GitHub`_ so I can fill in the gaps.
 | |
| 
 | |
| .. _issue on GitHub: https://github.com/danielquinn/paperless/issues
 | |
| 
 | |
| 
 | |
| .. _setup-download:
 | |
| 
 | |
| Download
 | |
| --------
 | |
| 
 | |
| The source is currently only available via GitHub, so grab it from there, either
 | |
| by using ``git``:
 | |
| 
 | |
| .. code:: bash
 | |
| 
 | |
|     $ git clone https://github.com/danielquinn/paperless.git
 | |
|     $ cd paperless
 | |
| 
 | |
| or just download the tarball and go that route:
 | |
| 
 | |
| .. code:: bash
 | |
| 
 | |
|     $ wget https://github.com/danielquinn/paperless/archive/master.zip
 | |
|     $ unzip master.zip
 | |
|     $ cd paperless-master
 | |
| 
 | |
| 
 | |
| .. _setup-installation:
 | |
| 
 | |
| Installation & Configuration
 | |
| ----------------------------
 | |
| 
 | |
| You can go multiple routes with setting up and running Paperless. The `Vagrant
 | |
| route`_ is quick & easy, but means you're running a VM which comes with memory
 | |
| consumption etc. We also `support Docker`_, which you can use natively under
 | |
| Linux and in a VM with `Docker Machine`_ (this guide was written for native
 | |
| Docker usage under Linux, you might have to adapt it for Docker Machine.)
 | |
| Alternatively the standard, `bare metal`_ approach is a little more complicated.
 | |
| 
 | |
| .. _Vagrant route: setup-installation-vagrant_
 | |
| .. _support Docker: setup-installation-docker_
 | |
| .. _bare metal: setup-installation-standard_
 | |
| 
 | |
| .. _Docker Machine: https://docs.docker.com/machine/
 | |
| 
 | |
| .. _setup-installation-standard:
 | |
| 
 | |
| Standard (Bare Metal)
 | |
| .....................
 | |
| 
 | |
| 1. Install the requirements as per the :ref:`requirements <requirements>` page.
 | |
| 2. Change to the ``src`` directory in this repo.
 | |
| 3. Edit ``paperless/settings.py`` and be sure to set the values for:
 | |
|     * ``CONSUMPTION_DIR``: this is where your documents will be dumped to be
 | |
|       consumed by Paperless.
 | |
|     * ``PASSPHRASE``: this is the passphrase Paperless uses to encrypt/decrypt
 | |
|       the original document.  The default value attempts to source the
 | |
|       passphrase from the environment, so if you don't set it to a static value
 | |
|       here, you must set ``PAPERLESS_PASSPHRASE=some-secret-string`` on the
 | |
|       command line whenever invoking the consumer or webserver.
 | |
|     * ``OCR_THREADS``: this is the number of threads the OCR process will spawn
 | |
|       to process document pages in parallel. The default value gets sourced from
 | |
|       the environment-variable ``PAPERLESS_OCR_THREADS`` and expects it to be an
 | |
|       integer. If the variable is not set, Python determines the core-count of
 | |
|       your CPU and uses that value.
 | |
| 4. Initialise the database with ``./manage.py migrate``.
 | |
| 5. Create a user for your Paperless instance with
 | |
|    ``./manage.py createsuperuser``. Follow the prompts to create your user.
 | |
| 6. Start the webserver with ``./manage.py runserver <IP>:<PORT>``.
 | |
|    If no specifc IP or port are given, the default is ``127.0.0.1:8000``.
 | |
|    You should now be able to visit your (empty) `Paperless webserver`_ at
 | |
|    ``127.0.0.1:8000`` (or whatever you chose).  You can login with the
 | |
|    user/pass you created in #5.
 | |
| 7. In a separate window, change to the ``src`` directory in this repo again, but
 | |
|    this time, you should start the consumer script with
 | |
|    ``./manage.py document_consumer``.
 | |
| 8. Scan something.  Put it in the ``CONSUMPTION_DIR``.
 | |
| 9. Wait a few minutes
 | |
| 10. Visit the document list on your webserver, and it should be there, indexed
 | |
|     and downloadable.
 | |
| 
 | |
| .. _Paperless webserver: http://127.0.0.1:8000
 | |
| 
 | |
| 
 | |
| .. _setup-installation-vagrant:
 | |
| 
 | |
| Vagrant Method
 | |
| ..............
 | |
| 
 | |
| 1. Install `Vagrant`_.  How you do that is really between you and your OS.
 | |
| 2. Run ``vagrant up``.  An instance will start up for you.  When it's ready and
 | |
|    provisioned...
 | |
| 3. Run ``vagrant ssh`` and once inside your new vagrant box, edit
 | |
|    ``/opt/paperless/src/paperless/settings.py`` and set the values for:
 | |
|     * ``CONSUMPTION_DIR``: this is where your documents will be dumped to be
 | |
|       consumed by Paperless.
 | |
|     * ``PASSPHRASE``: this is the passphrase Paperless uses to encrypt/decrypt
 | |
|       the original document.  The default value attempts to source the
 | |
|       passphrase from the environment, so if you don't set it to a static value
 | |
|       here, you must set ``PAPERLESS_PASSPHRASE=some-secret-string`` on the
 | |
|       command line whenever invoking the consumer or webserver.
 | |
| 4. Initialise the database with ``/opt/paperless/src/manage.py migrate``.
 | |
| 5. Still inside your vagrant box, create a user for your Paperless instance with
 | |
|    ``/opt/paperless/src/manage.py createsuperuser``. Follow the prompts to
 | |
|    create your user.
 | |
| 6. Start the webserver with ``/opt/paperless/src/manage.py runserver 0.0.0.0:8000``.
 | |
|    You should now be able to visit your (empty) `Paperless webserver`_ at
 | |
|    ``172.28.128.4:8000``.  You can login with the user/pass you created in #5.
 | |
| 7. In a separate window, run ``vagrant ssh`` again, but this time once inside
 | |
|    your vagrant instance, you should start the consumer script with
 | |
|    ``/opt/paperless/src/manage.py document_consumer``.
 | |
| 8. Scan something.  Put it in the ``CONSUMPTION_DIR``.
 | |
| 9. Wait a few minutes
 | |
| 10. Visit the document list on your webserver, and it should be there, indexed
 | |
|     and downloadable.
 | |
| 
 | |
| .. _Vagrant: https://vagrantup.com/
 | |
| .. _Paperless server: http://172.28.128.4:8000
 | |
| 
 | |
| 
 | |
| .. _setup-installation-docker:
 | |
| 
 | |
| Docker Method
 | |
| .............
 | |
| 
 | |
| 1. Install `Docker`_.
 | |
| 
 | |
|    .. caution::
 | |
| 
 | |
|       As mentioned earlier, this guide assumes that you use Docker natively
 | |
|       under Linux. If you are using `Docker Machine`_ under Mac OS X or Windows,
 | |
|       you will have to adapt IP addresses, volume-mounting, command execution
 | |
|       and maybe more.
 | |
| 
 | |
| 2. Install `docker-compose`_. [#compose]_
 | |
| 
 | |
|    .. caution::
 | |
| 
 | |
|        If you want to use the included ``docker-compose.yml.example`` file, you
 | |
|        need to have at least Docker version **1.10.0** and docker-compose
 | |
|        version **1.6.0**.
 | |
| 
 | |
|        See the `Docker installation guide`_ on how to install the current
 | |
|        version of Docker for your operating system or Linux distribution of
 | |
|        choice. To get an up-to-date version of docker-compose, follow the
 | |
|        `docker-compose installation guide`_ if your package repository doesn't
 | |
|        include it.
 | |
| 
 | |
|        .. _Docker installation guide: https://docs.docker.com/engine/installation/
 | |
|        .. _docker-compose installation guide: https://docs.docker.com/compose/install/
 | |
| 
 | |
| 3. Create a copy of ``docker-compose.yml.example`` as ``docker-compose.yml``.
 | |
| 4. Modify ``docker-compose.env`` and adapt the following environment variables:
 | |
| 
 | |
|    ``PAPERLESS_PASSPHRASE``
 | |
|      This is the passphrase Paperless uses to encrypt/decrypt the original
 | |
|      document.
 | |
| 
 | |
|    ``PAPERLESS_OCR_THREADS``
 | |
|      This is the number of threads the OCR process will spawn to process
 | |
|      document pages in parallel. If the variable is not set, Python determines
 | |
|      the core-count of your CPU and uses that value.
 | |
| 
 | |
|    ``PAPERLESS_OCR_LANGUAGES``
 | |
|      If you want the OCR to recognize other languages in addition to the default
 | |
|      English, set this parameter to a space separated list of three-letter
 | |
|      language-codes after `ISO 639-2/T`_. For a list of available languages --
 | |
|      including their three letter codes -- see the `Debian packagelist`_.
 | |
| 
 | |
|    ``USERMAP_UID`` and ``USERMAP_GID``
 | |
|      If you want to mount the consumption volume (directory ``/consume`` within
 | |
|      the containers) to a host-directory -- which you probably want to do --
 | |
|      access rights might be an issue. The default user and group ``paperless``
 | |
|      in the containers have an id of 1000. The containers will enforce that the
 | |
|      owning group of the consumption directory will be ``paperless`` to be able
 | |
|      to delete consumed documents. If your host-system has a group with an id of
 | |
|      1000 and you don't want this group to have access rights to the consumption
 | |
|      directory, you can use ``USERMAP_GID`` to change the id in the container
 | |
|      and thus the one of the consumption directory. Furthermore, you can change
 | |
|      the id of the default user as well using ``USERMAP_UID``.
 | |
| 
 | |
| 5. Run ``docker-compose up -d``. This will create and start the necessary
 | |
|    containers.
 | |
| 6. To be able to login, you will need a super user. To create it, execute the
 | |
|    following command:
 | |
| 
 | |
|    .. code-block:: shell-session
 | |
| 
 | |
|        $ docker-compose run --rm webserver createsuperuser
 | |
| 
 | |
|    This will prompt you to set a username (default ``paperless``), an optional
 | |
|    e-mail address and finally a password.
 | |
| 7. The default ``docker-compose.yml`` exports the webserver on your local port
 | |
|    8000. If you haven't adapted this, you should now be able to visit your
 | |
|    `Paperless webserver`_ at ``http://127.0.0.1:8000``. You can login with the
 | |
|    user and password you just created.
 | |
| 8. Add files to consumption directory the way you prefer to. Following are two
 | |
|    possible options:
 | |
| 
 | |
|    1. Mount the consumption directory to a local host path by modifying your
 | |
|       ``docker-compose.yml``:
 | |
| 
 | |
|       .. code-block:: diff
 | |
| 
 | |
|          diff --git a/docker-compose.yml b/docker-compose.yml
 | |
|          --- a/docker-compose.yml
 | |
|          +++ b/docker-compose.yml
 | |
|          @@ -17,9 +18,8 @@ services:
 | |
|                   volumes:
 | |
|                       - paperless-data:/usr/src/paperless/data
 | |
|                       - paperless-media:/usr/src/paperless/media
 | |
|          -            - /consume
 | |
|          +            - /local/path/you/choose:/consume
 | |
| 
 | |
|       .. danger::
 | |
| 
 | |
|           While the consumption container will ensure at startup that it can
 | |
|           **delete** a consumed file from a host-mounted directory, it might not
 | |
|           be able to **read** the document in the first place if the access
 | |
|           rights to the file are incorrect.
 | |
| 
 | |
|           Make sure that the documents you put into the consumption directory
 | |
|           will either be readable by everyone (``chmod o+r file.pdf``) or
 | |
|           readable by the default user or group id 1000 (or the one you have set
 | |
|           with ``USERMAP_UID`` or ``USERMAP_GID`` respectively).
 | |
| 
 | |
|    2. Use ``docker cp`` to copy your files directly into the container:
 | |
| 
 | |
|       .. code-block:: shell-session
 | |
| 
 | |
|          $ # Identify your containers
 | |
|          $ docker-compose ps
 | |
|                  Name                       Command                State     Ports
 | |
|          -------------------------------------------------------------------------
 | |
|          paperless_consumer_1    /sbin/docker-entrypoint.sh ...   Exit 0
 | |
|          paperless_webserver_1   /sbin/docker-entrypoint.sh ...   Exit 0
 | |
| 
 | |
|          $ docker cp /path/to/your/file.pdf paperless_consumer_1:/consume
 | |
| 
 | |
|       ``docker cp`` is a one-shot-command, just like ``cp``. This means that
 | |
|       every time you want to consume a new document, you will have to execute
 | |
|       ``docker cp`` again. You can of course automate this process, but option 1
 | |
|       is generally the preferred one.
 | |
| 
 | |
|       .. danger::
 | |
| 
 | |
|           ``docker cp`` will change the owning user and group of a copied file
 | |
|           to the acting user at the destination, which will be ``root``.
 | |
| 
 | |
|           You therefore need to ensure that the documents you want to copy into
 | |
|           the container are readable by everyone (``chmod o+r file.pdf``) before
 | |
|           copying them.
 | |
| 
 | |
| 
 | |
| .. _Docker: https://www.docker.com/
 | |
| .. _docker-compose: https://docs.docker.com/compose/install/
 | |
| .. _ISO 639-2/T: https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes
 | |
| .. _Debian packagelist: https://packages.debian.org/search?suite=jessie&searchon=names&keywords=tesseract-ocr-
 | |
| 
 | |
| .. [#compose] You of course don't have to use docker-compose, but it
 | |
|    simplifies deployment immensely. If you know your way around Docker, feel
 | |
|    free to tinker around without using compose!
 | |
| 
 | |
| 
 | |
| .. _making-things-a-little-more-permanent:
 | |
| 
 | |
| Making Things a Little more Permanent
 | |
| -------------------------------------
 | |
| 
 | |
| Once you've tested things and are happy with the work flow, you can automate the
 | |
| process of starting the webserver and consumer automatically.  If you're running
 | |
| on a bare metal system that's using Systemd, you can use the service unit files
 | |
| in the ``scripts`` directory to set this up.  If you're on another startup
 | |
| system or are using a Vagrant box, then you're currently on your own. If you are
 | |
| using Docker, you can set a restart-policy_ in the ``docker-compose.yml`` to
 | |
| have the containers automatically start with the Docker daemon.
 | |
| 
 | |
| .. _restart-policy: https://docs.docker.com/engine/reference/commandline/run/#restart-policies-restart
 |