mirror of
				https://github.com/paperless-ngx/paperless-ngx.git
				synced 2025-11-03 19:17:13 -05:00 
			
		
		
		
	Merge branch 'master' into issue/81
This commit is contained in:
		
						commit
						49b56425e8
					
				@ -24,8 +24,11 @@ How it Works
 | 
				
			|||||||
 | 
					
 | 
				
			||||||
1. Buy a document scanner like `this one`_.
 | 
					1. Buy a document scanner like `this one`_.
 | 
				
			||||||
2. Set it up to "scan to FTP" or something similar. It should be able to push
 | 
					2. Set it up to "scan to FTP" or something similar. It should be able to push
 | 
				
			||||||
   scanned images to a server without you having to do anything.
 | 
					   scanned images to a server without you having to do anything.  If your
 | 
				
			||||||
3. Have the target server run the *Paperless* consumption script to OCR the PDF
 | 
					   scanner doesn't know how to automatically upload the file somewhere, you can
 | 
				
			||||||
 | 
					   always do that manually.  Paperless doesn't care how the documents get into
 | 
				
			||||||
 | 
					   its local consumption directory.
 | 
				
			||||||
 | 
					3. Have the target server run the Paperless consumption script to OCR the PDF
 | 
				
			||||||
   and index it into a local database.
 | 
					   and index it into a local database.
 | 
				
			||||||
4. Use the web frontend to sift through the database and find what you want.
 | 
					4. Use the web frontend to sift through the database and find what you want.
 | 
				
			||||||
5. Download the PDF you need/want via the web interface and do whatever you
 | 
					5. Download the PDF you need/want via the web interface and do whatever you
 | 
				
			||||||
@ -56,7 +59,7 @@ powerful tools.
 | 
				
			|||||||
 | 
					
 | 
				
			||||||
* `ImageMagick`_ converts the images between colour and greyscale.
 | 
					* `ImageMagick`_ converts the images between colour and greyscale.
 | 
				
			||||||
* `Tesseract`_ does the character recognition.
 | 
					* `Tesseract`_ does the character recognition.
 | 
				
			||||||
* `Unpaper`_ despeckles and and deskews the scanned image.
 | 
					* `Unpaper`_ despeckles and deskews the scanned image.
 | 
				
			||||||
* `GNU Privacy Guard`_ is used as the encryption backend.
 | 
					* `GNU Privacy Guard`_ is used as the encryption backend.
 | 
				
			||||||
* `Python 3`_ is the language of the project.
 | 
					* `Python 3`_ is the language of the project.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
				
			|||||||
@ -11,6 +11,10 @@ services:
 | 
				
			|||||||
            - data:/usr/src/paperless/data
 | 
					            - data:/usr/src/paperless/data
 | 
				
			||||||
            - media:/usr/src/paperless/media
 | 
					            - media:/usr/src/paperless/media
 | 
				
			||||||
        env_file: docker-compose.env
 | 
					        env_file: docker-compose.env
 | 
				
			||||||
 | 
					        # The reason the line is here is so that the webserver that doesn't do
 | 
				
			||||||
 | 
					        # any text recognition and doesn't have to install unnecessary
 | 
				
			||||||
 | 
					        # languages the user might have set in the env-file by overwriting the
 | 
				
			||||||
 | 
					        # value with nothing.
 | 
				
			||||||
        environment:
 | 
					        environment:
 | 
				
			||||||
            - PAPERLESS_OCR_LANGUAGES=
 | 
					            - PAPERLESS_OCR_LANGUAGES=
 | 
				
			||||||
        command: ["runserver", "0.0.0.0:8000"]
 | 
					        command: ["runserver", "0.0.0.0:8000"]
 | 
				
			||||||
 | 
				
			|||||||
@ -1,6 +1,15 @@
 | 
				
			|||||||
Changelog
 | 
					Changelog
 | 
				
			||||||
#########
 | 
					#########
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					* 0.2.0
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					  * Added support for guessing the date from the file name along with the
 | 
				
			||||||
 | 
					    correspondent, title, and tags.  Thanks to `Tikitu de Jager`_ for his pull
 | 
				
			||||||
 | 
					    request that I took forever to merge and to `Pit`_ for his efforts on the
 | 
				
			||||||
 | 
					    regex front.
 | 
				
			||||||
 | 
					  * `#94`_: Restored support for changing the created date in the UI.  Thanks
 | 
				
			||||||
 | 
					    to `Martin Honermeyer`_ and `Tim White`_ for working with me on this.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
* 0.1.1
 | 
					* 0.1.1
 | 
				
			||||||
 | 
					
 | 
				
			||||||
  * Potentially **Breaking Change**: All references to "sender" in the code
 | 
					  * Potentially **Breaking Change**: All references to "sender" in the code
 | 
				
			||||||
@ -86,6 +95,8 @@ Changelog
 | 
				
			|||||||
.. _Wayne Werner: https://github.com/waynew
 | 
					.. _Wayne Werner: https://github.com/waynew
 | 
				
			||||||
.. _darkmatter: https://github.com/darkmatter
 | 
					.. _darkmatter: https://github.com/darkmatter
 | 
				
			||||||
.. _zedster: https://github.com/zedster
 | 
					.. _zedster: https://github.com/zedster
 | 
				
			||||||
 | 
					.. _Martin Honermeyer: https://github.com/djmaze
 | 
				
			||||||
 | 
					.. _Tim White: https://github.com/timwhite
 | 
				
			||||||
 | 
					
 | 
				
			||||||
.. _#20: https://github.com/danielquinn/paperless/issues/20
 | 
					.. _#20: https://github.com/danielquinn/paperless/issues/20
 | 
				
			||||||
.. _#44: https://github.com/danielquinn/paperless/issues/44
 | 
					.. _#44: https://github.com/danielquinn/paperless/issues/44
 | 
				
			||||||
@ -99,3 +110,4 @@ Changelog
 | 
				
			|||||||
.. _#67: https://github.com/danielquinn/paperless/issues/67
 | 
					.. _#67: https://github.com/danielquinn/paperless/issues/67
 | 
				
			||||||
.. _#68: https://github.com/danielquinn/paperless/issues/68
 | 
					.. _#68: https://github.com/danielquinn/paperless/issues/68
 | 
				
			||||||
.. _#71: https://github.com/danielquinn/paperless/issues/71
 | 
					.. _#71: https://github.com/danielquinn/paperless/issues/71
 | 
				
			||||||
 | 
					.. _#94: https://github.com/danielquinn/paperless/issues/71
 | 
				
			||||||
 | 
				
			|||||||
@ -45,19 +45,27 @@ you name the file right, it'll automatically set some values in the database
 | 
				
			|||||||
for you.  This is is the logic the consumer follows:
 | 
					for you.  This is is the logic the consumer follows:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
1. Try to find the correspondent, title, and tags in the file name following
 | 
					1. Try to find the correspondent, title, and tags in the file name following
 | 
				
			||||||
 | 
					   the pattern: ``Date - Correspondent - Title - tag,tag,tag.pdf``.  Note that
 | 
				
			||||||
 | 
					   the format of the date is **rigidly defined** as ``YYYYMMDDHHMMSSZ`` or
 | 
				
			||||||
 | 
					   ``YYYYMMDDZ``.  The ``Z`` is for "Zulu time" AKA "UTC".
 | 
				
			||||||
 | 
					2. If that doesn't work, we skip the date and try this pattern:
 | 
				
			||||||
   the pattern: ``Correspondent - Title - tag,tag,tag.pdf``.
 | 
					   the pattern: ``Correspondent - Title - tag,tag,tag.pdf``.
 | 
				
			||||||
2. If that doesn't work, try to find the correspondent and title in the file
 | 
					3. If that doesn't work, we try to find the correspondent and title in the file
 | 
				
			||||||
   name following the pattern:  ``Correspondent - Title.pdf``.
 | 
					   name following the pattern:  ``Correspondent - Title.pdf``.
 | 
				
			||||||
3. If that doesn't work, just assume that the name of the file is the title.
 | 
					4. If that doesn't work, just assume that the name of the file is the title.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
So given the above, the following examples would work as you'd expect:
 | 
					So given the above, the following examples would work as you'd expect:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					* ``20150314000700Z - Some Company Name - Invoice 2016-01-01 - money,invoices.pdf``
 | 
				
			||||||
 | 
					* ``20150314Z - Some Company Name - Invoice 2016-01-01 - money,invoices.pdf``
 | 
				
			||||||
* ``Some Company Name - Invoice 2016-01-01 - money,invoices.pdf``
 | 
					* ``Some Company Name - Invoice 2016-01-01 - money,invoices.pdf``
 | 
				
			||||||
* ``Another Company - Letter of Reference.jpg``
 | 
					* ``Another Company - Letter of Reference.jpg``
 | 
				
			||||||
* ``Dad's Recipe for Pancakes.png``
 | 
					* ``Dad's Recipe for Pancakes.png``
 | 
				
			||||||
 | 
					
 | 
				
			||||||
These however wouldn't work:
 | 
					These however wouldn't work:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					* ``2015-03-14 00:07:00 UTC - Some Company Name, Invoice 2016-01-01, money, invoices.pdf``
 | 
				
			||||||
 | 
					* ``2015-03-14 - Some Company Name, Invoice 2016-01-01, money, invoices.pdf``
 | 
				
			||||||
* ``Some Company Name, Invoice 2016-01-01, money, invoices.pdf``
 | 
					* ``Some Company Name, Invoice 2016-01-01, money, invoices.pdf``
 | 
				
			||||||
* ``Another Company- Letter of Reference.jpg``
 | 
					* ``Another Company- Letter of Reference.jpg``
 | 
				
			||||||
 | 
					
 | 
				
			||||||
@ -128,7 +136,7 @@ following name/value pairs:
 | 
				
			|||||||
  don't start uploading stuff to your server.  The means of generating this
 | 
					  don't start uploading stuff to your server.  The means of generating this
 | 
				
			||||||
  signature is defined below.
 | 
					  signature is defined below.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
Specify ``enctype="multipart/form-data"``, and then POST your file with:::
 | 
					Specify ``enctype="multipart/form-data"``, and then POST your file with::
 | 
				
			||||||
 | 
					
 | 
				
			||||||
    Content-Disposition: form-data; name="document"; filename="whatever.pdf"
 | 
					    Content-Disposition: form-data; name="document"; filename="whatever.pdf"
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
				
			|||||||
@ -33,4 +33,5 @@ Contents
 | 
				
			|||||||
   api
 | 
					   api
 | 
				
			||||||
   utilities
 | 
					   utilities
 | 
				
			||||||
   migrating
 | 
					   migrating
 | 
				
			||||||
 | 
					   troubleshooting 
 | 
				
			||||||
   changelog
 | 
					   changelog
 | 
				
			||||||
 | 
				
			|||||||
@ -8,7 +8,7 @@ should work) that has the following software installed on it:
 | 
				
			|||||||
 | 
					
 | 
				
			||||||
* `Python3`_ (with development libraries, pip and virtualenv)
 | 
					* `Python3`_ (with development libraries, pip and virtualenv)
 | 
				
			||||||
* `GNU Privacy Guard`_
 | 
					* `GNU Privacy Guard`_
 | 
				
			||||||
* `Tesseract`_
 | 
					* `Tesseract`_, plus its language files matching your document base.
 | 
				
			||||||
* `Imagemagick`_
 | 
					* `Imagemagick`_
 | 
				
			||||||
* `unpaper`_
 | 
					* `unpaper`_
 | 
				
			||||||
 | 
					
 | 
				
			||||||
@ -52,6 +52,7 @@ well as ImageMagick:
 | 
				
			|||||||
 | 
					
 | 
				
			||||||
    $ brew install ghostscript
 | 
					    $ brew install ghostscript
 | 
				
			||||||
    $ brew install imagemagick
 | 
					    $ brew install imagemagick
 | 
				
			||||||
 | 
					    $ brew install libmagic
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
.. _requirements-baremetal:
 | 
					.. _requirements-baremetal:
 | 
				
			||||||
 | 
				
			|||||||
							
								
								
									
										207
									
								
								docs/setup.rst
									
									
									
									
									
								
							
							
						
						
									
										207
									
								
								docs/setup.rst
									
									
									
									
									
								
							@ -5,7 +5,8 @@ Setup
 | 
				
			|||||||
 | 
					
 | 
				
			||||||
Paperless isn't a very complicated app, but there are a few components, so some
 | 
					Paperless isn't a very complicated app, but there are a few components, so some
 | 
				
			||||||
basic documentation is in order.  If you go follow along in this document and
 | 
					basic documentation is in order.  If you go follow along in this document and
 | 
				
			||||||
still have trouble, please open an `issue on GitHub`_ so I can fill in the gaps.
 | 
					still have trouble, please open an `issue on GitHub`_ so I can fill in the
 | 
				
			||||||
 | 
					gaps.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
.. _issue on GitHub: https://github.com/danielquinn/paperless/issues
 | 
					.. _issue on GitHub: https://github.com/danielquinn/paperless/issues
 | 
				
			||||||
 | 
					
 | 
				
			||||||
@ -15,8 +16,8 @@ still have trouble, please open an `issue on GitHub`_ so I can fill in the gaps.
 | 
				
			|||||||
Download
 | 
					Download
 | 
				
			||||||
--------
 | 
					--------
 | 
				
			||||||
 | 
					
 | 
				
			||||||
The source is currently only available via GitHub, so grab it from there, either
 | 
					The source is currently only available via GitHub, so grab it from there,
 | 
				
			||||||
by using ``git``:
 | 
					either by using ``git``:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
.. code:: bash
 | 
					.. code:: bash
 | 
				
			||||||
 | 
					
 | 
				
			||||||
@ -42,15 +43,16 @@ route`_ is quick & easy, but means you're running a VM which comes with memory
 | 
				
			|||||||
consumption etc. We also `support Docker`_, which you can use natively under
 | 
					consumption etc. We also `support Docker`_, which you can use natively under
 | 
				
			||||||
Linux and in a VM with `Docker Machine`_ (this guide was written for native
 | 
					Linux and in a VM with `Docker Machine`_ (this guide was written for native
 | 
				
			||||||
Docker usage under Linux, you might have to adapt it for Docker Machine.)
 | 
					Docker usage under Linux, you might have to adapt it for Docker Machine.)
 | 
				
			||||||
Alternatively the standard, `bare metal`_ approach is a little more complicated,
 | 
					Alternatively the standard, `bare metal`_ approach is a little more
 | 
				
			||||||
but worth it because it makes it easier to should you want to contribute some
 | 
					complicated, but worth it because it makes it easier to should you want to
 | 
				
			||||||
code back.
 | 
					contribute some code back.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
.. _Vagrant route: setup-installation-vagrant_
 | 
					.. _Vagrant route: setup-installation-vagrant_
 | 
				
			||||||
.. _support Docker: setup-installation-docker_
 | 
					.. _support Docker: setup-installation-docker_
 | 
				
			||||||
.. _bare metal: setup-installation-standard_
 | 
					.. _bare metal: setup-installation-standard_
 | 
				
			||||||
.. _Docker Machine: https://docs.docker.com/machine/
 | 
					.. _Docker Machine: https://docs.docker.com/machine/
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
.. _setup-installation-standard:
 | 
					.. _setup-installation-standard:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
Standard (Bare Metal)
 | 
					Standard (Bare Metal)
 | 
				
			||||||
@ -58,19 +60,16 @@ Standard (Bare Metal)
 | 
				
			|||||||
 | 
					
 | 
				
			||||||
1. Install the requirements as per the :ref:`requirements <requirements>` page.
 | 
					1. Install the requirements as per the :ref:`requirements <requirements>` page.
 | 
				
			||||||
2. Change to the ``src`` directory in this repo.
 | 
					2. Change to the ``src`` directory in this repo.
 | 
				
			||||||
3. Edit ``paperless/settings.py`` and be sure to set the values for:
 | 
					3. Copy ``paperless.conf.example`` to ``/etc/paperless.conf`` and open it in
 | 
				
			||||||
    * ``CONSUMPTION_DIR``: this is where your documents will be dumped to be
 | 
					   your favourite editor.  Set the values for:
 | 
				
			||||||
      consumed by Paperless.
 | 
					
 | 
				
			||||||
    * ``PASSPHRASE``: this is the passphrase Paperless uses to encrypt/decrypt
 | 
					    * ``PAPERLESS_CONSUMPTION_DIR``: this is where your documents will be
 | 
				
			||||||
      the original document.  The default value attempts to source the
 | 
					      dumped to be consumed by Paperless.
 | 
				
			||||||
      passphrase from the environment, so if you don't set it to a static value
 | 
					    * ``PAPERLESS_PASSPHRASE``: this is the passphrase Paperless uses to
 | 
				
			||||||
      here, you must set ``PAPERLESS_PASSPHRASE=some-secret-string`` on the
 | 
					      encrypt/decrypt the original document.
 | 
				
			||||||
      command line whenever invoking the consumer or webserver.
 | 
					    * ``PAPERLESS_OCR_THREADS``: this is the number of threads the OCR process
 | 
				
			||||||
    * ``OCR_THREADS``: this is the number of threads the OCR process will spawn
 | 
					      will spawn to process document pages in parallel.
 | 
				
			||||||
      to process document pages in parallel. The default value gets sourced from
 | 
					
 | 
				
			||||||
      the environment-variable ``PAPERLESS_OCR_THREADS`` and expects it to be an
 | 
					 | 
				
			||||||
      integer. If the variable is not set, Python determines the core-count of
 | 
					 | 
				
			||||||
      your CPU and uses that value.
 | 
					 | 
				
			||||||
4. Initialise the database with ``./manage.py migrate``.
 | 
					4. Initialise the database with ``./manage.py migrate``.
 | 
				
			||||||
5. Create a user for your Paperless instance with
 | 
					5. Create a user for your Paperless instance with
 | 
				
			||||||
   ``./manage.py createsuperuser``. Follow the prompts to create your user.
 | 
					   ``./manage.py createsuperuser``. Follow the prompts to create your user.
 | 
				
			||||||
@ -79,8 +78,8 @@ Standard (Bare Metal)
 | 
				
			|||||||
   You should now be able to visit your (empty) `Paperless webserver`_ at
 | 
					   You should now be able to visit your (empty) `Paperless webserver`_ at
 | 
				
			||||||
   ``127.0.0.1:8000`` (or whatever you chose).  You can login with the
 | 
					   ``127.0.0.1:8000`` (or whatever you chose).  You can login with the
 | 
				
			||||||
   user/pass you created in #5.
 | 
					   user/pass you created in #5.
 | 
				
			||||||
7. In a separate window, change to the ``src`` directory in this repo again, but
 | 
					7. In a separate window, change to the ``src`` directory in this repo again,
 | 
				
			||||||
   this time, you should start the consumer script with
 | 
					   but this time, you should start the consumer script with
 | 
				
			||||||
   ``./manage.py document_consumer``.
 | 
					   ``./manage.py document_consumer``.
 | 
				
			||||||
8. Scan something.  Put it in the ``CONSUMPTION_DIR``.
 | 
					8. Scan something.  Put it in the ``CONSUMPTION_DIR``.
 | 
				
			||||||
9. Wait a few minutes
 | 
					9. Wait a few minutes
 | 
				
			||||||
@ -100,6 +99,7 @@ Vagrant Method
 | 
				
			|||||||
   provisioned...
 | 
					   provisioned...
 | 
				
			||||||
3. Run ``vagrant ssh`` and once inside your new vagrant box, edit
 | 
					3. Run ``vagrant ssh`` and once inside your new vagrant box, edit
 | 
				
			||||||
   ``/etc/paperless.conf`` and set the values for:
 | 
					   ``/etc/paperless.conf`` and set the values for:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
    * ``PAPERLESS_CONSUMPTION_DIR``: this is where your documents will be
 | 
					    * ``PAPERLESS_CONSUMPTION_DIR``: this is where your documents will be
 | 
				
			||||||
      dumped to be consumed by Paperless.
 | 
					      dumped to be consumed by Paperless.
 | 
				
			||||||
    * ``PAPERLESS_PASSPHRASE``: this is the passphrase Paperless uses to
 | 
					    * ``PAPERLESS_PASSPHRASE``: this is the passphrase Paperless uses to
 | 
				
			||||||
@ -107,6 +107,7 @@ Vagrant Method
 | 
				
			|||||||
    * ``PAPERLESS_SHARED_SECRET``: this is the "magic word" used when consuming
 | 
					    * ``PAPERLESS_SHARED_SECRET``: this is the "magic word" used when consuming
 | 
				
			||||||
      documents from mail or via the API.  If you don't use either, leaving it
 | 
					      documents from mail or via the API.  If you don't use either, leaving it
 | 
				
			||||||
      blank is just fine.
 | 
					      blank is just fine.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
4. Exit the vagrant box and re-enter it with ``vagrant ssh`` again.  This
 | 
					4. Exit the vagrant box and re-enter it with ``vagrant ssh`` again.  This
 | 
				
			||||||
   updates the environment to make use of the changes you made to the config
 | 
					   updates the environment to make use of the changes you made to the config
 | 
				
			||||||
   file.
 | 
					   file.
 | 
				
			||||||
@ -140,9 +141,9 @@ Docker Method
 | 
				
			|||||||
   .. caution::
 | 
					   .. caution::
 | 
				
			||||||
 | 
					
 | 
				
			||||||
      As mentioned earlier, this guide assumes that you use Docker natively
 | 
					      As mentioned earlier, this guide assumes that you use Docker natively
 | 
				
			||||||
      under Linux. If you are using `Docker Machine`_ under Mac OS X or Windows,
 | 
					      under Linux. If you are using `Docker Machine`_ under Mac OS X or
 | 
				
			||||||
      you will have to adapt IP addresses, volume-mounting, command execution
 | 
					      Windows, you will have to adapt IP addresses, volume-mounting, command
 | 
				
			||||||
      and maybe more.
 | 
					      execution and maybe more.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
2. Install `docker-compose`_. [#compose]_
 | 
					2. Install `docker-compose`_. [#compose]_
 | 
				
			||||||
 | 
					
 | 
				
			||||||
@ -161,14 +162,14 @@ Docker Method
 | 
				
			|||||||
       .. _Docker installation guide: https://docs.docker.com/engine/installation/
 | 
					       .. _Docker installation guide: https://docs.docker.com/engine/installation/
 | 
				
			||||||
       .. _docker-compose installation guide: https://docs.docker.com/compose/install/
 | 
					       .. _docker-compose installation guide: https://docs.docker.com/compose/install/
 | 
				
			||||||
 | 
					
 | 
				
			||||||
3. Create a copy of ``docker-compose.yml.example`` as ``docker-compose.yml`` and
 | 
					3. Create a copy of ``docker-compose.yml.example`` as ``docker-compose.yml``
 | 
				
			||||||
   a copy of ``docker-compose.env.example`` as ``docker-compose.env``. You'll be
 | 
					   and a copy of ``docker-compose.env.example`` as ``docker-compose.env``.
 | 
				
			||||||
   editing both these files: taking a copy ensures that you can ``git pull`` to
 | 
					   You'll be editing both these files: taking a copy ensures that you can
 | 
				
			||||||
   receive updates without risking merge conflicts with your modified versions
 | 
					   ``git pull`` to receive updates without risking merge conflicts with your
 | 
				
			||||||
   of the configuration files.
 | 
					   modified versions of the configuration files.
 | 
				
			||||||
4. Modify ``docker-compose.yml`` to your preferences, following the instructions
 | 
					4. Modify ``docker-compose.yml`` to your preferences, following the
 | 
				
			||||||
   in comments in the file. The only change that is a hard requirement is to
 | 
					   instructions in comments in the file. The only change that is a hard
 | 
				
			||||||
   specify where the consumption directory should mount.
 | 
					   requirement is to specify where the consumption directory should mount.
 | 
				
			||||||
5. Modify ``docker-compose.env`` and adapt the following environment variables:
 | 
					5. Modify ``docker-compose.env`` and adapt the following environment variables:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
   ``PAPERLESS_PASSPHRASE``
 | 
					   ``PAPERLESS_PASSPHRASE``
 | 
				
			||||||
@ -181,10 +182,11 @@ Docker Method
 | 
				
			|||||||
     the core-count of your CPU and uses that value.
 | 
					     the core-count of your CPU and uses that value.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
   ``PAPERLESS_OCR_LANGUAGES``
 | 
					   ``PAPERLESS_OCR_LANGUAGES``
 | 
				
			||||||
     If you want the OCR to recognize other languages in addition to the default
 | 
					     If you want the OCR to recognize other languages in addition to the
 | 
				
			||||||
     English, set this parameter to a space separated list of three-letter
 | 
					     default English, set this parameter to a space separated list of
 | 
				
			||||||
     language-codes after `ISO 639-2/T`_. For a list of available languages --
 | 
					     three-letter language-codes after `ISO 639-2/T`_. For a list of available
 | 
				
			||||||
     including their three letter codes -- see the `Debian packagelist`_.
 | 
					     languages -- including their three letter codes -- see the
 | 
				
			||||||
 | 
					     `Debian packagelist`_.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
   ``USERMAP_UID`` and ``USERMAP_GID``
 | 
					   ``USERMAP_UID`` and ``USERMAP_GID``
 | 
				
			||||||
     If you want to mount the consumption volume (directory ``/consume`` within
 | 
					     If you want to mount the consumption volume (directory ``/consume`` within
 | 
				
			||||||
@ -192,11 +194,11 @@ Docker Method
 | 
				
			|||||||
     access rights might be an issue. The default user and group ``paperless``
 | 
					     access rights might be an issue. The default user and group ``paperless``
 | 
				
			||||||
     in the containers have an id of 1000. The containers will enforce that the
 | 
					     in the containers have an id of 1000. The containers will enforce that the
 | 
				
			||||||
     owning group of the consumption directory will be ``paperless`` to be able
 | 
					     owning group of the consumption directory will be ``paperless`` to be able
 | 
				
			||||||
     to delete consumed documents. If your host-system has a group with an id of
 | 
					     to delete consumed documents. If your host-system has a group with an ID
 | 
				
			||||||
     1000 and you don't want this group to have access rights to the consumption
 | 
					     of 1000 and you don't want this group to have access rights to the
 | 
				
			||||||
     directory, you can use ``USERMAP_GID`` to change the id in the container
 | 
					     consumption directory, you can use ``USERMAP_GID`` to change the id in the
 | 
				
			||||||
     and thus the one of the consumption directory. Furthermore, you can change
 | 
					     container and thus the one of the consumption directory. Furthermore, you
 | 
				
			||||||
     the id of the default user as well using ``USERMAP_UID``.
 | 
					     can change the id of the default user as well using ``USERMAP_UID``.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
6. Run ``docker-compose up -d``. This will create and start the necessary
 | 
					6. Run ``docker-compose up -d``. This will create and start the necessary
 | 
				
			||||||
   containers.
 | 
					   containers.
 | 
				
			||||||
@ -234,14 +236,14 @@ Docker Method
 | 
				
			|||||||
      .. danger::
 | 
					      .. danger::
 | 
				
			||||||
 | 
					
 | 
				
			||||||
          While the consumption container will ensure at startup that it can
 | 
					          While the consumption container will ensure at startup that it can
 | 
				
			||||||
          **delete** a consumed file from a host-mounted directory, it might not
 | 
					          **delete** a consumed file from a host-mounted directory, it might
 | 
				
			||||||
          be able to **read** the document in the first place if the access
 | 
					          not be able to **read** the document in the first place if the access
 | 
				
			||||||
          rights to the file are incorrect.
 | 
					          rights to the file are incorrect.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
          Make sure that the documents you put into the consumption directory
 | 
					          Make sure that the documents you put into the consumption directory
 | 
				
			||||||
          will either be readable by everyone (``chmod o+r file.pdf``) or
 | 
					          will either be readable by everyone (``chmod o+r file.pdf``) or
 | 
				
			||||||
          readable by the default user or group id 1000 (or the one you have set
 | 
					          readable by the default user or group id 1000 (or the one you have
 | 
				
			||||||
          with ``USERMAP_UID`` or ``USERMAP_GID`` respectively).
 | 
					          set with ``USERMAP_UID`` or ``USERMAP_GID`` respectively).
 | 
				
			||||||
 | 
					
 | 
				
			||||||
   2. Use ``docker cp`` to copy your files directly into the container:
 | 
					   2. Use ``docker cp`` to copy your files directly into the container:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
@ -258,8 +260,8 @@ Docker Method
 | 
				
			|||||||
 | 
					
 | 
				
			||||||
      ``docker cp`` is a one-shot-command, just like ``cp``. This means that
 | 
					      ``docker cp`` is a one-shot-command, just like ``cp``. This means that
 | 
				
			||||||
      every time you want to consume a new document, you will have to execute
 | 
					      every time you want to consume a new document, you will have to execute
 | 
				
			||||||
      ``docker cp`` again. You can of course automate this process, but option 1
 | 
					      ``docker cp`` again. You can of course automate this process, but option
 | 
				
			||||||
      is generally the preferred one.
 | 
					      1 is generally the preferred one.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
      .. danger::
 | 
					      .. danger::
 | 
				
			||||||
 | 
					
 | 
				
			||||||
@ -267,8 +269,8 @@ Docker Method
 | 
				
			|||||||
          to the acting user at the destination, which will be ``root``.
 | 
					          to the acting user at the destination, which will be ``root``.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
          You therefore need to ensure that the documents you want to copy into
 | 
					          You therefore need to ensure that the documents you want to copy into
 | 
				
			||||||
          the container are readable by everyone (``chmod o+r file.pdf``) before
 | 
					          the container are readable by everyone (``chmod o+r file.pdf``)
 | 
				
			||||||
          copying them.
 | 
					          before copying them.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
.. _Docker: https://www.docker.com/
 | 
					.. _Docker: https://www.docker.com/
 | 
				
			||||||
@ -281,17 +283,108 @@ Docker Method
 | 
				
			|||||||
   free to tinker around without using compose!
 | 
					   free to tinker around without using compose!
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
.. _making-things-a-little-more-permanent:
 | 
					.. _setup-permanent:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
Making Things a Little more Permanent
 | 
					Making Things a Little more Permanent
 | 
				
			||||||
-------------------------------------
 | 
					-------------------------------------
 | 
				
			||||||
 | 
					
 | 
				
			||||||
Once you've tested things and are happy with the work flow, you can automate the
 | 
					Once you've tested things and are happy with the work flow, you can automate
 | 
				
			||||||
process of starting the webserver and consumer automatically.  If you're running
 | 
					the process of starting the webserver and consumer automatically.
 | 
				
			||||||
on a bare metal system that's using Systemd, you can use the service unit files
 | 
					
 | 
				
			||||||
in the ``scripts`` directory to set this up.  If you're on another startup
 | 
					
 | 
				
			||||||
system or are using a Vagrant box, then you're currently on your own. If you are
 | 
					.. _setup-permanent-standard-systemd:
 | 
				
			||||||
using Docker, you can set a restart-policy_ in the ``docker-compose.yml`` to
 | 
					
 | 
				
			||||||
have the containers automatically start with the Docker daemon.
 | 
					Standard (Bare Metal, Systemd)
 | 
				
			||||||
 | 
					..............................
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					If you're running on a bare metal system that's using Systemd, you can use the
 | 
				
			||||||
 | 
					service unit files in the ``scripts`` directory to set this up.  You'll need to
 | 
				
			||||||
 | 
					create a user called ``paperless`` and setup Paperless to be in a place that
 | 
				
			||||||
 | 
					this new user can read and write to.  Then, you can just tell Systemd to enable
 | 
				
			||||||
 | 
					the two ``.service`` files::
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    # systemctl enable /path/to/paperless/scripts/paperless-consumer.service
 | 
				
			||||||
 | 
					    # systemctl enable /path/to/paperless/scripts/paperless-webserver.service
 | 
				
			||||||
 | 
					    # systemctl start /path/to/paperless/scripts/paperless-consumer.service
 | 
				
			||||||
 | 
					    # systemctl start /path/to/paperless/scripts/paperless-webserver.service
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					.. _setup-permanent-standard-ubuntu14:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Ubuntu 14.04 (Bare Metal, Upstart)
 | 
				
			||||||
 | 
					..................................
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Ubuntu 14.04 and earlier use the `Upstart`_ init system to start services
 | 
				
			||||||
 | 
					during the boot process. To configure Upstart to run Paperless automatically
 | 
				
			||||||
 | 
					after restarting your system:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					1. Change to the directory where Upstart's configuration files are kept:
 | 
				
			||||||
 | 
					   ``cd /etc/init``
 | 
				
			||||||
 | 
					2. Create a new file: ``sudo nano paperless-server.conf``
 | 
				
			||||||
 | 
					3. In the newly-created file enter::
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    start on (local-filesystems and net-device-up IFACE=eth0)
 | 
				
			||||||
 | 
					    stop on shutdown
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    respawn
 | 
				
			||||||
 | 
					    respawn limit 10 5
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    script
 | 
				
			||||||
 | 
					      exec /srv/paperless/src/manage.py runserver 0.0.0.0:80
 | 
				
			||||||
 | 
					    end script
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					   Note that you'll need to replace ``/srv/paperless/src/manage.py`` with the
 | 
				
			||||||
 | 
					   path to the ``manage.py`` script in your installation directory.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					  If you are using a network interface other than ``eth0``, you will have to
 | 
				
			||||||
 | 
					  change ``IFACE=eth0``. For example, if you are connected via WiFi, you will
 | 
				
			||||||
 | 
					  likely need to replace ``eth0`` above with ``wlan0``. To see all interfaces,
 | 
				
			||||||
 | 
					  run ``ifconfig``.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					  Save the file.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					4. Create a new file: ``sudo nano paperless-consumer.conf``
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					5. In the newly-created file enter::
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    start on (local-filesystems and net-device-up IFACE=eth0)
 | 
				
			||||||
 | 
					    stop on shutdown
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    respawn
 | 
				
			||||||
 | 
					    respawn limit 10 5
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    script
 | 
				
			||||||
 | 
					      exec /srv/paperless/src/manage.py document_consumer
 | 
				
			||||||
 | 
					    end script
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					  Replace ``/srv/paperless/src/manage.py`` with the same values as in step 3
 | 
				
			||||||
 | 
					  above and replace ``eth0`` with the appropriate value, if necessary. Save the
 | 
				
			||||||
 | 
					  file.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					These two configuration files together will start both the Paperless webserver
 | 
				
			||||||
 | 
					and document consumer processes when the file system and network interface
 | 
				
			||||||
 | 
					specified is available after boot. Furthermore, if either process ever exits
 | 
				
			||||||
 | 
					unexpectedly, Upstart will try to restart it a maximum of 10 times within a 5
 | 
				
			||||||
 | 
					second period.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					.. _Upstart: http://upstart.ubuntu.com/
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					.. _setup-permanent-vagrant:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Vagrant
 | 
				
			||||||
 | 
					.......
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					You're currently on your own, but the Ubuntu explanation above may be enough.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					.. _setup-permanent-docker:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Docker
 | 
				
			||||||
 | 
					......
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					If you're using Docker, you can set a restart-policy_ in the
 | 
				
			||||||
 | 
					``docker-compose.yml`` to have the containers automatically start with the
 | 
				
			||||||
 | 
					Docker daemon.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
.. _restart-policy: https://docs.docker.com/engine/reference/commandline/run/#restart-policies-restart
 | 
					.. _restart-policy: https://docs.docker.com/engine/reference/commandline/run/#restart-policies-restart
 | 
				
			||||||
 | 
				
			|||||||
							
								
								
									
										19
									
								
								docs/troubleshooting.rst
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										19
									
								
								docs/troubleshooting.rst
									
									
									
									
									
										Normal file
									
								
							@ -0,0 +1,19 @@
 | 
				
			|||||||
 | 
					.. _troubleshooting:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Troubleshooting
 | 
				
			||||||
 | 
					===============
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					.. _troubleshooting_ocr_language_files_missing:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Consumer warns ``OCR for XX failed``
 | 
				
			||||||
 | 
					------------------------------------
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					If you find the OCR accuracy to be too low, and/or the document consumer warns that ``OCR for
 | 
				
			||||||
 | 
					XX failed, but we're going to stick with what we've got since FORGIVING_OCR is enabled``, then you
 | 
				
			||||||
 | 
					might need to install the `Tesseract language files
 | 
				
			||||||
 | 
					<http://packages.ubuntu.com/search?keywords=tesseract-ocr>`_ marching your documents languages.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					As an example, if you are running Paperless from the Vagrant setup provided (or from any Ubuntu or Debian
 | 
				
			||||||
 | 
					box), and your documents are written in Spanish you may need to run::
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    apt-get install -y tesseract-ocr-spa
 | 
				
			||||||
@ -20,7 +20,7 @@ PAPERLESS_CONSUME_MAIL_PASS=""
 | 
				
			|||||||
#
 | 
					#
 | 
				
			||||||
# The passphrase you use here will be used when storing your documents in
 | 
					# The passphrase you use here will be used when storing your documents in
 | 
				
			||||||
# Paperless, but you can always export them in an unencrypted format by using
 | 
					# Paperless, but you can always export them in an unencrypted format by using
 | 
				
			||||||
# document exporter.  See the documentaiton for more information.
 | 
					# document exporter.  See the documentation for more information.
 | 
				
			||||||
#
 | 
					#
 | 
				
			||||||
# One final note about the passphrase.  Once you've consumed a document with
 | 
					# One final note about the passphrase.  Once you've consumed a document with
 | 
				
			||||||
# one passphrase, DON'T CHANGE IT.  Paperless assumes this to be a constant and
 | 
					# one passphrase, DON'T CHANGE IT.  Paperless assumes this to be a constant and
 | 
				
			||||||
@ -31,3 +31,8 @@ PAPERLESS_PASSPHRASE="secret"
 | 
				
			|||||||
# If you intend to consume documents either via HTTP POST or by email, you must
 | 
					# If you intend to consume documents either via HTTP POST or by email, you must
 | 
				
			||||||
# have a shared secret here.
 | 
					# have a shared secret here.
 | 
				
			||||||
PAPERLESS_SHARED_SECRET=""
 | 
					PAPERLESS_SHARED_SECRET=""
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					# By default, Paperless will attempt to use all available CPU cores to process
 | 
				
			||||||
 | 
					# a document, but if you would like to limit that, you can set this value to
 | 
				
			||||||
 | 
					# an integer:
 | 
				
			||||||
 | 
					#PAPERLESS_OCR_THREADS=1
 | 
				
			||||||
 | 
				
			|||||||
							
								
								
									
										
											BIN
										
									
								
								presentation/img/kitten.jpg
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										
											BIN
										
									
								
								presentation/img/kitten.jpg
									
									
									
									
									
										Normal file
									
								
							
										
											Binary file not shown.
										
									
								
							| 
		 After Width: | Height: | Size: 92 KiB  | 
@ -148,12 +148,12 @@
 | 
				
			|||||||
 | 
					
 | 
				
			||||||
        <section data-background="img/pony.png">
 | 
					        <section data-background="img/pony.png">
 | 
				
			||||||
          <h2>Demo!</h2>
 | 
					          <h2>Demo!</h2>
 | 
				
			||||||
          <p>(Time to sacrifice a kitten)</p>
 | 
					          <img src="img/kitten.jpg" style="width: 50%;" />
 | 
				
			||||||
        </section>
 | 
					        </section>
 | 
				
			||||||
 | 
					
 | 
				
			||||||
        <section>
 | 
					        <section>
 | 
				
			||||||
          <h2>TODO</h2>
 | 
					          <h2>TODO</h2>
 | 
				
			||||||
          <p>It works, but it could use polish</p>
 | 
					          <p>It works, but it needs polish</p>
 | 
				
			||||||
          <ul>
 | 
					          <ul>
 | 
				
			||||||
            <li>The UI is the Django admin</li>
 | 
					            <li>The UI is the Django admin</li>
 | 
				
			||||||
            <li>Mail consumption is really raw</li>
 | 
					            <li>Mail consumption is really raw</li>
 | 
				
			||||||
@ -163,11 +163,11 @@
 | 
				
			|||||||
          <aside class="notes">
 | 
					          <aside class="notes">
 | 
				
			||||||
            <ul>
 | 
					            <ul>
 | 
				
			||||||
              <li>
 | 
					              <li>
 | 
				
			||||||
                <strong>Plugin architecture</strong>: there've been requests for
 | 
					                <strong>Plugin architecture</strong>: there've been requests
 | 
				
			||||||
                some overly custom stuff to happen before and after consumption,
 | 
					                for some overly custom stuff to happen before and after
 | 
				
			||||||
                but in the UNIX spirit of "do one job well", I think this sort
 | 
					                consumption, but in the UNIX spirit of "do one job well", I
 | 
				
			||||||
                of thing is better written as a plugin -- which means I need to
 | 
					                think this sort of thing is better written as a plugin -- which
 | 
				
			||||||
                figure out a best practise for that.
 | 
					                means I need to figure out a best practise for that.
 | 
				
			||||||
              </li>
 | 
					              </li>
 | 
				
			||||||
            </ul>
 | 
					            </ul>
 | 
				
			||||||
          </aside>
 | 
					          </aside>
 | 
				
			||||||
 | 
				
			|||||||
@ -1,4 +1,4 @@
 | 
				
			|||||||
Django==1.9.2
 | 
					Django==1.9.4
 | 
				
			||||||
Pillow==3.1.1
 | 
					Pillow==3.1.1
 | 
				
			||||||
django-crispy-forms==1.6.0
 | 
					django-crispy-forms==1.6.0
 | 
				
			||||||
django-extensions==1.6.1
 | 
					django-extensions==1.6.1
 | 
				
			||||||
 | 
				
			|||||||
@ -19,12 +19,11 @@ from PIL import Image
 | 
				
			|||||||
 | 
					
 | 
				
			||||||
from django.conf import settings
 | 
					from django.conf import settings
 | 
				
			||||||
from django.utils import timezone
 | 
					from django.utils import timezone
 | 
				
			||||||
from django.template.defaultfilters import slugify
 | 
					 | 
				
			||||||
from pyocr.tesseract import TesseractError
 | 
					from pyocr.tesseract import TesseractError
 | 
				
			||||||
 | 
					
 | 
				
			||||||
from paperless.db import GnuPG
 | 
					from paperless.db import GnuPG
 | 
				
			||||||
 | 
					
 | 
				
			||||||
from .models import Correspondent, Tag, Document, Log
 | 
					from .models import Tag, Document, Log, FileInfo
 | 
				
			||||||
from .languages import ISO639
 | 
					from .languages import ISO639
 | 
				
			||||||
from .signals import (
 | 
					from .signals import (
 | 
				
			||||||
    document_consumption_started, document_consumption_finished)
 | 
					    document_consumption_started, document_consumption_finished)
 | 
				
			||||||
@ -56,19 +55,6 @@ class Consumer(object):
 | 
				
			|||||||
 | 
					
 | 
				
			||||||
    DEFAULT_OCR_LANGUAGE = settings.OCR_LANGUAGE
 | 
					    DEFAULT_OCR_LANGUAGE = settings.OCR_LANGUAGE
 | 
				
			||||||
 | 
					
 | 
				
			||||||
    REGEX_TITLE = re.compile(
 | 
					 | 
				
			||||||
        r"^.*/(.*)\.(pdf|jpe?g|png|gif|tiff)$",
 | 
					 | 
				
			||||||
        flags=re.IGNORECASE
 | 
					 | 
				
			||||||
    )
 | 
					 | 
				
			||||||
    REGEX_CORRESPONDENT_TITLE = re.compile(
 | 
					 | 
				
			||||||
        r"^.*/(.+) - (.*)\.(pdf|jpe?g|png|gif|tiff)$",
 | 
					 | 
				
			||||||
        flags=re.IGNORECASE
 | 
					 | 
				
			||||||
    )
 | 
					 | 
				
			||||||
    REGEX_CORRESPONDENT_TITLE_TAGS = re.compile(
 | 
					 | 
				
			||||||
        r"^.*/(.*) - (.*) - ([a-z0-9\-,]*)\.(pdf|jpe?g|png|gif|tiff)$",
 | 
					 | 
				
			||||||
        flags=re.IGNORECASE
 | 
					 | 
				
			||||||
    )
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
    def __init__(self):
 | 
					    def __init__(self):
 | 
				
			||||||
 | 
					
 | 
				
			||||||
        self.logger = logging.getLogger(__name__)
 | 
					        self.logger = logging.getLogger(__name__)
 | 
				
			||||||
@ -107,7 +93,7 @@ class Consumer(object):
 | 
				
			|||||||
            if not os.path.isfile(doc):
 | 
					            if not os.path.isfile(doc):
 | 
				
			||||||
                continue
 | 
					                continue
 | 
				
			||||||
 | 
					
 | 
				
			||||||
            if not re.match(self.REGEX_TITLE, doc):
 | 
					            if not re.match(FileInfo.REGEXES["title"], doc):
 | 
				
			||||||
                continue
 | 
					                continue
 | 
				
			||||||
 | 
					
 | 
				
			||||||
            if doc in self._ignore:
 | 
					            if doc in self._ignore:
 | 
				
			||||||
@ -282,72 +268,20 @@ class Consumer(object):
 | 
				
			|||||||
        # Strip out excess white space to allow matching to go smoother
 | 
					        # Strip out excess white space to allow matching to go smoother
 | 
				
			||||||
        return re.sub(r"\s+", " ", r)
 | 
					        return re.sub(r"\s+", " ", r)
 | 
				
			||||||
 | 
					
 | 
				
			||||||
    def _guess_attributes_from_name(self, parseable):
 | 
					 | 
				
			||||||
        """
 | 
					 | 
				
			||||||
        We use a crude naming convention to make handling the correspondent,
 | 
					 | 
				
			||||||
        title, and tags easier:
 | 
					 | 
				
			||||||
          "<correspondent> - <title> - <tags>.<suffix>"
 | 
					 | 
				
			||||||
          "<correspondent> - <title>.<suffix>"
 | 
					 | 
				
			||||||
          "<title>.<suffix>"
 | 
					 | 
				
			||||||
        """
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
        def get_correspondent(correspondent_name):
 | 
					 | 
				
			||||||
            return Correspondent.objects.get_or_create(
 | 
					 | 
				
			||||||
                name=correspondent_name,
 | 
					 | 
				
			||||||
                defaults={"slug": slugify(correspondent_name)}
 | 
					 | 
				
			||||||
            )[0]
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
        def get_tags(tags):
 | 
					 | 
				
			||||||
            r = []
 | 
					 | 
				
			||||||
            for t in tags.split(","):
 | 
					 | 
				
			||||||
                r.append(
 | 
					 | 
				
			||||||
                    Tag.objects.get_or_create(slug=t, defaults={"name": t})[0])
 | 
					 | 
				
			||||||
            return tuple(r)
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
        def get_suffix(suffix):
 | 
					 | 
				
			||||||
            suffix = suffix.lower()
 | 
					 | 
				
			||||||
            if suffix == "jpeg":
 | 
					 | 
				
			||||||
                return "jpg"
 | 
					 | 
				
			||||||
            return suffix
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
        # First attempt: "<correspondent> - <title> - <tags>.<suffix>"
 | 
					 | 
				
			||||||
        m = re.match(self.REGEX_CORRESPONDENT_TITLE_TAGS, parseable)
 | 
					 | 
				
			||||||
        if m:
 | 
					 | 
				
			||||||
            return (
 | 
					 | 
				
			||||||
                get_correspondent(m.group(1)),
 | 
					 | 
				
			||||||
                m.group(2),
 | 
					 | 
				
			||||||
                get_tags(m.group(3)),
 | 
					 | 
				
			||||||
                get_suffix(m.group(4))
 | 
					 | 
				
			||||||
            )
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
        # Second attempt: "<correspondent> - <title>.<suffix>"
 | 
					 | 
				
			||||||
        m = re.match(self.REGEX_CORRESPONDENT_TITLE, parseable)
 | 
					 | 
				
			||||||
        if m:
 | 
					 | 
				
			||||||
            return (
 | 
					 | 
				
			||||||
                get_correspondent(m.group(1)),
 | 
					 | 
				
			||||||
                m.group(2),
 | 
					 | 
				
			||||||
                (),
 | 
					 | 
				
			||||||
                get_suffix(m.group(3))
 | 
					 | 
				
			||||||
            )
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
        # That didn't work, so we assume correspondent and tags are None
 | 
					 | 
				
			||||||
        m = re.match(self.REGEX_TITLE, parseable)
 | 
					 | 
				
			||||||
        return None, m.group(1), (), get_suffix(m.group(2))
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
    def _store(self, text, doc, thumbnail):
 | 
					    def _store(self, text, doc, thumbnail):
 | 
				
			||||||
 | 
					
 | 
				
			||||||
        sender, title, tags, file_type = self._guess_attributes_from_name(doc)
 | 
					        file_info = FileInfo.from_path(doc)
 | 
				
			||||||
        relevant_tags = set(list(Tag.match_all(text)) + list(tags))
 | 
					        relevant_tags = set(list(Tag.match_all(text)) + list(file_info.tags))
 | 
				
			||||||
 | 
					
 | 
				
			||||||
        stats = os.stat(doc)
 | 
					        stats = os.stat(doc)
 | 
				
			||||||
 | 
					
 | 
				
			||||||
        self.log("debug", "Saving record to database")
 | 
					        self.log("debug", "Saving record to database")
 | 
				
			||||||
 | 
					
 | 
				
			||||||
        document = Document.objects.create(
 | 
					        document = Document.objects.create(
 | 
				
			||||||
            correspondent=sender,
 | 
					            correspondent=file_info.correspondent,
 | 
				
			||||||
            title=title,
 | 
					            title=file_info.title,
 | 
				
			||||||
            content=text,
 | 
					            content=text,
 | 
				
			||||||
            file_type=file_type,
 | 
					            file_type=file_info.extension,
 | 
				
			||||||
            created=timezone.make_aware(
 | 
					            created=timezone.make_aware(
 | 
				
			||||||
                datetime.datetime.fromtimestamp(stats.st_mtime)),
 | 
					                datetime.datetime.fromtimestamp(stats.st_mtime)),
 | 
				
			||||||
            modified=timezone.make_aware(
 | 
					            modified=timezone.make_aware(
 | 
				
			||||||
 | 
				
			|||||||
@ -96,11 +96,16 @@ class Command(Renderable, BaseCommand):
 | 
				
			|||||||
 | 
					
 | 
				
			||||||
    @staticmethod
 | 
					    @staticmethod
 | 
				
			||||||
    def _get_legacy_file_name(doc):
 | 
					    def _get_legacy_file_name(doc):
 | 
				
			||||||
        if doc.correspondent and doc.title:
 | 
					
 | 
				
			||||||
            tags = ",".join([t.slug for t in doc.tags.all()])
 | 
					        if not doc.correspondent and not doc.title:
 | 
				
			||||||
            if tags:
 | 
					            return os.path.basename(doc.source_path)
 | 
				
			||||||
                return "{} - {} - {}.{}".format(
 | 
					
 | 
				
			||||||
                    doc.correspondent, doc.title, tags, doc.file_type)
 | 
					        created = doc.created.strftime("%Y%m%d%H%M%SZ")
 | 
				
			||||||
            return "{} - {}.{}".format(
 | 
					        tags = ",".join([t.slug for t in doc.tags.all()])
 | 
				
			||||||
                doc.correspondent, doc.title, doc.file_type)
 | 
					
 | 
				
			||||||
        return os.path.basename(doc.source_path)
 | 
					        if tags:
 | 
				
			||||||
 | 
					            return "{} - {} - {} - {}.{}".format(
 | 
				
			||||||
 | 
					                created, doc.correspondent, doc.title, tags, doc.file_type)
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					        return "{} - {} - {}.{}".format(
 | 
				
			||||||
 | 
					            created, doc.correspondent, doc.title, doc.file_type)
 | 
				
			||||||
 | 
				
			|||||||
@ -1,8 +1,11 @@
 | 
				
			|||||||
 | 
					import dateutil.parser
 | 
				
			||||||
import logging
 | 
					import logging
 | 
				
			||||||
import os
 | 
					import os
 | 
				
			||||||
import re
 | 
					import re
 | 
				
			||||||
import uuid
 | 
					import uuid
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					from collections import OrderedDict
 | 
				
			||||||
 | 
					
 | 
				
			||||||
from django.conf import settings
 | 
					from django.conf import settings
 | 
				
			||||||
from django.core.urlresolvers import reverse
 | 
					from django.core.urlresolvers import reverse
 | 
				
			||||||
from django.db import models
 | 
					from django.db import models
 | 
				
			||||||
@ -152,7 +155,7 @@ class Document(models.Model):
 | 
				
			|||||||
    )
 | 
					    )
 | 
				
			||||||
    tags = models.ManyToManyField(
 | 
					    tags = models.ManyToManyField(
 | 
				
			||||||
        Tag, related_name="documents", blank=True)
 | 
					        Tag, related_name="documents", blank=True)
 | 
				
			||||||
    created = models.DateTimeField(default=timezone.now, editable=False)
 | 
					    created = models.DateTimeField(default=timezone.now)
 | 
				
			||||||
    modified = models.DateTimeField(auto_now=True, editable=False)
 | 
					    modified = models.DateTimeField(auto_now=True, editable=False)
 | 
				
			||||||
 | 
					
 | 
				
			||||||
    class Meta(object):
 | 
					    class Meta(object):
 | 
				
			||||||
@ -250,3 +253,136 @@ class Log(models.Model):
 | 
				
			|||||||
            self.group = uuid.uuid4()
 | 
					            self.group = uuid.uuid4()
 | 
				
			||||||
 | 
					
 | 
				
			||||||
        models.Model.save(self, *args, **kwargs)
 | 
					        models.Model.save(self, *args, **kwargs)
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					class FileInfo(object):
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    # This epic regex *almost* worked for our needs, so I'm keeping it here for
 | 
				
			||||||
 | 
					    # posterity, in the hopes that we might find a way to make it work one day.
 | 
				
			||||||
 | 
					    ALMOST_REGEX = re.compile(
 | 
				
			||||||
 | 
					        r"^((?P<date>\d\d\d\d\d\d\d\d\d\d\d\d\d\dZ){separator})?"
 | 
				
			||||||
 | 
					        r"((?P<correspondent>{non_separated_word}+){separator})??"
 | 
				
			||||||
 | 
					        r"(?P<title>{non_separated_word}+)"
 | 
				
			||||||
 | 
					        r"({separator}(?P<tags>[a-z,0-9-]+))?"
 | 
				
			||||||
 | 
					        r"\.(?P<extension>[a-zA-Z.-]+)$".format(
 | 
				
			||||||
 | 
					            separator=r"\s+-\s+",
 | 
				
			||||||
 | 
					            non_separated_word=r"([\w,. ]|([^\s]-))"
 | 
				
			||||||
 | 
					        )
 | 
				
			||||||
 | 
					    )
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    REGEXES = OrderedDict([
 | 
				
			||||||
 | 
					        ("created-correspondent-title-tags", re.compile(
 | 
				
			||||||
 | 
					            r"^(?P<created>\d\d\d\d\d\d\d\d(\d\d\d\d\d\d)?Z) - "
 | 
				
			||||||
 | 
					            r"(?P<correspondent>.*) - "
 | 
				
			||||||
 | 
					            r"(?P<title>.*) - "
 | 
				
			||||||
 | 
					            r"(?P<tags>[a-z0-9\-,]*)"
 | 
				
			||||||
 | 
					            r"\.(?P<extension>pdf|jpe?g|png|gif|tiff)$",
 | 
				
			||||||
 | 
					            flags=re.IGNORECASE
 | 
				
			||||||
 | 
					        )),
 | 
				
			||||||
 | 
					        ("created-title-tags", re.compile(
 | 
				
			||||||
 | 
					            r"^(?P<created>\d\d\d\d\d\d\d\d(\d\d\d\d\d\d)?Z) - "
 | 
				
			||||||
 | 
					            r"(?P<title>.*) - "
 | 
				
			||||||
 | 
					            r"(?P<tags>[a-z0-9\-,]*)"
 | 
				
			||||||
 | 
					            r"\.(?P<extension>pdf|jpe?g|png|gif|tiff)$",
 | 
				
			||||||
 | 
					            flags=re.IGNORECASE
 | 
				
			||||||
 | 
					        )),
 | 
				
			||||||
 | 
					        ("created-correspondent-title", re.compile(
 | 
				
			||||||
 | 
					            r"^(?P<created>\d\d\d\d\d\d\d\d(\d\d\d\d\d\d)?Z) - "
 | 
				
			||||||
 | 
					            r"(?P<correspondent>.*) - "
 | 
				
			||||||
 | 
					            r"(?P<title>.*)"
 | 
				
			||||||
 | 
					            r"\.(?P<extension>pdf|jpe?g|png|gif|tiff)$",
 | 
				
			||||||
 | 
					            flags=re.IGNORECASE
 | 
				
			||||||
 | 
					        )),
 | 
				
			||||||
 | 
					        ("created-title", re.compile(
 | 
				
			||||||
 | 
					            r"^(?P<created>\d\d\d\d\d\d\d\d(\d\d\d\d\d\d)?Z) - "
 | 
				
			||||||
 | 
					            r"(?P<title>.*)"
 | 
				
			||||||
 | 
					            r"\.(?P<extension>pdf|jpe?g|png|gif|tiff)$",
 | 
				
			||||||
 | 
					            flags=re.IGNORECASE
 | 
				
			||||||
 | 
					        )),
 | 
				
			||||||
 | 
					        ("correspondent-title-tags", re.compile(
 | 
				
			||||||
 | 
					            r"(?P<correspondent>.*) - "
 | 
				
			||||||
 | 
					            r"(?P<title>.*) - "
 | 
				
			||||||
 | 
					            r"(?P<tags>[a-z0-9\-,]*)"
 | 
				
			||||||
 | 
					            r"\.(?P<extension>pdf|jpe?g|png|gif|tiff)$",
 | 
				
			||||||
 | 
					            flags=re.IGNORECASE
 | 
				
			||||||
 | 
					        )),
 | 
				
			||||||
 | 
					        ("correspondent-title", re.compile(
 | 
				
			||||||
 | 
					            r"(?P<correspondent>.*) - "
 | 
				
			||||||
 | 
					            r"(?P<title>.*)?"
 | 
				
			||||||
 | 
					            r"\.(?P<extension>pdf|jpe?g|png|gif|tiff)$",
 | 
				
			||||||
 | 
					            flags=re.IGNORECASE
 | 
				
			||||||
 | 
					        )),
 | 
				
			||||||
 | 
					        ("title", re.compile(
 | 
				
			||||||
 | 
					            r"(?P<title>.*)"
 | 
				
			||||||
 | 
					            r"\.(?P<extension>pdf|jpe?g|png|gif|tiff)$",
 | 
				
			||||||
 | 
					            flags=re.IGNORECASE
 | 
				
			||||||
 | 
					        ))
 | 
				
			||||||
 | 
					    ])
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    def __init__(self, created=None, correspondent=None, title=None, tags=(),
 | 
				
			||||||
 | 
					                 extension=None):
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					        self.created = created
 | 
				
			||||||
 | 
					        self.title = title
 | 
				
			||||||
 | 
					        self.extension = extension
 | 
				
			||||||
 | 
					        self.correspondent = correspondent
 | 
				
			||||||
 | 
					        self.tags = tags
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    @classmethod
 | 
				
			||||||
 | 
					    def _get_created(cls, created):
 | 
				
			||||||
 | 
					        return dateutil.parser.parse("{:0<14}Z".format(created[:-1]))
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    @classmethod
 | 
				
			||||||
 | 
					    def _get_correspondent(cls, name):
 | 
				
			||||||
 | 
					        if not name:
 | 
				
			||||||
 | 
					            return None
 | 
				
			||||||
 | 
					        return Correspondent.objects.get_or_create(name=name, defaults={
 | 
				
			||||||
 | 
					            "slug": slugify(name)
 | 
				
			||||||
 | 
					        })[0]
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    @classmethod
 | 
				
			||||||
 | 
					    def _get_title(cls, title):
 | 
				
			||||||
 | 
					        return title
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    @classmethod
 | 
				
			||||||
 | 
					    def _get_tags(cls, tags):
 | 
				
			||||||
 | 
					        r = []
 | 
				
			||||||
 | 
					        for t in tags.split(","):
 | 
				
			||||||
 | 
					            r.append(
 | 
				
			||||||
 | 
					                Tag.objects.get_or_create(slug=t, defaults={"name": t})[0])
 | 
				
			||||||
 | 
					        return tuple(r)
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    @classmethod
 | 
				
			||||||
 | 
					    def _get_extension(cls, extension):
 | 
				
			||||||
 | 
					        r = extension.lower()
 | 
				
			||||||
 | 
					        if r == "jpeg":
 | 
				
			||||||
 | 
					            return "jpg"
 | 
				
			||||||
 | 
					        return r
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    @classmethod
 | 
				
			||||||
 | 
					    def _mangle_property(cls, properties, name):
 | 
				
			||||||
 | 
					        if name in properties:
 | 
				
			||||||
 | 
					            properties[name] = getattr(cls, "_get_{}".format(name))(
 | 
				
			||||||
 | 
					                properties[name]
 | 
				
			||||||
 | 
					            )
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    @classmethod
 | 
				
			||||||
 | 
					    def from_path(cls, path):
 | 
				
			||||||
 | 
					        """
 | 
				
			||||||
 | 
					        We use a crude naming convention to make handling the correspondent,
 | 
				
			||||||
 | 
					        title, and tags easier:
 | 
				
			||||||
 | 
					          "<correspondent> - <title> - <tags>.<suffix>"
 | 
				
			||||||
 | 
					          "<correspondent> - <title>.<suffix>"
 | 
				
			||||||
 | 
					          "<title>.<suffix>"
 | 
				
			||||||
 | 
					        """
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					        for regex in cls.REGEXES.values():
 | 
				
			||||||
 | 
					            m = regex.match(os.path.basename(path))
 | 
				
			||||||
 | 
					            if m:
 | 
				
			||||||
 | 
					                properties = m.groupdict()
 | 
				
			||||||
 | 
					                cls._mangle_property(properties, "created")
 | 
				
			||||||
 | 
					                cls._mangle_property(properties, "correspondent")
 | 
				
			||||||
 | 
					                cls._mangle_property(properties, "title")
 | 
				
			||||||
 | 
					                cls._mangle_property(properties, "tags")
 | 
				
			||||||
 | 
					                cls._mangle_property(properties, "extension")
 | 
				
			||||||
 | 
					                return cls(**properties)
 | 
				
			||||||
 | 
				
			|||||||
@ -1,29 +1,36 @@
 | 
				
			|||||||
from django.test import TestCase
 | 
					from django.test import TestCase
 | 
				
			||||||
 | 
					
 | 
				
			||||||
from ..consumer import Consumer
 | 
					from ..models import Document, FileInfo
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
class TestAttachment(TestCase):
 | 
					class TestAttachment(TestCase):
 | 
				
			||||||
 | 
					
 | 
				
			||||||
    TAGS = ("tag1", "tag2", "tag3")
 | 
					    TAGS = ("tag1", "tag2", "tag3")
 | 
				
			||||||
    CONSUMER = Consumer()
 | 
					    EXTENSIONS = (
 | 
				
			||||||
    SUFFIXES = (
 | 
					 | 
				
			||||||
        "pdf", "png", "jpg", "jpeg", "gif",
 | 
					        "pdf", "png", "jpg", "jpeg", "gif",
 | 
				
			||||||
        "PDF", "PNG", "JPG", "JPEG", "GIF",
 | 
					        "PDF", "PNG", "JPG", "JPEG", "GIF",
 | 
				
			||||||
        "PdF", "PnG", "JpG", "JPeG", "GiF",
 | 
					        "PdF", "PnG", "JpG", "JPeG", "GiF",
 | 
				
			||||||
    )
 | 
					    )
 | 
				
			||||||
 | 
					
 | 
				
			||||||
    def _test_guess_attributes_from_name(self, path, sender, title, tags):
 | 
					    def _test_guess_attributes_from_name(self, path, sender, title, tags):
 | 
				
			||||||
        for suffix in self.SUFFIXES:
 | 
					
 | 
				
			||||||
            f = path.format(suffix)
 | 
					        for extension in self.EXTENSIONS:
 | 
				
			||||||
            results = self.CONSUMER._guess_attributes_from_name(f)
 | 
					
 | 
				
			||||||
            self.assertEqual(results[0].name, sender, f)
 | 
					            f = path.format(extension)
 | 
				
			||||||
            self.assertEqual(results[1], title, f)
 | 
					            file_info = FileInfo.from_path(f)
 | 
				
			||||||
            self.assertEqual(tuple([t.slug for t in results[2]]), tags, f)
 | 
					
 | 
				
			||||||
            if suffix.lower() == "jpeg":
 | 
					            if sender:
 | 
				
			||||||
                self.assertEqual(results[3], "jpg", f)
 | 
					                self.assertEqual(file_info.correspondent.name, sender, f)
 | 
				
			||||||
            else:
 | 
					            else:
 | 
				
			||||||
                self.assertEqual(results[3], suffix.lower(), f)
 | 
					                self.assertIsNone(file_info.correspondent, f)
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					            self.assertEqual(file_info.title, title, f)
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					            self.assertEqual(tuple([t.slug for t in file_info.tags]), tags, f)
 | 
				
			||||||
 | 
					            if extension.lower() == "jpeg":
 | 
				
			||||||
 | 
					                self.assertEqual(file_info.extension, "jpg", f)
 | 
				
			||||||
 | 
					            else:
 | 
				
			||||||
 | 
					                self.assertEqual(file_info.extension, extension.lower(), f)
 | 
				
			||||||
 | 
					
 | 
				
			||||||
    def test_guess_attributes_from_name0(self):
 | 
					    def test_guess_attributes_from_name0(self):
 | 
				
			||||||
        self._test_guess_attributes_from_name(
 | 
					        self._test_guess_attributes_from_name(
 | 
				
			||||||
@ -92,3 +99,206 @@ class TestAttachment(TestCase):
 | 
				
			|||||||
            "Τιτλε",
 | 
					            "Τιτλε",
 | 
				
			||||||
            self.TAGS
 | 
					            self.TAGS
 | 
				
			||||||
        )
 | 
					        )
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    def test_guess_attributes_from_name_when_correspondent_empty(self):
 | 
				
			||||||
 | 
					        self._test_guess_attributes_from_name(
 | 
				
			||||||
 | 
					            '/path/to/ - weird empty correspondent but should not break.{}',
 | 
				
			||||||
 | 
					            None,
 | 
				
			||||||
 | 
					            'weird empty correspondent but should not break',
 | 
				
			||||||
 | 
					            ()
 | 
				
			||||||
 | 
					        )
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    def test_guess_attributes_from_name_when_title_starts_with_dash(self):
 | 
				
			||||||
 | 
					        self._test_guess_attributes_from_name(
 | 
				
			||||||
 | 
					            '/path/to/- weird but should not break.{}',
 | 
				
			||||||
 | 
					            None,
 | 
				
			||||||
 | 
					            '- weird but should not break',
 | 
				
			||||||
 | 
					            ()
 | 
				
			||||||
 | 
					        )
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    def test_guess_attributes_from_name_when_title_ends_with_dash(self):
 | 
				
			||||||
 | 
					        self._test_guess_attributes_from_name(
 | 
				
			||||||
 | 
					            '/path/to/weird but should not break -.{}',
 | 
				
			||||||
 | 
					            None,
 | 
				
			||||||
 | 
					            'weird but should not break -',
 | 
				
			||||||
 | 
					            ()
 | 
				
			||||||
 | 
					        )
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    def test_guess_attributes_from_name_when_title_is_empty(self):
 | 
				
			||||||
 | 
					        self._test_guess_attributes_from_name(
 | 
				
			||||||
 | 
					            '/path/to/weird correspondent but should not break - .{}',
 | 
				
			||||||
 | 
					            'weird correspondent but should not break',
 | 
				
			||||||
 | 
					            '',
 | 
				
			||||||
 | 
					            ()
 | 
				
			||||||
 | 
					        )
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					class Permutations(TestCase):
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    valid_dates = (
 | 
				
			||||||
 | 
					        "20150102030405Z",
 | 
				
			||||||
 | 
					        "20150102Z",
 | 
				
			||||||
 | 
					    )
 | 
				
			||||||
 | 
					    valid_correspondents = [
 | 
				
			||||||
 | 
					        "timmy",
 | 
				
			||||||
 | 
					        "Dr. McWheelie",
 | 
				
			||||||
 | 
					        "Dash Gor-don",
 | 
				
			||||||
 | 
					        "ο Θερμαστής",
 | 
				
			||||||
 | 
					        ""
 | 
				
			||||||
 | 
					    ]
 | 
				
			||||||
 | 
					    valid_titles = ["title", "Title w Spaces", "Title a-dash", "Τίτλος", ""]
 | 
				
			||||||
 | 
					    valid_tags = ["tag", "tig,tag", "tag1,tag2,tag-3"]
 | 
				
			||||||
 | 
					    valid_extensions = ["pdf", "png", "jpg", "jpeg", "gif"]
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    def _test_guessed_attributes(self, filename, created=None,
 | 
				
			||||||
 | 
					                                 correspondent=None, title=None,
 | 
				
			||||||
 | 
					                                 extension=None, tags=None):
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					        # print(filename)
 | 
				
			||||||
 | 
					        info = FileInfo.from_path(filename)
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					        # Created
 | 
				
			||||||
 | 
					        if created is None:
 | 
				
			||||||
 | 
					            self.assertIsNone(info.created, filename)
 | 
				
			||||||
 | 
					        else:
 | 
				
			||||||
 | 
					            self.assertEqual(info.created.year, int(created[:4]), filename)
 | 
				
			||||||
 | 
					            self.assertEqual(info.created.month, int(created[4:6]), filename)
 | 
				
			||||||
 | 
					            self.assertEqual(info.created.day, int(created[6:8]), filename)
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					        # Correspondent
 | 
				
			||||||
 | 
					        if correspondent:
 | 
				
			||||||
 | 
					            self.assertEqual(info.correspondent.name, correspondent, filename)
 | 
				
			||||||
 | 
					        else:
 | 
				
			||||||
 | 
					            self.assertEqual(info.correspondent, None, filename)
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					        # Title
 | 
				
			||||||
 | 
					        self.assertEqual(info.title, title, filename)
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					        # Tags
 | 
				
			||||||
 | 
					        if tags is None:
 | 
				
			||||||
 | 
					            self.assertEqual(info.tags, (), filename)
 | 
				
			||||||
 | 
					        else:
 | 
				
			||||||
 | 
					            self.assertEqual(
 | 
				
			||||||
 | 
					                [t.slug for t in info.tags], tags.split(','),
 | 
				
			||||||
 | 
					                filename
 | 
				
			||||||
 | 
					            )
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					        # Extension
 | 
				
			||||||
 | 
					        if extension == 'jpeg':
 | 
				
			||||||
 | 
					            extension = 'jpg'
 | 
				
			||||||
 | 
					        self.assertEqual(info.extension, extension, filename)
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    def test_just_title(self):
 | 
				
			||||||
 | 
					        template = '/path/to/{title}.{extension}'
 | 
				
			||||||
 | 
					        for title in self.valid_titles:
 | 
				
			||||||
 | 
					            for extension in self.valid_extensions:
 | 
				
			||||||
 | 
					                spec = dict(title=title, extension=extension)
 | 
				
			||||||
 | 
					                filename = template.format(**spec)
 | 
				
			||||||
 | 
					                self._test_guessed_attributes(filename, **spec)
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    def test_title_and_correspondent(self):
 | 
				
			||||||
 | 
					        template = '/path/to/{correspondent} - {title}.{extension}'
 | 
				
			||||||
 | 
					        for correspondent in self.valid_correspondents:
 | 
				
			||||||
 | 
					            for title in self.valid_titles:
 | 
				
			||||||
 | 
					                for extension in self.valid_extensions:
 | 
				
			||||||
 | 
					                    spec = dict(correspondent=correspondent, title=title,
 | 
				
			||||||
 | 
					                                extension=extension)
 | 
				
			||||||
 | 
					                    filename = template.format(**spec)
 | 
				
			||||||
 | 
					                    self._test_guessed_attributes(filename, **spec)
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    def test_title_and_correspondent_and_tags(self):
 | 
				
			||||||
 | 
					        template = '/path/to/{correspondent} - {title} - {tags}.{extension}'
 | 
				
			||||||
 | 
					        for correspondent in self.valid_correspondents:
 | 
				
			||||||
 | 
					            for title in self.valid_titles:
 | 
				
			||||||
 | 
					                for tags in self.valid_tags:
 | 
				
			||||||
 | 
					                    for extension in self.valid_extensions:
 | 
				
			||||||
 | 
					                        spec = dict(correspondent=correspondent, title=title,
 | 
				
			||||||
 | 
					                                    tags=tags, extension=extension)
 | 
				
			||||||
 | 
					                        filename = template.format(**spec)
 | 
				
			||||||
 | 
					                        self._test_guessed_attributes(filename, **spec)
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    def test_created_and_correspondent_and_title_and_tags(self):
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					        template = ("/path/to/{created} - "
 | 
				
			||||||
 | 
					                    "{correspondent} - "
 | 
				
			||||||
 | 
					                    "{title} - "
 | 
				
			||||||
 | 
					                    "{tags}"
 | 
				
			||||||
 | 
					                    ".{extension}")
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					        for created in self.valid_dates:
 | 
				
			||||||
 | 
					            for correspondent in self.valid_correspondents:
 | 
				
			||||||
 | 
					                for title in self.valid_titles:
 | 
				
			||||||
 | 
					                    for tags in self.valid_tags:
 | 
				
			||||||
 | 
					                        for extension in self.valid_extensions:
 | 
				
			||||||
 | 
					                            spec = {
 | 
				
			||||||
 | 
					                                "created": created,
 | 
				
			||||||
 | 
					                                "correspondent": correspondent,
 | 
				
			||||||
 | 
					                                "title": title,
 | 
				
			||||||
 | 
					                                "tags": tags,
 | 
				
			||||||
 | 
					                                "extension": extension
 | 
				
			||||||
 | 
					                            }
 | 
				
			||||||
 | 
					                            self._test_guessed_attributes(
 | 
				
			||||||
 | 
					                                template.format(**spec), **spec)
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    def test_created_and_correspondent_and_title(self):
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					        template = ("/path/to/{created} - "
 | 
				
			||||||
 | 
					                    "{correspondent} - "
 | 
				
			||||||
 | 
					                    "{title}"
 | 
				
			||||||
 | 
					                    ".{extension}")
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					        for created in self.valid_dates:
 | 
				
			||||||
 | 
					            for correspondent in self.valid_correspondents:
 | 
				
			||||||
 | 
					                for title in self.valid_titles:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					                    # Skip cases where title looks like a tag as we can't
 | 
				
			||||||
 | 
					                    # accommodate such cases.
 | 
				
			||||||
 | 
					                    if title.lower() == title:
 | 
				
			||||||
 | 
					                        continue
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					                    for extension in self.valid_extensions:
 | 
				
			||||||
 | 
					                        spec = {
 | 
				
			||||||
 | 
					                            "created": created,
 | 
				
			||||||
 | 
					                            "correspondent": correspondent,
 | 
				
			||||||
 | 
					                            "title": title,
 | 
				
			||||||
 | 
					                            "extension": extension
 | 
				
			||||||
 | 
					                        }
 | 
				
			||||||
 | 
					                        self._test_guessed_attributes(
 | 
				
			||||||
 | 
					                            template.format(**spec), **spec)
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    def test_created_and_title(self):
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					        template = ("/path/to/{created} - "
 | 
				
			||||||
 | 
					                    "{title}"
 | 
				
			||||||
 | 
					                    ".{extension}")
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					        for created in self.valid_dates:
 | 
				
			||||||
 | 
					            for title in self.valid_titles:
 | 
				
			||||||
 | 
					                for extension in self.valid_extensions:
 | 
				
			||||||
 | 
					                    spec = {
 | 
				
			||||||
 | 
					                        "created": created,
 | 
				
			||||||
 | 
					                        "title": title,
 | 
				
			||||||
 | 
					                        "extension": extension
 | 
				
			||||||
 | 
					                    }
 | 
				
			||||||
 | 
					                    self._test_guessed_attributes(
 | 
				
			||||||
 | 
					                        template.format(**spec), **spec)
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    def test_created_and_title_and_tags(self):
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					        template = ("/path/to/{created} - "
 | 
				
			||||||
 | 
					                    "{title} - "
 | 
				
			||||||
 | 
					                    "{tags}"
 | 
				
			||||||
 | 
					                    ".{extension}")
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					        for created in self.valid_dates:
 | 
				
			||||||
 | 
					            for title in self.valid_titles:
 | 
				
			||||||
 | 
					                for tags in self.valid_tags:
 | 
				
			||||||
 | 
					                    for extension in self.valid_extensions:
 | 
				
			||||||
 | 
					                        spec = {
 | 
				
			||||||
 | 
					                            "created": created,
 | 
				
			||||||
 | 
					                            "title": title,
 | 
				
			||||||
 | 
					                            "tags": tags,
 | 
				
			||||||
 | 
					                            "extension": extension
 | 
				
			||||||
 | 
					                        }
 | 
				
			||||||
 | 
					                        self._test_guessed_attributes(
 | 
				
			||||||
 | 
					                            template.format(**spec), **spec)
 | 
				
			||||||
 | 
				
			|||||||
		Loading…
	
	
			
			x
			
			
		
	
		Reference in New Issue
	
	Block a user