mirror of
				https://github.com/paperless-ngx/paperless-ngx.git
				synced 2025-10-24 23:39:05 -04:00 
			
		
		
		
	Documented consumption
This commit is contained in:
		
							parent
							
								
									330dfa544b
								
							
						
					
					
						commit
						cec9968cdb
					
				
							
								
								
									
										154
									
								
								docs/consumption.rst
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										154
									
								
								docs/consumption.rst
									
									
									
									
									
										Normal file
									
								
							| @ -0,0 +1,154 @@ | |||||||
|  | .. _consumption: | ||||||
|  | 
 | ||||||
|  | Consumption | ||||||
|  | ########### | ||||||
|  | 
 | ||||||
|  | Once you've got *Paperless* setup, you need to start feeding documents into it. | ||||||
|  | Currently, there are three options: the consumption directory, IMAP (email), and | ||||||
|  | HTTP POST. | ||||||
|  | 
 | ||||||
|  | 
 | ||||||
|  | .. _consumption-directory: | ||||||
|  | 
 | ||||||
|  | The Consumption Directory | ||||||
|  | ========================= | ||||||
|  | 
 | ||||||
|  | The primary method of getting documents into your database is by putting them in | ||||||
|  | the consumption directory.  The ``document_consumer`` script runs in an infinite | ||||||
|  | loop looking for new additions to this directory and when it finds them, it goes | ||||||
|  | about the process of parsing them with the OCR, indexing what it finds, and | ||||||
|  | encrypting the PDF, storing it in the media directory. | ||||||
|  | 
 | ||||||
|  | Getting stuff into this directory is up to you.  If you're running *Paperless* | ||||||
|  | on your local computer, you might just want to drag and drop files there, but if | ||||||
|  | you're running this on a server and want your scanner to automatically push | ||||||
|  | files to this directory, you'll need to setup some sort of service to accept the | ||||||
|  | files from the scanner.  Typically, you're looking at an FTP server like | ||||||
|  | `Proftpd`_ or `Samba`_. | ||||||
|  | 
 | ||||||
|  | .. _Proftpd: http://www.proftpd.org/ | ||||||
|  | .. _Samba: http://www.samba.org/ | ||||||
|  | 
 | ||||||
|  | So where is this consumption directory?  It's wherever you define it.  Look for | ||||||
|  | the ``CONSUMPTION_DIR`` value in ``settings.py``.  Set that to somewhere | ||||||
|  | appropriate for your use and put some documents in there.  When you're ready, | ||||||
|  | follow the :ref:`consumer <utilities-consumer>` instructions to get it running. | ||||||
|  | 
 | ||||||
|  | 
 | ||||||
|  | .. _consumption-directory-naming: | ||||||
|  | 
 | ||||||
|  | A Note on File Naming | ||||||
|  | --------------------- | ||||||
|  | 
 | ||||||
|  | Any document you put into the consumption directory will be consumed, but if you | ||||||
|  | name the file right, it'll automatically set some values in the database for | ||||||
|  | you.  This is is the logic the consumer follows: | ||||||
|  | 
 | ||||||
|  | 1. Try to find the sender, title, and tags in the file name following the | ||||||
|  |    pattern: ``Sender - Title - tag,tag,tag.pdf``. | ||||||
|  | 2. If that doesn't work, try to find the sender and title in the file name | ||||||
|  |    following the pattern:  ``Sender - Title.pdf``. | ||||||
|  | 3. If that doesn't work, just assume that the name of the file is the title. | ||||||
|  | 
 | ||||||
|  | So given the above, the following examples would work as you'd expect: | ||||||
|  | 
 | ||||||
|  | * ``Some Company Name - Invoice 2016-01-01 - money,invoices.pdf`` | ||||||
|  | * ``Another Company - Letter of Reference.jpg`` | ||||||
|  | * ``Dad's Recipe for Pancakes.png`` | ||||||
|  | 
 | ||||||
|  | These however wouldn't work: | ||||||
|  | 
 | ||||||
|  | * ``Some Company Name, Invoice 2016-01-01, money, invoices.pdf`` | ||||||
|  | * ``Another Company- Letter of Reference.jpg`` | ||||||
|  | 
 | ||||||
|  | 
 | ||||||
|  | .. _consumption-imap: | ||||||
|  | 
 | ||||||
|  | IMAP (Email) | ||||||
|  | ============ | ||||||
|  | 
 | ||||||
|  | Another handy way to get documents into your database is to email them to | ||||||
|  | yourself.  The typical use-case would be to be out for lunch and want to send a | ||||||
|  | copy of the receipt back to your system at home.  *Paperless* can be taught to | ||||||
|  | pull emails down from an arbitrary account and dump them into the consumption | ||||||
|  | directory where the process :ref:`above <consumption-directory>` will follow the | ||||||
|  | usual pattern on consuming the document. | ||||||
|  | 
 | ||||||
|  | Some things you need to know about this feature: | ||||||
|  | 
 | ||||||
|  | * It's disabled by default.  By setting the values below it will be enabled. | ||||||
|  | * It's been tested in a limited environment, so it may not work for you (please | ||||||
|  |   submit a pull request if you can!) | ||||||
|  | * It's designed to **delete mail from the server once consumed**.  So don't go | ||||||
|  |   pointing this to your personal email account and wonder where all your stuff | ||||||
|  |   went. | ||||||
|  | * Currently, only one photo (attachment) per email will work. | ||||||
|  | 
 | ||||||
|  | So, with all that in mind, here's what you do to get it running: | ||||||
|  | 
 | ||||||
|  | 1. Setup a new email account somewhere, or if you're feeling daring, create a | ||||||
|  |    folder in an existing email box and note the path to that folder. | ||||||
|  | 2. In ``settings.py`` set all of the appropriate values in ``MAIL_CONSUMPTION``. | ||||||
|  |    If you decided to use a subfolder of an existing account, then make sure you | ||||||
|  |    set ``INBOX`` accordingly here. | ||||||
|  | 3. Restart the :ref:`consumer <utilities-consumer>`.  The consumer will check | ||||||
|  |    the configured email account every 10 minutes for something new and pull down | ||||||
|  |    whatever it finds. | ||||||
|  | 4. Send yourself an email!  Note that the subject is treated as the file name, | ||||||
|  |    so if you set the subject to ``Sender - Title - tag,tag,tag``, you'll get | ||||||
|  |    what you expect. | ||||||
|  | 5. After a few minutes, the consumer will poll your mailbox, pull down the | ||||||
|  |    message, and place the attachment in the consumption directory with the | ||||||
|  |    appropriate name.  A few minutes later, the consumer will import it like any | ||||||
|  |    other file. | ||||||
|  | 
 | ||||||
|  | 
 | ||||||
|  | .. _consumption-http: | ||||||
|  | 
 | ||||||
|  | HTTP POST | ||||||
|  | ========= | ||||||
|  | 
 | ||||||
|  | Currently, the API is limited to only handling file uploads, it doesn't do tags | ||||||
|  | yet, and the URL schema isn't concrete, but it's a start.  It's also not much of | ||||||
|  | a real API, it's just a URL that accepts an HTTP POST. | ||||||
|  | 
 | ||||||
|  | To push your document to *Paperless*, send an HTTP POST to the server with the | ||||||
|  | following name/value pairs: | ||||||
|  | 
 | ||||||
|  | * ``sender``: The name of the document's sender.  Note that there are | ||||||
|  |   restrictions on what characters you can use here.  Specifically, alphanumeric | ||||||
|  |   characters, `-`, `,`, `.`, and `'` are ok, everything else it out.  You also | ||||||
|  |   can't use the sequence ` - ` (space, dash, space). | ||||||
|  | * ``title``: The title of the document.  The rules for characters is the same | ||||||
|  |   here as the sender. | ||||||
|  | * ``signature``: For security reasons, we have the sender send a signature using | ||||||
|  |   a "shared secret" method to make sure that random strangers don't start | ||||||
|  |   uploading stuff to your server.  The means of generating this signature is | ||||||
|  |   defined below. | ||||||
|  | 
 | ||||||
|  | Specify ``enctype="multipart/form-data"``, and then POST your file with::: | ||||||
|  | 
 | ||||||
|  |     Content-Disposition: form-data; name="document"; filename="whatever.pdf" | ||||||
|  | 
 | ||||||
|  | 
 | ||||||
|  | .. _consumption-http-signature: | ||||||
|  | 
 | ||||||
|  | Generating the Signature | ||||||
|  | ------------------------ | ||||||
|  | 
 | ||||||
|  | Generating a signature based a shared secret is pretty simple: define a secret, | ||||||
|  | and store it on the server and the client.  Then use that secret, along with | ||||||
|  | the text you want to verify to generate a string that you can use for | ||||||
|  | verification. | ||||||
|  | 
 | ||||||
|  | In the case of *Paperless*, you configure the server with the secret by setting | ||||||
|  | ``UPLOAD_SHARED_SECRET``.  Then on your client, you generate your signature by | ||||||
|  | concatenating the sender, title, and the secret, and then using sha256 to | ||||||
|  | generate a hexdigest. | ||||||
|  | 
 | ||||||
|  | If you're using Python, this is what that looks like: | ||||||
|  | 
 | ||||||
|  | .. code:: python | ||||||
|  | 
 | ||||||
|  |     from hashlib import sha256 | ||||||
|  |     signature = sha256(sender + title + secret).hexdigest() | ||||||
| @ -29,6 +29,7 @@ Contents | |||||||
| 
 | 
 | ||||||
|    requirements |    requirements | ||||||
|    setup |    setup | ||||||
|  |    consumption | ||||||
|    utilities |    utilities | ||||||
|    migrating |    migrating | ||||||
|    changelog |    changelog | ||||||
|  | |||||||
		Loading…
	
	
			
			x
			
			
		
	
		Reference in New Issue
	
	Block a user