mirror of
				https://github.com/paperless-ngx/paperless-ngx.git
				synced 2025-11-03 19:17:13 -05:00 
			
		
		
		
	Improves the docs: OCRing files in languages other than English + fixes typos
This commit is contained in:
		
							parent
							
								
									840626e571
								
							
						
					
					
						commit
						8115cf8905
					
				@ -59,7 +59,7 @@ powerful tools.
 | 
				
			|||||||
 | 
					
 | 
				
			||||||
* `ImageMagick`_ converts the images between colour and greyscale.
 | 
					* `ImageMagick`_ converts the images between colour and greyscale.
 | 
				
			||||||
* `Tesseract`_ does the character recognition.
 | 
					* `Tesseract`_ does the character recognition.
 | 
				
			||||||
* `Unpaper`_ despeckles and and deskews the scanned image.
 | 
					* `Unpaper`_ despeckles and deskews the scanned image.
 | 
				
			||||||
* `GNU Privacy Guard`_ is used as the encryption backend.
 | 
					* `GNU Privacy Guard`_ is used as the encryption backend.
 | 
				
			||||||
* `Python 3`_ is the language of the project.
 | 
					* `Python 3`_ is the language of the project.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
				
			|||||||
@ -128,7 +128,7 @@ following name/value pairs:
 | 
				
			|||||||
  don't start uploading stuff to your server.  The means of generating this
 | 
					  don't start uploading stuff to your server.  The means of generating this
 | 
				
			||||||
  signature is defined below.
 | 
					  signature is defined below.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
Specify ``enctype="multipart/form-data"``, and then POST your file with:::
 | 
					Specify ``enctype="multipart/form-data"``, and then POST your file with::
 | 
				
			||||||
 | 
					
 | 
				
			||||||
    Content-Disposition: form-data; name="document"; filename="whatever.pdf"
 | 
					    Content-Disposition: form-data; name="document"; filename="whatever.pdf"
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
				
			|||||||
@ -33,4 +33,5 @@ Contents
 | 
				
			|||||||
   api
 | 
					   api
 | 
				
			||||||
   utilities
 | 
					   utilities
 | 
				
			||||||
   migrating
 | 
					   migrating
 | 
				
			||||||
 | 
					   troubleshooting 
 | 
				
			||||||
   changelog
 | 
					   changelog
 | 
				
			||||||
 | 
				
			|||||||
@ -8,7 +8,7 @@ should work) that has the following software installed on it:
 | 
				
			|||||||
 | 
					
 | 
				
			||||||
* `Python3`_ (with development libraries, pip and virtualenv)
 | 
					* `Python3`_ (with development libraries, pip and virtualenv)
 | 
				
			||||||
* `GNU Privacy Guard`_
 | 
					* `GNU Privacy Guard`_
 | 
				
			||||||
* `Tesseract`_
 | 
					* `Tesseract`_, plus it's language files matching your document base.
 | 
				
			||||||
* `Imagemagick`_
 | 
					* `Imagemagick`_
 | 
				
			||||||
* `unpaper`_
 | 
					* `unpaper`_
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
				
			|||||||
							
								
								
									
										18
									
								
								docs/troubleshooting.rst
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										18
									
								
								docs/troubleshooting.rst
									
									
									
									
									
										Normal file
									
								
							@ -0,0 +1,18 @@
 | 
				
			|||||||
 | 
					.. _troubleshooting:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Troubleshooting
 | 
				
			||||||
 | 
					===============
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					.. _troubleshooting_ocr_language_files_missing:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Consumer warns ``OCR for XX failed``
 | 
				
			||||||
 | 
					------------------------------------
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					If you find the OCR accuracy to be too low, and/or the document consumer warns that ``OCR for
 | 
				
			||||||
 | 
					XX failed, but we're going to stick with what we've got since FORGIVING_OCR is enabled``, then you
 | 
				
			||||||
 | 
					might need to install the `Tesseract language files
 | 
				
			||||||
 | 
					<http://packages.ubuntu.com/search?keywords=tesseract-ocr>`_ marching your documents languages.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					As an example, if your documents are written in Spanish you may need to run::
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    apt-get install -y tesseract-ocr-spa
 | 
				
			||||||
		Loading…
	
	
			
			x
			
			
		
	
		Reference in New Issue
	
	Block a user