Improves the docs: OCRing files in languages other than English + fixes typos

2025-12-23 13:27:24 -05:00 · 2016-03-21 21:57:36 +01:00 · 2016-03-21 21:57:36 +01:00 · 8115cf8905
commit 8115cf8905
parent 840626e571
5 changed files with 22 additions and 3 deletions
--- a/README.rst
+++ b/README.rst
@ -59,7 +59,7 @@ powerful tools.
 * `ImageMagick`_ converts the images between colour and greyscale.
 * `Tesseract`_ does the character recognition.
-* `Unpaper`_ despeckles and and deskews the scanned image.
+* `Unpaper`_ despeckles and deskews the scanned image.
 * `GNU Privacy Guard`_ is used as the encryption backend.
 * `Python 3`_ is the language of the project.
--- a/docs/consumption.rst
+++ b/docs/consumption.rst
@ -128,7 +128,7 @@ following name/value pairs:
  don't start uploading stuff to your server.  The means of generating this
  signature is defined below.
-Specify ``enctype="multipart/form-data"``, and then POST your file with:::
+Specify ``enctype="multipart/form-data"``, and then POST your file with::
    Content-Disposition: form-data; name="document"; filename="whatever.pdf"
--- a/docs/index.rst
+++ b/docs/index.rst
@ -33,4 +33,5 @@ Contents
   api
   utilities
   migrating
   troubleshooting 
   changelog
--- a/docs/requirements.rst
+++ b/docs/requirements.rst
@ -8,7 +8,7 @@ should work) that has the following software installed on it:
 * `Python3`_ (with development libraries, pip and virtualenv)
 * `GNU Privacy Guard`_
-* `Tesseract`_
+* `Tesseract`_, plus it's language files matching your document base.
 * `Imagemagick`_
 * `unpaper`_
--- a/docs/troubleshooting.rst
+++ b/docs/troubleshooting.rst
@ -0,0 +1,18 @@
 .. _troubleshooting:
 Troubleshooting
 ===============
 .. _troubleshooting_ocr_language_files_missing:
 Consumer warns ``OCR for XX failed``
 ------------------------------------
 If you find the OCR accuracy to be too low, and/or the document consumer warns that ``OCR for
 XX failed, but we're going to stick with what we've got since FORGIVING_OCR is enabled``, then you
 might need to install the `Tesseract language files
 <http://packages.ubuntu.com/search?keywords=tesseract-ocr>`_ marching your documents languages.
 As an example, if your documents are written in Spanish you may need to run::
    apt-get install -y tesseract-ocr-spa