0 votes

The program appears to support a certain set of languages:

Supported OCR languages

  • Chinese - Simplified
  • English
  • French
  • German
  • Italian
  • Spanish

How do I add other languages, e.g. Chinese Traditional?

by (29.5k points)

1 Answer

+1 vote

Yes, you can add additional languages by downloading the appropriate OCR training data for the language:

https://github.com/tesseract-ocr/tessdata

Download the language you're interested in (e.g. the Traditional Chinese file is called 'chi_tr.traineddata') and add it to the folder:

C:\Program Files\Mythicsoft\FileLocator Pro\ocr\training_data

If you then re-open the OCR Settings you should now see the new language available.

Chinese - Traditional now available as option


Additional information: The default list of languages is limited due to the size of the training data, to avoid the installer becoming unnecessarily large. For example, the regular Traditional Chinese file alone is 50MB in size.

It is possible to download smaller training data using the 'fast' data, which are the same files used in the default installation:

https://github.com/tesseract-ocr/tessdata_fast

by (29.5k points)
...