npTess plugin

About this plugin

This plugin allow you to use the Tesseract OCR (Optical Character Recognition) engine in your publications, allowing to several page segmentation and other options in order to get the best possible results from BMP, JPG, PNG, GIF and TIFF images.

You can instantiate one or more Tess objects in order to do the work, and specify several events in order to control the text recognition task process. With the plugin are included all the files needed to be deployed along your publication.

Thirparty

This plugin are imposible without the aid of this people:

Thanks a lot!

Plugin actions index

npTessCreate

Create a new instance of an Tess object. Specify the Tesseract DLL (included by the plugin) path and the Tesseract "tessdata" directory path. The result variable store a numeric identifier which you must to use with other actions of this plugin.

↑↑

npTessDestroy

Destroy a previously created instanace of an Tess object. The result variable store "True" if everything is OK, or "False" if an error ocurr. In this last case the [LastError] variable store information about the error.

↑↑

npTessDestroyAll

Destroy all previously created instances of Tess objects.

↑↑

npTessGetText

Extract the text of an image file using the Tesseract engine. Specify the path of the image file that you want to extract their text. This image can be in one of this formats: BMP, JPG, GIF, PNG and TIFF. The Page segment mode argument can be one of the following values:

Optionally you can specify a language code for use in Tesseract. Note that the plugin include various well tested languages with the included version of Tesseract. So if you take a look at the "tessdata" directory you can found files like "eng.traineddata", "spa.traineddata" and "ita.traineddata". The language code is the part until the dot, here "eng", "spa" and "ita."

If you don't specity a language code the plugin use the default one: "eng" (english). Language files not included by the plugin can cause problems when run Tesseract: use only language files for the version of Tesseract included by the plugin (see Tesseract version) or better the language files included directly in the plugin installation.

Optionally you can specify a text file path containing some additional Tesseract configuration. Not to much information about this options on the Tesseract website, and, is possible that, due the nature of this plugin, some options cannot be used. For information about the possible options to use I remite you directly to the Tesseract website.

Finally, the result variable store the text recognized and returned by Tesseract OCR (*). Note that this string can be empty if Tesseract cannot recognize any text: in this case you can try changing the Page segment mode argument in this case. The result variable can also store "False" if an error ocurr. In this last case the [LastError] contain information about the possible error.

* Note that you also can retrieve the recognized text using the appropiate plugin event: npOnTessEnd.

↑↑

npTessCancel

Cancel the text extraction operation of the specified Tess object. Note when the task are cancelled the appropiate event is fired: npOnTessCancel. The result variable store "True" if everything is OK, or "False" if an error ocurr. In this last case the [LastError] variable store information about the error.

↑↑

npOnTessEnd

Set a publication subroutine to be executed when the specified Tess object end event are fired. This event is fired when the text recognition task end. The Instance ID variable store the ID of the Tess object who raise this event: you can use this variable to share the same subroutine with various Tess objects instances. The Text variable store the recognized text returned by Tesseract OCR. The result variable store "True" if everything is OK, or "False" if an error ocurr. In this last case the [LastError] variable store information about the error.

↑↑

npOnTessError

Set a publication subroutine to be executed when the specified Tess object error event are fired. The Instance ID variable store the ID of the Tess object who raise this event: you can use this variable to share the same subroutine with various Tess objects instances. The Error variable store the Tesseract error code that produce the error. The result variable store "True" if everything is OK, or "False" if an error ocurr. In this last case the [LastError] variable store information about the error.

↑↑

npOnTessCancel

Set a publication subroutine to be executed when the specified Tess object cancel event are fired. This event is fired when the text recognition task are cancelled using the npTessCancel action.

The Instance ID variable store the ID of the Tess object who raise this event: you can use this variable to share the same subroutine with various Tess objects instances. The result variable store "True" if everything is OK, or "False" if an error ocurr. In this last case the [LastError] variable store information about the error.

↑↑

npOnTessProgress

Set a publication subroutine to be executed when the specified Tess object progress event are fired. This event is fired while a text recognition task is running. The event don't provide a progress/total progress information, but only inform you that a task is running. The Instance ID variable store the ID of the Tess object who raise this event: you can use this variable to share the same subroutine with various Tess objects instances. The result variable store "True" if everything is OK, or "False" if an error ocurr. In this last case the [LastError] variable store information about the error.

↑↑

Plugin deployment

Two important things to remember when deploy a publication that use this plugin. Since this plugin (and therefore your publication) use the Tesseract OCR Engine, and this are distributed under the Apache License, firstly you need to agree with the Apache license terms, which require from you this:

  1. Include a copy of the license in any redistribution you may make that includes Apache software.
  2. Provide clear attribution to The Apache Software Foundation for any distributions that include Apache software.

So remember to include a copy of the Apache license file (the plugin include one, following the license, in the plugin samples directory, that you can reuse) and provide an attribution to the Apache Software Foundation, like I made on this help file (see below) and also in the plugin about dialog. So you can use this plugin and the Tesseract OCR in commercial applications? Yes (like this plugin) when you agree with the above terms (like this plugin).

Secondly, you need to distribute your publication along the Tesseract DLL, which is included and placed by the plugin installer in the plugin samples directory too. Finally you need to distribute along your publication the Tesseract "tessdata" directory, which contain language files and others files using by Tesseract and also is included by the plugin.

In this last case only one language are always needed: the english language, that corresponde to the file "eng.traineddata". The other language files (included by the plugin) are optional, and only must be distributed if you use these languages. The file "osd.traineddata" is algo optional and only needed if you use the argument "tpsmAutomaticWithOSD" when run Tesseract.

Finally remember that the plugin allow you to place the Tesseract DLL and Tesseract "tessdata" directory whatever you wanted: so the DLL and the "tessdata" directory can be placed on your publication directory or in other place with no problem. Just specify the appropiate paths when you create Tess objects instances with the plugin.

↑↑

Tesseract version

For your information, this plugin work with Tesseract version 3.01, the latest available version when the plugin are published.

↑↑

Action errors subroutine

All the NeoPlugins deal with errors in the same way that NeoBook does: when the plugin found an action error the [LastError] variable store information about the error, so you can take care about this variable when execute an action.

But all the NeoPlugins also incorporate an advanced way to deal with possible action errors. You can define a subroutine named OnNeoPluginActionError in order to be executed when some action error are found and you can use this variables inside:

Note that this error handling subroutine are shared for all the NeoPlugins, so you no need to specify a subroutine for every plugin you use in your publication because the same subroutine are recognized and automagically used by every NeoPlugin. Below you can view a sample of this subroutine code:

:OnNeoPluginActionError
  AlertBox "NeoPlugin Error" "Error [LastError] in plugin: [PluginName]"
Return

Also note that the use of this NeoPlugins error handling subroutine is completelly optional. You can continue using the [LastError] variable as usual and even use the both methods at the same time.

↑↑