OCR

Performs Optical Character Recognition using the Tesseract Library. Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your ‘tessdata’ directory. Download the tessdata directory with git clone https://github.com/tesseract-ocr/tessdata.git

Type

transform

Class

fire.nodes.ocr.NodeOCRTesseract

Fields

Name

Title

Description

imageNameCol

Image Name Column

input image column name

imageCol

Image Column

input image column name

outputCol

Output OCR Column

output column name

Details

Model OCR Extract Node

This node extracts text from images using an OCR (Optical Character Recognition) model. It takes an image as input and outputs the extracted text as a string.

Examples

Model OCR Extract Node Example

Given the following dataset:

| ImageColumn |
|---|---|
| [image data] |

If you configure the Model OCR Extract node to extract text from the ImageColumn, the output would look like this:

ImageColumn OutputOCRColumn

[image data] “This is the extracted text from the image.”