OCR¶
Performs Optical Character Recognition using the Tesseract Library. Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your ‘tessdata’ directory. Download the tessdata directory with git clone https://github.com/tesseract-ocr/tessdata.git
Type¶
transform
Class¶
fire.nodes.ocr.NodeOCRTesseract
Fields¶
Name |
Title |
Description |
|---|---|---|
imageNameCol |
Image Name Column |
input image column name |
imageCol |
Image Column |
input image column name |
outputCol |
Output OCR Column |
output column name |
Details¶
Model OCR Extract Node¶
This node extracts text from images using an OCR (Optical Character Recognition) model. It takes an image as input and outputs the extracted text as a string.
Examples¶
Model OCR Extract Node Example¶
Given the following dataset:
| ImageColumn |
|---|---|
| [image data] |
If you configure the Model OCR Extract node to extract text from the ImageColumn, the output would look like this:
ImageColumn OutputOCRColumn
[image data] “This is the extracted text from the image.”