PDF Image OCR
===========

Reads in PDF Files from a given path, extracts the images from them, and converts them to text with Tesseract

Input
--------------
It reads in a PDF file or a directory containing PDF files

Output
--------------
It creates a DataFrame from the data read and sends it to its output

Type
--------- 

dataset

Class
--------- 

fire.nodes.dataset.NodeDatasetPDFImageOCR

Fields
--------- 

.. list-table::
      :widths: 10 5 10
      :header-rows: 1

      * - Name
        - Title
        - Description
      * - path
        - Path of the PDF files
        - Path of the PDF file/directory
      * - fileNameCol
        - File Name Column
        - File Name Column in the Output DataFrame
      * - outputCol
        - Column Name which contains the result of OCR
        - OCR output column in the Output DataFrame