Informatica OCR plugin is a PowerCenter based tool which leverages the image processing capabilities of ABBYY FineReader and the parsing capabilities of Informatica DT Studio to convert and process image files.The plugin comprises of a simple PowerCenter workflow. The workflow consists of a mapping which triggers a DT service. The DT code uses java to invoke the ABBYY engine on the server which does the initial conversion of source files from image to text. The text is kept in memory and can then be parsed by the DT service as per the business requirements and the relevant data returned to PowerCenter.
FeaturesCan read and parse scanned text images.The input can be a file-list of the image files. The text is stored in the memory and hence other business transformations can be applied on the fly.
Current Version: 1.1Release Date: April 27, 2012. System Requirements :
- Operating System: Red Hat Enterprise Linux 5.6 or later, Suse Linux Enterprise Server 10 or later.
- RAM: 2GB or more (recommended)
- 10 MB free hard disk space.
- PowerCenter ETL Job Requires PowerCenter 8.5x or later.
- Data Transformation Studio
- ABBYY FineReader Engine 9.0 CLI for Linux.