Project: Capture Data from Scanned Forms

This scenario demonstrates how to capture data from scanned forms saved as image files using OCR (optical character recognition) technology.

For information about the OCR Picture object that will be used throughout this project, see Advanced Picture Object.

The procedure presented in this tutorial is only appropriate for forms that have been scanned as images, such that the data entered onto the form is not available digitally. For a project that demonstrates capturing data from forms that were filled electronically using a PDF reader, see Project: Write Data from PDF Forms into Excel.

Scenario Overview

In this scenario, a workflow is created to read data from PDF forms that were filled in by hand and then scanned and saved as image files (.jpg).

The workflow will capture the form data from all scanned forms located in a specific folder. It will then display the captured data in a callout.

Download Project Files

  1. Download the project files here.

  2. Unzip the downloaded file.

  3. Download the sample scanned forms here, unzip, and save them to c:/temp/forms.

Prerequisites

This project requires that NICE Advanced OCR be installed. See Install NICE Advanced OCR.

Recommended Implementation Approach

The solution demonstrated follows the steps below:

  1. Examine different methods of capturing text from forms saved as images.

  2. Load a template form into an Advanced Picture object using the Load Image from File method.

  3. Analyze the text returned when capturing all text from a form using the Get Text from Image method. Identify the static and dynamic texts around each field to be captured.

  4. Create a user-defined function to return the text between specified static and dynamic texts using regular expression functions.

  5. Create a list of all images in a folder using the Get File Information from Folder function from the File built-in service.

  6. Create a user-defined type called Contact to store all details from a single form.

  7. Cycle through those image files. For each file:

    1. Load the image into an Advanced Picture object using the Load Image from File method.

    2. Capture all text from the image using the Get Text from Image method.

    3. Remove unwanted codes from the text using the Replace Subtext function from the Text built-in service.

    4. Read all required fields from the form using the user-defined function created earlier.

    5. Populate a list variable of type Contact with all the data retrieved.

  8. Build a callout that includes a dynamic table that is linked to the list variable of type Contact, to display the data retrieved from all forms.

The procedure is demonstrated in the following video.

The video is divided as follows:

Time Section
0:00 Start
0:32 Analysis & Alternatives
2:23 Analyze the Form
10:32 Create a User-Defined Function for Regex Searches
15:20 Prepare Variables
16:48 Build the Solution Workflow
22:50 Build a Callout