Using Form Recognizer To Create A Document Management System

Using Form Recognizer to create a Document Management System

Form Recognizer is one of the Microsoft’s cognitive services which applies advanced machine learning to accurately extract text, key/value pairs, and tables data from typed or handwritten documents.

Functions of form recognizer:

  • It can analyze various types of documents like phone-captured images, scanned documents, and digital PDFs, typed and handwritten documents and returns structured data representation of the documents.
  • It offers a lot of prebuilt models to extract key information from receipts, invoices, and identity documents. Apart from the prebuilt models, we can also train custom models that can extract information from specific forms of our interest with just five sample documents.
  • Using Form Recognizer via the REST API or SDK will save time by reducing manual data entry errors, while also making it easier to perform additional analysis of the available data.

With composed models, we can create a single model ID by combining multiple models each corresponding to a specific type of form. When we submit a document to the composed model, the form recognizer service performs a classification step to find the type of the forms and routes it to the corresponding custom model.

Mesh 3.0- an employee experience platform has utilized the capabilities of Form Recognizer to create a Document Management System where the users can submit any type of form. The application will categorize the document type and extracts the associated key information from the form, allowing users to search for the information across the documents.

Newyork Signature
Mesh Document Management System V2
Mesh Document Management System Upload V2

Form Recognizer Features:

Layout API:

Layout API is used as a part of the custom models to detect and identify text, tables, selection marks, and structure information from documents (PDF, TIFF) and images (JPG, PNG, BMP).

The JSON returned by the Layout API contains the following nodes-

  • “readResults” – It contains all the text with its respective bounding box placement on the page.
  • “selectionMarks” ­– It has every selection mark (checkbox, radio mark), whether it is “selected” or “unselected”.
  • “pageResults” – It includes the tables that are extracted.

Prebuilt models:

The prebuilt models support receipts like sales receipts from Australia, Canada, Great Britain, India, and the United States. It also supports business cards, identity documents, invoices in various formats, and can extract key information from world-wide passports and US driver licenses.

Let’s see an example of prebuilt model that extracts information of a US driver’s license.

1. Use the open-source labelling tool, part of the Form OCR Test Toolset (FOTT)

Form Recognizer 2.1 Ga Services

2. To work with prebuilt models, click on “Use prebuilt model to get data”

Use Prebuilt Model To Get Data

3. Go to the Form Recognizer resource created in the azure portal, get the Form recognizer service endpoint and API key present in the Keys and Endpoint tab.

Keys And Endpoint Tab

4. Provide the Form recognizer service endpoint, API key and the form type that we are going to analyze. In our case it is ID and chose the file for analysis.

Chose The File For Analysis Mesh

5. Once we click on Run analysis, the data gets extracted in the form of key value pairs.

Form Of Key Value Pairs Mesh

Custom models

With the help of custom models, we can analyze various forms of our interest. We just need five sample forms of same type to train the custom model using labelled or unlabeled data.

Train a Custom model

1. Use the open-source labelling tool, part of the Form OCR Test Toolset (FOTT) where we can label our custom forms, train a model, and can analyze the form using the model

Form Recognizer 2.1 Ga Services

2. Enable Resource sharing (CORS) for the storage account by clicking on the CORS tab and fill the values as the following

The Cors Tab And Fill The Values As The Following

3. Create an Azure blob storage container and upload the forms and generate the SAS URI for the container by selecting all the permissions as shown

Selecting All The Permissions As Shown
Shared Access Tokens

4. To create connection to the azure storage container go to connections page and click on add icon. Provide a display name and the SAS URL generated in the previous step.

Mesh Connection Setting V1

5. Create new Project

  • Display Name – The project display name
  • Security Token – Each project will generate a security token that can be used to encrypt/decrypt sensitive project settings.
  • Source Connection – Select the connection to the azure blob storage created in the previous step
  • Folder Path – “Optional” – Specify the folder name here, if the forms are in a sub-folder on the blob container,
  • Form Recognizer Service Uri, API Key – Follow the procedure mentioned in step 3 of using Prebuilt models to get the service Uri and API key
  • Description – “Optional” – Project description
Project Settings

6. On the creation of project the forms will get displayed in the Tags editor tab and click on Run OCR on all files on the left side which detects the text and tables in all the documents.

7. Label the forms by creating the tags and then add the associated text. Specify the tag type and format to get better results.

Format To Get Better Results

8. Train the custom model by navigating to train page and click on the “train” button. Provide a name to the model for better identification in the compose models tab and examine the average accuracy. If it is low, then train additional forms.

Train Additional Forms

9. Compose a model by clicking on the “Compose” icon and select the model IDs you want to compose into a single model and click on “Compose” in the upper left corner and give a name to the composed model.

Mesh Composed Model

When the operation completes, your newly composed model will appear in the list.

10. Any form that is submitted to the composed model goes through the classification step which matches to the corresponding model ID.

Tip: To avoid the failure in classification, label the forms with a greater number of tags.

Steps to resume a project:

1. Click on share project icon present on the right-side top

Project Token

2. Create connection to the same blob storage container to restore the project.

3. Go to main page and click on “Open Cloud Project” and paste the shared project token.

Shared Project Token

As we have learnt the steps to create custom and composed models, we can use them in Form Recognizer applications to build automated data processing software.