Somewhere along the way the intelligence in capture technologies has been associated with what OCR engine the product uses…and this is not altogether accurate. Intelligent capture is much more than OCR, it’s about discovering and verifying data in all types and formats of documents by using technology with the goal of minimizing the input from the user.
So…What Does That Mean?
In the early days of document capture, products main emphasis was on scanning. This makes sense as the focus at that time was to convert paper into a digital form. Capture products ultimately export the documents into document management systems, which often use index data to classify documents to allow users to specifically search by data elements (Invoice No, Loan No. etc). Assigning that data was a labor intensive task, and used barcode/separator sheets, database lookups, template driven capture (fixed forms), and manual data entry to assign the data. Intelligent Capture goes way beyond this by using multiple methods for data extraction and verification, focusing on automation to reduce the user’s input.
The Push for Data
The advent of data analytics to make better business decisions shines the spotlight on documents as a useful source of data. This places pressure on capture vendors to deliver better solutions for dealing unstructured content, and text extraction methods.
More Than Just Images
As paper moved to digital formats, this created a shift towards managing electronic document formats:
TIFF and PDF Based Images
Text Files, JSON and XML Files
Adobe PDF Forms and PDF Documents
The Source of Capture Has Expanded
The move to digital also increased the scope of capture solutions. Documents are delivered via numerous channels:
Importing from File Shares and secure FTPs
Email and Fax Capture
Capturing Documents from Business Applications
Capturing from Desktop applications, such as Microsoft Office or Email
Via a Application Programming Interface (API) to allow capture to be “plugged-in” to other processes
A New Approach
Increasing the amount of data extracted from documents meant that the traditional methods were not scalable as they involved too much user intervention. New methods were introduced by capture vendors to extract data from structure and unstructured documents.
Top 5 Components of an Intelligent Capture Solution
The following concepts help define intelligent capture:
Rules are applied to recognize text patterns and document structures to extract the data and classify documents. The rules are intelligent enough to extract data in any location in the document not just in defined coordinates.
Validation rules can be applied to determine whether the context of the data is correct (e.g. if the number of bathrooms assigned in 25 from a 2,100 square foot house, then it is likely that the decimal point was missing and it should be 2.5).
Verifying against the data source can identify data issues that need to be addressed.
Routing, or escalating, data issues to another user group for review so issues can be resolved.
Machine learning component to analyze historical document data trends and user intervention to increase the level of automation.
DocuNECT…A Solution for Intelligent Capture and More
DocuNECT is a great solution for extracting actionable data from documents, and can also be used to manage the complete lifecycle of documents not just the data capture and verification. DocuNECT is web-based, and scalable to manage all your Intelligent capture needs.