In all types of software needs, businesses are often faced with the same challenge – identifying the appropriate solution. Unfortunately this identification process is given short-shrift, often ending with a person from IT challenged with creating a list of potential solutions based upon a long laundry list of features or technical requirements. The reality of this situation is that the resulting list of candidate solutions can run the gamut from feature-rich but very complex, to solutions that are more limited but are easier to use. And everything in-between.
When it comes to capture software, this scenario plays out on a weekly basis with business representatives tasked with gathering a short list of potential solutions. And solution vendors are just as culpable in this scenario as they often accentuate their features but typically downplay the hidden costs of installation, configuration, and operational use. And they almost certainly will try and disguise total pricing for the features that they do tout. The result is a lot of frustrated business users from IT and lines of business that are stuck with a solution that is long on the feature list and short on the promise.
Here are some things to look at when considering a capture solution from the “hidden costs” side.
- Does the solution provide the ability to capture all types of document information in the same solution? This might seem like a weird question, but there are many well-known packages out there that talk about machine print (OCR), ICR, and even unconstrained handwriting, but offer them in different software packages that require additional set-up, configuration, and cost.
- Does the solution handle structured, semi-structured, and unstructured capture in a single solution and within a single workflow? After reading through question #1, this may not seem that surprising. The reality is that many capture solutions separate their structured, semi-structured, and unstructured offerings either due to separate application or to try and charge additional for a solution that can handle all three.
- Does the offering allow configuration of document types/classes and workflows without having to move through multiple applications or user interfaces? The fewer screens or applications required, the easier it is to actually get things done in both a comprehensive and accurate manner. Ask to see how a document type is set-up and count how many different steps are required.
- Does the offering provide a comprehensive test capability of data extraction results? Many solutions have limited capabilities to actually see real world results, in context of the document and to make refinements. Even fewer offer real-time statistics on document classification and field recognition results. Ask to see how results are verified and see if it meets your needs. Having easy and comprehensive access to results and statistics ensures that your capture project will have all bases covered prior to production.
There are a lot of capture solutions out there, and while many have the features and capabilities businesses are searching for, these features are delivered in all sorts of packages. Be sure to find out if the solution is useful AND usable by your organization.
By: Don Dew, Director of Marketing
I spent a few days last week at DOCUMENT Strategy Forum in Greenwich, CT. Co-located with the BFMA (Business Forms Management Association) it was an interesting event with a lot of content focusing on forms design and usability.
At an event like this, one would think that business forms are the center of the universe. And while that is obviously a stretch, a form is often a lynchpin between business processes. At its very core, a paper form is essentially an intermediary between two entities or processes where an electronic alternative is not available or not practical. Often this might be a customer communication.
The process involved in managing a form is often overlooked by users and vendors alike. John Sharp of the Vancouver City Savings Credit Union did a fine job of reminding us just how extensive a form’s reach is by showing its impact on an organization in terms of a mind-map.
This less-linear approach than PowerPoint was very revealing in the impact that a form has. A form permeates an organization in many ways. As such, inefficiencies in design, workflow, etc, can cause a large, undesirable ripple effect resulting in high costs. In the case of healthcare, the unintended consequences can have a very human impact.
What’s the moral of the story? Effective form design has multiple audiences. We often get caught up in the most effective way to process the form, which might not be the most effective way to get the desired usability from the form itself. When designing a form, we need to step back and ask:
- Who will be completing the form, what information is mandatory and what is not (and if it is not, why is it there)?
- Does the form make sense to the person who is actually supposed to fill it out?
- How and where does the completed form get routed for processing? Is the form designed friendly for capturing?
- How much of the processing can be automated?
- Do the form fields have map-able database fields in your business systems? Are the field types equivalent?
- Once the data is in the system, what happens to the paper document? Does it need to be retained?
I could dream up a dozen more questions but you probably get the point. If we fail to assess the larger impact of a form, then we are likely creating an inefficient process, missing the whole point entirely.
Would you like to learn more about form design as it relates to optimizating for capture? Heres our article: "7 Reasons Your Document may not be Suited for Unstructured OCR / ICR"
Parascript CheckPlus International enables check recognition capabilities in non-U.S. markets. Parascript offers versions for Argentina, Australia, Brazil, Canada, Chile, France, India, Italy, Malaysia, Portugal, and Puerto Rico.
CheckPlus international for India automates check courtesy and legal amount recognition (CAR/LAR), MICR Line (E13B), date field, account number on the back of the check, and account number on the front of the check. It provides amount verification as well.
The new version of CheckPlus International for India (3.5) offers the following new functionality and improvements:
- Automatic location and recognition of account number field on the front side of personal and business checks
- Changes to CTS compliance methodology. CTS check compliance is made considering three factors: currency symbol in the courtesy amount field; date field format (presence of preprinted boxes); and the format of Account Number Front field
Want to learn more? Please contact us.
A couple of months ago, Parascript released a short video featuring the benefits of automated signature verification and how it compares to human verification. You can watch it here. Now here’s a cool infographic explaining the art of automated signature verification, click here to see it on Visual.ly.
Signatures are a unique biometric that belongs to each individual. Banks, businesses and governments rely on signatures to verify identification and authorize transactions. Signature verification software is fast, accurate and consistent, and has proven to outperform humans on even the most difficult types of forgery. This infographic highlights key characteristics of signatures that are compared, along with different comparison methods.
Do you want to learn more about automated signature verification? Download our White Paper: Automated Signature Verification—What you Need to Know.
Yesterday, Parascript announced FormXtra® Capture, a fully-functional IDR (Intelligent Document Recognition) solution, capable of capture, classification and recognition of virtually any data-type. FormXtra Capture has an aggressive and exciting roadmap, as mentioned in the press release, here.
We have had a lot of excitement about it so far and wanted to take this opportunity to further address our new technology and its role here, on our blog.
What is FormXtra Capture?
FormXtra Capture is a heavily re-engineered version of FormXtra Enterprise, a capture product that Parascript has had been selling for years. FormXtra Capture rolls in the major product updates that were announced in last year’s FormXtra SDK 5.0, greatly broadening its usability with a highly capable API and semi-structured document processing. And it doesn’t stop there. Over the course of this year we’ll be consolidating many functions currently found in discrete toolkits into the FormXtra platform. Meaning the ability to extract data from a check and an associated remittance will be rolled together into one package, providing greater opportunities for our integrators and partners to better help their customers.
We changed the name from FormXtra Enterprise to FormXtra Capture to better describe and position the product. “Enterprise” is a somewhat elusive name with a lot of different meanings. Capture says a lot more about what the product is—a thorough, high value IDR solution. It also says what the product isn’t—BPM, advanced workflow, etc. And while it will be able to extract critical check information, it doesn’t create all of the files and documentation (such as the X29 file) necessary for Check 21 processing. We rely on our partners to provide this and other application-specific functionality.
Why did we announce a capture solution?
At Parascript we are committed to advancing our technology and its applications to help address our partners’ needs. Over the last year, both existing and new partners have expressed interest in a different type of solution—one that would help them expand their capabilities and provide more document processing within existing accounts. In many cases this is a response to the need to process multiple types of documents without having to integrate multiple engines. In addition, we are receiving more requests from the SMB market. FormXtra Capture is intended to strike a balance between the needs and rigor of a process-intensive enterprise environment while keeping it straightforward enough to have realistic appeal to the SMB marketplace, better equipping our partners.
Isn’t the capture market saturated?
With all of the consolidation going on in the industry, it would seem that the capture market has little life left. Our research with AIIM suggests otherwise: Most organizations (55%) are rekeying their forms data. Only 32% are using OCR, and only 6% are recognizing unconstrained handwriting (advanced ICR). In the race to move up-market, we think that a gap has been left for improving how the capture process is managed. We look forward to exploring this niche, with advancements in our technology, and helping our partners provide greater value here.
What makes this product different?
On its May 15th release, FormXtra Capture will recognize all major data types (machine print, handprint and cursive) on structure and semi-structured forms, from a single solution that is easy to implement. The unique GUI allows for easy setup and testing of form definitions. This can be previewed on our FormXtra microsite.
We’ve also incorporated a few twists inside of the normal workflow process. Most notable are data validation workflows, that can perform snippet-based validation, enabling higher security and efficiency (such as for HIPAA and PCI) by sending a field (such as a social security number) to a skilled validator void of any other context that could make the information usable.
In the coming months we’ll be updating the product with our check reading capabilities, and later this year, signature validation.
How will this product be sold?
As with all of Parascript’s products, FormXtra Capture will be available through our reseller and OEM partner network. If you’d like to learn more, send a note to firstname.lastname@example.org.
There’s a lot more to come. Be sure to subscribe to the blog to be notified of updates, and stop by the FormXtra microsite. We’ll be updating it with more capabilities found in FormXtra Capture. It will give you a good sense of what FormXtra is capable of.
Is there a capability you’d like to see rolled up into a forms solution? Comment below or drop us a line at email@example.com.
Government agencies, healthcare departments, and other organizations have a need to identify and protect information located randomly on documents to comply with Federal and State laws. For example, the U.S. Health Insurance Portability and Accountability Act (HIPPA) requires that medical records, even when authorized by patients, protect patient health information. In the same way, the U.S. Payment Card Industry Data Security Standard requires protection of cardholder data, such as credit card numbers and social security numbers. Other applications include Freedom of Information (FOI) Act and other privacy acts, education, healthcare and eDiscovery, where sensitive information is required to be redacted.
Redaction removes sensitive information by obscuring it, usually in black, to make documents secure for distribution. Manual redaction of sensitive information is time consuming and labor intensive. Traditionally documents are copied, redacted using permanent markers, and re-copied to make sure no information is still legible.
Redaction has been a very hot topic lately as there have been several high profiles cases that resulted in million-dollar lawsuits. Obviously, government agencies and organizations need to ensure they are complying with requirements and no sensitive information is left on a document.
Automated redaction speeds up processes, provides greater accuracy, reduces labor costs and manual errors. It locates and protects sensitive information on any field on a document. Fields include credit card numbers, drivers license numbers, or social security numbers, among many others. Automated redaction software uses black or clear redaction. Clear redaction whites out the information and makes it appear that no one included the information on a form. Black redaction blocks out the information and makes it completely unable to read. Automated redaction can be applied as soon as a document enters the organization and is recognized to increase security, or documents can be redacted only as needed.
Do you want to see automated redaction in action? See our FormXtra video demo locate and identify content that needs to be protected to support compliance.
Visual signature verification is one of the most common fraud prevention methods that has remained unchanged for many decades. Although criticized, visual verification is still used as the final arbiter when automatic signature verification cannot make a reliable conclusion, or makes the wrong conclusion about signature authenticity. How reliable is visual verification? How does visual verification accuracy compare with the accuracy of automatic signature verification systems? Why can a human operator and an automatic system make different conclusions about the same signature?
Visual verification is an inexact science and depends significantly on human factors such as expertise, fatigue, mood, working conditions, and many others. This is why a forensic handwriting expert, who has been verifying handwritten signatures for years and is able to look thoroughly for as long as necessary at every signature, will be able to produce more accurate signature verification results than a merchant or bank teller who only makes an educated guess as to whether the two signatures were, or were not, made by the same person. Similarly, a signature verification operator in a bank who has to look at 200-300 signatures per hour, not only has a lower accuracy than a forensic expert, but also makes more mistakes at the end of the day than in the morning. Sometimes the mistakes are so obvious that they do not require expertise or thorough examining to make the right conclusion. For example:
1. Genuine signature and random forgery that an operator accepted as an authentic signature:
2. One of two genuine signatures was rejected as a forgery by an operator:
The way to reduce this type of human errors is to rely on people with higher expertise and reduce the number of signatures one operator has to verify within a limited time frame. This is an expensive approach; however, automatic systems help to make it feasible.
The most advanced signature verification software exploits powerful artificial intelligence mechanisms to imitate human analysis and combines this human-like approach with the strengths of computer systems. Therefore, automatic systems can make definitive measurements and give more accurate appraisals of signature characteristics that some experts can only estimate using traditional techniques. Comparison of these measurements, from reference signatures to signatures submitted for verification, allows the software to show more accurate and consistent results in real-life applications, which usually require verification of a large number of signatures within a limited time frame.
Based on different principles and exploiting different strengths, a human operator and automatic signature verification software may make different conclusions about one particular signature. In the fight of man vs. machine, even individual erroneous cases do not change the fact that automatic verification solutions have proven to be statistically more accurate than humans. However, the race ends in a tie. Automatic signature verification systems should be used to verify the majority of images, sending only a smaller number of signatures that are considered to be suspects for human verification. The reduced burden on human operators allows them to scrutinize signatures more thoroughly and produce more accurate results, thus improving the overall accuracy of the verification process.
Want to learn more about the benefits of automated signature verification? Download our White Paper: Automated Signature Verification—What you Need to Know.
by Don Dew, Director of Marketing, Parascript
I spent the last 3 days at the 2013 AIIM Conference in New Orleans, featuring a multitude of visionaries and speakers including Seth Godin, Thornton May, David Pogue, and others; on topics ranging from social organizations, information management, and mobile device trends as they relate to business. The following 3 tidbits have little direct relevance to document classification and recognition, but for those of us who enjoy technology and trends, you should enjoy.
1) This year is not last year
There are some conferences where you begin to feel like Bill Murray in Groundhog Day. Every year feels like the previous. This was not one of them. Last year’s overriding themes were Big Data and Social Enterprise; this year there was a lot more discussion on Information Governance—especially in the context of mobility and BYOD, or Bring Your Own Device.
Perhaps Big Data is just becoming the new normal. And that’s good. We’ll all see more clearly when we don’t let the definitions overwhelm us and can start dealing with the fact that how we leverage and manage the information in our organizations matters more than what we call it. And to that end, the data shows that organizations that can optimize their people’s interaction with information grow at astounding rates.
As to governance and BYOD, there are a lot of open questions. Organizations are struggling to control information that is taking on a life of its own, and can literally walk out of the building on less-than-secure consumer devices. Do we try and secure these devices? How do we do it, without creating a layer of security so cumbersome that people circumvent it using consumer tools as they already are?
2) 1 gram of DNA can store 700 terabytes
Vince Kellen, CIO of University of Kentucky presented the results of a Harvard science experiment, finding that one gram of DNA can be manipulated to store an incredible 700 terabytes of data. While access and retrieval are noted as “slow”, the “medium” itself is expected to be stable for around 400,000 years (I think that qualifies as indefinitely). Sure beats tape! But what this incredible breakthrough really illustrates is that we are living in the absolute beginning of this era of information, and this is going to evolve quickly beyond any level of understanding we currently have today. Everything changes. Virtually everything can be recorded and stored.
3) Peak silicon is near, and will change how we manage information
In the same session, we learned that “peak silicon”, or the point where we can no longer cram more processing power into a square inch of silicon, is probably 5 to 10 years out. This could theoretically slow the progress of Moore’s Law, which states that the amount of processing power available (as a function of the number of transistors on a chip) doubles every 18 months. That law has basically held true since 1958, and has also applied to related technology, such as storage.
Peak silicon will be an interesting time. It may be some time before biological and quantum computing are realized, so we may enter a period of relative scarcity for processing power (assuming the demand for processing grows as much as the supply has). If this is the case, I would expect to see tremendous improvement in algorithms in order to compensate—effectively working smarter, not harder. Where does this take us? Will information start to create information on its own?
In the end, AIIM 2013 was a thought-provoking event, with some great speakers and visionaries doing their best to paint a murky picture of the information landscape that is in front of us. From Parascript’s little corner of the information landscape, we know that document capture is just the beginning. The use of paper is slowly waning, but with the bulk of new content created being image-based (i.e. video), the need for image-based information analysis is truly in its infancy.
Did you attend? What did you find most thought-provoking?
Over the years, recognition technology companies (including Parascript) have attempted to create acronyms to delineate the differences between OCR, ICR, and the technology needed to effectively read many types and styles of handwriting, including cursive. In the end, we haven’t run into anybody who asks about natural handwriting recognition and the such. People just ask about ICR. The following is a short overview on the differences between OCR, ICR and unconstrained and cursive handwriting, which is what you get with ICR from Parascript.
Evolution of Recognition Technology
Optical Character Recognition (OCR) examines scanned images of machine-printed text and translates the characters into ASCII text files. Though most advanced systems are able to recognize almost any type of font, they deal only with machine printed characters and reject all handwriting. Machine-printed letters are evenly spaced across, and up-and-down on a given page, allowing OCR systems to read the text one character at a time. Once all characters in a word are recognized, the word is compared against a vocabulary of potential answers for the final result. Any text that is less than perfect will cause even the most sophisticated OCR systems to return significant reductions in accuracy when processing degraded images. For example, when characters break apart due to poor image quality, or if multiple characters merge due to blurred or dark backgrounds between them, recognition accuracy may be reduced by as much as 20 percent.
Intelligent Character Recognition (ICR) tends to be generically used with reference to all types of handwriting recognition, however from a technical perspective; ICR is the ability to recognize constrained hand-printed characters.
Not surprisingly, interpreting the patterns of human writing is far more complicated than converting simple machine print, because no two people ever write identical characters. Factors such as mood, environment, or stress all conspire to create variations in character writing, causing individuals to form characters differently each time they write. As with OCR, ICR engines execute recognition character-by-character and start by segmenting words into their component characters. Because ICR technology recognizes separate words or word combinations, such as form fields, letters cannot be written sloppily or stuck together.
While ICR is more robust than OCR in handling human writing, dictionaries are employed after the recognition process, not during it. Therefore, if a correct guess was not generated during the character segmentation and recognition process, validation with vocabulary lists does not improve the result and significantly reduces accuracy.
Parascript ICR: unconstrained handprint and cursive recognition
Both OCR and ICR deliver high accuracy when analyzing constrained text but are ineffective when dealing with unconstrained or cursive writing, where letters are linked together, and may be poorly written or even illegible. Parascript ICR technology recognizes that the features of handwriting have a dynamic pattern. Handwriting, when reduced to its most basic element, is essentially motions made by a writing instrument. Certain symbols embody the essence of all handwriting styles, such as the strokes that describe the trajectories. Parascript calls these strokes XR elements – and they are found in all letters. Combined, XR elements form virtually all letter shapes.
Parascript's XR Elements
Parascript ICR technology focuses on the anatomy of a written word. Much like how humans use context to read words that have been partially scrambled (yuo cna lkiley raed tihs wthiuot a pborlem), The Parascript ICR engine achieves similar recognition through a context-driven approach. By referencing results databases during the recognition process, Parascript ICR builds highly accurate answers which, in turn, lead to substantially higher recognition rates than engines which only validate answers at the end of the process.
This process is also helpful in achieving recognition of machine print that is too poor for an OCR engine to recognize.
We hope this simplifies the differences between the types of OCR and ICR. What questions do you have? How have you heard the terms OCR and ICR referenced?
Learn more about adoption of OCR and ICR technologies from this AIIM Whitepaper:
In 1897 Abbé Jean-Hippolyte Michon, the founder of graphology, divided handwriting into seven fundamental elements: speed, pressure, form, dimension, continuity, direction, and order.
Since then, the anatomy of handwriting (including signatures) has not changed and these characteristics are still key in signature verification analysis. Forensic expertise looks at such features of handwriting as height, width, slant, regularity, typical shapes, strokes, and order of elements. When dynamic characteristics of handwriting are available, biometric features such as pressure, speed, constancy, characteristic gestures, and occupation of the space are analyzed in addition to static characteristics.
Today, signature analysis can be accomplished by powerful computers using sophisticated algorithms. Automatic comparison is executed by a combination of verifiers using fundamentally different algorithms and techniques. In particular, they combine a human-like holistic analysis of a signature and signature segmentation with a subsequent analysis of the signature elements.
The whole verification process can be described as the work of a group of highly skilled experts. Each of them has a favorite approach, looking at particular characteristics, which is especially efficient in some cases and “good-enough” in others. When they work together as a team, their areas of expertise complement each other resulting in excellent overall performance.
Here are just a few different elements used to create a human-like, holistic analysis of a signature.
1. A special descriptive language, consisting of a set of formative hieroglyphic elements that embody the essence of all styles of writing signatures. Suspect and reference signatures are presented as sequences of these elements and compared using multiple parameters. Linear transformation is used to allow correlation between elements belonging to different signatures. A system of estimates is built and passed through several neural-network-based learning and interpretation agents to execute a highly refined analysis and make a sophisticated conclusion about the similarity of two signatures.
XR-interpretation of two signatures.
2. Geometrical analysis in which the similar nodes that are distinctive elements of a signature are located on the suspect and reference signatures. Triads of these nodes are used to build triangles with apexes located in the selected nodes. The similarity of the triangles belonging to different signatures is analyzed and used to make a conclusion about signature genuineness.
Geometrical interpretation of signatures.
3. An analytical method based on signature segmentation and finding correlations between the fragments of reference and suspect signatures can also be applied. This method complements the holistic approach and is especially efficient in those cases where the holistic approach cannot ensure the required reliability of the result.
Signature Fragements Comparison.
These different approaches result in outstanding signature verification performance that even surpases human verification. Want to learn more about the benefits of automated signature verification and how it compates to human verification? Download our White Paper: Automated Signature Verification—What you Need to Know.