Documents have started to become a burden for businesses of all sizes. From invoices and receipts to purchase orders and financial statements, these documents hold crucial information that needs to be extracted and processed efficiently. Manual data entry is not only time-consuming and error-prone, but it also diverts resources away from more strategic tasks. UiPath, a leader in Robotic Process Automation (RPA), has introduced DocPath. This Large Language Model (LLM) is specifically designed to tackle the challenge of information extraction from business documents.
The Bottleneck of Document Processing
Business processes rely heavily on documents for smooth operation. However, the sheer volume and diversity of these documents – structured, semi-structured, and unstructured – create significant hurdles. Traditional methods of manual data entry are slow and prone to errors. The bottleneck not only hinders efficiency but also leads to inconsistencies and delays in downstream processes. This creates a need for new methods that can leverage new technologies and increase efficiency.
UiPath DocPath: An LLM for Information Extraction
UiPath DocPath emerges as a game-changer in this scenario. Unlike general-purpose AI models, DocPath is specifically designed for the task of information extraction from documents. This targeted approach allows DocPath to be highly optimized for accuracy and efficiency.
How DocPath Works
UiPath’s DocPath leverages the power of a fine-tuned LLM, the FLAN-T5 XL architecture. This specific architecture has proven to be superior for tasks like information extraction, which require dealing with factual data and a limited set of possible answers. To train DocPath, UiPath researchers utilized a massive dataset of over 100,000 labeled documents, encompassing various business document formats.
One of the key innovations in DocPath is its approach to prompt design. Unlike previous methods that relied on token classification, DocPath utilizes a prompt and completion approach. This allows the model to directly output structured JSON data, eliminating the need for complex post-processing steps.
DocPath uses Positional Grounding
A critical aspect of DocPath’s effectiveness is its innovative use of positional tokens. These tokens, embedded within the prompts, provide crucial positional information about the document structure. This enables DocPath to attribute the extracted data back to its original location in the document, ensuring accuracy and traceability.
Techniques for Efficiency
The development team behind DocPath understands that efficiency is paramount in real-world applications. To address this, they have implemented several techniques for optimizing the inference process, particularly when dealing with large tables or documents with multiple data points. One such technique involves splitting the list of fields to be extracted into buckets and processing them in parallel. This significantly reduces processing time compared to a sequential approach.
Another noteworthy aspect is the use of the CTranslate2 translation engine. This engine offers exceptional decoding throughput and integrates seamlessly with UiPath’s codebase, ensuring smooth operation within the larger automation platform. DocPath assigns confidence scores to the extracted fields. These scores are based on the logit values associated with the tokens representing the field values. This information can be valuable for identifying potentially inaccurate data points and implementing appropriate quality control measures.
Continuous Improvement and Future Possibilities
UiPath acknowledges that DocPath is a work in progress. The research team is actively exploring avenues for further improvement. This includes experimenting with larger versions of the FLAN-T5 model to potentially enhance accuracy and exploring decoder-only architectures for potentially faster processing times.
Intriguingly, the possibility of incorporating document image data directly into the model is also being investigated. This could potentially improve DocPath’s ability to handle complex document layouts and variations.