Invoice Intelligence System (Case Study)

Context

The client received invoices from 100+ different vendors, each with its own layout, terminology, and quirks. Their existing process was manual: an employee checked a shared Outlook inbox daily, downloaded every invoice PDF, renamed it to match an internal purchase order number, uploaded it to SharePoint, and then verified and updated the corresponding record in their Microsoft Business Central ERP by hand. That six-step loop repeated for every invoice that arrived, roughly 3,000 per month.

Problem

The approach capped throughput, introduced transcription errors, and created staffing uncertainty. Invoice volume fluctuates, and a 30% spike in a given month could demand 40 or more additional hours with no notice.

Furthermore, document processing at-scale is hard to get right. For this client, two specific failure modes dominated:

Layout drift. Vendors change templates, add surcharges, or restructure line items without warning. A rule-based parser breaks silently; you find out when someone notices the numbers are wrong.
Software limitations. Existing solutions in this space could either process documents from a fixed, structured format or integrate with an ERP system, not both. None could handle unstructured, vendor-generated invoices and push the result directly into Business Central.

Goal

A system that could ingest an invoice in whatever form it arrived and transform it into structured data that could be validated against existing ERP records, only flagging genuine mismatches for human review.

Approach

I built the pipeline around LLM-driven document extraction feeding a structured-output stage constrained by explicit schemas. Rather than parsing each vendor layout with bespoke rules, the model reads the document and emits data shaped to a fixed contract.

The end-to-end flow runs automatically on a time-based trigger: incoming invoice PDFs are extracted, validated against Business Central, the renamed PDF is uploaded to SharePoint, and a success or failure record is written to a database for visibility. A daily digest email summarizes the run for the team. I also built a companion admin portal that gives the team a full audit trail, the ability to review flagged invoices, and job history exports.

Schema-constrained extraction

Each invoice is resolved against a Pydantic schema describing the fields the business actually uses: vendor identity, line items, quantities, unit prices, totals, tax, and dates. The schema encodes business rules directly, making programmatic validation straightforward.

Adaptable by nature

Because the model works from the document's content rather than a per-vendor template, adding a new vendor format usually costs nothing. The system absorbed new layouts as they appeared without code changes for each one, which is what kept it viable across more than a hundred formats.

Results

The figures below are representative of the deployed system's steady-state operation 6 months after first release:

Roughly 3,000 invoices processed per month
Per-invoice processing cost of $0.02 (approx. $60 per month)
More than 100 distinct vendor layouts
>90% straight-through rate for received invoices
Over 2,000 working hours returned to the team per year.

The team's time shifted from manual transcription to reviewing genuine mismatches, the cases where human judgment actually matters.

Retrospective

The most important insight was about failure visibility. An invoice that fails silently is worse than one that fails loudly. It enters the system wrong and surfaces weeks later during audits, and requires more time to resolve. Every error path in the system routes to the review queue and the daily digest, so nothing goes unnoticed. That design decision, more than any accuracy improvement, is what made the system trustworthy enough to run in production.

Tech Stack

Languages	Python (backend), TypeScript (frontend)
Tools	OpenAI SDK, Microsoft Graph API
Infra	AWS EC2, SQLite