From vendor workbooks to a trusted benchmark platform.
Six governed stages from source workbooks to analyst workbench, with a review gate between automated processing and human approval. Each stage is interactive — hover or tap to read more.
Six governed stages from source workbooks to analyst workbench. Tap a stage to lock details.
The problem:
The firm receives vendor pricing models as semi-structured Excel workbooks, each formatted differently by the provider, with varying column structures, layouts, and data conventions. Comparing across providers required manual effort, and the institutional context tied to each model was not captured in a governed, reusable workflow.
What we built:
A governed benchmark data platform in Python, designed around reproducibility, traceability, and human oversight. The core is a six-stage workflow: source workbook intake, CLI-based onboarding and inspection, profile/extract/match processing, human review, controlled rebuild/apply into a governed benchmark database, and analyst consumption through a local Flask workbench. Source workbooks are inspected and registered through a centralized onboarding flow that issues stable model IDs, records provenance in a source registry, and does not bypass human review.
On the production side, a deterministic rebuild/apply system produces governed benchmark database artifacts backed by release manifests, integrity reports, typed data contracts, diffing, backups, and preflight validation. Atomic write protections now cover the governed data tables as well as governance-adjacent artifacts such as the source registry, intake register, release manifest, and integrity report. A GitHub Actions CI workflow runs the non-browser validation suite on every change, and the project currently has 254 passing tests.
The consumption layer is a six-page Flask workbench — Overview, Roles, Providers, Confidence, Quality, and Trends — with client-side filtering, AJAX detail panels, manifest-based freshness signaling, and governed CSV exports that include provenance headers. Historical trend analysis is governed by explicit comparability rules: only same-client, same-provider series are compared across model vintages.
Why it matters:
This is a system where automation handles schema alignment, validation, provenance capture, and drift detection while humans remain in control of every decision that affects benchmark truth. The pipeline can be rebuilt deterministically at any point, every published output is traceable to approved source decisions, and the platform is structured to accumulate benchmark history over time through a governed workflow rather than one-off manual analysis.