Principal Data Reliability Engineer (AI Process Control)
Own the quality math inside the bounded-synthesis pipeline — define tolerances, build telemetry, and set measurable thresholds for the AI governance control plane's governance gates.
Stack: TypeScript, Python, PostgreSQL, LLM evaluation pipelines, statistical telemetry
Why this role exists
Most AI teams inspect outputs at the end and hope for the best. We enforce quality at every stage transition. Our bounded-synthesis pipeline treats AI inference like a manufacturing process: if evidence is insufficient or if a claim can't be verified against source text, the Boundary Gate halts emission. No evidence, no output. Somebody has to define what "sufficient" means mathematically, instrument the pipeline to measure it, and diagnose where quality drifts. That's this role.
What you'll do
- Define governance gate thresholds. Set the entailment tolerances that decide pass/fail at each stage — including zero-tolerance rules for numeric predicates like legal deadlines and dosage limits.
- Build verification algorithms. Write the deterministic extraction and comparison logic that checks LLM claims against source text. Root-clause polarity, numeric extraction, boundary matching — the math that makes "refuse when you should" concrete.
- Instrument telemetry. Track entailment variance across models and Kura corpora. Build control charts. When a model drifts outside control limits, diagnose the exact stage where the defect occurred.
- Tune retrieval math. Optimize the aggregation logic for multi-query retrieval so candidate evidence pools are mathematically sound before they reach the generation step.
What we're looking for
- Education: Industrial & Systems Engineering, Operations Research, Quality Engineering, or a related quantitative field.
- Process control expertise: Practical knowledge of SPC, Deming's principles, and loss functions applied to real systems — not just textbook knowledge. Experience in high-consequence or regulated environments (defense, healthcare, finance) is strongly preferred.
- Engineering proficiency: Comfortable in Python or TypeScript and able to write complex SQL for telemetry analysis. You can build the middleware that evaluates payloads and enforces the math.
- Deterministic thinking: You're comfortable implementing hard logic (regex, AST parsing, numeric extraction) to verify stochastic outputs.
- Quality mindset: Variation is the enemy. You build the system, not the fix.
Why Kenshiki?
You'll take industrial engineering principles and apply them to AI inference — proving that generative AI can be governed, predictable, and legally defensible. The tolerances you set become the enforcement boundary.