AI Systems Engineer

Wilson Gichuhi

I build deterministic agentic systems and evaluation infrastructure that turn model output into reliable business workflows. Finite state machines, grounded retrieval, and rigorous eval harnesses keep production behavior predictable.

Nairobi, Kenya+254700652437GitHubLinkedIn
Current FocusOpen to remote roles

Deterministic AI systems with audit ready traces

I work at the boundary between probabilistic models and strict business requirements, building the guardrails that make AI dependable.

WhatsApp to eTIMS order processing with immutable state transitions
Inference gateways with semantic caching and cold start tuning
Evaluation harnesses and adversarial testing for frontier models

Core Stack

TypeScript, Python, Go

Availability

Open to remote roles

100%

Impact

Execution traceability for tax compliant workflows

99%

Impact

Malformed outputs removed with schema guards

40%

Impact

SKU matching precision gain on noisy text

Sub second

Impact

Inference gateway latency with semantic caching

Deterministic agentsEvaluation harnessesGrounded retrievalSchema validationHybrid searchInference gatewaysFinite state machinesAudit ready tracesQuantizationCircuit breakersDeterministic agentsEvaluation harnessesGrounded retrievalSchema validationHybrid searchInference gatewaysFinite state machinesAudit ready tracesQuantizationCircuit breakers

What I Build

Systems that stay reliable under pressure

Production focused
Deterministic Agentic Orchestration

Finite state machines and strict state transitions that turn model output into audit ready workflows.

  • XState and Apache Burr orchestration
  • State immutability and full execution traces
  • Tool use grounded by explicit schemas
Evaluation and Governance

Evals as code with adversarial testing, reliability scoring, and production safeguards.

  • Bespoke MiniCheck and NLI based reliability
  • DeepEval and Curator for automated testing
  • LLM as a judge with oracle verification
Retrieval and Search

Hybrid retrieval pipelines that stay precise under noisy, real world inputs.

  • Vector plus BM25 hybrid search
  • Rerankers tuned for domain precision
  • Atomic fact extraction for grounding
Inference Infrastructure

Production grade model serving with cost controls and deterministic latency.

  • vLLM, Ollama, Nvidia Triton
  • Quantization with GGUF, EXL2, AWQ
  • Modal gateways with cold start tuning

Selected Work

Case studies with measurable impact

Full Resume
Promco

Builder and AI Systems Engineer, Founder

2026 to Present

WhatsApp to eTIMS order processing engine built for deterministic execution and audit safe traces.

  • Replaced nondeterministic loops with finite state machines using XState and Apache Burr
  • Sub second inference gateway on Modal with semantic caching for Bespoke MiniCheck
  • Effect TS middleware with schema validation and circuit breakers to remove 99 percent malformed outputs
  • Hybrid search with Voyage rerankers improving SKU matching precision by 40 percent
Agentic OrchestrationInferenceRAGMCP
AfterQuery

AI Training and Model Evaluation Engineer

Aug 2025 to Mar 25, 2026

Benchmarking frontier models with realistic eval harnesses and red team pipelines.

  • Evaluated frontier models for logic correctness and security
  • Built Project Anvil style environments with gold patch verification
  • Terminal Bench v2 tasks covering concurrency, security, and infra debugging
  • Automated red teaming for model generated system architectures
EvalsBenchmarkingSecurity
Alcora

Backend Engineer

Feb 2025 to Aug 2025

Headless ERP and FMCG operations platform for multi tenant revenue ops.

  • Real time stock and demand intelligence across distributed actors
  • Payment rails with automated debt reconciliation
  • Geospatial delivery tracking and dispatch automation
  • Promo cost versus incremental gross profit tracking cutting wastage by 40 percent
FintechERPGeospatial
AfyaTelemed

Founding Engineer

May 2024 to Jul 2025

HIPAA compliant telemedicine platform spanning triage, consults, pharmacy, and labs.

  • Secure video consultations with end to end encryption
  • Local and international payment processing
  • Stack built with Python, FastAPI, Next.js, WebRTC, AWS
HealthTechWebRTCPlatform

Experience

A focused timeline

2026

Builder and AI Systems Engineer · Promco

Deterministic orchestration and inference systems for tax compliant workflows.

Aug 2025 to Mar 25, 2026

AI Training and Model Evaluation Engineer · AfterQuery

Frontier model benchmarks, eval harnesses, and adversarial testing.

Feb 2025 to Aug 2025

Backend Engineer · Alcora

Headless ERP, payment rails, and FMCG intelligence.

May 2024 to Jul 2025

Founding Engineer · AfyaTelemed

HIPAA compliant telemedicine and secure video workflows.

2024

Software Engineer · McSystems and Medical Inc.

Voice to CRM logging and offline first marketplace rebuild.

2023 to 2024

Fullstack Developer · JHUB Innovation Africa

Digital trade platforms and enterprise workflow integrations.

Operating Principles

Predictable by design

Determinism first. Systems are built with explicit states and traceable transitions.
Evaluation in the loop. Every agent is backed by tests, regression suites, and adversarial scenarios.
Grounding everywhere. Retrieval, schemas, and guardrails keep outputs aligned with reality.

Preferred Collaboration

Async friendly teams shipping reliable AI systems

Toolbox

Stack aligned to reliable AI

Production ready
AI Orchestration and Agents
XStateApache BurrModel Context ProtocolAgentic RAGTool use and function callingAI primitives
Evals and Governance
Bespoke MiniCheckRAGASDeepEvalCuratorEvals as codeLLM as a judge
Retrieval and Search
Hybrid searchRerankersAtomic fact extractionPineconeQdrantpgvector
Inference and Reliability
vLLMOllamaNvidia TritonQuantizationDocker and KubernetesEvent driven systems

Let us build

Delivery focused for remote roles

I work best with teams who value clarity, measurable outcomes, and systems that behave the same way every time. If you are scaling AI workflows or building evaluation infrastructure, I would love to help.

Grounded by data

Hybrid retrieval and rerankers that keep responses anchored to source truth.

Fast and reliable

Inference gateways designed for low latency and predictable cost.

Built for global teams

Async execution, clear docs, and collaboration habits.