Areas of Focus

Cody Champion — Dublin-based Applied AI Architect & GenAI Systems Lead
cody@bitsandbeakers.com · LinkedIn · GitHub

In short: I build production GenAI systems for regulated environments, covering RAG, agents, LLM evaluation, observability, security controls, and AI governance. Dublin-based, working across Ireland, the UK, the EU, and global remote teams.

Areas of strongest impact

Staff / Principal Applied AI Architect
Senior GenAI Solutions Architect
LLM Evaluation Lead
AI Governance Engineering Lead
Regulated AI Deployment Lead
Senior Manager, GenAI Deployment Engineering
Applied AI Safety / Evals Lead
Applied AI Research Engineer
Enterprise AI Technical Lead

Strongest environments

Regulated enterprise — financial services, pharma, telco, energy
Public sector and government AI — federal delivery, mission AI, governance-heavy programs
Cloud / SaaS AI platforms — building or scaling GenAI products
Consulting-to-product AI transformation — Deloitte, BCG, PwC, Big 4, boutique AI advisory
AI assurance and model governance — model risk, audit readiness, EU AI Act, responsible AI

Core skills

Applied AI Architect
GenAI Architect
Enterprise GenAI
RAG
Agents
LLMOps
LLM Evaluation
Model Evaluation
AI Safety Evals
Red Teaming
AI Governance
Responsible AI
EU AI Act
Regulated AI
AI Assurance
AI Observability
Prompt Injection Defense
Langfuse
MLflow
Azure AI Foundry
GCP
Python
Dublin
Ireland
EMEA

Four proof pillars

1. NSF AI Governance Foundations

Greenfield AI governance, technical review, security exercises, community, and data architecture inside a newly formed Chief AI Officer function.

2. 99.6% ML Cost Reduction

Hands-on production ML ownership: a research-origin geospatial pipeline rebuilt from approximately $26,200/month to approximately $90/month.

3. LLM Evaluation Workbench / Model Readiness Leaderboard

Public evaluation harness for capability, reliability, governance behavior, groundedness, cost, latency, and reviewable artifacts in regulated GenAI systems.

4. PAEF Contract Compliance Evaluation

Evaluation research comparing atomic microagent checks with monolithic auditing across 193 service contracts and 7,913 labeled policy checks.

Where this applies

Consulting firms & regulated-sector clients — Deloitte, BCG, PwC, banks, insurance, pharma

AI Governance, Assurance & Regulated GenAI

I have built and documented governance structures from scratch inside a federal agency — not advised on them from the outside. At NSF I created an AI deployment playbook, co-chaired the 100+ member AI Community of Practice, engineered the agency's first vector and graph database capabilities, and sat as a voting member on the engineering review board. The outputs were approval processes, risk classification frameworks, audit trails, AI literacy programs, and a governance stack that let the agency move fast without exposing itself to unacceptable risk.

At Accenture I translate that governance experience directly into delivery: clients in regulated sectors get systems with access control, prompt injection defense, human-in-the-loop checkpoints, cost/latency monitoring, and evidence artifacts ready for internal audit or regulatory review. I can map technical controls to governance requirements — not just describe the controls.

Keywords: AI assurance · model risk management · auditability · risk classification · governance-to-engineering · human review checkpoints · control mapping · EU AI Act readiness · AI deployment playbook · evidence artifacts · model monitoring · responsible AI · regulated GenAI deployment · AI transparency · access control · prompt injection defense

Cloud platforms & product companies — Azure, GCP, AWS, SaaS AI, AI infrastructure

Enterprise GenAI Architecture

I design and build end-to-end GenAI systems that run in production, not just demos that impress in a sandbox. The architecture decisions I own most often: retrieval stack design (hybrid vector, keyword, and graph with source attribution), agent and tool orchestration (MCP-based, least-privilege, auditable), evaluation loops (ground truth plus LLM-as-judge sampling), and observability (Langfuse where it fits, conventional telemetry where it does not).

I work primarily across Azure AI Foundry and GCP, with practical LLMOps around quality, safety, latency, and token spend. The 99.6% ML infrastructure cost reduction I achieved at Accenture Federal was not a procurement decision — I personally rewrote the ML codebase and cloud architecture. I can design for scale, explain the cost model, and own the implementation.

Keywords: Applied AI Architect · GenAI Architect · Enterprise GenAI · RAG · agents · tool orchestration · LLMOps · Azure AI Foundry · GCP · cloud-native AI deployment · observability · Langfuse · MLflow · FastAPI · Docker · Kubernetes · token cost · latency optimization · AI evaluation harnesses · production GenAI · agentic workflows · embedding pipelines

AI evaluation and assurance teams — model readiness, governance behavior, AI reliability

LLM Evaluation & AI Assurance

My public LLM evaluation work (llm-eval-workbench and the PAEF Zenodo preprint) is built around regulated-enterprise model readiness. The harness covers capability, reliability, governance behavior, groundedness, security reasoning, cost and latency tracking, and structured failure taxonomy. Its public scenarios are benign and designed for reviewable evidence rather than safeguard bypass. Every run produces structured artifacts, not just a score.

The PAEF preprint (Zenodo DOI: 10.5281/zenodo.19848867) formalizes a parallelized atomic evaluation framework for contract compliance across multiple models, with token-level margin analysis. The embedding benchmark evaluated 33 models across 1,700 arXiv papers — cross-domain, reproducible, MLOps-rigorous. I think about evaluation the way an engineer thinks about CI: it runs continuously, it catches regressions, and it produces evidence you can show to a skeptic.

On the operational side I have built prompt injection defenses, sandboxed tool execution, and access control inside live enterprise GenAI systems. I also have experience explaining AI safety tradeoffs to non-technical stakeholders — governance boards, federal oversight bodies, and clients who need to sign off on deployment risk.

Keywords: LLM evaluation · model readiness · model evaluation · governance behavior · RAG groundedness · access-control reasoning · failure taxonomy · cost/latency tracking · AI observability · prompt injection resilience · contract compliance · evaluation harness · AI reliability · PAEF · atomic evaluation

Outside my focus

Core LLM pretraining research
Low-level ML compiler or runtime work
Junior trust and safety operations
Generalist BI, dashboard, and data-analyst work

Location

Based in Dublin, Ireland, working across Ireland, the UK, EMEA, and global remote teams.

Key artifacts

Case study: NSF AI Governance Foundations - greenfield governance, technical review, community, security, and vector/graph architecture inside NSF's CAIO function
Case study: 99.6% ML infrastructure cost reduction
LLM Evaluation Workbench - regulated-enterprise model-readiness benchmark and harness
PAEF Contract Compliance Evaluation - published evaluation research and Zenodo DOI
Plain-text AI summary - machine-readable profile