Build an eval harness for 184 AI agent prompts with promptfoo

How to build an LLM-as-judge eval system that scores AI agent prompts on quality, identity, and safety.

March 30, 2026 · 9 min · Russell

Git hooks are your best defense against AI-generated mess

Git hooks have always enforced standards before code enters a repo. With AI agents writing commits autonomously, they’ve become essential.

March 16, 2026 · 4 min · Russell

Skills for applying codified context to your own codebase

Two Claude Code skills for applying and maintaining the three-tier codified context architecture — what they do, how they work, and how to get started.

March 14, 2026 · 8 min · Russell

Cold memory: specs, MCP tools, and on-demand context retrieval

How subsystem specs and MCP retrieval tools handle architectural knowledge too large for hot memory — and why stale specs are worse than no specs.

March 13, 2026 · 7 min · Russell

Domain specialist skills: teaching AI to think like your senior dev

What specialist skills are, why the 50% domain knowledge rule matters, and how waaseyaa’s spec-backed orchestration keeps AI consistent across a 29-package PHP monorepo.

March 12, 2026 · 6 min · Russell

Writing a CLAUDE.md that actually works

How to structure your CLAUDE.md as a routing layer so AI agents always know where to look.

March 11, 2026 · 6 min · Russell

Why AI agents lose their minds in complex codebases

Token limits aren’t the real problem with AI in large codebases — inconsistent context is. Here’s what breaks and why a three-tier architecture fixes it.

March 10, 2026 · 5 min · Russell