Build an eval harness for 184 AI agent prompts with promptfoo
How to build an LLM-as-judge eval system that scores AI agent prompts on quality, identity, and safety.
How to build an LLM-as-judge eval system that scores AI agent prompts on quality, identity, and safety.
Git hooks have always enforced standards before code enters a repo. With AI agents writing commits autonomously, they’ve become essential.
Two Claude Code skills for applying and maintaining the three-tier codified context architecture — what they do, how they work, and how to get started.
How subsystem specs and MCP retrieval tools handle architectural knowledge too large for hot memory — and why stale specs are worse than no specs.
What specialist skills are, why the 50% domain knowledge rule matters, and how waaseyaa’s spec-backed orchestration keeps AI consistent across a 29-package PHP monorepo.
How to structure your CLAUDE.md as a routing layer so AI agents always know where to look.
Token limits aren’t the real problem with AI in large codebases — inconsistent context is. Here’s what breaks and why a three-tier architecture fixes it.