6 Lessons From Building AI Systems That Run in Production

What I learned building AI-powered developer tools that run 24/7 — from architecture decisions to failure modes to what actually matters.


I’ve spent the past year building AI-powered developer tooling that runs continuously in production. Not a chatbot. Not a wrapper around an API. Real infrastructure — services that coordinate work, anticipate needs, and recover from failures without human intervention.

Here’s what I learned.

Lesson 1: Local Models Need Tools, Not Just Prompts

The first version of my AI assistant was a local LLM used as a chatbot. It could answer questions and hold conversations, but it couldn’t do anything. It had no awareness of what I was working on, what failed, or what was coming next.

The breakthrough was giving the local model tools — function-calling capabilities that let it query real data sources and take real actions. Once the model could look up project state, search a knowledge base, and read activity logs, it went from generic chatbot to genuinely useful assistant.

The architectural insight: the model is the reasoning layer, not the knowledge layer. Keep the model small and fast for reasoning. Put the knowledge in structured, queryable systems that the model can access through tools.

Lesson 2: Event-Driven Beats Polling

The first version of the monitoring system polled for changes every few seconds. File watchers checked for modifications. Cron jobs scraped state.

This was fragile, slow, and resource-wasteful. The fix was switching to an event-driven architecture where components emit structured events when significant things happen, and services subscribe to the events they care about.

The difference is dramatic. An event-driven system can react to a build failure in under 3 seconds — checking for known fixes, correlating with recent changes, and surfacing relevant context. A polling system might not even notice for another 5-10 seconds.

Lesson 3: Voice Interfaces Need Strict Gating

An always-listening voice interface that speaks unprompted sounds great in theory. In practice, it’s incredibly annoying if the gating isn’t right.

Effective voice systems need layered gates:

  1. Activity detection — is someone actually speaking?
  2. Intent recognition — is this directed at the system?
  3. Context awareness — is this a good time to respond?
  4. Throttling — don’t repeat the same notification
  5. Significance scoring — is this worth interrupting for?

The last gate matters most. Not every event deserves a spoken announcement. The system needs to evaluate whether the information is significant enough to justify the interruption, given what the user is currently doing.

Lesson 4: Memory Must Be Cross-Session

AI sessions are ephemeral. They start with a blank context and end when the window closes. Every decision, every correction, every lesson learned — gone.

Effective AI tooling needs persistent memory with multiple layers:

  • Session state — what’s happening right now across the environment
  • Knowledge base — structured entries for past decisions, mistakes, and patterns
  • Prediction — anticipating what the user will need based on historical patterns

When sessions start with context instead of cold, the experience transforms. The system knows what you were working on, what happened since your last session, and what’s likely coming next.

Lesson 5: Autonomous Services Need Circuit Breakers

When you have multiple services running continuously, failures cascade. One dependency goes down, a service can’t reach it, retries pile up, resources spike, and everything degrades.

Circuit breakers prevent this. Track consecutive failures per dependency:

  • Closed (normal): requests flow through
  • Open (after N failures): requests are immediately rejected with a fallback
  • Half-open (after timeout): one test request checks if recovery happened

This turns cascading failures into graceful degradation. Individual components can fail without taking down the whole system.

Lesson 6: Build for Yourself First

The most valuable lesson: be the user. Every feature in the system exists because I needed it — not because it seemed impressive or technically interesting. The features that sound cool but don’t solve a real problem? Those got cut.

Building for yourself eliminates the most dangerous failure mode in software: building something nobody wants.

What I’d Do Differently

If I were starting over:

  • Start with the data model, not the AI. The structured knowledge base and event system matter more than which model you use. Get the data right first.
  • Design for failure from day one. Circuit breakers, health checks, and graceful degradation aren’t features — they’re requirements for anything that runs continuously.
  • Keep the model layer thin. The AI should reason and orchestrate, not store state or manage data. That’s what databases and event systems are for.
  • Resist the urge to add complexity. A system that runs reliably on a single machine is worth more than a distributed architecture you can’t debug.

The best AI systems aren’t impressive demos. They’re quiet, reliable tools that make their users more effective — and stay running when things go wrong.