Building a Production AI Agent | Brain-First Architecture Case Study

The Problem

Why standard chatbots and RAG pipelines fail for complex business workflows.

Problem 01

Rigid State Machines

Traditional chatbots use hardcoded FSMs that break on multi-intent messages. User says "Wroclaw, women, laser hair removal" — FSM handles one step at a time.

Problem 02

RAG Hallucinations

Standard Retrieval-Augmented Generation retrieves context but still lets the LLM fabricate prices, services, and availability that don't exist in the database.

Problem 03

Context Loss

Stateless chatbots forget user preferences between messages. A returning customer has to re-specify their studio, gender, and preferred device every session.

Problem 04

No Observability

When something goes wrong, there's no trace of which tool was called, what the LLM decided, or why the user got a wrong answer. Debugging is guesswork.

The Solution: Brain-First Architecture

A single fine-tuned LLM reads a comprehensive system prompt and decides which tools to call — no hardcoded state machines, no rigid flows.

User Message

Incoming message from any channel (Web, Telegram, Instagram, Facebook)

Load Session

Restore conversation context, preferences, cart, booking history from PostgreSQL

Build Context

Construct dynamic system prompt with current studio, gender, cart, pending questions

Brain LLM

Fine-tuned GPT-4.1 Mini decides which tools to call based on user intent

Tool Execution

Up to 5 sequential tool calls with context sync between each step via prepareStep

Response

Natural language answer grounded in database facts, with UI selectors when needed

Key Innovation: prepareStep

Before each LLM step, prepareStep rebuilds the system prompt with real-time state from tool executions. When the user says "Wroclaw, women, laser" — the LLM processes it in 4 sequential steps, with each step seeing the updated context from the previous tool call.

The Tech Stack

Production-grade infrastructure for a serverless AI agent at scale.

AI / LLM

GPT-4.1 MiniFine-tuned

Vercel AI SDKNative multi-step with generateText

LangfuseLLM observability & tracing

OpenAI Embeddingstext-embedding-3-small

Backend

Next.js 16App Router, serverless on Vercel

PostgreSQLWith pgvector extension

Prisma 7Multi-schema + Accelerate edge

OpenTelemetryDistributed tracing

Channels

Website WidgetCORS-protected API

Telegram Bot APIWebhook handler

Instagram / FacebookMeta webhooks

Admin PanelUser management, debug UI

Infrastructure

VercelServerless deployment + Cron

ResendTransactional email delivery

React 19Admin panel & widget UI

VitestTool & integration testing

21 Orchestrated Tools

The Brain LLM has access to 21 specialized tools organized into 5 categories. Each tool validates its own context and returns typed results.

Studio & Location

3 tools

findStudios

7-priority fuzzy search chain (handles misspellings, districts, cross-locale)

getStudioDetails

Address, hours, phone, Google Maps link, video directions

getCountryList

Countries with active studios

Services & Products

5 tools

browseServices

Zones (legs, face, bikini) or procedures (cosmetology)

getPrices

Structured price overview with device comparison support

browseAbonnements

Subscription packages with calculated savings

getDeviceInfo

Full device specs, comparisons, contraindications

askMoreServices

"Want to add more zones?" with yes/no selector

Cart & Booking

5 tools

addToCart

Validates serviceId against lastShownServices

removeFromCart

Clear specific item from cart

checkAvailability

Available time slots for a specific date

getBookingLink

Generates booking URL, validates time against availability

repeatBooking

Re-book from permanent booking history

Knowledge & Support

3 tools

searchKnowledge

Two-stage semantic search: context-first, then global fallback

requestHandoff

Escalates to human support, creates incident snapshot

askConsultation

Proactive consultation offer before/after procedure selection

Flow Control

5 tools

askGender

Gender selection with UI selector when unset

getCategoryTypes

Category selection with automatic device inference

selectDevice

Device inference from category via DEVICE_TO_CATEGORY mapping

getCart

Display current cart with totals

restartBooking

Clear session after completing booking

Near-Zero Hallucination Strategy

Six layers of protection ensure the agent never fabricates information.

Database-Only Policy

System prompt explicitly forbids inventing prices, services, studios, or availability. All factual queries must go through tools.

Tool-Level Validation

Each tool validates required context. addToCart checks serviceId against lastShownServices. getBookingLink validates time slots against lastAvailabilityTimes.

Two-Stage Vector Search

Context-aware search with studio/gender/country filters first. Global fallback with lower threshold (0.25) only if contextual search returns nothing.

FAQ Direct Answer

High-similarity FAQ matches (>= 0.85) return the stored answer directly without LLM rewriting, eliminating any chance of distortion.

Injection Detection

Pre-LLM guard layer with pattern matching for prompt injection attempts, 500-char message limit, and DB-based rate limiting for serverless.

Incident Snapshots

On 3+ consecutive failures, user dislikes, or complaints — a SupportIncident is created with full conversation snapshot, independent from conversation lifecycle.

Key Architectural Decisions

The reasoning behind critical design choices.

Brain-First over FSM

Single LLM agent decides all flows, enabling flexibility for multi-intent requests and natural conversations instead of rigid state transitions.

Native Multi-Step (Vercel AI SDK)

Uses generateText with stopWhen + prepareStep instead of custom loop. prepareStep rebuilds the system prompt before each LLM step with real-time state changes.

Session vs Conversation split

Session is temporary (24h TTL) for cart/booking state. Conversation is permanent for preferences, studio, gender, and booking history that persists across sessions.

No Langchain / LlamaIndex

Direct Vercel AI SDK + Prisma for simpler, more transparent architecture. Full control over prompt construction, tool execution, and state management.

Fine-tuned model over prompting

Custom training data pipeline (merge, validate, audit, quality-check scripts) for consistent tool calling behavior and tone across 6 languages.

Permanent booking history

Never reset, even after handoff or session restart. Enables repeat booking, personalization, and analytics across the entire customer lifecycle.

The Outcome

✓

$0.015 per Conversation

Fine-tuned GPT-4.1 Mini keeps cost ~100x cheaper than a human operator per dialog

✓

~4s Average Response

Multi-step tool execution (2-5 calls) with prompt caching delivers near-instant replies

✓

Near-Zero Hallucinations

Brain-first architecture with tool-level validation ensures every fact comes from the database

✓

18 Studios, 5 Countries

Live in production across the entire European chain with full localization in 6 languages

✓

4 Channels, 1 Context

Website, Telegram, Instagram, Facebook — unified conversation and booking history

✓

Complete Observability

Langfuse traces every tool call, token usage, and cost per conversation in real-time

Interested in building something similar?

Get in Touch Try the Agent Live