Skip to content
Case Study

Building a Production AI Agent

Brain-First Architecture with 21 Orchestrated Tools

How I built an enterprise AI agent for a European beauty chain that handles booking, knowledge search, cart management, and customer support across 18 studios in 5 countries - with near-zero hallucinations.

AI ArchitectBuilt from scratch
21
Orchestrated Tools
booking, cart, knowledge, support
5
Countries
18 studios, 6 languages
4
Channels
Web, Telegram, Instagram, FB
~0
Hallucinations
brain-first control flow

The Problem

Why standard chatbots and RAG pipelines fail for complex business workflows.

Problem 01

Rigid State Machines

Traditional chatbots use hardcoded FSMs that break on multi-intent messages. User says "Wroclaw, women, laser hair removal" - FSM handles one step at a time.

Problem 02

RAG Hallucinations

Standard Retrieval-Augmented Generation retrieves context but still lets the LLM fabricate prices, services, and availability that don't exist in the database.

Problem 03

Context Loss

Stateless chatbots forget user preferences between messages. A returning customer has to re-specify their studio, gender, and preferred device every session.

Problem 04

No Observability

When something goes wrong, there's no trace of which tool was called, what the LLM decided, or why the user got a wrong answer. Debugging is guesswork.

The Solution: Brain-First Architecture

A single fine-tuned LLM reads a comprehensive system prompt and decides which tools to call - no hardcoded state machines, no rigid flows.

01

User Message

Incoming message from any channel (Web, Telegram, Instagram, Facebook)

02

Load Session

Restore conversation context, preferences, cart, booking history from PostgreSQL

03

Build Context

Construct dynamic system prompt with current studio, gender, cart, pending questions

04

Brain LLM

Fine-tuned GPT-4.1 Mini decides which tools to call based on user intent

05

Tool Execution

Up to 5 sequential tool calls with context sync between each step via prepareStep

06

Response

Natural language answer grounded in database facts, with UI selectors when needed

Key Innovation: prepareStep

Before each LLM step,prepareStep rebuilds the system prompt with real-time state from tool executions. When the user says "Wroclaw, women, laser" - the LLM processes it in 4 sequential steps, with each step seeing the updated context from the previous tool call.

The Tech Stack

Production-grade infrastructure for a serverless AI agent at scale.

AI / LLM

GPT-4.1 MiniFine-tuned
Vercel AI SDKNative multi-step with generateText
LangfuseLLM observability & tracing
OpenAI Embeddingstext-embedding-3-small

Backend

Next.js 16App Router, serverless on Vercel
PostgreSQLWith pgvector extension
Prisma 7Multi-schema + Accelerate edge
OpenTelemetryDistributed tracing

Channels

Website WidgetCORS-protected API
Telegram Bot APIWebhook handler
Instagram / FacebookMeta webhooks
Admin PanelUser management, debug UI

Infrastructure

VercelServerless deployment + Cron
ResendTransactional email delivery
React 19Admin panel & widget UI
VitestTool & integration testing

21 Orchestrated Tools

The Brain LLM has access to 21 specialized tools organized into 5 categories. Each tool validates its own context and returns typed results.

Studio & Location

3 tools
Studio Finder

7-priority fuzzy search chain (handles misspellings, districts, cross-locale)

Studio Details

Address, hours, phone, Google Maps link, video directions

Country List

Countries with active studios

Services & Products

5 tools
Service Browser

Zones (legs, face, bikini) or procedures (cosmetology)

Price Lookup

Structured price overview with device comparison support

Subscription Browser

Subscription packages with calculated savings

Device Info

Full device specs, comparisons, contraindications

Upsell Prompt

"Want to add more zones?" with yes/no selector

Cart & Booking

5 tools
Add to Cart

Validates serviceId against lastShownServices

Remove from Cart

Clear specific item from cart

Availability Check

Available time slots for a specific date

Booking Link

Generates booking URL, validates time against availability

Repeat Booking

Re-book from permanent booking history

Knowledge & Support

3 tools
Knowledge Search

Two-stage semantic search: context-first, then global fallback

Human Handoff

Escalates to human support, creates incident snapshot

Consultation Offer

Proactive consultation offer before/after procedure selection

Flow Control

5 tools
Gender Selector

Gender selection with UI selector when unset

Category Picker

Category selection with automatic device inference

Device Resolver

Device inference from category via DEVICE_TO_CATEGORY mapping

Cart Overview

Display current cart with totals

Session Reset

Clear session after completing booking

Near-Zero Hallucination Strategy

Six layers of protection ensure the agent never fabricates information.

Database-Only Policy

System prompt explicitly forbids inventing prices, services, studios, or availability. All factual queries must go through tools.

Tool-Level Validation

Each tool validates required context. addToCart checks serviceId against lastShownServices. getBookingLink validates time slots against lastAvailabilityTimes.

Two-Stage Vector Search

Context-aware search with studio/gender/country filters first. Global fallback with lower threshold (0.25) only if contextual search returns nothing.

FAQ Direct Answer

High-similarity FAQ matches (>= 0.85) return the stored answer directly without LLM rewriting, eliminating any chance of distortion.

Injection Detection

Pre-LLM guard layer with pattern matching for prompt injection attempts, 500-char message limit, and DB-based rate limiting for serverless.

Incident Snapshots

On 3+ consecutive failures, user dislikes, or complaints - a SupportIncident is created with full conversation snapshot, independent from conversation lifecycle.

Key Architectural Decisions

The reasoning behind critical design choices.

Brain-First over FSM

Single LLM agent decides all flows, enabling flexibility for multi-intent requests and natural conversations instead of rigid state transitions.

Native Multi-Step (Vercel AI SDK)

Uses generateText with stopWhen + prepareStep instead of custom loop. prepareStep rebuilds the system prompt before each LLM step with real-time state changes.

Session vs Conversation split

Session is temporary (24h TTL) for cart/booking state. Conversation is permanent for preferences, studio, gender, and booking history that persists across sessions.

No Langchain / LlamaIndex

Direct Vercel AI SDK + Prisma for simpler, more transparent architecture. Full control over prompt construction, tool execution, and state management.

Fine-tuned model over prompting

Custom training data pipeline (merge, validate, audit, quality-check scripts) for consistent tool calling behavior and tone across 6 languages.

Permanent booking history

Never reset, even after handoff or session restart. Enables repeat booking, personalization, and analytics across the entire customer lifecycle.

The Outcome

$0.015 per Conversation

Fine-tuned GPT-4.1 Mini keeps cost ~100x cheaper than a human operator per dialog

~4s Average Response

Multi-step tool execution (2-5 calls) with prompt caching delivers near-instant replies

Near-Zero Hallucinations

Brain-first architecture with tool-level validation ensures every fact comes from the database

18 Studios, 5 Countries

Live in production across the entire European chain with full localization in 6 languages

4 Channels, 1 Context

Website, Telegram, Instagram, Facebook - unified conversation and booking history

Complete Observability

Langfuse traces every tool call, token usage, and cost per conversation in real-time

Interested in building something similar?

Try the Agent Live