Pikkuna — Multilingual RAG AI System

The Challenge

E-commerce store received many repetitive questions from customers in 35 countries in different languages: delivery times, costs, how to measure sizes, product differences. Manual email ticket processing took a lot of time, and standard FAQ didn't provide contextual answers.

The Solution

I built a RAG system with three entry points:

Streaming chatbot on site for self-service support — customer gets answer in their language in seconds
Hybrid search combining semantic similarity with keyword matching
AI Ticket Classifier for Zoho Desk — auto-categorizes incoming emails, detects language, generates draft reply with confidence score

All three systems use single vector store with 1600+ documents and incremental indexing for data freshness.

RAG Pipeline Architecture

Single retrieveContext function used in both chatbot and ticket classifier:

// src/lib/rag-chat.ts
export const retrieveContext = async (
  query: string,
  locale: string = "en"
): Promise<RetrievalResult[]> => {
  // 1. Detect product by keywords (multilingual dictionary)
  const detectedProducts = detectProductFromQuery(query);
  // → ['pikkuna'] | ['pikkuroof'] | ['general']

  // 2. Generate query embedding
  const { embedding } = await embed({
    model: openai.embedding("text-embedding-3-large"),
    value: query,
  });

  // 3. Vector search (no locale filter — all languages available)
  const results = await vectorIndex.query({
    vector: embedding,
    topK: 15, // candidates for filtering
    includeMetadata: true,
  });

  // 4. Re-ranking with product boost
  const reranked = results
    .filter((r) => r.score >= 0.7)
    .map((r) => {
      let score = r.score;
      const productTag = r.metadata?.productTag;

      // +30% boost if product matches query
      if (productTag && detectedProducts.includes(productTag)) {
        score = Math.min(1.0, score * 1.3);
      }
      // -40% penalty if product doesn't match (and not general)
      else if (productTag && productTag !== "general") {
        score *= 0.6;
      }

      return { ...r, score };
    })
    .sort((a, b) => b.score - a.score)
    .slice(0, 5);

  return reranked;
};

// Multilingual dictionary for product detection
const PRODUCT_KEYWORDS = {
  pikkuna: [
    "side curtain",
    "vertical",
    "pole",
    "post", // EN
    "боковые",
    "вертикальн",
    "столб", // RU
    "sivuverho",
    "pylväs",
    "tolppa", // FI
    "zijgordijn",
    "verticaal",
    "paal", // NL
    "seitenvorhang",
    "vertikal",
    "stütze", // DE
  ],
  pikkuroof: [
    "roof",
    "pergola",
    "horizontal",
    "canopy",
    "крыша",
    "перголы",
    "горизонтальн",
    "навес",
    "katto",
    "pergola",
    "vaakasuora",
  ],
};

Streaming Chat with Vercel AI SDK

Integration with @assistant-ui/react for ready UI with persistence:

// src/app/api/chat/route.ts
export async function POST(req: Request) {
  const { messages, locale } = await req.json();

  // Handle follow-up questions (context enrichment)
  const query = buildContextualQuery(messages);
  // "And to Germany?" → "how much is shipping... And to Germany?"

  // RAG retrieval
  const relevantDocs = await retrieveContext(query, locale);

  // Build context for prompt
  const contextMessage = relevantDocs
    .map((doc, i) => `[${i + 1}] ${doc.metadata?.category}:\n${doc.text}`)
    .join('\n\n');

  // Streaming response
  const result = streamText({
    model: openai('gpt-4o-mini'),
    system: SYSTEM_PROMPT + `\n\nKnowledge base context:\n${contextMessage}`,
    messages,
    temperature: 0.7,
  });

  // Format compatible with @assistant-ui/react
  return result.toUIMessageStreamResponse();
}

// src/components/ChatBot/ChatBot.tsx
const chat = useChat({
  api: '/api/chat',
  body: { locale },
  initialMessages: loadFromLocalStorage(),  // persistence
});

const runtime = useAISDKRuntime(chat);

return (
  <AssistantRuntimeProvider runtime={runtime}>
    <AssistantModal />
  </AssistantRuntimeProvider>
);

Hybrid Search (Semantic + Keyword)

Search endpoint combines vector similarity with fuzzy keyword matching:

// src/app/api/search/route.ts
export async function GET(req: Request) {
  const { query, locale, type } = parseParams(req);

  // 1. Query embedding
  const { embedding } = await embed({
    model: openai.embedding("text-embedding-3-large"),
    value: query,
  });

  // 2. Vector search WITH locale filter (unlike chat)
  const results = await vectorIndex.query({
    vector: embedding,
    topK: 50,
    filter: `locale = "${locale}"${type ? ` AND type = "${type}"` : ""}`,
    includeMetadata: true,
  });

  // 3. Fuzzy keyword matching
  const queryWords = normalizeForSearch(query).split(" ");
  // normalizeForSearch: й→и, ё→е, ä→a, ö→o, ü→u, ß→ss

  // 4. Hybrid scoring
  const scored = results.map((r) => {
    const titleMatches = countMatches(r.metadata?.question, queryWords);
    const contentMatches = countMatches(r.metadata?.text, queryWords);

    // Hybrid score = vector + keyword bonus
    const hybridScore = r.score + Math.max(titleMatches * 0.25, contentMatches * 0.08);

    return { ...r, hybridScore, hasKeywordMatch: titleMatches > 0 };
  });

  // 5. Smart filtering
  // With keyword matches → threshold 0.62
  // Without keyword matches → threshold 0.70 (stricter)
  const filtered = scored.filter((r) =>
    r.hasKeywordMatch ? r.hybridScore >= 0.62 : r.score >= 0.7
  );

  return Response.json({
    hits: filtered.slice(0, 5),
    totalHits: filtered.length,
  });
}

AI Ticket Classifier with Two-Stage Pipeline

Auto-categorization of Zoho Desk emails with RAG enrichment:

// src/app/api/zoho/ticket-classifier/route.ts
export async function POST(req: Request) {
  const { subject, description } = await req.json();
  const fullMessage = `${subject}\n\n${description}`;

  // === STAGE 1: Extract core question ===
  const extraction = await generateText({
    model: openai("gpt-4.1-mini"),
    temperature: 0.1, // strict for extraction
    system: EXTRACTION_PROMPT,
    prompt: fullMessage,
  });
  // "Hello! When will my order #12345 arrive? Thanks, John"
  // → "When will my order arrive?"

  // === STAGE 2: RAG retrieval ===
  const context = await retrieveContext(extraction.text, "en");

  // === STAGE 3: Classification + Reply Generation ===
  const classification = await generateText({
    model: openai("gpt-4.1-mini"),
    temperature: 0.3, // strict for JSON output
    system: CLASSIFIER_PROMPT, // 18 categories + confidence rules
    prompt: `
FULL MESSAGE: ${fullMessage}
CORE QUESTION: ${extraction.text}
KNOWLEDGE BASE: ${context.map((c) => c.text).join("\n\n")}
    `,
  });

  const result = JSON.parse(classification.text);
  // {
  //   language: "en",
  //   classification: "Order status inquiry",
  //   confidence: 0.85,
  //   reply_en: "Orders typically ship within 2-3 business days...",
  // }

  // If confidence <= 0.3 → reply empty (requires manager)
  return Response.json(result);
}

// 18 classification categories
const CATEGORIES = [
  "Order status inquiry",
  "Quote request",
  "Custom drawing",
  "Reclamation::Film defect",
  "Reclamation::Incorrect size",
  "Reclamation::Delivery Issue",
  "Partnership",
  "Spam",
  // ... 10 more
];

Incremental Indexing with SHA-256 Cache

Only changed documents get re-indexed:

// scripts/ingest-locales-to-upstash.ts
async function ingestDocuments(options: { force?: boolean }) {
  // 1. Load hash cache
  const cache = await loadCache(); // .cache/rag-hashes.json

  // 2. Extract documents from all sources
  const documents = [
    ...extractFAQs(locales), // pages.support.content
    ...extractProducts(locales), // pages.products.*
    ...extractDeliveryTimes(config), // next-intl.config.js
    ...extractShippingCosts(config), // shipping_rates + VAT
    ...extractMarkdownDocs(docsDir), // docs/**/*.md
  ];

  // 3. Compute SHA-256 hashes
  const currentHashes = documents.reduce((acc, doc) => {
    acc[doc.id] = crypto.createHash("sha256").update(doc.text).digest("hex").slice(0, 16);
    return acc;
  }, {});

  // 4. Compare with cache
  const { added, changed, deleted } = compareWithCache(currentHashes, cache);

  if (!options.force && added.length === 0 && changed.length === 0) {
    console.log("No changes detected, skipping ingestion");
    return;
  }

  // 5. Delete stale vectors
  if (deleted.length > 0) {
    await vectorIndex.delete(deleted);
  }

  // 6. Generate embeddings (batch of 10, rate-limit friendly)
  const toUpdate = [...added, ...changed];
  const embeddings = await generateEmbeddingsBatch(toUpdate, 10);

  // 7. Upsert to Upstash (batch of 100)
  await upsertVectors(embeddings, 100);

  // 8. Save new cache
  await saveCache({ hashes: currentHashes, version: "2.0" });
}

// npm run ingest-rag          → incremental
// npm run ingest-rag --force  → full rebuild

System Architecture

┌─────────────────────────────────────────────────────────────────────┐
│                    DATA INGESTION (Offline)                         │
│                                                                     │
│  Sources:                      Pipeline:                            │
│  ├─ src/locales/*.json        ┌─────────────┐    ┌──────────────┐  │
│  │  (25 langs)         ──────►│ Extract &   │───►│ SHA-256 Hash │  │
│  ├─ docs/**/*.md              │ Chunk       │    │ Cache Check  │  │
│  └─ next-intl.config.js       └─────────────┘    └──────┬───────┘  │
│                                                  Changed only       │
│                                ┌─────────────┐   ┌──────▼────────┐  │
│                                │ OpenAI      │◄──│ Generate      │  │
│                                │ Embedding   │   │ Embeddings    │  │
│                                │ 3-large     │   └───────────────┘  │
│                                └──────┬──────┘                      │
│                                ┌──────▼──────┐                      │
│                                │  Upstash    │                      │
│                                │  Vector     │                      │
│                                │  (1614 docs)│                      │
│                                └─────────────┘                      │
└─────────────────────────────────────────────────────────────────────┘
                                      │
                                      ▼
┌─────────────────────────────────────────────────────────────────────┐
│                    RUNTIME (Online, per request)                    │
│                                                                     │
│  ┌───────────┐   ┌───────────┐   ┌─────────────────────────┐       │
│  │  ChatBot  │   │  Search   │   │  Ticket Classifier      │       │
│  │  (Stream) │   │  (Hybrid) │   │  (Zoho Desk webhook)    │       │
│  └─────┬─────┘   └─────┬─────┘   └───────────┬─────────────┘       │
│        └───────────────┼─────────────────────┘                      │
│                        ▼                                            │
│              ┌─────────────────────┐                                │
│              │  retrieveContext()  │                                │
│              │  1. Embed query     │                                │
│              │  2. Detect product  │                                │
│              │  3. Vector search   │                                │
│              │  4. Re-rank         │                                │
│              └──────────┬──────────┘                                │
│                         ▼                                           │
│              ┌─────────────────────┐                                │
│              │  OpenAI LLM         │                                │
│              │  gpt-4o-mini (chat) │                                │
│              │  gpt-4.1-mini (cls) │                                │
│              └─────────────────────┘                                │
└─────────────────────────────────────────────────────────────────────┘

Multilingual Support

Aspect	Implementation
Documents	Indexed for each of 25 locales separately
Retrieval (chat)	No locale filter — cross-lingual search
Retrieval (search)	With locale filter — results in UI language
Product detection	Multilingual dictionary (EN/RU/FI/NL/DE)
Fuzzy matching	Normalization: й→и, ё→е, ä→a, ö→o, ü→u, ß→ss
LLM response	Auto-switch to user's language
Brand names	Localized: Pikkuna / Пиккуна / Πίκκουνα / 皮库娜

Results

Metric	Value
Knowledge base	1,614 documents
Languages	25+ (multilingual retrieval)
Vector dimensions	3,072 (text-embedding-3-large)
RAG retrieval	<500ms (P95)
Ingestion script	1,843 lines
Incremental updates	SHA-256 cache (only changed docs)

The chatbot now resolves 70%+ of support queries without human intervention, available 24/7 in every supported language.