Pikkuna — Multilingual RAG AI System
RAG system on OpenAI and Upstash Vector with 25+ language support. Includes streaming chatbot, hybrid search (semantic + keyword), AI ticket classifier, and incremental indexing with SHA-256 caching.
Tech Stack
AI/ML
Vector Store
Frontend
Data Pipeline
Key Results
- 1,614 documents in knowledge base
- 25+ languages (multilingual retrieval)
- 3,072-dim vectors (text-embedding-3-large)
- RAG retrieval <500ms (P95)
The Challenge
E-commerce store received many repetitive questions from customers in 35 countries in different languages: delivery times, costs, how to measure sizes, product differences. Manual email ticket processing took a lot of time, and standard FAQ didn't provide contextual answers.
The Solution
I built a RAG system with three entry points:
- Streaming chatbot on site for self-service support — customer gets answer in their language in seconds
- Hybrid search combining semantic similarity with keyword matching
- AI Ticket Classifier for Zoho Desk — auto-categorizes incoming emails, detects language, generates draft reply with confidence score
All three systems use single vector store with 1600+ documents and incremental indexing for data freshness.
RAG Pipeline Architecture
Single retrieveContext function used in both chatbot and ticket classifier:
// src/lib/rag-chat.ts
export const retrieveContext = async (
query: string,
locale: string = "en"
): Promise<RetrievalResult[]> => {
// 1. Detect product by keywords (multilingual dictionary)
const detectedProducts = detectProductFromQuery(query);
// → ['pikkuna'] | ['pikkuroof'] | ['general']
// 2. Generate query embedding
const { embedding } = await embed({
model: openai.embedding("text-embedding-3-large"),
value: query,
});
// 3. Vector search (no locale filter — all languages available)
const results = await vectorIndex.query({
vector: embedding,
topK: 15, // candidates for filtering
includeMetadata: true,
});
// 4. Re-ranking with product boost
const reranked = results
.filter((r) => r.score >= 0.7)
.map((r) => {
let score = r.score;
const productTag = r.metadata?.productTag;
// +30% boost if product matches query
if (productTag && detectedProducts.includes(productTag)) {
score = Math.min(1.0, score * 1.3);
}
// -40% penalty if product doesn't match (and not general)
else if (productTag && productTag !== "general") {
score *= 0.6;
}
return { ...r, score };
})
.sort((a, b) => b.score - a.score)
.slice(0, 5);
return reranked;
};
// Multilingual dictionary for product detection
const PRODUCT_KEYWORDS = {
pikkuna: [
"side curtain",
"vertical",
"pole",
"post", // EN
"боковые",
"вертикальн",
"столб", // RU
"sivuverho",
"pylväs",
"tolppa", // FI
"zijgordijn",
"verticaal",
"paal", // NL
"seitenvorhang",
"vertikal",
"stütze", // DE
],
pikkuroof: [
"roof",
"pergola",
"horizontal",
"canopy",
"крыша",
"перголы",
"горизонтальн",
"навес",
"katto",
"pergola",
"vaakasuora",
],
};
Streaming Chat with Vercel AI SDK
Integration with @assistant-ui/react for ready UI with persistence:
// src/app/api/chat/route.ts
export async function POST(req: Request) {
const { messages, locale } = await req.json();
// Handle follow-up questions (context enrichment)
const query = buildContextualQuery(messages);
// "And to Germany?" → "how much is shipping... And to Germany?"
// RAG retrieval
const relevantDocs = await retrieveContext(query, locale);
// Build context for prompt
const contextMessage = relevantDocs
.map((doc, i) => `[${i + 1}] ${doc.metadata?.category}:\n${doc.text}`)
.join('\n\n');
// Streaming response
const result = streamText({
model: openai('gpt-4o-mini'),
system: SYSTEM_PROMPT + `\n\nKnowledge base context:\n${contextMessage}`,
messages,
temperature: 0.7,
});
// Format compatible with @assistant-ui/react
return result.toUIMessageStreamResponse();
}
// src/components/ChatBot/ChatBot.tsx
const chat = useChat({
api: '/api/chat',
body: { locale },
initialMessages: loadFromLocalStorage(), // persistence
});
const runtime = useAISDKRuntime(chat);
return (
<AssistantRuntimeProvider runtime={runtime}>
<AssistantModal />
</AssistantRuntimeProvider>
);
Hybrid Search (Semantic + Keyword)
Search endpoint combines vector similarity with fuzzy keyword matching:
// src/app/api/search/route.ts
export async function GET(req: Request) {
const { query, locale, type } = parseParams(req);
// 1. Query embedding
const { embedding } = await embed({
model: openai.embedding("text-embedding-3-large"),
value: query,
});
// 2. Vector search WITH locale filter (unlike chat)
const results = await vectorIndex.query({
vector: embedding,
topK: 50,
filter: `locale = "${locale}"${type ? ` AND type = "${type}"` : ""}`,
includeMetadata: true,
});
// 3. Fuzzy keyword matching
const queryWords = normalizeForSearch(query).split(" ");
// normalizeForSearch: й→и, ё→е, ä→a, ö→o, ü→u, ß→ss
// 4. Hybrid scoring
const scored = results.map((r) => {
const titleMatches = countMatches(r.metadata?.question, queryWords);
const contentMatches = countMatches(r.metadata?.text, queryWords);
// Hybrid score = vector + keyword bonus
const hybridScore = r.score + Math.max(titleMatches * 0.25, contentMatches * 0.08);
return { ...r, hybridScore, hasKeywordMatch: titleMatches > 0 };
});
// 5. Smart filtering
// With keyword matches → threshold 0.62
// Without keyword matches → threshold 0.70 (stricter)
const filtered = scored.filter((r) =>
r.hasKeywordMatch ? r.hybridScore >= 0.62 : r.score >= 0.7
);
return Response.json({
hits: filtered.slice(0, 5),
totalHits: filtered.length,
});
}
AI Ticket Classifier with Two-Stage Pipeline
Auto-categorization of Zoho Desk emails with RAG enrichment:
// src/app/api/zoho/ticket-classifier/route.ts
export async function POST(req: Request) {
const { subject, description } = await req.json();
const fullMessage = `${subject}\n\n${description}`;
// === STAGE 1: Extract core question ===
const extraction = await generateText({
model: openai("gpt-4.1-mini"),
temperature: 0.1, // strict for extraction
system: EXTRACTION_PROMPT,
prompt: fullMessage,
});
// "Hello! When will my order #12345 arrive? Thanks, John"
// → "When will my order arrive?"
// === STAGE 2: RAG retrieval ===
const context = await retrieveContext(extraction.text, "en");
// === STAGE 3: Classification + Reply Generation ===
const classification = await generateText({
model: openai("gpt-4.1-mini"),
temperature: 0.3, // strict for JSON output
system: CLASSIFIER_PROMPT, // 18 categories + confidence rules
prompt: `
FULL MESSAGE: ${fullMessage}
CORE QUESTION: ${extraction.text}
KNOWLEDGE BASE: ${context.map((c) => c.text).join("\n\n")}
`,
});
const result = JSON.parse(classification.text);
// {
// language: "en",
// classification: "Order status inquiry",
// confidence: 0.85,
// reply_en: "Orders typically ship within 2-3 business days...",
// }
// If confidence <= 0.3 → reply empty (requires manager)
return Response.json(result);
}
// 18 classification categories
const CATEGORIES = [
"Order status inquiry",
"Quote request",
"Custom drawing",
"Reclamation::Film defect",
"Reclamation::Incorrect size",
"Reclamation::Delivery Issue",
"Partnership",
"Spam",
// ... 10 more
];
Incremental Indexing with SHA-256 Cache
Only changed documents get re-indexed:
// scripts/ingest-locales-to-upstash.ts
async function ingestDocuments(options: { force?: boolean }) {
// 1. Load hash cache
const cache = await loadCache(); // .cache/rag-hashes.json
// 2. Extract documents from all sources
const documents = [
...extractFAQs(locales), // pages.support.content
...extractProducts(locales), // pages.products.*
...extractDeliveryTimes(config), // next-intl.config.js
...extractShippingCosts(config), // shipping_rates + VAT
...extractMarkdownDocs(docsDir), // docs/**/*.md
];
// 3. Compute SHA-256 hashes
const currentHashes = documents.reduce((acc, doc) => {
acc[doc.id] = crypto.createHash("sha256").update(doc.text).digest("hex").slice(0, 16);
return acc;
}, {});
// 4. Compare with cache
const { added, changed, deleted } = compareWithCache(currentHashes, cache);
if (!options.force && added.length === 0 && changed.length === 0) {
console.log("No changes detected, skipping ingestion");
return;
}
// 5. Delete stale vectors
if (deleted.length > 0) {
await vectorIndex.delete(deleted);
}
// 6. Generate embeddings (batch of 10, rate-limit friendly)
const toUpdate = [...added, ...changed];
const embeddings = await generateEmbeddingsBatch(toUpdate, 10);
// 7. Upsert to Upstash (batch of 100)
await upsertVectors(embeddings, 100);
// 8. Save new cache
await saveCache({ hashes: currentHashes, version: "2.0" });
}
// npm run ingest-rag → incremental
// npm run ingest-rag --force → full rebuild
System Architecture
┌─────────────────────────────────────────────────────────────────────┐
│ DATA INGESTION (Offline) │
│ │
│ Sources: Pipeline: │
│ ├─ src/locales/*.json ┌─────────────┐ ┌──────────────┐ │
│ │ (25 langs) ──────►│ Extract & │───►│ SHA-256 Hash │ │
│ ├─ docs/**/*.md │ Chunk │ │ Cache Check │ │
│ └─ next-intl.config.js └─────────────┘ └──────┬───────┘ │
│ Changed only │
│ ┌─────────────┐ ┌──────▼────────┐ │
│ │ OpenAI │◄──│ Generate │ │
│ │ Embedding │ │ Embeddings │ │
│ │ 3-large │ └───────────────┘ │
│ └──────┬──────┘ │
│ ┌──────▼──────┐ │
│ │ Upstash │ │
│ │ Vector │ │
│ │ (1614 docs)│ │
│ └─────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────┐
│ RUNTIME (Online, per request) │
│ │
│ ┌───────────┐ ┌───────────┐ ┌─────────────────────────┐ │
│ │ ChatBot │ │ Search │ │ Ticket Classifier │ │
│ │ (Stream) │ │ (Hybrid) │ │ (Zoho Desk webhook) │ │
│ └─────┬─────┘ └─────┬─────┘ └───────────┬─────────────┘ │
│ └───────────────┼─────────────────────┘ │
│ ▼ │
│ ┌─────────────────────┐ │
│ │ retrieveContext() │ │
│ │ 1. Embed query │ │
│ │ 2. Detect product │ │
│ │ 3. Vector search │ │
│ │ 4. Re-rank │ │
│ └──────────┬──────────┘ │
│ ▼ │
│ ┌─────────────────────┐ │
│ │ OpenAI LLM │ │
│ │ gpt-4o-mini (chat) │ │
│ │ gpt-4.1-mini (cls) │ │
│ └─────────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
Multilingual Support
| Aspect | Implementation |
|---|---|
| Documents | Indexed for each of 25 locales separately |
| Retrieval (chat) | No locale filter — cross-lingual search |
| Retrieval (search) | With locale filter — results in UI language |
| Product detection | Multilingual dictionary (EN/RU/FI/NL/DE) |
| Fuzzy matching | Normalization: й→и, ё→е, ä→a, ö→o, ü→u, ß→ss |
| LLM response | Auto-switch to user's language |
| Brand names | Localized: Pikkuna / Пиккуна / Πίκκουνα / 皮库娜 |
Results
| Metric | Value |
|---|---|
| Knowledge base | 1,614 documents |
| Languages | 25+ (multilingual retrieval) |
| Vector dimensions | 3,072 (text-embedding-3-large) |
| RAG retrieval | <500ms (P95) |
| Ingestion script | 1,843 lines |
| Incremental updates | SHA-256 cache (only changed docs) |
The chatbot now resolves 70%+ of support queries without human intervention, available 24/7 in every supported language.
AvailableNeed something similar?
I build custom solutions — from APIs to full products. Let's talk about your project.