Inebrio · carlosatta · Apr 8, 2026 · Apr 8, 2026 · Apr 8, 2026 · Apr 8, 2026
diff --git a/docs/concepts/routing.md b/docs/concepts/routing.md
@@ -94,6 +94,37 @@ Scores models by how much of their associated budget is still available. Models
 
 **Use when:** you have per-model spending limits and want Routerly to naturally prefer models with headroom.
 
+### `semantic-intent`
+
+Classifies each incoming request by semantic intent using embeddings, then restricts the candidate pool to the models you have mapped to that intent.
+
+**How it works:**
+
+1. You define **intents** — each intent has a name, a list of **example phrases** that represent it, and the **target models** that should handle requests of that type.
+2. When a request arrives, Routerly embeds the user message and compares it against the centroid of each intent's examples using cosine similarity.
+3. Based on the best match score and the gap between the top two intents, the policy produces one of three outcomes:
+
+| Outcome | Condition | Effect |
+|---|---|---|
+| **Confident** | Top score ≥ threshold and margin ≥ ambiguity gap | Hard-filters candidates to the matched intent's model pool |
+| **Ambiguous** | Top score ≥ threshold but gap is too small | Merges the top-2 intent pools |
+| **Unknown** | Top score below threshold | No filtering — all candidates pass through |
+
+**Configuration:**
+
+| Option | Default | Description |
+|---|---|---|
+| `embedding_provider` | _(required)_ | `openai` or `ollama` |
+| `embedding_model` | _(required)_ | Model ID to use for embedding (must have the embedding capability) |
+| `absolute_threshold` | `0.60` | Minimum cosine similarity score to consider a match |
+| `ambiguity_threshold` | `0.08` | Minimum margin between top-2 scores to consider a match confident |
+
+**Use when:** you have distinct request categories that should always be routed to specific models (e.g. billing questions → a fine-tuned model, code requests → a coding model).
+
+:::tip Intent centroids are cached
+Embeddings for intent examples are computed once and cached in memory for 1 hour. Changing an intent's examples automatically invalidates the cache.
+:::
+
 ---
 
 ## Configuring Routing

diff --git a/docs/service/routing-engine.md b/docs/service/routing-engine.md
@@ -186,6 +186,91 @@ No configuration options (reads limits from the project and model config).
 
 ---
 
+### `semantic-intent`
+
+**Type:** hard filter (pool narrowing)
+
+Classifies the incoming request by semantic intent using embedding-based similarity, then restricts the candidate pool to the models mapped to that intent. Policies that run after it (e.g. `cheapest`, `performance`) operate only within the narrowed pool.
+
+#### How it works
+
+1. The last user message is extracted from the request.
+2. It is embedded using the configured embedding model/provider.
+3. Each intent's **centroid** — the element-wise mean of its example phrase embeddings — is computed (and cached for 1 hour).
+4. Cosine similarity is computed between the request vector and every intent centroid.
+5. The result is classified as `confident`, `ambiguous`, or `unknown`:
+
+| Status | Condition | Candidate pool |
+|---|---|---|
+| `confident` | `topScore ≥ absolute_threshold` and `margin ≥ ambiguity_threshold` | Intent's `candidate_models` only |
+| `ambiguous` | `topScore ≥ absolute_threshold` but `margin < ambiguity_threshold` | Union of top-2 intents' `candidate_models` |
+| `unknown` | `topScore < absolute_threshold` | All candidates (no filtering) |
+
+If the embedding call itself fails, the policy degrades gracefully and passes all candidates through unchanged.
+
+#### Configuration
+
+| Config key | Default | Description |
+|------------|---------|-------------|
+| `embedding_provider` | _(required)_ | `openai` or `ollama` |
+| `embedding_model` | _(required)_ | Embedding model ID. The model must have `capabilities.embedding = true` |
+| `embedding_endpoint` | — | Custom base URL (useful for self-hosted Ollama) |
+| `embedding_api_key` | — | API key override (defaults to the provider's global key) |
+| `absolute_threshold` | `0.60` | Minimum cosine similarity to recognise a match |
+| `ambiguity_threshold` | `0.08` | Minimum margin between top-2 scores to resolve ambiguity |
+| `intents` | _(required)_ | Map of intent name → `{ examples: string[], candidate_models: string[] }` |
+
+#### Intent definition
+
+```json
+{
+  "intents": {
+    "billing": {
+      "examples": [
+        "I need an invoice",
+        "Can I change my payment method?",
+        "Refund request"
+      ],
+      "candidate_models": ["gpt-4.1-mini"]
+    },
+    "code_review": {
+      "examples": [
+        "Review this pull request",
+        "Check my TypeScript code",
+        "What's wrong with this function?"
+      ],
+      "candidate_models": ["claude-3-7-sonnet", "gpt-4.1"]
+    }
+  }
+}
+```
+
+Intent names are normalised to `snake_case` (e.g. `"Customer Support"` → `customer_support`).
+
+#### Trace entries
+
+The policy emits three trace entries visible in the **Router Response** panel of the dashboard:
+
+| Message | When | Details |
+|---|---|---|
+| `policy:semantic-intent:classification` | Always (when text is classified) | `topIntent`, `topScore`, `secondIntent`, `secondScore`, `margin`, `status` |
+| `policy:semantic-intent:result` | After pool narrowing | `allowed`, `excluded`, `status` |
+| `policy:semantic-intent:error` | If the embedding call fails | `error` message |
+
+#### Centroid cache
+
+Intent centroids are computed once — embedding all example phrases and averaging them — then stored in memory with a 1-hour TTL. The cache key includes a hash of the example phrases, so changing an intent's examples automatically invalidates it without a service restart.
+
+**Use when:** you have distinct request categories that must always reach specific models (support triage, multilingual routing, task-type segregation, etc.).
+
+:::info Recommended pipeline position
+Place `semantic-intent` **before** scoring policies (`cheapest`, `performance`, `llm`) so they score only within the already-narrowed pool. Place it **after** hard-filter policies (`health`, `context`, `capability`) so unhealthy or incapable models are excluded before intent matching.
+
+Suggested order: `health` → `context` → `capability` → `budget-remaining` → `rate-limit` → **`semantic-intent`** → `llm` → `performance` → `fairness` → `cheapest`
+:::
+
+---
+
 ## Policy Ordering and Weights
 
 Policies are applied in the order configured in the project. Their positional weight (`total − index`) means policies near the top of the list have more influence on the final score. Reorder policies via the dashboard (**Projects → your project → Routing**) or the CLI.

diff --git a/package-lock.json b/package-lock.json
diff --git a/packages/cli/src/commands/project.ts b/packages/cli/src/commands/project.ts
@@ -187,11 +187,12 @@ Examples:
   policyCmd.command('enable <project> <type>')
     .description('Enable a routing policy (adds it if not present)')
     .addHelpText('after', `
-Policy types: health, context, capability, budget-remaining, rate-limit, llm, performance, fairness, cheapest
+Policy types: health, context, capability, budget-remaining, rate-limit, llm, performance, fairness, cheapest, semantic-intent
 
 Examples:
   routerly project routing policy enable my-api health
   routerly project routing policy enable my-api llm --config '{"memoryCount":3}'
+  routerly project routing policy enable my-api semantic-intent --config '{"embedding_provider":"openai","embedding_model":"text-embedding-3-small","absolute_threshold":0.60,"ambiguity_threshold":0.08,"intents":{"coding":{"examples":["write a python function"],"candidate_models":["qwen-coder"]}}}'
 `)
     .option('--config <json>', 'Policy-specific configuration as JSON')
     .action(async (nameOrId: string, type: string, opts: { config?: string }) => {

diff --git a/packages/dashboard/package.json b/packages/dashboard/package.json
@@ -25,6 +25,6 @@
     "@vitejs/plugin-react": "^5.2.0",
     "sharp": "^0.34.5",
     "typescript": "^5.4.5",
-    "vite": "^6.4.1"
+    "vite": "^6.4.2"
   }
 }
diff --git a/packages/dashboard/src/api.ts b/packages/dashboard/src/api.ts
@@ -152,13 +152,22 @@ export interface PricingTier {
   cachePerMillion?: number;
 }
 
+export interface ModelCapabilities {
+  thinking?: boolean;
+  vision?: boolean;
+  functionCalling?: boolean;
+  json?: boolean;
+  embedding?: boolean;
+}
+
 export interface Model {
   id: string; name: string; provider: string; endpoint: string;
   upstreamModelId?: string;
   cost: { inputPerMillion: number; outputPerMillion: number; cachePerMillion?: number; pricingTiers?: PricingTier[] };
   contextWindow?: number;
   limits?: Limit[];
   /** @deprecated use limits */ globalThresholds?: { daily?: number; weekly?: number; monthly?: number };
+  capabilities?: ModelCapabilities;
 }
 
 export const getModels = () => request<Model[]>('/models');
@@ -170,6 +179,7 @@ export const createModel = (data: {
   contextWindow?: number;
   pricingTiers?: PricingTier[];
   limits?: Limit[];
+  capabilities?: ModelCapabilities;
 }) => request<Model>('/models', { method: 'POST', body: JSON.stringify(data) });
 export const updateModel = (id: string, data: {
   id?: string;
@@ -180,11 +190,12 @@ export const updateModel = (id: string, data: {
   contextWindow?: number;
   pricingTiers?: PricingTier[];
   limits?: Limit[];
+  capabilities?: ModelCapabilities;
 }) => request<Model>(`/models/${encodeURIComponent(id)}`, { method: 'PUT', body: JSON.stringify(data) });
 export const deleteModel = (id: string) => request<void>(`/models/${encodeURIComponent(id)}`, { method: 'DELETE' });
 
 export interface RoutingPolicy {
-  type: 'context' | 'cheapest' | 'health' | 'performance' | 'llm' | 'capability' | 'rate-limit' | 'fairness' | 'budget-remaining';
+  type: 'context' | 'cheapest' | 'health' | 'performance' | 'llm' | 'capability' | 'rate-limit' | 'fairness' | 'budget-remaining' | 'semantic-intent';
   enabled: boolean;
   config?: any;
 }

diff --git a/packages/dashboard/src/components/TraceEntryRenderer.tsx b/packages/dashboard/src/components/TraceEntryRenderer.tsx
@@ -44,6 +44,7 @@ export function TraceEntryRenderer({ entry: e }: TraceEntryRendererProps) {
   const isIntake       = e.message === 'router:intake';
 
   const labelColor = isError ? 'var(--danger)' : isThinking ? '#a78bfa' : isModelPrompt ? '#c4b5fd' : isRecap ? '#34d399' : 'var(--accent)';
+  const hasDetails = e.details != null && Object.keys(e.details).length > 0;
 
   // Estrai i campi "speciali" dal JSON tecnico per non duplicarli nel fallback
   const { systemPrompt, responseText, responseJSON, ...baseDetails } = e.details ?? {};
@@ -58,6 +59,13 @@ export function TraceEntryRenderer({ entry: e }: TraceEntryRendererProps) {
     whiteSpace: 'pre-wrap',
   };
 
+  const rawDetails = hasDetails ? (
+    <details>
+      <summary style={{ fontSize: '0.8rem', color: 'var(--text-muted)', cursor: 'pointer', userSelect: 'none' }}>raw details</summary>
+      <pre style={{ ...preStyle, margin: '4px 0 0' }}>{JSON.stringify(e.details, null, 2)}</pre>
+    </details>
+  ) : null;
+
   return (
     <div style={{ marginBottom: 8 }}>
 
@@ -132,6 +140,8 @@ export function TraceEntryRenderer({ entry: e }: TraceEntryRendererProps) {
               </pre>
             </details>
           )}
+
+          {rawDetails}
         </div>
 
       ) : isIntake ? (
@@ -163,6 +173,8 @@ export function TraceEntryRenderer({ entry: e }: TraceEntryRendererProps) {
               </div>
             </div>
           )}
+
+          {rawDetails}
         </div>
 
       ) : isRecap ? (
@@ -197,7 +209,7 @@ export function TraceEntryRenderer({ entry: e }: TraceEntryRendererProps) {
                 <div key={i} style={{ background: 'var(--bg-surface)', border: '1px solid var(--border)', borderRadius: 'var(--radius-sm)', padding: '6px 10px' }}>
                   <div style={{ display: 'flex', alignItems: 'center', justifyContent: 'space-between', marginBottom: p.scores?.length > 1 ? 4 : 0 }}>
                     <span style={{ fontSize: '0.875rem', fontWeight: 700, color: 'var(--text-primary)', textTransform: 'capitalize' }}>
-                      {p.type === 'llm' ? 'AI Routing' : p.type === 'rate-limit' ? 'Rate Limit' : p.type === 'budget-remaining' ? 'Budget Remaining' : p.type}
+                      {p.type === 'llm' ? 'AI Routing' : p.type === 'rate-limit' ? 'Rate Limit' : p.type === 'budget-remaining' ? 'Budget Remaining' : p.type === 'semantic-intent' ? 'Semantic Intent' : p.type}
                     </span>
                     <span style={{ fontSize: '0.75rem', color: 'var(--text-muted)' }}>weight {p.weight?.toFixed(2)}</span>
                   </div>
@@ -228,11 +240,13 @@ export function TraceEntryRenderer({ entry: e }: TraceEntryRendererProps) {
               No policy data (record may be corrupted or from older version)
             </div>
           )}
+
+          {rawDetails}
         </div>
 
       ) : (
         <details>
-          <summary style={{ fontSize: '0.8rem', color: 'var(--text-muted)', cursor: 'pointer', userSelect: 'none' }}>details</summary>
+          <summary style={{ fontSize: '0.8rem', color: 'var(--text-muted)', cursor: 'pointer', userSelect: 'none' }}>raw details</summary>
           <pre style={{ ...preStyle, margin: '4px 0 0' }}>{JSON.stringify(e.details, null, 2)}</pre>
         </details>
       )}

diff --git a/packages/dashboard/src/pages/ModelFormPage.tsx b/packages/dashboard/src/pages/ModelFormPage.tsx
@@ -1,7 +1,7 @@
 import React, { useEffect, useState } from 'react';
 import { useNavigate, useParams, useSearchParams } from 'react-router-dom';
 import { Plus, X, ChevronDown, EyeOff, Eye, ArrowLeft } from 'lucide-react';
-import { getModels, createModel, updateModel, type Model, type PricingTier, type Limit, type LimitMetric, type LimitPeriod, type RollingUnit } from '../api';
+import { getModels, createModel, updateModel, type Model, type ModelCapabilities, type PricingTier, type Limit, type LimitMetric, type LimitPeriod, type RollingUnit } from '../api';
 import { providersConf } from '@routerly/shared';
 
 type Provider = keyof typeof providersConf;
@@ -19,6 +19,7 @@ type ProviderModel = {
     output: number;
     cache?: number;
   }>;
+  capabilities?: ModelCapabilities;
 };
 
 // ── Constants ──────────────────────────────────────────────────────────────────
@@ -157,6 +158,7 @@ export function ModelFormPage() {
   const [err, setErr] = useState('');
   const [showToken, setShowToken] = useState(false);
   const [isCustomModel, setIsCustomModel] = useState(false);
+  const [isEmbeddingModel, setIsEmbeddingModel] = useState(false);
 
   useEffect(() => {
     async function init() {
@@ -201,8 +203,10 @@ export function ModelFormPage() {
     if (!preset) {
       setForm(f => ({ ...f, id: modelId, inputPerMillion: '', outputPerMillion: '', cachePerMillion: '', contextWindow: '' }));
       setTierRows([]); setShowAdvanced(false);
+      setIsEmbeddingModel(false);
       return;
     }
+    setIsEmbeddingModel(preset.capabilities?.embedding === true);
     setForm(f => ({
       ...f,
       id: modelId,
@@ -284,6 +288,7 @@ export function ModelFormPage() {
     const ctxWindow   = model.contextWindow != null ? model.contextWindow : (preset?.contextWindow ?? null);
 
     setIsCustomModel(customModel);
+    setIsEmbeddingModel(model.capabilities?.embedding === true);
     setErr(''); setShowToken(false);
 
     setForm(f => ({
@@ -384,6 +389,7 @@ export function ModelFormPage() {
         limits: limitRows
           .filter(l => l.value !== '' && !isNaN(parseFloat(l.value)))
           .map(rowToLimit),
+        ...(isEmbeddingModel ? { capabilities: { embedding: true } } : {}),
       };
 
       if (editingModelId) {
@@ -520,6 +526,25 @@ export function ModelFormPage() {
             </div>
           </div>
 
+          {/* ── Section: Capabilities ─────────────────────────── */}
+          <div className="form-section">
+            <h3 className="section-title">Capabilities</h3>
+            <p className="section-desc">Specify the type and capabilities of this model.</p>
+            <div className="form-group" style={{ display: 'flex', alignItems: 'center', gap: 10 }}>
+              <input
+                type="checkbox"
+                id="cap-embedding"
+                checked={isEmbeddingModel}
+                onChange={e => setIsEmbeddingModel(e.target.checked)}
+                style={{ width: 16, height: 16, cursor: 'pointer' }}
+              />
+              <label htmlFor="cap-embedding" style={{ cursor: 'pointer', marginBottom: 0 }}>
+                Embedding model
+                <span style={{ marginLeft: 8, fontSize: '0.75rem', color: 'var(--text-muted)' }}>This model generates vector embeddings (not chat completions)</span>
+              </label>
+            </div>
+          </div>
+
           {/* ── Section: Pricing ─────────────────────────────── */}
           <div className="form-section">
             <h3 className="section-title">Pricing & context</h3>