Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 31 additions & 0 deletions docs/concepts/routing.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,37 @@ Scores models by how much of their associated budget is still available. Models

**Use when:** you have per-model spending limits and want Routerly to naturally prefer models with headroom.

### `semantic-intent`

Classifies each incoming request by semantic intent using embeddings, then restricts the candidate pool to the models you have mapped to that intent.

**How it works:**

1. You define **intents** — each intent has a name, a list of **example phrases** that represent it, and the **target models** that should handle requests of that type.
2. When a request arrives, Routerly embeds the user message and compares it against the centroid of each intent's examples using cosine similarity.
3. Based on the best match score and the gap between the top two intents, the policy produces one of three outcomes:

| Outcome | Condition | Effect |
|---|---|---|
| **Confident** | Top score ≥ threshold and margin ≥ ambiguity gap | Hard-filters candidates to the matched intent's model pool |
| **Ambiguous** | Top score ≥ threshold but gap is too small | Merges the top-2 intent pools |
| **Unknown** | Top score below threshold | No filtering — all candidates pass through |

**Configuration:**

| Option | Default | Description |
|---|---|---|
| `embedding_provider` | _(required)_ | `openai` or `ollama` |
| `embedding_model` | _(required)_ | Model ID to use for embedding (must have the embedding capability) |
| `absolute_threshold` | `0.60` | Minimum cosine similarity score to consider a match |
| `ambiguity_threshold` | `0.08` | Minimum margin between top-2 scores to consider a match confident |

**Use when:** you have distinct request categories that should always be routed to specific models (e.g. billing questions → a fine-tuned model, code requests → a coding model).

:::tip Intent centroids are cached
Embeddings for intent examples are computed once and cached in memory for 1 hour. Changing an intent's examples automatically invalidates the cache.
:::

---

## Configuring Routing
Expand Down
85 changes: 85 additions & 0 deletions docs/service/routing-engine.md
Original file line number Diff line number Diff line change
Expand Up @@ -186,6 +186,91 @@ No configuration options (reads limits from the project and model config).

---

### `semantic-intent`

**Type:** hard filter (pool narrowing)

Classifies the incoming request by semantic intent using embedding-based similarity, then restricts the candidate pool to the models mapped to that intent. Policies that run after it (e.g. `cheapest`, `performance`) operate only within the narrowed pool.

#### How it works

1. The last user message is extracted from the request.
2. It is embedded using the configured embedding model/provider.
3. Each intent's **centroid** — the element-wise mean of its example phrase embeddings — is computed (and cached for 1 hour).
4. Cosine similarity is computed between the request vector and every intent centroid.
5. The result is classified as `confident`, `ambiguous`, or `unknown`:

| Status | Condition | Candidate pool |
|---|---|---|
| `confident` | `topScore ≥ absolute_threshold` and `margin ≥ ambiguity_threshold` | Intent's `candidate_models` only |
| `ambiguous` | `topScore ≥ absolute_threshold` but `margin < ambiguity_threshold` | Union of top-2 intents' `candidate_models` |
| `unknown` | `topScore < absolute_threshold` | All candidates (no filtering) |

If the embedding call itself fails, the policy degrades gracefully and passes all candidates through unchanged.

#### Configuration

| Config key | Default | Description |
|------------|---------|-------------|
| `embedding_provider` | _(required)_ | `openai` or `ollama` |
| `embedding_model` | _(required)_ | Embedding model ID. The model must have `capabilities.embedding = true` |
| `embedding_endpoint` | — | Custom base URL (useful for self-hosted Ollama) |
| `embedding_api_key` | — | API key override (defaults to the provider's global key) |
| `absolute_threshold` | `0.60` | Minimum cosine similarity to recognise a match |
| `ambiguity_threshold` | `0.08` | Minimum margin between top-2 scores to resolve ambiguity |
| `intents` | _(required)_ | Map of intent name → `{ examples: string[], candidate_models: string[] }` |

#### Intent definition

```json
{
"intents": {
"billing": {
"examples": [
"I need an invoice",
"Can I change my payment method?",
"Refund request"
],
"candidate_models": ["gpt-4.1-mini"]
},
"code_review": {
"examples": [
"Review this pull request",
"Check my TypeScript code",
"What's wrong with this function?"
],
"candidate_models": ["claude-3-7-sonnet", "gpt-4.1"]
}
}
}
```

Intent names are normalised to `snake_case` (e.g. `"Customer Support"` → `customer_support`).

#### Trace entries

The policy emits three trace entries visible in the **Router Response** panel of the dashboard:

| Message | When | Details |
|---|---|---|
| `policy:semantic-intent:classification` | Always (when text is classified) | `topIntent`, `topScore`, `secondIntent`, `secondScore`, `margin`, `status` |
| `policy:semantic-intent:result` | After pool narrowing | `allowed`, `excluded`, `status` |
| `policy:semantic-intent:error` | If the embedding call fails | `error` message |

#### Centroid cache

Intent centroids are computed once — embedding all example phrases and averaging them — then stored in memory with a 1-hour TTL. The cache key includes a hash of the example phrases, so changing an intent's examples automatically invalidates it without a service restart.

**Use when:** you have distinct request categories that must always reach specific models (support triage, multilingual routing, task-type segregation, etc.).

:::info Recommended pipeline position
Place `semantic-intent` **before** scoring policies (`cheapest`, `performance`, `llm`) so they score only within the already-narrowed pool. Place it **after** hard-filter policies (`health`, `context`, `capability`) so unhealthy or incapable models are excluded before intent matching.

Suggested order: `health` → `context` → `capability` → `budget-remaining` → `rate-limit` → **`semantic-intent`** → `llm` → `performance` → `fairness` → `cheapest`
:::

---

## Policy Ordering and Weights

Policies are applied in the order configured in the project. Their positional weight (`total − index`) means policies near the top of the list have more influence on the final score. Reorder policies via the dashboard (**Projects → your project → Routing**) or the CLI.
Expand Down
14 changes: 7 additions & 7 deletions package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 2 additions & 1 deletion packages/cli/src/commands/project.ts
Original file line number Diff line number Diff line change
Expand Up @@ -187,11 +187,12 @@ Examples:
policyCmd.command('enable <project> <type>')
.description('Enable a routing policy (adds it if not present)')
.addHelpText('after', `
Policy types: health, context, capability, budget-remaining, rate-limit, llm, performance, fairness, cheapest
Policy types: health, context, capability, budget-remaining, rate-limit, llm, performance, fairness, cheapest, semantic-intent

Examples:
routerly project routing policy enable my-api health
routerly project routing policy enable my-api llm --config '{"memoryCount":3}'
routerly project routing policy enable my-api semantic-intent --config '{"embedding_provider":"openai","embedding_model":"text-embedding-3-small","absolute_threshold":0.60,"ambiguity_threshold":0.08,"intents":{"coding":{"examples":["write a python function"],"candidate_models":["qwen-coder"]}}}'
`)
.option('--config <json>', 'Policy-specific configuration as JSON')
.action(async (nameOrId: string, type: string, opts: { config?: string }) => {
Expand Down
2 changes: 1 addition & 1 deletion packages/dashboard/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,6 @@
"@vitejs/plugin-react": "^5.2.0",
"sharp": "^0.34.5",
"typescript": "^5.4.5",
"vite": "^6.4.1"
"vite": "^6.4.2"
}
}
13 changes: 12 additions & 1 deletion packages/dashboard/src/api.ts
Original file line number Diff line number Diff line change
Expand Up @@ -152,13 +152,22 @@ export interface PricingTier {
cachePerMillion?: number;
}

export interface ModelCapabilities {
thinking?: boolean;
vision?: boolean;
functionCalling?: boolean;
json?: boolean;
embedding?: boolean;
}

export interface Model {
id: string; name: string; provider: string; endpoint: string;
upstreamModelId?: string;
cost: { inputPerMillion: number; outputPerMillion: number; cachePerMillion?: number; pricingTiers?: PricingTier[] };
contextWindow?: number;
limits?: Limit[];
/** @deprecated use limits */ globalThresholds?: { daily?: number; weekly?: number; monthly?: number };
capabilities?: ModelCapabilities;
}

export const getModels = () => request<Model[]>('/models');
Expand All @@ -170,6 +179,7 @@ export const createModel = (data: {
contextWindow?: number;
pricingTiers?: PricingTier[];
limits?: Limit[];
capabilities?: ModelCapabilities;
}) => request<Model>('/models', { method: 'POST', body: JSON.stringify(data) });
export const updateModel = (id: string, data: {
id?: string;
Expand All @@ -180,11 +190,12 @@ export const updateModel = (id: string, data: {
contextWindow?: number;
pricingTiers?: PricingTier[];
limits?: Limit[];
capabilities?: ModelCapabilities;
}) => request<Model>(`/models/${encodeURIComponent(id)}`, { method: 'PUT', body: JSON.stringify(data) });
export const deleteModel = (id: string) => request<void>(`/models/${encodeURIComponent(id)}`, { method: 'DELETE' });

export interface RoutingPolicy {
type: 'context' | 'cheapest' | 'health' | 'performance' | 'llm' | 'capability' | 'rate-limit' | 'fairness' | 'budget-remaining';
type: 'context' | 'cheapest' | 'health' | 'performance' | 'llm' | 'capability' | 'rate-limit' | 'fairness' | 'budget-remaining' | 'semantic-intent';
enabled: boolean;
config?: any;
}
Expand Down
18 changes: 16 additions & 2 deletions packages/dashboard/src/components/TraceEntryRenderer.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ export function TraceEntryRenderer({ entry: e }: TraceEntryRendererProps) {
const isIntake = e.message === 'router:intake';

const labelColor = isError ? 'var(--danger)' : isThinking ? '#a78bfa' : isModelPrompt ? '#c4b5fd' : isRecap ? '#34d399' : 'var(--accent)';
const hasDetails = e.details != null && Object.keys(e.details).length > 0;

// Estrai i campi "speciali" dal JSON tecnico per non duplicarli nel fallback
const { systemPrompt, responseText, responseJSON, ...baseDetails } = e.details ?? {};
Expand All @@ -58,6 +59,13 @@ export function TraceEntryRenderer({ entry: e }: TraceEntryRendererProps) {
whiteSpace: 'pre-wrap',
};

const rawDetails = hasDetails ? (
<details>
<summary style={{ fontSize: '0.8rem', color: 'var(--text-muted)', cursor: 'pointer', userSelect: 'none' }}>raw details</summary>
<pre style={{ ...preStyle, margin: '4px 0 0' }}>{JSON.stringify(e.details, null, 2)}</pre>
</details>
) : null;

return (
<div style={{ marginBottom: 8 }}>

Expand Down Expand Up @@ -132,6 +140,8 @@ export function TraceEntryRenderer({ entry: e }: TraceEntryRendererProps) {
</pre>
</details>
)}

{rawDetails}
</div>

) : isIntake ? (
Expand Down Expand Up @@ -163,6 +173,8 @@ export function TraceEntryRenderer({ entry: e }: TraceEntryRendererProps) {
</div>
</div>
)}

{rawDetails}
</div>

) : isRecap ? (
Expand Down Expand Up @@ -197,7 +209,7 @@ export function TraceEntryRenderer({ entry: e }: TraceEntryRendererProps) {
<div key={i} style={{ background: 'var(--bg-surface)', border: '1px solid var(--border)', borderRadius: 'var(--radius-sm)', padding: '6px 10px' }}>
<div style={{ display: 'flex', alignItems: 'center', justifyContent: 'space-between', marginBottom: p.scores?.length > 1 ? 4 : 0 }}>
<span style={{ fontSize: '0.875rem', fontWeight: 700, color: 'var(--text-primary)', textTransform: 'capitalize' }}>
{p.type === 'llm' ? 'AI Routing' : p.type === 'rate-limit' ? 'Rate Limit' : p.type === 'budget-remaining' ? 'Budget Remaining' : p.type}
{p.type === 'llm' ? 'AI Routing' : p.type === 'rate-limit' ? 'Rate Limit' : p.type === 'budget-remaining' ? 'Budget Remaining' : p.type === 'semantic-intent' ? 'Semantic Intent' : p.type}
</span>
<span style={{ fontSize: '0.75rem', color: 'var(--text-muted)' }}>weight {p.weight?.toFixed(2)}</span>
</div>
Expand Down Expand Up @@ -228,11 +240,13 @@ export function TraceEntryRenderer({ entry: e }: TraceEntryRendererProps) {
No policy data (record may be corrupted or from older version)
</div>
)}

{rawDetails}
</div>

) : (
<details>
<summary style={{ fontSize: '0.8rem', color: 'var(--text-muted)', cursor: 'pointer', userSelect: 'none' }}>details</summary>
<summary style={{ fontSize: '0.8rem', color: 'var(--text-muted)', cursor: 'pointer', userSelect: 'none' }}>raw details</summary>
<pre style={{ ...preStyle, margin: '4px 0 0' }}>{JSON.stringify(e.details, null, 2)}</pre>
</details>
)}
Expand Down
27 changes: 26 additions & 1 deletion packages/dashboard/src/pages/ModelFormPage.tsx
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
import React, { useEffect, useState } from 'react';
import { useNavigate, useParams, useSearchParams } from 'react-router-dom';
import { Plus, X, ChevronDown, EyeOff, Eye, ArrowLeft } from 'lucide-react';
import { getModels, createModel, updateModel, type Model, type PricingTier, type Limit, type LimitMetric, type LimitPeriod, type RollingUnit } from '../api';
import { getModels, createModel, updateModel, type Model, type ModelCapabilities, type PricingTier, type Limit, type LimitMetric, type LimitPeriod, type RollingUnit } from '../api';
import { providersConf } from '@routerly/shared';

type Provider = keyof typeof providersConf;
Expand All @@ -19,6 +19,7 @@ type ProviderModel = {
output: number;
cache?: number;
}>;
capabilities?: ModelCapabilities;
};

// ── Constants ──────────────────────────────────────────────────────────────────
Expand Down Expand Up @@ -157,6 +158,7 @@ export function ModelFormPage() {
const [err, setErr] = useState('');
const [showToken, setShowToken] = useState(false);
const [isCustomModel, setIsCustomModel] = useState(false);
const [isEmbeddingModel, setIsEmbeddingModel] = useState(false);

useEffect(() => {
async function init() {
Expand Down Expand Up @@ -201,8 +203,10 @@ export function ModelFormPage() {
if (!preset) {
setForm(f => ({ ...f, id: modelId, inputPerMillion: '', outputPerMillion: '', cachePerMillion: '', contextWindow: '' }));
setTierRows([]); setShowAdvanced(false);
setIsEmbeddingModel(false);
return;
}
setIsEmbeddingModel(preset.capabilities?.embedding === true);
setForm(f => ({
...f,
id: modelId,
Expand Down Expand Up @@ -284,6 +288,7 @@ export function ModelFormPage() {
const ctxWindow = model.contextWindow != null ? model.contextWindow : (preset?.contextWindow ?? null);

setIsCustomModel(customModel);
setIsEmbeddingModel(model.capabilities?.embedding === true);
setErr(''); setShowToken(false);

setForm(f => ({
Expand Down Expand Up @@ -384,6 +389,7 @@ export function ModelFormPage() {
limits: limitRows
.filter(l => l.value !== '' && !isNaN(parseFloat(l.value)))
.map(rowToLimit),
...(isEmbeddingModel ? { capabilities: { embedding: true } } : {}),
};

if (editingModelId) {
Expand Down Expand Up @@ -520,6 +526,25 @@ export function ModelFormPage() {
</div>
</div>

{/* ── Section: Capabilities ─────────────────────────── */}
<div className="form-section">
<h3 className="section-title">Capabilities</h3>
<p className="section-desc">Specify the type and capabilities of this model.</p>
<div className="form-group" style={{ display: 'flex', alignItems: 'center', gap: 10 }}>
<input
type="checkbox"
id="cap-embedding"
checked={isEmbeddingModel}
onChange={e => setIsEmbeddingModel(e.target.checked)}
style={{ width: 16, height: 16, cursor: 'pointer' }}
/>
<label htmlFor="cap-embedding" style={{ cursor: 'pointer', marginBottom: 0 }}>
Embedding model
<span style={{ marginLeft: 8, fontSize: '0.75rem', color: 'var(--text-muted)' }}>This model generates vector embeddings (not chat completions)</span>
</label>
</div>
</div>

{/* ── Section: Pricing ─────────────────────────────── */}
<div className="form-section">
<h3 className="section-title">Pricing & context</h3>
Expand Down
Loading
Loading