diff --git a/docs/concepts/routing.md b/docs/concepts/routing.md index 7e9486c..8de9f6d 100644 --- a/docs/concepts/routing.md +++ b/docs/concepts/routing.md @@ -94,6 +94,37 @@ Scores models by how much of their associated budget is still available. Models **Use when:** you have per-model spending limits and want Routerly to naturally prefer models with headroom. +### `semantic-intent` + +Classifies each incoming request by semantic intent using embeddings, then restricts the candidate pool to the models you have mapped to that intent. + +**How it works:** + +1. You define **intents** — each intent has a name, a list of **example phrases** that represent it, and the **target models** that should handle requests of that type. +2. When a request arrives, Routerly embeds the user message and compares it against the centroid of each intent's examples using cosine similarity. +3. Based on the best match score and the gap between the top two intents, the policy produces one of three outcomes: + +| Outcome | Condition | Effect | +|---|---|---| +| **Confident** | Top score ≥ threshold and margin ≥ ambiguity gap | Hard-filters candidates to the matched intent's model pool | +| **Ambiguous** | Top score ≥ threshold but gap is too small | Merges the top-2 intent pools | +| **Unknown** | Top score below threshold | No filtering — all candidates pass through | + +**Configuration:** + +| Option | Default | Description | +|---|---|---| +| `embedding_provider` | _(required)_ | `openai` or `ollama` | +| `embedding_model` | _(required)_ | Model ID to use for embedding (must have the embedding capability) | +| `absolute_threshold` | `0.60` | Minimum cosine similarity score to consider a match | +| `ambiguity_threshold` | `0.08` | Minimum margin between top-2 scores to consider a match confident | + +**Use when:** you have distinct request categories that should always be routed to specific models (e.g. billing questions → a fine-tuned model, code requests → a coding model). + +:::tip Intent centroids are cached +Embeddings for intent examples are computed once and cached in memory for 1 hour. Changing an intent's examples automatically invalidates the cache. +::: + --- ## Configuring Routing diff --git a/docs/service/routing-engine.md b/docs/service/routing-engine.md index 1f9fa94..e3fdf49 100644 --- a/docs/service/routing-engine.md +++ b/docs/service/routing-engine.md @@ -186,6 +186,91 @@ No configuration options (reads limits from the project and model config). --- +### `semantic-intent` + +**Type:** hard filter (pool narrowing) + +Classifies the incoming request by semantic intent using embedding-based similarity, then restricts the candidate pool to the models mapped to that intent. Policies that run after it (e.g. `cheapest`, `performance`) operate only within the narrowed pool. + +#### How it works + +1. The last user message is extracted from the request. +2. It is embedded using the configured embedding model/provider. +3. Each intent's **centroid** — the element-wise mean of its example phrase embeddings — is computed (and cached for 1 hour). +4. Cosine similarity is computed between the request vector and every intent centroid. +5. The result is classified as `confident`, `ambiguous`, or `unknown`: + +| Status | Condition | Candidate pool | +|---|---|---| +| `confident` | `topScore ≥ absolute_threshold` and `margin ≥ ambiguity_threshold` | Intent's `candidate_models` only | +| `ambiguous` | `topScore ≥ absolute_threshold` but `margin < ambiguity_threshold` | Union of top-2 intents' `candidate_models` | +| `unknown` | `topScore < absolute_threshold` | All candidates (no filtering) | + +If the embedding call itself fails, the policy degrades gracefully and passes all candidates through unchanged. + +#### Configuration + +| Config key | Default | Description | +|------------|---------|-------------| +| `embedding_provider` | _(required)_ | `openai` or `ollama` | +| `embedding_model` | _(required)_ | Embedding model ID. The model must have `capabilities.embedding = true` | +| `embedding_endpoint` | — | Custom base URL (useful for self-hosted Ollama) | +| `embedding_api_key` | — | API key override (defaults to the provider's global key) | +| `absolute_threshold` | `0.60` | Minimum cosine similarity to recognise a match | +| `ambiguity_threshold` | `0.08` | Minimum margin between top-2 scores to resolve ambiguity | +| `intents` | _(required)_ | Map of intent name → `{ examples: string[], candidate_models: string[] }` | + +#### Intent definition + +```json +{ + "intents": { + "billing": { + "examples": [ + "I need an invoice", + "Can I change my payment method?", + "Refund request" + ], + "candidate_models": ["gpt-4.1-mini"] + }, + "code_review": { + "examples": [ + "Review this pull request", + "Check my TypeScript code", + "What's wrong with this function?" + ], + "candidate_models": ["claude-3-7-sonnet", "gpt-4.1"] + } + } +} +``` + +Intent names are normalised to `snake_case` (e.g. `"Customer Support"` → `customer_support`). + +#### Trace entries + +The policy emits three trace entries visible in the **Router Response** panel of the dashboard: + +| Message | When | Details | +|---|---|---| +| `policy:semantic-intent:classification` | Always (when text is classified) | `topIntent`, `topScore`, `secondIntent`, `secondScore`, `margin`, `status` | +| `policy:semantic-intent:result` | After pool narrowing | `allowed`, `excluded`, `status` | +| `policy:semantic-intent:error` | If the embedding call fails | `error` message | + +#### Centroid cache + +Intent centroids are computed once — embedding all example phrases and averaging them — then stored in memory with a 1-hour TTL. The cache key includes a hash of the example phrases, so changing an intent's examples automatically invalidates it without a service restart. + +**Use when:** you have distinct request categories that must always reach specific models (support triage, multilingual routing, task-type segregation, etc.). + +:::info Recommended pipeline position +Place `semantic-intent` **before** scoring policies (`cheapest`, `performance`, `llm`) so they score only within the already-narrowed pool. Place it **after** hard-filter policies (`health`, `context`, `capability`) so unhealthy or incapable models are excluded before intent matching. + +Suggested order: `health` → `context` → `capability` → `budget-remaining` → `rate-limit` → **`semantic-intent`** → `llm` → `performance` → `fairness` → `cheapest` +::: + +--- + ## Policy Ordering and Weights Policies are applied in the order configured in the project. Their positional weight (`total − index`) means policies near the top of the list have more influence on the final score. Reorder policies via the dashboard (**Projects → your project → Routing**) or the CLI. diff --git a/package-lock.json b/package-lock.json index 28485f6..c431d30 100644 --- a/package-lock.json +++ b/package-lock.json @@ -9894,9 +9894,9 @@ } }, "node_modules/vite": { - "version": "7.3.1", - "resolved": "https://registry.npmjs.org/vite/-/vite-7.3.1.tgz", - "integrity": "sha512-w+N7Hifpc3gRjZ63vYBXA56dvvRlNWRczTdmCBBa+CotUzAPf5b7YMdMR/8CQoeYE5LX3W4wj6RYTgonm1b9DA==", + "version": "7.3.2", + "resolved": "https://registry.npmjs.org/vite/-/vite-7.3.2.tgz", + "integrity": "sha512-Bby3NOsna2jsjfLVOHKes8sGwgl4TT0E6vvpYgnAYDIF/tie7MRaFthmKuHx1NSXjiTueXH3do80FMQgvEktRg==", "dev": true, "license": "MIT", "dependencies": { @@ -10308,7 +10308,7 @@ "@vitejs/plugin-react": "^5.2.0", "sharp": "^0.34.5", "typescript": "^5.4.5", - "vite": "^6.4.1" + "vite": "^6.4.2" } }, "packages/dashboard/node_modules/@esbuild/aix-ppc64": { @@ -10827,9 +10827,9 @@ } }, "packages/dashboard/node_modules/vite": { - "version": "6.4.1", - "resolved": "https://registry.npmjs.org/vite/-/vite-6.4.1.tgz", - "integrity": "sha512-+Oxm7q9hDoLMyJOYfUYBuHQo+dkAloi33apOPP56pzj+vsdJDzr+j1NISE5pyaAuKL4A3UD34qd0lx5+kfKp2g==", + "version": "6.4.2", + "resolved": "https://registry.npmjs.org/vite/-/vite-6.4.2.tgz", + "integrity": "sha512-2N/55r4JDJ4gdrCvGgINMy+HH3iRpNIz8K6SFwVsA+JbQScLiC+clmAxBgwiSPgcG9U15QmvqCGWzMbqda5zGQ==", "dev": true, "license": "MIT", "dependencies": { diff --git a/packages/cli/src/commands/project.ts b/packages/cli/src/commands/project.ts index 639c480..d0b81fe 100644 --- a/packages/cli/src/commands/project.ts +++ b/packages/cli/src/commands/project.ts @@ -187,11 +187,12 @@ Examples: policyCmd.command('enable ') .description('Enable a routing policy (adds it if not present)') .addHelpText('after', ` -Policy types: health, context, capability, budget-remaining, rate-limit, llm, performance, fairness, cheapest +Policy types: health, context, capability, budget-remaining, rate-limit, llm, performance, fairness, cheapest, semantic-intent Examples: routerly project routing policy enable my-api health routerly project routing policy enable my-api llm --config '{"memoryCount":3}' + routerly project routing policy enable my-api semantic-intent --config '{"embedding_provider":"openai","embedding_model":"text-embedding-3-small","absolute_threshold":0.60,"ambiguity_threshold":0.08,"intents":{"coding":{"examples":["write a python function"],"candidate_models":["qwen-coder"]}}}' `) .option('--config ', 'Policy-specific configuration as JSON') .action(async (nameOrId: string, type: string, opts: { config?: string }) => { diff --git a/packages/dashboard/package.json b/packages/dashboard/package.json index 5c40783..ece9d48 100644 --- a/packages/dashboard/package.json +++ b/packages/dashboard/package.json @@ -25,6 +25,6 @@ "@vitejs/plugin-react": "^5.2.0", "sharp": "^0.34.5", "typescript": "^5.4.5", - "vite": "^6.4.1" + "vite": "^6.4.2" } } diff --git a/packages/dashboard/src/api.ts b/packages/dashboard/src/api.ts index 482b170..53468a8 100644 --- a/packages/dashboard/src/api.ts +++ b/packages/dashboard/src/api.ts @@ -152,6 +152,14 @@ export interface PricingTier { cachePerMillion?: number; } +export interface ModelCapabilities { + thinking?: boolean; + vision?: boolean; + functionCalling?: boolean; + json?: boolean; + embedding?: boolean; +} + export interface Model { id: string; name: string; provider: string; endpoint: string; upstreamModelId?: string; @@ -159,6 +167,7 @@ export interface Model { contextWindow?: number; limits?: Limit[]; /** @deprecated use limits */ globalThresholds?: { daily?: number; weekly?: number; monthly?: number }; + capabilities?: ModelCapabilities; } export const getModels = () => request('/models'); @@ -170,6 +179,7 @@ export const createModel = (data: { contextWindow?: number; pricingTiers?: PricingTier[]; limits?: Limit[]; + capabilities?: ModelCapabilities; }) => request('/models', { method: 'POST', body: JSON.stringify(data) }); export const updateModel = (id: string, data: { id?: string; @@ -180,11 +190,12 @@ export const updateModel = (id: string, data: { contextWindow?: number; pricingTiers?: PricingTier[]; limits?: Limit[]; + capabilities?: ModelCapabilities; }) => request(`/models/${encodeURIComponent(id)}`, { method: 'PUT', body: JSON.stringify(data) }); export const deleteModel = (id: string) => request(`/models/${encodeURIComponent(id)}`, { method: 'DELETE' }); export interface RoutingPolicy { - type: 'context' | 'cheapest' | 'health' | 'performance' | 'llm' | 'capability' | 'rate-limit' | 'fairness' | 'budget-remaining'; + type: 'context' | 'cheapest' | 'health' | 'performance' | 'llm' | 'capability' | 'rate-limit' | 'fairness' | 'budget-remaining' | 'semantic-intent'; enabled: boolean; config?: any; } diff --git a/packages/dashboard/src/components/TraceEntryRenderer.tsx b/packages/dashboard/src/components/TraceEntryRenderer.tsx index f391214..5455a9d 100644 --- a/packages/dashboard/src/components/TraceEntryRenderer.tsx +++ b/packages/dashboard/src/components/TraceEntryRenderer.tsx @@ -44,6 +44,7 @@ export function TraceEntryRenderer({ entry: e }: TraceEntryRendererProps) { const isIntake = e.message === 'router:intake'; const labelColor = isError ? 'var(--danger)' : isThinking ? '#a78bfa' : isModelPrompt ? '#c4b5fd' : isRecap ? '#34d399' : 'var(--accent)'; + const hasDetails = e.details != null && Object.keys(e.details).length > 0; // Estrai i campi "speciali" dal JSON tecnico per non duplicarli nel fallback const { systemPrompt, responseText, responseJSON, ...baseDetails } = e.details ?? {}; @@ -58,6 +59,13 @@ export function TraceEntryRenderer({ entry: e }: TraceEntryRendererProps) { whiteSpace: 'pre-wrap', }; + const rawDetails = hasDetails ? ( +
+ raw details +
{JSON.stringify(e.details, null, 2)}
+
+ ) : null; + return (
@@ -132,6 +140,8 @@ export function TraceEntryRenderer({ entry: e }: TraceEntryRendererProps) { )} + + {rawDetails}
) : isIntake ? ( @@ -163,6 +173,8 @@ export function TraceEntryRenderer({ entry: e }: TraceEntryRendererProps) { )} + + {rawDetails} ) : isRecap ? ( @@ -197,7 +209,7 @@ export function TraceEntryRenderer({ entry: e }: TraceEntryRendererProps) {
1 ? 4 : 0 }}> - {p.type === 'llm' ? 'AI Routing' : p.type === 'rate-limit' ? 'Rate Limit' : p.type === 'budget-remaining' ? 'Budget Remaining' : p.type} + {p.type === 'llm' ? 'AI Routing' : p.type === 'rate-limit' ? 'Rate Limit' : p.type === 'budget-remaining' ? 'Budget Remaining' : p.type === 'semantic-intent' ? 'Semantic Intent' : p.type} weight {p.weight?.toFixed(2)}
@@ -228,11 +240,13 @@ export function TraceEntryRenderer({ entry: e }: TraceEntryRendererProps) { No policy data (record may be corrupted or from older version)
)} + + {rawDetails} ) : (
- details + raw details
{JSON.stringify(e.details, null, 2)}
)} diff --git a/packages/dashboard/src/pages/ModelFormPage.tsx b/packages/dashboard/src/pages/ModelFormPage.tsx index 610afb2..4f4bac3 100644 --- a/packages/dashboard/src/pages/ModelFormPage.tsx +++ b/packages/dashboard/src/pages/ModelFormPage.tsx @@ -1,7 +1,7 @@ import React, { useEffect, useState } from 'react'; import { useNavigate, useParams, useSearchParams } from 'react-router-dom'; import { Plus, X, ChevronDown, EyeOff, Eye, ArrowLeft } from 'lucide-react'; -import { getModels, createModel, updateModel, type Model, type PricingTier, type Limit, type LimitMetric, type LimitPeriod, type RollingUnit } from '../api'; +import { getModels, createModel, updateModel, type Model, type ModelCapabilities, type PricingTier, type Limit, type LimitMetric, type LimitPeriod, type RollingUnit } from '../api'; import { providersConf } from '@routerly/shared'; type Provider = keyof typeof providersConf; @@ -19,6 +19,7 @@ type ProviderModel = { output: number; cache?: number; }>; + capabilities?: ModelCapabilities; }; // ── Constants ────────────────────────────────────────────────────────────────── @@ -157,6 +158,7 @@ export function ModelFormPage() { const [err, setErr] = useState(''); const [showToken, setShowToken] = useState(false); const [isCustomModel, setIsCustomModel] = useState(false); + const [isEmbeddingModel, setIsEmbeddingModel] = useState(false); useEffect(() => { async function init() { @@ -201,8 +203,10 @@ export function ModelFormPage() { if (!preset) { setForm(f => ({ ...f, id: modelId, inputPerMillion: '', outputPerMillion: '', cachePerMillion: '', contextWindow: '' })); setTierRows([]); setShowAdvanced(false); + setIsEmbeddingModel(false); return; } + setIsEmbeddingModel(preset.capabilities?.embedding === true); setForm(f => ({ ...f, id: modelId, @@ -284,6 +288,7 @@ export function ModelFormPage() { const ctxWindow = model.contextWindow != null ? model.contextWindow : (preset?.contextWindow ?? null); setIsCustomModel(customModel); + setIsEmbeddingModel(model.capabilities?.embedding === true); setErr(''); setShowToken(false); setForm(f => ({ @@ -384,6 +389,7 @@ export function ModelFormPage() { limits: limitRows .filter(l => l.value !== '' && !isNaN(parseFloat(l.value))) .map(rowToLimit), + ...(isEmbeddingModel ? { capabilities: { embedding: true } } : {}), }; if (editingModelId) { @@ -520,6 +526,25 @@ export function ModelFormPage() { + {/* ── Section: Capabilities ─────────────────────────── */} +
+

Capabilities

+

Specify the type and capabilities of this model.

+
+ setIsEmbeddingModel(e.target.checked)} + style={{ width: 16, height: 16, cursor: 'pointer' }} + /> + +
+
+ {/* ── Section: Pricing ─────────────────────────────── */}

Pricing & context

diff --git a/packages/dashboard/src/pages/OverviewPage.tsx b/packages/dashboard/src/pages/OverviewPage.tsx index 69c435f..8461e57 100644 --- a/packages/dashboard/src/pages/OverviewPage.tsx +++ b/packages/dashboard/src/pages/OverviewPage.tsx @@ -26,7 +26,7 @@ export function OverviewPage() { const pieData = Object.entries(stats.byModel).map(([name, v]) => ({ name, value: v.cost })); const timelineData = stats.timeline.map(([date, cost]) => ({ - date: date.slice(5), cost: Number(cost.toFixed(6)), + date: date.slice(5), cost: Number(cost.toFixed(8)), })); return ( @@ -80,7 +80,7 @@ export function OverviewPage() { [`$${(v as number).toFixed(6)}`, 'Cost']} + formatter={(v) => [`$${(v as number).toFixed(8)}`, 'Cost']} /> @@ -100,7 +100,7 @@ export function OverviewPage() { [`$${(v as number).toFixed(6)}`, 'Cost']} + formatter={(v) => [`$${(v as number).toFixed(8)}`, 'Cost']} /> diff --git a/packages/dashboard/src/pages/UsagePage.tsx b/packages/dashboard/src/pages/UsagePage.tsx index a5b32bd..2eb2c0a 100644 --- a/packages/dashboard/src/pages/UsagePage.tsx +++ b/packages/dashboard/src/pages/UsagePage.tsx @@ -27,7 +27,7 @@ export function UsagePage() { const [pollInterval, setPollInterval] = useFilterState({ key: 'usage-filters-pollInterval', defaultValue: 30_000 }); const [refreshing, setRefreshing] = useState(false); const [page, setPage] = useState(1); - const [pageSize] = useState(50); + const [pageSize] = useState(100); const navigate = useNavigate(); const POLL_OPTIONS: { label: string; value: number }[] = [ @@ -293,7 +293,7 @@ export function UsagePage() { 0 ? 'var(--danger)' : 'inherit' }}>{v.errors} {v.inputTokens.toLocaleString()} {v.outputTokens.toLocaleString()} - ${v.cost.toFixed(6)} + ${v.cost.toFixed(8)} ))} @@ -353,7 +353,7 @@ export function UsagePage() { {r.inputTokens} {r.outputTokens} - ${r.cost.toFixed(6)} + ${r.cost.toFixed(8)} {r.latencyMs}ms {r.ttftMs != null ? `${r.ttftMs}ms` : '—'} {r.tokensPerSec != null ? `${r.tokensPerSec}` : '—'} diff --git a/packages/dashboard/src/pages/project/ProjectLogsTab.tsx b/packages/dashboard/src/pages/project/ProjectLogsTab.tsx index 86b9443..ebb7f60 100644 --- a/packages/dashboard/src/pages/project/ProjectLogsTab.tsx +++ b/packages/dashboard/src/pages/project/ProjectLogsTab.tsx @@ -30,7 +30,7 @@ export function ProjectLogsTab() { const [pollInterval, setPollInterval] = useFilterState({ key: `project-${projectId}-filters-pollInterval`, defaultValue: 30_000 }); const [refreshing, setRefreshing] = useState(false); const [page, setPage] = useState(1); - const [pageSize] = useState(50); + const [pageSize] = useState(100); const POLL_OPTIONS: { label: string; value: number }[] = [ { label: 'Off', value: 0 }, @@ -313,7 +313,7 @@ export function ProjectLogsTab() { {r.inputTokens} {r.outputTokens} - ${r.cost.toFixed(6)} + ${r.cost.toFixed(8)} {r.latencyMs}ms {r.ttftMs != null ? `${r.ttftMs}ms` : '—'} {r.tokensPerSec != null ? `${r.tokensPerSec}` : '—'} diff --git a/packages/dashboard/src/pages/project/ProjectRoutingTab.tsx b/packages/dashboard/src/pages/project/ProjectRoutingTab.tsx index bdcfa09..5439f96 100644 --- a/packages/dashboard/src/pages/project/ProjectRoutingTab.tsx +++ b/packages/dashboard/src/pages/project/ProjectRoutingTab.tsx @@ -1,5 +1,5 @@ import React, { useEffect, useState } from 'react'; -import { Plus, Trash2, GripVertical } from 'lucide-react'; +import { Plus, Trash2, GripVertical, X, Check } from 'lucide-react'; import { updateProject, getModels, type Model, type Project, type RoutingPolicy } from '../../api'; import { useProject } from './ProjectLayout'; import { SearchableSelect } from '../../components/SearchableSelect'; @@ -25,6 +25,7 @@ const POLICY_DESCRIPTIONS: Record = { 'rate-limit': 'Penalizes models with a high recent call frequency to reduce the risk of hitting provider rate limits (HTTP 429). Supports a configurable hard threshold that forces the score to 0.', fairness: 'Distributes traffic evenly by penalizing models that received more successful calls recently. Acts as a soft round-robin to prevent load from concentrating on a single model.', 'budget-remaining': 'Scores models based on remaining budget headroom across all configured limits. Prefers models with more room before their thresholds are hit, spreading consumption proactively.', + 'semantic-intent': 'Classifies the request by semantic intent using embeddings, then restricts the candidate pool to the models mapped to that intent. Confident matches hard-filter the pool; ambiguous matches merge top-2 pools; unknown requests pass all candidates through.', }; export function ProjectRoutingTab() { @@ -32,6 +33,7 @@ export function ProjectRoutingTab() { const [availableModels, setAvailableModels] = useState([]); const [loading, setLoading] = useState(true); const [saving, setSaving] = useState(false); + const [saved, setSaved] = useState(false); const [err, setErr] = useState(''); const [policies, setPolicies] = useState([]); @@ -40,10 +42,23 @@ export function ProjectRoutingTab() { // Advanced section open state per policy index const [advancedOpen, setAdvancedOpen] = useState>(new Set()); + // Add-intent inline input state per policy index + const [addIntentInputs, setAddIntentInputs] = useState>({}); + + // Expanded intent state: which intents are open (keyed by intentName) + const [expandedIntents, setExpandedIntents] = useState>(new Set()); + + // Show-all-examples toggle: keyed by `${policyIdx}::${intentName}` + const [showAllExamples, setShowAllExamples] = useState>(new Set()); + + // Add-example inline input state: keyed by `${policyIdx}::${intentName}` + const [addExampleInputs, setAddExampleInputs] = useState>({}); + // Drag state const [draggedTargetIdx, setDraggedTargetIdx] = useState(null); const [draggedPolicyIdx, setDraggedPolicyIdx] = useState(null); const [draggedLlmModelIdx, setDraggedLlmModelIdx] = useState(null); + const [draggedSemModelIdx, setDraggedSemModelIdx] = useState(null); const [promptHoverIdx, setPromptHoverIdx] = useState(null); useEffect(() => { @@ -64,7 +79,8 @@ export function ProjectRoutingTab() { // budget-remaining → penalise models approaching budget exhaustion // rate-limit → penalise models approaching their rate limits // Phase 3 – semantic scoring: the core routing intelligence - // llm → AI-based relevance scoring + // semantic-intent → embedding-based intent classification + pool narrowing + // llm → AI-based relevance scoring // Phase 4 – cost/quality optimisation: tiebreakers // performance → favour low-latency, high-reliability models // fairness → balance load across models @@ -75,6 +91,7 @@ export function ProjectRoutingTab() { { internalId: mkId(), type: 'capability', enabled: false }, { internalId: mkId(), type: 'budget-remaining', enabled: false }, { internalId: mkId(), type: 'rate-limit', enabled: false }, + { internalId: mkId(), type: 'semantic-intent', enabled: false, config: { embedding_provider: 'openai', embedding_model: '', intents: {} } }, { internalId: mkId(), type: 'llm', enabled: false, config: { routingModelId: project.routingModelId || '', fallbackModelIds: project.fallbackRoutingModelIds || [], autoRouting: project.autoRouting ?? true } }, { internalId: mkId(), type: 'performance', enabled: false }, { internalId: mkId(), type: 'fairness', enabled: false }, @@ -134,6 +151,76 @@ export function ProjectRoutingTab() { return used; } + // --- Semantic Intent / Model association helpers --- + function getIntentsForModel(modelId: string): Set { + const result = new Set(); + const semPolicy = policies.find(p => p.type === 'semantic-intent' && p.enabled); + if (!semPolicy) return result; + const intents = (semPolicy.config?.intents ?? {}) as Record; + for (const [key, def] of Object.entries(intents)) { + if (def.candidate_models?.includes(modelId)) result.add(key); + } + return result; + } + + function toggleIntentForModel(modelId: string, intentKey: string) { + const semPolicyIdx = policies.findIndex(p => p.type === 'semantic-intent' && p.enabled); + if (semPolicyIdx === -1) return; + const semPolicy = policies[semPolicyIdx]!; + const intents = { ...((semPolicy.config?.intents ?? {}) as Record) }; + const def = intents[intentKey]; + if (!def) return; + const current = def.candidate_models ?? []; + const next = current.includes(modelId) + ? current.filter(id => id !== modelId) + : [...current, modelId]; + intents[intentKey] = { ...def, candidate_models: next }; + updatePolicyConfig(semPolicyIdx, { intents }); + } + + // --- Semantic Intent Embedding Model Helpers --- + function getSemModelIds(policy: PolicyItem): string[] { + const primary = policy.config?.embedding_model; + const fallbacks: string[] = policy.config?.embedding_fallback_models ?? []; + const ids = primary ? [primary, ...fallbacks] : fallbacks; + return ids.length === 0 ? [''] : ids; + } + + function setSemModelIds(policyIdx: number, newIds: string[]) { + updatePolicyConfig(policyIdx, { + embedding_model: newIds[0] ?? '', + embedding_fallback_models: newIds.slice(1), + }); + } + + function onDragStartSemModel(e: React.DragEvent, mIdx: number) { + setDraggedSemModelIdx(mIdx); + e.dataTransfer.effectAllowed = 'move'; + setTimeout(() => { + const el = document.getElementById(`sem-model-row-${mIdx}`); + if (el) el.style.opacity = '0.4'; + }, 0); + } + + function onDragEnterSemModel(e: React.DragEvent, policyIdx: number, targetIdx: number) { + e.preventDefault(); + if (draggedSemModelIdx === null || draggedSemModelIdx === targetIdx) return; + const pol = policies[policyIdx]!; + const ids = getSemModelIds(pol); + const copy = [...ids]; + const dragged = copy[draggedSemModelIdx]!; + copy.splice(draggedSemModelIdx, 1); + copy.splice(targetIdx, 0, dragged); + setSemModelIds(policyIdx, copy); + setDraggedSemModelIdx(targetIdx); + } + + function onDragEndSemModel(_e: React.DragEvent, mIdx: number) { + setDraggedSemModelIdx(null); + const el = document.getElementById(`sem-model-row-${mIdx}`); + if (el) el.style.opacity = '1'; + } + // --- LLM Routing Model Helpers --- function getLlmModelIds(policy: PolicyItem): string[] { const primary = policy.config?.routingModelId; @@ -263,8 +350,7 @@ export function ProjectRoutingTab() { // -------------------------------- - async function handleSubmit(e: React.FormEvent) { - e.preventDefault(); + async function doSave() { if (!project) return; setErr(''); @@ -291,6 +377,8 @@ export function ProjectRoutingTab() { }; const updated = await updateProject(project.id, payload); setProject(updated); + setSaved(true); + setTimeout(() => setSaved(false), 2500); } catch (err) { setErr(err instanceof Error ? err.message : 'Error saving project routing'); } finally { @@ -298,10 +386,21 @@ export function ProjectRoutingTab() { } } + function handleSubmit(e: React.FormEvent) { + e.preventDefault(); + void doSave(); + } + const isAiRoutingEnabled = policies.some(p => p.type === 'llm' && p.enabled); const isAutoRoutingEnabled = policies.find(p => p.type === 'llm')?.config?.autoRouting ?? true; const showPromptInput = isAiRoutingEnabled && !isAutoRoutingEnabled; + const semanticIntentPolicy = policies.find(p => p.type === 'semantic-intent' && p.enabled); + const isSemanticIntentEnabled = !!semanticIntentPolicy; + const semanticIntents = isSemanticIntentEnabled + ? (semanticIntentPolicy!.config?.intents ?? {}) as Record + : {}; + if (loading) return (
@@ -347,6 +446,7 @@ export function ProjectRoutingTab() { {policy.type === 'llm' ? 'AI Routing' : policy.type === 'rate-limit' ? 'Rate Limit' : policy.type === 'budget-remaining' ? 'Budget Remaining' + : policy.type === 'semantic-intent' ? 'Semantic Intent' : policy.type} Policy
{POLICY_DESCRIPTIONS[policy.type] && ( @@ -594,6 +694,360 @@ export function ProjectRoutingTab() {
)} + {policy.type === 'semantic-intent' && policy.enabled && ( +
+ + {/* --- Embedding Model --- */} +
+ +

The first model is the primary. The others are tried in order if the primary fails.

+
+ {getSemModelIds(policy).map((modelId, mIdx) => { + const embeddingModels = availableModels.filter(m => m.capabilities?.embedding === true); + const usedIds = new Set(getSemModelIds(policy).filter((_, i) => i !== mIdx)); + const opts = embeddingModels + .filter(m => !usedIds.has(m.id)) + .sort((a, b) => a.id.localeCompare(b.id)) + .map(m => ({ value: m.id, label: m.id })); + return ( +
onDragStartSemModel(e, mIdx)} + onDragEnter={e => onDragEnterSemModel(e, idx, mIdx)} + onDragEnd={e => onDragEndSemModel(e, mIdx)} + onDragOver={e => e.preventDefault()} + style={{ display: 'flex', alignItems: 'center', gap: 6, cursor: 'grab', transition: 'opacity 0.2s' }} + > +
+ { + const ids = getSemModelIds(policy); + const copy = [...ids]; + copy[mIdx] = val; + setSemModelIds(idx, copy); + }} + placeholder="Select model" + style={{ flex: 1 }} + /> + {mIdx === 0 && ( + primary + )} + +
+ ); + })} +
+ +
+ + {/* --- Intents --- */} +
+ +

+ Each intent groups example utterances that represent a category of requests. The closer a user message is to an intent's examples, the higher its score. +

+
+ {Object.entries((policy.config?.intents ?? {}) as Record).map(([intentName, intentDef], iIdx, arr) => { + const isExpanded = expandedIntents.has(intentName); + const exampleKey = `${idx}::${intentName}`; + return ( +
+ {/* Intent header row */} +
setExpandedIntents(prev => { + const next = new Set(prev); + next.has(intentName) ? next.delete(intentName) : next.add(intentName); + return next; + })} + > + +
+ + {intentName.replace(/_/g, ' ')} + + + {intentDef.examples?.length ?? 0} example{(intentDef.examples?.length ?? 0) !== 1 ? 's' : ''} + + +
+ + {/* Expanded: examples list */} + {isExpanded && ( +
+ {(intentDef.examples ?? []).length === 0 && ( +

+ No examples yet. Add representative phrases below. +

+ )} + {(() => { + const examples = intentDef.examples ?? []; + const PAGE = 5; + const showAll = showAllExamples.has(exampleKey); + const visible = showAll ? examples : examples.slice(0, PAGE); + const hidden = examples.length - PAGE; + return ( + <> + {visible.map((ex, exIdx) => ( +
{ e.currentTarget.style.background = 'rgba(128,128,128,0.06)'; }} onMouseLeave={e => { e.currentTarget.style.background = 'transparent'; }}> + {exIdx + 1}. + { + const intents = { ...((policy.config?.intents ?? {}) as Record) }; + const def = intents[intentName]; + if (!def) return; + const newExamples = [...def.examples]; + newExamples[exIdx] = e.target.value; + intents[intentName] = { ...def, examples: newExamples }; + updatePolicyConfig(idx, { intents }); + }} + onMouseDown={e => e.stopPropagation()} + onKeyDown={e => { if (e.key === 'Enter') e.preventDefault(); }} + style={{ + flex: 1, + fontSize: '0.8rem', + color: 'var(--text-primary)', + lineHeight: 1.4, + background: 'none', + border: '1px solid transparent', + borderRadius: 4, + outline: 'none', + padding: '3px 6px', + transition: 'border-color 0.15s ease', + }} + onFocus={e => { e.currentTarget.style.borderColor = 'var(--accent)'; }} + onBlur={e => { e.currentTarget.style.borderColor = 'transparent'; }} + /> + +
+ ))} + {!showAll && hidden > 0 && ( + + )} + {showAll && examples.length > PAGE && ( + + )} + + ); + })()} + {/* Add example input */} +
+ + setAddExampleInputs(prev => ({ ...prev, [exampleKey]: e.target.value }))} + onMouseDown={e => e.stopPropagation()} + onKeyDown={e => { + if (e.key === 'Enter') { + e.preventDefault(); + const text = addExampleInputs[exampleKey]?.trim(); + if (!text) return; + const intents = { ...((policy.config?.intents ?? {}) as Record) }; + const def = intents[intentName]; + if (!def) return; + intents[intentName] = { ...def, examples: [...def.examples, text] }; + updatePolicyConfig(idx, { intents }); + setAddExampleInputs(prev => ({ ...prev, [exampleKey]: '' })); + } else if (e.key === 'Escape') { + setAddExampleInputs(prev => ({ ...prev, [exampleKey]: '' })); + } + }} + style={{ flex: 1, background: 'none', border: '1px solid var(--border)', borderRadius: 4, outline: 'none', fontSize: '0.8rem', color: 'var(--text-primary)', padding: '4px 8px' }} + /> +
+
+ )} +
+ ); + })} +
+ + setAddIntentInputs(prev => ({ ...prev, [idx]: e.target.value }))} + onMouseDown={e => e.stopPropagation()} + onKeyDown={e => { + if (e.key === 'Enter') { + const raw = addIntentInputs[idx]; + if (!raw?.trim()) return; + const key = raw.trim().toLowerCase().replace(/\s+/g, '_'); + const intents = { ...((policy.config?.intents ?? {}) as Record) }; + if (!intents[key]) { + intents[key] = { examples: [], candidate_models: [] }; + updatePolicyConfig(idx, { intents }); + setExpandedIntents(prev => new Set(prev).add(key)); + } + setAddIntentInputs(prev => ({ ...prev, [idx]: '' })); + } else if (e.key === 'Escape') { + setAddIntentInputs(prev => ({ ...prev, [idx]: '' })); + } + }} + onBlur={() => { + const raw = addIntentInputs[idx]; + if (raw?.trim()) { + const key = raw.trim().toLowerCase().replace(/\s+/g, '_'); + const intents = { ...((policy.config?.intents ?? {}) as Record) }; + if (!intents[key]) { + intents[key] = { examples: [], candidate_models: [] }; + updatePolicyConfig(idx, { intents }); + setExpandedIntents(prev => new Set(prev).add(key)); + } + } + setAddIntentInputs(prev => ({ ...prev, [idx]: '' })); + }} + style={{ flex: 1, background: 'none', border: 'none', outline: 'none', fontSize: '0.82rem', color: 'var(--text-primary)', padding: 0 }} + /> +
+
+
+ Names are normalized automatically (e.g. "Customer Support" → customer_support) +
+
+ + {/* --- Advanced (thresholds) --- */} +
+
{ + const open = (e.currentTarget as HTMLDetailsElement).open; + setAdvancedOpen(prev => { + const next = new Set(prev); + open ? next.add(idx) : next.delete(idx); + return next; + }); + }} + > + + Advanced + +
+ +
+
+ + updatePolicyConfig(idx, { absolute_threshold: Number(e.target.value) })} + onMouseDown={e => e.stopPropagation()} + /> +
+

+ Minimum cosine similarity score to consider a classification valid. Below this, the result is "unknown" and no filtering is applied. +

+
+ +
+
+ + updatePolicyConfig(idx, { ambiguity_threshold: Number(e.target.value) })} + onMouseDown={e => e.stopPropagation()} + /> +
+

+ Minimum gap between the top and second-best intent. If the margin is smaller, the classification is "ambiguous" and the candidate pools of both intents are merged. +

+
+ +
+
+
+ +
+ )} + {policy.type === 'fairness' && policy.enabled && (
@@ -715,6 +1169,37 @@ export function ProjectRoutingTab() { />
)} + + {isSemanticIntentEnabled && Object.keys(semanticIntents).length > 0 && ( +
+ +
+ {Object.keys(semanticIntents).map(intentKey => { + const active = getIntentsForModel(item.modelId).has(intentKey); + return ( + + ); + })} +
+
+ )}
@@ -763,8 +1248,20 @@ export function ProjectRoutingTab() {
-
diff --git a/packages/dashboard/src/utils/traceUtils.ts b/packages/dashboard/src/utils/traceUtils.ts index 4e81ce5..811f8d4 100644 --- a/packages/dashboard/src/utils/traceUtils.ts +++ b/packages/dashboard/src/utils/traceUtils.ts @@ -102,7 +102,7 @@ export function formatCost(usd: number | null): string { if (usd == null) return '—'; if (usd === 0) return '$0.000'; if (usd < 0.000001) return '<$0.000001'; - if (usd < 0.01) return `$${usd.toFixed(6)}`; + if (usd < 0.01) return `$${usd.toFixed(8)}`; if (usd < 1) return `$${usd.toFixed(4)}`; return `$${usd.toFixed(2)}`; } diff --git a/packages/service/src/cost/calculator.ts b/packages/service/src/cost/calculator.ts index df08125..1e53c04 100644 --- a/packages/service/src/cost/calculator.ts +++ b/packages/service/src/cost/calculator.ts @@ -21,5 +21,5 @@ export function calculateCost( const cachedCost = (cachedInputTokens / 1_000_000) * (model.cost.cachePerMillion ?? model.cost.inputPerMillion); const cacheCreateCost = (cacheCreationInputTokens / 1_000_000) * (model.cost.cacheWritePerMillion ?? model.cost.inputPerMillion); const outputCost = (outputTokens / 1_000_000) * model.cost.outputPerMillion; - return Math.round((inputCost + cachedCost + cacheCreateCost + outputCost) * 1_000_000) / 1_000_000; + return Math.round((inputCost + cachedCost + cacheCreateCost + outputCost) * 1_000_000_000) / 1_000_000_000; } diff --git a/packages/service/src/cost/tracker.ts b/packages/service/src/cost/tracker.ts index 73e10eb..6c8092c 100644 --- a/packages/service/src/cost/tracker.ts +++ b/packages/service/src/cost/tracker.ts @@ -40,8 +40,8 @@ export async function trackUsage(params: TrackUsageParams): Promise { (plainInput / 1_000_000) * params.model.cost.inputPerMillion + ((params.cachedInputTokens ?? 0) / 1_000_000) * (params.model.cost.cachePerMillion ?? params.model.cost.inputPerMillion) + ((params.cacheCreationInputTokens ?? 0) / 1_000_000) * (params.model.cost.cacheWritePerMillion ?? params.model.cost.inputPerMillion) - ) * 1_000_000) / 1_000_000; - const costOutput = Math.round(((params.outputTokens / 1_000_000) * params.model.cost.outputPerMillion) * 1_000_000) / 1_000_000; + ) * 1_000_000_000) / 1_000_000_000; + const costOutput = Math.round(((params.outputTokens / 1_000_000) * params.model.cost.outputPerMillion) * 1_000_000_000) / 1_000_000_000; const record: UsageRecord = { id: uuidv4(), diff --git a/packages/service/src/embeddings/index.ts b/packages/service/src/embeddings/index.ts new file mode 100644 index 0000000..615823f --- /dev/null +++ b/packages/service/src/embeddings/index.ts @@ -0,0 +1,23 @@ +import type { EmbeddingProvider, EmbeddingProviderType } from './types.js'; +import { OpenAIEmbeddingProvider } from './openai.js'; +import { OllamaEmbeddingProvider } from './ollama.js'; + +export type { EmbeddingProvider, EmbeddingProviderType }; +export type { EmbeddingProviderConfig } from './types.js'; + +export function getEmbeddingProvider( + type: EmbeddingProviderType, + endpoint?: string, + apiKey?: string, +): EmbeddingProvider { + switch (type) { + case 'openai': + return new OpenAIEmbeddingProvider(endpoint, apiKey); + case 'ollama': + return new OllamaEmbeddingProvider(endpoint); + default: { + const _exhaustive: never = type; + throw new Error(`Unknown embedding provider type: "${String(_exhaustive)}"`); + } + } +} diff --git a/packages/service/src/embeddings/ollama.ts b/packages/service/src/embeddings/ollama.ts new file mode 100644 index 0000000..256a8e0 --- /dev/null +++ b/packages/service/src/embeddings/ollama.ts @@ -0,0 +1,31 @@ +import type { EmbeddingProvider, EmbedResult } from './types.js'; + +interface OllamaEmbedResponse { + embeddings: number[][]; +} + +export class OllamaEmbeddingProvider implements EmbeddingProvider { + private readonly baseURL: string; + + constructor(endpoint?: string) { + this.baseURL = (endpoint ?? 'http://localhost:11434').replace(/\/$/, ''); + } + + async embed(texts: string[], model: string): Promise { + const url = `${this.baseURL}/api/embed`; + const response = await fetch(url, { + method: 'POST', + headers: { 'Content-Type': 'application/json' }, + body: JSON.stringify({ model, input: texts }), + }); + + if (!response.ok) { + const body = await response.text(); + throw new Error(`Ollama embedding request failed (${response.status}): ${body}`); + } + + const json = (await response.json()) as OllamaEmbedResponse; + // Ollama does not report token counts for embedding calls. + return { embeddings: json.embeddings, inputTokens: 0 }; + } +} diff --git a/packages/service/src/embeddings/openai.test.ts b/packages/service/src/embeddings/openai.test.ts new file mode 100644 index 0000000..7ad745e --- /dev/null +++ b/packages/service/src/embeddings/openai.test.ts @@ -0,0 +1,65 @@ +import { describe, it, expect, vi, afterEach } from 'vitest'; +import { OpenAIEmbeddingProvider } from './openai.js'; + +// Mock the OpenAI SDK. +vi.mock('openai', () => { + const mockCreate = vi.fn(); + const MockOpenAI = vi.fn(() => ({ + embeddings: { create: mockCreate }, + })); + (MockOpenAI as any).__mockCreate = mockCreate; + return { default: MockOpenAI }; +}); + +import OpenAI from 'openai'; + +function getMockCreate() { + return (OpenAI as any).__mockCreate as ReturnType; +} + +describe('OpenAIEmbeddingProvider', () => { + afterEach(() => { + vi.clearAllMocks(); + }); + + it('calls embeddings.create with the correct parameters', async () => { + const mockVec = [0.1, 0.2, 0.3]; + getMockCreate().mockResolvedValue({ + data: [{ embedding: mockVec }], + usage: { prompt_tokens: 3, total_tokens: 3 }, + }); + + const provider = new OpenAIEmbeddingProvider(); + const result = await provider.embed(['hello world'], 'text-embedding-3-small'); + + expect(getMockCreate()).toHaveBeenCalledWith({ + model: 'text-embedding-3-small', + input: ['hello world'], + encoding_format: 'float', + }); + expect(result.embeddings).toEqual([mockVec]); + expect(result.inputTokens).toBe(3); + }); + + it('returns one vector per input text', async () => { + getMockCreate().mockResolvedValue({ + data: [{ embedding: [1, 0] }, { embedding: [0, 1] }], + usage: { prompt_tokens: 2, total_tokens: 2 }, + }); + + const provider = new OpenAIEmbeddingProvider(); + const result = await provider.embed(['text1', 'text2'], 'text-embedding-3-small'); + + expect(result.embeddings).toHaveLength(2); + expect(result.embeddings[0]).toEqual([1, 0]); + expect(result.embeddings[1]).toEqual([0, 1]); + expect(result.inputTokens).toBe(2); + }); + + it('propagates errors from the SDK', async () => { + getMockCreate().mockRejectedValue(new Error('api error')); + + const provider = new OpenAIEmbeddingProvider(); + await expect(provider.embed(['test'], 'model')).rejects.toThrow('api error'); + }); +}); diff --git a/packages/service/src/embeddings/openai.ts b/packages/service/src/embeddings/openai.ts new file mode 100644 index 0000000..e75657e --- /dev/null +++ b/packages/service/src/embeddings/openai.ts @@ -0,0 +1,26 @@ +import OpenAI from 'openai'; +import type { EmbeddingProvider, EmbedResult } from './types.js'; + +export class OpenAIEmbeddingProvider implements EmbeddingProvider { + private readonly client: OpenAI; + + constructor(endpoint?: string, apiKey?: string) { + this.client = new OpenAI({ + apiKey: apiKey ?? '', + baseURL: endpoint ?? 'https://api.openai.com/v1', + }); + } + + async embed(texts: string[], model: string): Promise { + const response = await this.client.embeddings.create({ + model, + input: texts, + encoding_format: 'float', + }); + return { + // The API returns results in the same order as inputs. + embeddings: response.data.map(d => d.embedding), + inputTokens: response.usage?.prompt_tokens ?? 0, + }; + } +} diff --git a/packages/service/src/embeddings/types.ts b/packages/service/src/embeddings/types.ts new file mode 100644 index 0000000..f89fd19 --- /dev/null +++ b/packages/service/src/embeddings/types.ts @@ -0,0 +1,28 @@ +export type EmbeddingProviderType = 'openai' | 'ollama'; + +export interface EmbeddingProviderConfig { + /** Provider type. */ + type: EmbeddingProviderType; + /** Embedding model ID (e.g. 'text-embedding-3-small', 'nomic-embed-text'). */ + model: string; + /** Base URL / endpoint override. */ + endpoint?: string; + /** API key (required for OpenAI). */ + apiKey?: string; +} + +export interface EmbedResult { + /** One float32 vector per input text, in the same order. */ + embeddings: number[][]; + /** Total input tokens consumed by this call (0 when the provider does not report it). */ + inputTokens: number; +} + +/** Minimal interface every embedding backend must satisfy. */ +export interface EmbeddingProvider { + /** + * Embed a batch of texts. + * Returns the embedding vectors together with the reported input token count. + */ + embed(texts: string[], model: string): Promise; +} diff --git a/packages/service/src/routes/api.ts b/packages/service/src/routes/api.ts index 966dc52..e8ad7c2 100644 --- a/packages/service/src/routes/api.ts +++ b/packages/service/src/routes/api.ts @@ -7,7 +7,7 @@ import { randomBytes } from 'node:crypto'; import { readConfig, writeConfig } from '../config/loader.js'; import { CONFIG_PATHS } from '../config/paths.js'; import { createSessionToken, verifyToken, generateRawToken } from '../plugins/jwt.js'; -import type { ModelConfig, ProjectConfig, UserConfig, RoleConfig, Permission, Provider, PricingTier, RoutingPolicy, TokenModelRef, Settings, Limit } from '@routerly/shared'; +import type { ModelConfig, ProjectConfig, UserConfig, RoleConfig, Permission, Provider, PricingTier, RoutingPolicy, TokenModelRef, Settings, Limit, ModelCapabilities } from '@routerly/shared'; import { getTrace } from '../routing/traceStore.js'; import { sendTestNotification } from '../notifications/sender.js'; @@ -211,6 +211,7 @@ export const apiRoutes: FastifyPluginAsync = async (fastify) => { contextWindow?: number; pricingTiers?: PricingTier[]; limits?: Limit[]; + capabilities?: ModelCapabilities; /** @deprecated use limits */ dailyBudget?: number; /** @deprecated use limits */ weeklyBudget?: number; /** @deprecated use limits */ monthlyBudget?: number; @@ -252,6 +253,7 @@ export const apiRoutes: FastifyPluginAsync = async (fastify) => { ...(resolvedLimits?.length ? { limits: resolvedLimits } : {}), ...(req.body.contextWindow !== undefined ? { contextWindow: req.body.contextWindow } : {}), ...(req.body.upstreamModelId ? { upstreamModelId: req.body.upstreamModelId } : {}), + ...(req.body.capabilities ? { capabilities: req.body.capabilities } : {}), }; models.push(model); await writeConfig('models', models); @@ -269,6 +271,7 @@ export const apiRoutes: FastifyPluginAsync = async (fastify) => { contextWindow?: number; pricingTiers?: PricingTier[]; limits?: Limit[]; + capabilities?: ModelCapabilities; /** @deprecated use limits */ dailyBudget?: number; /** @deprecated use limits */ weeklyBudget?: number; /** @deprecated use limits */ monthlyBudget?: number; @@ -323,6 +326,8 @@ export const apiRoutes: FastifyPluginAsync = async (fastify) => { : req.body.upstreamModelId === undefined && _existingUpstreamModelId !== undefined ? { upstreamModelId: _existingUpstreamModelId } : {}), + // capabilities: if provided in body use it; if absent clear (unchecking the checkbox removes it) + ...(req.body.capabilities ? { capabilities: req.body.capabilities } : {}), }; models[index] = model; await writeConfig('models', models); @@ -720,7 +725,7 @@ export const apiRoutes: FastifyPluginAsync = async (fastify) => { const records = await readConfig('usage'); const { period = 'monthly', projectId, from, to } = req.query; const page = Math.max(1, parseInt(req.query.page ?? '1', 10) || 1); - const pageSize = Math.min(200, Math.max(1, parseInt(req.query.pageSize ?? '50', 10) || 50)); + const pageSize = Math.min(200, Math.max(1, parseInt(req.query.pageSize ?? '100', 10) || 100)); const now = new Date(); let since = new Date(0); diff --git a/packages/service/src/routing/intent/cache.ts b/packages/service/src/routing/intent/cache.ts new file mode 100644 index 0000000..65a9a6b --- /dev/null +++ b/packages/service/src/routing/intent/cache.ts @@ -0,0 +1,57 @@ +import type { IntentDefinition } from '@routerly/shared'; +import type { EmbeddingProvider } from '../../embeddings/types.js'; +import { meanVector } from './similarity.js'; + +interface CacheEntry { + centroid: number[]; + expiresAt: number; +} + +// Key: `${model}::${intentName}::${stable hash of examples}` +const cache = new Map(); + +const DEFAULT_TTL_MS = 60 * 60 * 1000; // 1 hour + +function hashExamples(examples: string[]): string { + // Simple deterministic hash: sort, join, use length+first-char sum as fingerprint. + // Not cryptographic — just needs to detect config changes. + const joined = [...examples].sort().join('|'); + let h = 0; + for (let i = 0; i < joined.length; i++) { + h = (Math.imul(31, h) + joined.charCodeAt(i)) >>> 0; + } + return h.toString(36); +} + +function buildKey(model: string, intentName: string, examples: string[]): string { + return `${model}::${intentName}::${hashExamples(examples)}`; +} + +/** + * Returns the centroid embedding for an intent's examples. + * Results are cached in-memory for TTL_MS to avoid re-embedding on every request. + */ +export async function getIntentCentroid( + intentName: string, + intent: IntentDefinition, + provider: EmbeddingProvider, + model: string, + ttlMs: number = DEFAULT_TTL_MS, +): Promise { + const key = buildKey(model, intentName, intent.examples); + const now = Date.now(); + const cached = cache.get(key); + if (cached !== undefined && cached.expiresAt > now) { + return cached.centroid; + } + + const { embeddings } = await provider.embed(intent.examples, model); + const centroid = meanVector(embeddings); + cache.set(key, { centroid, expiresAt: now + ttlMs }); + return centroid; +} + +/** Clears the entire centroid cache (useful in tests). */ +export function clearIntentCache(): void { + cache.clear(); +} diff --git a/packages/service/src/routing/intent/classifier.ts b/packages/service/src/routing/intent/classifier.ts new file mode 100644 index 0000000..e9db157 --- /dev/null +++ b/packages/service/src/routing/intent/classifier.ts @@ -0,0 +1,102 @@ +import type { IntentClassification, SemanticIntentConfig } from '@routerly/shared'; +import { getEmbeddingProvider } from '../../embeddings/index.js'; +import { cosineSimilarity } from './similarity.js'; +import { getIntentCentroid } from './cache.js'; + +const DEFAULT_ABSOLUTE_THRESHOLD = 0.60; +const DEFAULT_AMBIGUITY_THRESHOLD = 0.08; + +/** + * Classifies a text against the intent definitions in `config`. + * + * Returns an `IntentClassification` describing the top intent, scores, + * margin, and a `status` of 'confident', 'ambiguous', or 'unknown'. + * + * Also returns the total `inputTokens` used by the embedding API call for + * the request text, so callers can track usage accurately. + */ +export interface ClassifyIntentResult { + classification: IntentClassification; + /** Input tokens consumed by embedding the request text (0 for providers that don't report it). */ + inputTokens: number; +} + +export async function classifyIntent( + text: string, + config: SemanticIntentConfig, +): Promise { + const absoluteThreshold = config.absolute_threshold ?? DEFAULT_ABSOLUTE_THRESHOLD; + const ambiguityThreshold = config.ambiguity_threshold ?? DEFAULT_AMBIGUITY_THRESHOLD; + const intentNames = Object.keys(config.intents); + + const unknownResult: ClassifyIntentResult = { + classification: { + topIntent: null, + topScore: 0, + secondIntent: null, + secondScore: 0, + margin: 0, + status: 'unknown', + }, + inputTokens: 0, + }; + + if (intentNames.length === 0) return unknownResult; + + const provider = getEmbeddingProvider( + config.embedding_provider, + config.embedding_endpoint, + config.embedding_api_key, + ); + + // Embed the request text and all intent centroids in parallel. + // Only the request embedding reports token usage (centroids are cached). + const [requestEmbedResult, ...centroids] = await Promise.all([ + provider.embed([text], config.embedding_model), + ...intentNames.map(name => + getIntentCentroid(name, config.intents[name]!, provider, config.embedding_model), + ), + ]); + + const requestVec = requestEmbedResult.embeddings[0]; + if (requestVec === undefined || requestVec.length === 0) return unknownResult; + + // Score each intent against the request embedding. + const scores = intentNames.map((name, i) => ({ + name, + score: cosineSimilarity(requestVec, centroids[i] ?? []), + })); + + // Sort descending by score. + scores.sort((a, b) => b.score - a.score); + + const top = scores[0]; + const second = scores[1] ?? null; + + if (top === undefined) return unknownResult; + + const topScore = top.score; + const secondScore = second?.score ?? 0; + const margin = topScore - secondScore; + + let status: IntentClassification['status']; + if (topScore < absoluteThreshold) { + status = 'unknown'; + } else if (margin < ambiguityThreshold) { + status = 'ambiguous'; + } else { + status = 'confident'; + } + + return { + classification: { + topIntent: top.name, + topScore, + secondIntent: second?.name ?? null, + secondScore, + margin, + status, + }, + inputTokens: requestEmbedResult.inputTokens, + }; +} diff --git a/packages/service/src/routing/intent/similarity.test.ts b/packages/service/src/routing/intent/similarity.test.ts new file mode 100644 index 0000000..26f019f --- /dev/null +++ b/packages/service/src/routing/intent/similarity.test.ts @@ -0,0 +1,45 @@ +import { describe, it, expect } from 'vitest'; +import { cosineSimilarity, meanVector } from './similarity.js'; + +describe('cosineSimilarity', () => { + it('returns 1.0 for identical vectors', () => { + expect(cosineSimilarity([1, 0, 0], [1, 0, 0])).toBeCloseTo(1.0); + }); + + it('returns -1.0 for opposite vectors', () => { + expect(cosineSimilarity([1, 0, 0], [-1, 0, 0])).toBeCloseTo(-1.0); + }); + + it('returns 0.0 for orthogonal vectors', () => { + expect(cosineSimilarity([1, 0], [0, 1])).toBeCloseTo(0.0); + }); + + it('returns 0.0 when one vector is zero', () => { + expect(cosineSimilarity([0, 0, 0], [1, 2, 3])).toBe(0); + }); + + it('returns 0.0 when both vectors are zero', () => { + expect(cosineSimilarity([0, 0], [0, 0])).toBe(0); + }); + + it('computes correct similarity for known vectors', () => { + // [1,1] vs [1,0] → dot=1, |a|=√2, |b|=1 → 1/√2 ≈ 0.7071 + expect(cosineSimilarity([1, 1], [1, 0])).toBeCloseTo(0.7071, 4); + }); +}); + +describe('meanVector', () => { + it('returns [] for empty input', () => { + expect(meanVector([])).toEqual([]); + }); + + it('returns the vector itself for a single input', () => { + expect(meanVector([[1, 2, 3]])).toEqual([1, 2, 3]); + }); + + it('computes the correct element-wise mean', () => { + const result = meanVector([[1, 2], [3, 4]]); + expect(result[0]).toBeCloseTo(2); + expect(result[1]).toBeCloseTo(3); + }); +}); diff --git a/packages/service/src/routing/intent/similarity.ts b/packages/service/src/routing/intent/similarity.ts new file mode 100644 index 0000000..33e666f --- /dev/null +++ b/packages/service/src/routing/intent/similarity.ts @@ -0,0 +1,33 @@ +/** + * Cosine similarity between two equal-length float vectors. + * Returns a value in [-1, 1]. + */ +export function cosineSimilarity(a: number[], b: number[]): number { + let dot = 0; + let normA = 0; + let normB = 0; + for (let i = 0; i < a.length; i++) { + dot += (a[i] ?? 0) * (b[i] ?? 0); + normA += (a[i] ?? 0) * (a[i] ?? 0); + normB += (b[i] ?? 0) * (b[i] ?? 0); + } + const denom = Math.sqrt(normA) * Math.sqrt(normB); + if (denom === 0) return 0; + return dot / denom; +} + +/** + * Compute the element-wise mean of a list of vectors (centroid). + * All vectors must have the same length. + */ +export function meanVector(vectors: number[][]): number[] { + if (vectors.length === 0) return []; + const dim = vectors[0]?.length ?? 0; + const sum = new Array(dim).fill(0); + for (const v of vectors) { + for (let i = 0; i < dim; i++) { + sum[i] = (sum[i] ?? 0) + (v[i] ?? 0); + } + } + return sum.map(s => s / vectors.length); +} diff --git a/packages/service/src/routing/policies/semantic-intent.test.ts b/packages/service/src/routing/policies/semantic-intent.test.ts new file mode 100644 index 0000000..011cb3a --- /dev/null +++ b/packages/service/src/routing/policies/semantic-intent.test.ts @@ -0,0 +1,213 @@ +import { describe, it, expect, vi, beforeEach } from 'vitest'; +import { clearIntentCache } from '../intent/cache.js'; + +// ── Helpers ─────────────────────────────────────────────────────────────────── + +function makeMockProvider(vectors: Record) { + return { + embed: vi.fn(async (texts: string[], _model: string) => ({ + embeddings: texts.map(t => vectors[t] ?? [0]), + inputTokens: 0, + })), + }; +} + +// Mock the embedding index so we can inject a fake provider. +const mockProvider = makeMockProvider({}); +vi.mock('../../embeddings/index.js', () => ({ + getEmbeddingProvider: vi.fn(() => mockProvider), +})); + +import { semanticIntentPolicy } from './semantic-intent.js'; +import type { PolicyInput } from './types.js'; +import type { SemanticIntentConfig } from '@routerly/shared'; + +// ── Fixtures ────────────────────────────────────────────────────────────────── + +function makeRequest(content: string): PolicyInput['request'] { + return { + model: 'auto', + messages: [{ role: 'user', content }], + } as PolicyInput['request']; +} + +function makeCandidate(id: string): PolicyInput['candidates'][0] { + return { + model: { + id, + name: id, + provider: 'openai', + endpoint: 'https://api.openai.com/v1', + cost: { inputPerMillion: 1, outputPerMillion: 3 }, + }, + }; +} + +const baseConfig: SemanticIntentConfig = { + embedding_provider: 'openai', + embedding_model: 'text-embedding-3-small', + absolute_threshold: 0.60, + ambiguity_threshold: 0.08, + intents: { + coding: { + examples: ['write a python function', 'fix this typescript bug'], + candidate_models: ['coder-model'], + }, + general_chat: { + examples: ['hello', 'how are you'], + candidate_models: ['chat-model'], + }, + }, +}; + +// ── Tests ───────────────────────────────────────────────────────────────────── + +describe('semanticIntentPolicy', () => { + beforeEach(() => { + clearIntentCache(); + vi.clearAllMocks(); + }); + + it('passes all candidates when config is missing required fields', async () => { + const emitMock = vi.fn(); + const result = await semanticIntentPolicy({ + request: makeRequest('hello'), + candidates: [makeCandidate('any-model')], + config: {}, + emit: emitMock, + }); + + expect(result.routing).toHaveLength(1); + expect(result.routing[0]!.point).toBe(1.0); + expect(emitMock).toHaveBeenCalledWith(expect.objectContaining({ + message: 'policy:semantic-intent:misconfigured', + })); + }); + + it('passes all candidates when status is unknown (low score)', async () => { + // All similarities will be 0 (zero vectors) → below absolute_threshold → unknown + mockProvider.embed.mockResolvedValue({ embeddings: [[0, 0, 0]], inputTokens: 0 }); + + // Centroids also zero → cosineSimilarity = 0 + const result = await semanticIntentPolicy({ + request: makeRequest('some text'), + candidates: [makeCandidate('coder-model'), makeCandidate('chat-model')], + config: baseConfig, + }); + + expect(result.excludes).toBeUndefined(); + expect(result.routing).toHaveLength(2); + expect(result.routing.every(r => r.point === 1.0)).toBe(true); + }); + + it('filters to intent pool when status is confident', async () => { + // coding intent examples → centroid ~[1,0,0] + // general_chat examples → centroid ~[0,1,0] + // request → [1,0,0] close to coding + const codingVec = [1, 0, 0]; + const chatVec = [0, 1, 0]; + const requestVec = [0.99, 0.01, 0]; + + mockProvider.embed.mockImplementation(async (texts: string[]) => { + return { + embeddings: texts.map(t => { + if (t === 'some text') return requestVec; + if ((baseConfig.intents['coding']?.examples ?? []).includes(t)) return codingVec; + return chatVec; + }), + inputTokens: 0, + }; + }); + + const result = await semanticIntentPolicy({ + request: makeRequest('some text'), + candidates: [makeCandidate('coder-model'), makeCandidate('chat-model'), makeCandidate('other-model')], + config: baseConfig, + }); + + // coder-model is in coding pool → included + const coderEntry = result.routing.find(r => r.model === 'coder-model'); + expect(coderEntry?.point).toBe(1.0); + + // chat-model is not in coding pool → excluded + const chatEntry = result.routing.find(r => r.model === 'chat-model'); + expect(chatEntry?.point).toBe(0.0); + expect(result.excludes).toContain('chat-model'); + }); + + it('merges pools when status is ambiguous', async () => { + // Two intents with similar scores (margin < ambiguity_threshold) + const vec = [1, 0, 0]; // same vector for all → margin = 0 + + mockProvider.embed.mockResolvedValue({ embeddings: [vec], inputTokens: 0 }); + + const result = await semanticIntentPolicy({ + request: makeRequest('ambiguous text'), + candidates: [makeCandidate('coder-model'), makeCandidate('chat-model')], + config: { + ...baseConfig, + absolute_threshold: 0.0, // ensure score passes absolute threshold + ambiguity_threshold: 1.0, // very high → everything is ambiguous + }, + }); + + // Both pools merged → both models allowed + expect(result.routing.every(r => r.point === 1.0)).toBe(true); + expect(result.excludes).toBeUndefined(); + }); + + it('passes all candidates when user message is empty', async () => { + const result = await semanticIntentPolicy({ + request: makeRequest(' '), + candidates: [makeCandidate('coder-model'), makeCandidate('chat-model')], + config: baseConfig, + }); + + expect(result.routing.every(r => r.point === 1.0)).toBe(true); + expect(result.excludes).toBeUndefined(); + }); + + it('degrades gracefully when the embedding provider throws', async () => { + mockProvider.embed.mockRejectedValue(new Error('network error')); + + const emit = vi.fn(); + const result = await semanticIntentPolicy({ + request: makeRequest('some text'), + candidates: [makeCandidate('coder-model'), makeCandidate('chat-model')], + config: baseConfig, + emit, + }); + + // All candidates pass through + expect(result.routing.every(r => r.point === 1.0)).toBe(true); + // An error trace entry is emitted + expect(emit).toHaveBeenCalledWith(expect.objectContaining({ + message: 'policy:semantic-intent:error', + })); + }); + + it('falls back to all candidates when intent pool has no overlap with project candidates', async () => { + const codingVec = [1, 0, 0]; + const requestVec = [0.99, 0, 0]; + + mockProvider.embed.mockImplementation(async (texts: string[]) => { + return { + embeddings: texts.map(t => { + if (t === 'some text') return requestVec; + return codingVec; + }), + inputTokens: 0, + }; + }); + + // Project candidates don't include any model from the coding pool + const result = await semanticIntentPolicy({ + request: makeRequest('some text'), + candidates: [makeCandidate('some-other-model')], + config: baseConfig, + }); + + // Falls back to all candidates + expect(result.routing.find(r => r.model === 'some-other-model')?.point).toBe(1.0); + }); +}); diff --git a/packages/service/src/routing/policies/semantic-intent.ts b/packages/service/src/routing/policies/semantic-intent.ts new file mode 100644 index 0000000..b10e29d --- /dev/null +++ b/packages/service/src/routing/policies/semantic-intent.ts @@ -0,0 +1,204 @@ +import type { SemanticIntentConfig } from '@routerly/shared'; +import { providersConf } from '@routerly/shared'; +import { classifyIntent } from '../intent/classifier.js'; +import { trackUsage } from '../../cost/tracker.js'; +import type { PolicyFn } from './types.js'; + +/** Lookup input cost (per 1M tokens) for an embedding model from the static providers catalogue. */ +function getEmbeddingInputCost(provider: string, modelId: string): number { + const providerConf = (providersConf as Record }>)[provider]; + const modelConf = providerConf?.models?.find(m => m.id === modelId); + return modelConf?.input ?? 0; +} + +/** + * Policy: semantic-intent + * + * Classifies the incoming request by semantic intent using embeddings, + * then restricts the candidate pool to the models mapped to that intent. + * + * Classification outcomes: + * - `confident` → hard-filter candidates to the matched intent's model pool. + * - `ambiguous` → merge the candidate pools of the top-2 intents. + * - `unknown` → pass all candidates through unchanged (no filtering). + * + * The policy emits trace entries with classification details so that + * the routing decision is visible in the real-time trace stream. + * + * This policy is designed to run *before* scoring policies (cheapest, + * performance, etc.) so that they operate only within the narrowed pool. + */ +export const semanticIntentPolicy: PolicyFn = async ({ + request, + candidates, + config, + log, + emit, + projectId, + traceId, +}) => { + const cfg = config as SemanticIntentConfig | undefined; + + // ── Validate config ──────────────────────────────────────────────────────── + if (!cfg?.embedding_provider || !cfg?.embedding_model || !cfg?.intents) { + log?.warn({ cfg }, 'semantic-intent policy: misconfigured, passing all candidates through'); + emit?.({ + panel: 'router-response', + message: 'policy:semantic-intent:misconfigured', + details: { + reason: 'Missing embedding_provider, embedding_model, or intents — passing all candidates through.', + }, + }); + return { + routing: candidates.map(c => ({ model: c.model.id, point: 1.0 })), + }; + } + + // ── Extract user message text ────────────────────────────────────────────── + const messages = request.messages ?? []; + // Use the last user message for classification; fall back to a concatenation. + const userText = [...messages] + .reverse() + .find(m => m.role === 'user' && typeof m.content === 'string') + ?.content as string | undefined + ?? messages + .filter(m => typeof m.content === 'string') + .map(m => m.content as string) + .join(' '); + + if (!userText || userText.trim().length === 0) { + // Nothing to classify — pass all candidates through. + return { + routing: candidates.map(c => ({ model: c.model.id, point: 1.0 })), + }; + } + + log?.info({ + candidates: candidates.map(c => ({ id: c.model.id, provider: c.model.provider })), + intentCount: Object.keys(cfg.intents).length, + embeddingModel: cfg.embedding_model, + }, 'semantic-intent policy: input'); + + // ── Classify ─────────────────────────────────────────────────────────────── + let classifyResult; + try { + const t0 = Date.now(); + classifyResult = await classifyIntent(userText, cfg); + const latencyMs = Date.now() - t0; + + // Track the embedding API call as a routing-type usage record so it appears + // in the dashboard alongside llm-policy routing calls. + // The embedding model is not in models.json, so we build a synthetic ModelConfig. + if (projectId) { + const inputPerMillion = getEmbeddingInputCost(cfg.embedding_provider, cfg.embedding_model); + await trackUsage({ + projectId, + model: { + id: cfg.embedding_model, + name: cfg.embedding_model, + provider: cfg.embedding_provider as any, + endpoint: cfg.embedding_endpoint ?? '', + cost: { inputPerMillion, outputPerMillion: 0 }, + }, + inputTokens: classifyResult.inputTokens, + outputTokens: 0, + latencyMs, + outcome: 'success', + callType: 'routing', + ...(traceId !== undefined ? { traceId } : {}), + }).catch(() => {}); + } + } catch (err) { + // Embedding call failed — degrade gracefully, pass all candidates. + log?.warn({ err: err instanceof Error ? err.message : String(err) }, 'semantic-intent policy: embedding error, passing all candidates through'); + emit?.({ + panel: 'router-response', + message: 'policy:semantic-intent:error', + details: { error: err instanceof Error ? err.message : String(err) }, + }); + return { + routing: candidates.map(c => ({ model: c.model.id, point: 1.0 })), + }; + } + + const classification = classifyResult.classification; + + log?.info({ + topIntent: classification.topIntent, + topScore: classification.topScore, + status: classification.status, + }, 'semantic-intent policy: classification'); + emit?.({ + panel: 'router-response', + message: 'policy:semantic-intent:classification', + details: { + topIntent: classification.topIntent, + topScore: classification.topScore, + secondIntent: classification.secondIntent, + secondScore: classification.secondScore, + margin: classification.margin, + status: classification.status, + }, + }); + + // ── Build the allowed model set ──────────────────────────────────────────── + const candidateIds = new Set(candidates.map(c => c.model.id)); + + const resolvePool = (intentName: string | null): Set => { + if (!intentName) return candidateIds; + const intentDef = cfg.intents[intentName]; + if (!intentDef) return candidateIds; + // Only keep models that are both in the intent pool AND in the project's candidate list. + const pool = new Set(intentDef.candidate_models.filter(id => candidateIds.has(id))); + // If the intent's candidate_models are all unknown (not in project), fall back to all. + return pool.size > 0 ? pool : candidateIds; + }; + + let allowedIds: Set; + + switch (classification.status) { + case 'confident': { + allowedIds = resolvePool(classification.topIntent); + break; + } + case 'ambiguous': { + // Merge top-2 intent pools. + const pool1 = resolvePool(classification.topIntent); + const pool2 = resolvePool(classification.secondIntent); + allowedIds = new Set([...pool1, ...pool2]); + break; + } + case 'unknown': + default: { + // No filtering. + allowedIds = candidateIds; + break; + } + } + + const routing = candidates.map(c => ({ + model: c.model.id, + point: allowedIds.has(c.model.id) ? 1.0 : 0.0, + intent: classification.topIntent, + intentStatus: classification.status, + })); + + const excludes = routing.filter(r => r.point === 0.0).map(r => r.model); + + log?.info({ + allowed: [...allowedIds], + excluded: excludes, + status: classification.status, + }, 'semantic-intent policy: result'); + emit?.({ + panel: 'router-response', + message: 'policy:semantic-intent:result', + details: { + allowed: [...allowedIds], + excluded: excludes, + status: classification.status, + }, + }); + + return { routing, ...(excludes.length > 0 ? { excludes } : {}) }; +}; diff --git a/packages/service/src/routing/router.ts b/packages/service/src/routing/router.ts index bcdbaed..013e3de 100644 --- a/packages/service/src/routing/router.ts +++ b/packages/service/src/routing/router.ts @@ -12,6 +12,7 @@ import { capabilityPolicy } from './policies/capability.js'; import { rateLimitPolicy } from './policies/rate-limit.js'; import { fairnessPolicy } from './policies/fairness.js'; import { budgetRemainingPolicy } from './policies/budget-remaining.js'; +import { semanticIntentPolicy } from './policies/semantic-intent.js'; import type { PolicyFn } from './policies/types.js'; import type { TraceEntry, TracePanel } from './traceStore.js'; @@ -40,6 +41,7 @@ const POLICY_MAP: Record = { 'rate-limit': rateLimitPolicy, fairness: fairnessPolicy, 'budget-remaining': budgetRemainingPolicy, + 'semantic-intent': semanticIntentPolicy, }; export async function routeRequest( @@ -173,8 +175,19 @@ export async function routeRequest( } // ── Emit policy config subito dopo ─────────────────────────────────────── + // Redact sensitive fields (API keys, tokens, secrets) from the trace. + const redactConfig = (cfg: unknown): unknown => { + if (!cfg || typeof cfg !== 'object') return cfg; + return Object.fromEntries( + Object.entries(cfg as Record).map(([k, v]) => { + if (/key|secret|token|password/i.test(k)) return [k, '***']; + if (v && typeof v === 'object') return [k, redactConfig(v)]; + return [k, v]; + }), + ); + }; const policiesEntry = te('router-request', 'router:policies', { - policies: policiesWithWeight.map(({ type, weight, config }) => ({ type, weight, config })), + policies: policiesWithWeight.map(({ type, weight, config }) => ({ type, weight, config: redactConfig(config) })), candidates: validCandidates.map(c => c.model.id), }); emit?.(policiesEntry); diff --git a/packages/shared/src/browser.ts b/packages/shared/src/browser.ts index 6a01409..664f4bb 100644 --- a/packages/shared/src/browser.ts +++ b/packages/shared/src/browser.ts @@ -23,6 +23,9 @@ export type { TokenModelRef, RoutingPolicy, RoutingPolicyType, + IntentDefinition, + SemanticIntentConfig, + IntentClassification, UserConfig, RoleConfig, Permission, diff --git a/packages/shared/src/conf/providers.json b/packages/shared/src/conf/providers.json index 9f6eecf..ac24428 100644 --- a/packages/shared/src/conf/providers.json +++ b/packages/shared/src/conf/providers.json @@ -121,6 +121,30 @@ "cache": 0.275, "contextWindow": 128000, "notes": "Reinforcement-tuned mini model" + }, + { + "id": "text-embedding-3-large", + "input": 0.13, + "output": 0, + "contextWindow": 8191, + "notes": "OpenAI embedding model (3072-dim)", + "capabilities": { "embedding": true } + }, + { + "id": "text-embedding-3-small", + "input": 0.02, + "output": 0, + "contextWindow": 8191, + "notes": "OpenAI embedding model (1536-dim, fastest)", + "capabilities": { "embedding": true } + }, + { + "id": "text-embedding-ada-002", + "input": 0.1, + "output": 0, + "contextWindow": 8191, + "notes": "OpenAI legacy embedding model", + "capabilities": { "embedding": true } } ] }, @@ -477,6 +501,30 @@ "output": 0, "contextWindow": 16384, "notes": "Microsoft Phi-4" + }, + { + "id": "nomic-embed-text", + "input": 0, + "output": 0, + "contextWindow": 8192, + "notes": "Nomic embedding model for Ollama", + "capabilities": { "embedding": true } + }, + { + "id": "mxbai-embed-large", + "input": 0, + "output": 0, + "contextWindow": 512, + "notes": "MixedBread large embedding model for Ollama", + "capabilities": { "embedding": true } + }, + { + "id": "all-minilm", + "input": 0, + "output": 0, + "contextWindow": 512, + "notes": "all-MiniLM embedding model for Ollama", + "capabilities": { "embedding": true } } ] }, diff --git a/packages/shared/src/index.ts b/packages/shared/src/index.ts index 39141e6..30c65c6 100644 --- a/packages/shared/src/index.ts +++ b/packages/shared/src/index.ts @@ -9,6 +9,7 @@ export type { LimitsMode, RollingUnit, BudgetThresholds, + ModelCapabilities, ModelConfig, ProjectModelRef, ProjectConfig, @@ -18,6 +19,9 @@ export type { TokenModelRef, RoutingPolicy, RoutingPolicyType, + IntentDefinition, + SemanticIntentConfig, + IntentClassification, UserConfig, RoleConfig, Permission, diff --git a/packages/shared/src/types/config.ts b/packages/shared/src/types/config.ts index d03836a..7f8bf60 100644 --- a/packages/shared/src/types/config.ts +++ b/packages/shared/src/types/config.ts @@ -91,6 +91,8 @@ export interface ModelCapabilities { functionCalling?: boolean; /** Whether the model supports JSON-mode output (response_format: json_object) */ json?: boolean; + /** Whether the model is an embedding model */ + embedding?: boolean; } export interface ModelConfig { @@ -130,7 +132,7 @@ export interface ProjectModelRef { thresholds?: BudgetThresholds; } -export type RoutingPolicyType = 'context' | 'cheapest' | 'health' | 'performance' | 'llm' | 'capability' | 'rate-limit' | 'fairness' | 'budget-remaining'; +export type RoutingPolicyType = 'context' | 'cheapest' | 'health' | 'performance' | 'llm' | 'capability' | 'rate-limit' | 'fairness' | 'budget-remaining' | 'semantic-intent'; export interface RoutingPolicy { type: RoutingPolicyType; @@ -140,6 +142,63 @@ export interface RoutingPolicy { config?: any; } +/** One intent definition: example utterances and the models to consider for this intent. */ +export interface IntentDefinition { + /** Representative utterances used to compute the intent embedding (centroid). */ + examples: string[]; + /** Model IDs from the project's candidate pool to route to when this intent is matched. */ + candidate_models: string[]; +} + +/** Configuration for the `semantic-intent` routing policy. */ +export interface SemanticIntentConfig { + /** Embedding provider to use: 'openai' or 'ollama'. */ + embedding_provider: 'openai' | 'ollama'; + /** Embedding model ID (e.g. 'text-embedding-3-small', 'nomic-embed-text'). */ + embedding_model: string; + /** Fallback embedding model IDs tried in order if the primary fails. */ + embedding_fallback_models?: string[]; + /** API endpoint for the embedding provider. Defaults to provider's default. */ + embedding_endpoint?: string; + /** API key for the embedding provider. Required for OpenAI. */ + embedding_api_key?: string; + /** + * Minimum cosine similarity score for a classification to be considered `confident`. + * Requests scoring below this threshold are classified as `unknown` (no filtering applied). + * @default 0.60 + */ + absolute_threshold?: number; + /** + * Minimum margin between the top and second-best intent score to be considered `confident`. + * If the margin is below this value, the classification is `ambiguous`. + * @default 0.08 + */ + ambiguity_threshold?: number; + /** + * Name of the policy type to fall back to when the classification is `ambiguous` or `unknown`. + * If not set, all candidates are passed through unchanged. + */ + fallback_policy?: RoutingPolicyType; + /** Map of intent name to its definition. */ + intents: Record; +} + +/** Result of classifying a request against known intents. */ +export interface IntentClassification { + /** The name of the top-ranked intent (or null when status is 'unknown'). */ + topIntent: string | null; + /** Cosine similarity score of the top intent. */ + topScore: number; + /** The name of the second-ranked intent (or null when fewer than 2 intents). */ + secondIntent: string | null; + /** Cosine similarity of the second-ranked intent. */ + secondScore: number; + /** Difference between topScore and secondScore. */ + margin: number; + /** Classification confidence status. */ + status: 'confident' | 'ambiguous' | 'unknown'; +} + export type ProjectRole = 'viewer' | 'editor' | 'admin'; export interface ProjectMember { diff --git a/test/setup_alibaba_test.py b/test/setup_alibaba_test.py new file mode 100644 index 0000000..11012d4 --- /dev/null +++ b/test/setup_alibaba_test.py @@ -0,0 +1,195 @@ +#!/usr/bin/env python3 +""" +Setup e test integrazione Routerly <-> Alibaba DashScope. +Uso: python3 test/setup_alibaba_test.py +""" +import json, urllib.request, urllib.error, base64, sys + +BASE = "http://localhost:3000" +ADMIN_EMAIL = "info@routerly.ai" +ADMIN_PASSWORD = "C4m4ll0!" +DASHSCOPE_KEY = "sk-41c27074f7a54378a8cecd28763785cc" +DASHSCOPE_ENDPOINT = "https://dashscope-intl.aliyuncs.com/compatible-mode/v1" + +def req(method, path, body=None, token=None, bearer_type="session"): + url = BASE + path + data = json.dumps(body).encode() if body else None + headers = {"Content-Type": "application/json"} + if token: + headers["Authorization"] = f"Bearer {token}" + r = urllib.request.Request(url, data=data, headers=headers, method=method) + try: + with urllib.request.urlopen(r, timeout=20) as resp: + return json.loads(resp.read()) + except urllib.error.HTTPError as e: + body = e.read().decode() + print(f" [HTTP {e.code}] {method} {path}: {body}") + return None + +def sep(title): + print(f"\n{'='*60}") + print(f" {title}") + print('='*60) + +# ── 1. Login ───────────────────────────────────────────────────────────────── +sep("1. Login admin") +resp = req("POST", "/api/auth/login", {"email": ADMIN_EMAIL, "password": ADMIN_PASSWORD}) +if not resp or not resp.get("token"): + print("ERRORE: login fallito") + sys.exit(1) +session = resp["token"] +print(f" OK token: {session[:30]}...") + +# ── 2. Crea modelli ─────────────────────────────────────────────────────────── +sep("2. Crea modelli Alibaba DashScope") + +models_to_create = [ + { + "id": "alibaba/qwen-vl-plus", + "name": "Qwen VL Plus (Alibaba)", + "provider": "custom", + "endpoint": DASHSCOPE_ENDPOINT, + "apiKey": DASHSCOPE_KEY, + "upstreamModelId": "qwen-vl-plus", + "contextWindow": 32000, + "capabilities": {"vision": True, "functionCalling": True, "json": True}, + "cost": {"input": 0.0004, "output": 0.0012}, + }, + { + "id": "alibaba/qwen-plus", + "name": "Qwen Plus (Alibaba)", + "provider": "custom", + "endpoint": DASHSCOPE_ENDPOINT, + "apiKey": DASHSCOPE_KEY, + "upstreamModelId": "qwen-plus", + "contextWindow": 131072, + "capabilities": {"vision": False, "functionCalling": True, "json": True}, + "cost": {"input": 0.0004, "output": 0.0012}, + }, +] + +created_model_ids = [] +for m in models_to_create: + # Controlla se esiste già + existing = req("GET", f"/api/models/{m['id']}", token=session) + if existing: + print(f" SKIP {m['id']} (già presente)") + created_model_ids.append(m["id"]) + continue + r = req("POST", "/api/models", m, token=session) + if r: + print(f" OK {m['id']}") + created_model_ids.append(m["id"]) + else: + print(f" FAIL {m['id']}") + +# ── 3. Crea progetto ────────────────────────────────────────────────────────── +sep("3. Crea progetto 'alibaba-test'") + +PROJECT_NAME = "alibaba-test" +existing_projects = req("GET", "/api/projects", token=session) or [] +project = next((p for p in existing_projects if p.get("name") == PROJECT_NAME), None) + +if project: + print(f" SKIP progetto '{PROJECT_NAME}' già esistente (id: {project['id']})") +else: + r = req("POST", "/api/projects", { + "name": PROJECT_NAME, + "models": [{"modelId": mid} for mid in created_model_ids], + "timeoutMs": 30000, + }, token=session) + if r: + project = r.get("project") or r + print(f" OK progetto creato (id: {project.get('id', '?')})") + if r.get("token"): + print(f" token API: {r['token']}") + else: + print(" FAIL creazione progetto") + sys.exit(1) + +project_id = project.get("id") or project.get("projectId") + +# Aggiorna i modelli del progetto se già esistente ma potrebbe non averli +if project: + r = req("PUT", f"/api/projects/{project_id}", { + "name": PROJECT_NAME, + "models": [{"modelId": mid} for mid in created_model_ids], + "timeoutMs": 30000, + }, token=session) + if r: print(f" OK modelli aggiornati nel progetto") + +# ── 4. Crea token API per le chiamate di test ───────────────────────────────── +sep("4. Crea token API per il progetto") +token_resp = req("POST", f"/api/projects/{project_id}/tokens", {"labels": ["alibaba-test"]}, token=session) +if not token_resp or not token_resp.get("token"): + # Prova con tutti i token esistenti + proj_detail = req("GET", f"/api/projects/{project_id}", token=session) or {} + tokens = proj_detail.get("tokens", []) + if tokens: + api_token = None # Non abbiamo il token completo, solo snippet + print(f" INFO token esistente snippet: {tokens[0].get('tokenSnippet')}... (non recuperabile, creane uno nuovo)") + token_resp = req("POST", f"/api/projects/{project_id}/tokens", {"labels": ["test2"]}, token=session) + +if token_resp and token_resp.get("token"): + api_token = token_resp["token"] + print(f" OK {api_token[:30]}...") +else: + print(" FAIL impossibile creare token") + sys.exit(1) + +# ── 5. Test: chiamata testo con qwen-plus ───────────────────────────────────── +sep("5. Test testo: qwen-plus via Routerly") +r = req("POST", "/v1/chat/completions", { + "model": "alibaba/qwen-plus", + "messages": [{"role": "user", "content": "Dimmi solo: ciao!"}], + "max_tokens": 10, +}, token=api_token) +if r and r.get("choices"): + content = r["choices"][0]["message"]["content"] + model_used = r.get("model", "?") + print(f" OK risposta: {repr(content)}") + print(f" model used: {model_used}") + usage = r.get("usage", {}) + print(f" tokens: {usage.get('total_tokens', '?')} ({usage.get('prompt_tokens','?')} in + {usage.get('completion_tokens','?')} out)") +else: + print(" FAIL") + +# ── 6. Test: chiamata vision con qwen-vl-plus ───────────────────────────────── +sep("6. Test vision: qwen-vl-plus + immagine base64 via Routerly") + +# Scarica un'immagine di test +try: + img_req = urllib.request.Request( + "https://images.unsplash.com/photo-1529778873920-4da4926a72c2?w=100", + headers={"User-Agent": "Mozilla/5.0"} + ) + img_data = urllib.request.urlopen(img_req, timeout=10).read() + img_b64 = base64.b64encode(img_data).decode() + print(f" immagine scaricata: {len(img_data)} bytes") +except Exception as e: + print(f" WARN: impossibile scaricare immagine: {e}") + img_b64 = None + +if img_b64: + r = req("POST", "/v1/chat/completions", { + "model": "alibaba/qwen-vl-plus", + "messages": [{"role": "user", "content": [ + {"type": "text", "text": "Descrivi questa immagine in una frase breve."}, + {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{img_b64}"}} + ]}], + "max_tokens": 40, + }, token=api_token) + if r and r.get("choices"): + content = r["choices"][0]["message"]["content"] + model_used = r.get("model", "?") + print(f" OK risposta: {repr(content)}") + print(f" model used: {model_used}") + usage = r.get("usage", {}) + print(f" tokens: {usage.get('total_tokens','?')} ({usage.get('prompt_tokens','?')} in + {usage.get('completion_tokens','?')} out)") + else: + print(" FAIL") + +sep("COMPLETATO") +print(f" Progetto: {PROJECT_NAME} (id: {project_id})") +print(f" Modelli: {', '.join(created_model_ids)}") +print(f" Token: {api_token[:30]}...") diff --git a/website/docusaurus.config.ts b/website/docusaurus.config.ts index 98134c0..f31ae9e 100644 --- a/website/docusaurus.config.ts +++ b/website/docusaurus.config.ts @@ -78,6 +78,10 @@ const config: Config = { position: 'left', label: 'Docs', }, + { + type: 'docsVersionDropdown', + position: 'right', + }, { href: 'https://github.com/Inebrio/Routerly', label: 'GitHub', diff --git a/website/versioned_docs/version-0.1.5/api/llm-proxy.md b/website/versioned_docs/version-0.1.5/api/llm-proxy.md new file mode 100644 index 0000000..4fe7b80 --- /dev/null +++ b/website/versioned_docs/version-0.1.5/api/llm-proxy.md @@ -0,0 +1,170 @@ +--- +title: LLM Proxy +sidebar_position: 2 +--- + +# LLM Proxy API + +The LLM proxy exposes standard-compatible endpoints. Any client that speaks the OpenAI or Anthropic protocol can connect without modification. + +**Base URL:** `http://localhost:3000/v1` + +**Authentication:** `Authorization: Bearer sk-rt-YOUR_PROJECT_TOKEN` + +--- + +## Chat Completions + +``` +POST /v1/chat/completions +``` + +OpenAI-compatible chat completions endpoint. Accepts the same request body as the OpenAI API. + +### Request + +```json +{ + "model": "gpt-5-mini", + "messages": [ + { "role": "system", "content": "You are a helpful assistant." }, + { "role": "user", "content": "Hello!" } + ], + "stream": false, + "temperature": 0.7, + "max_tokens": 1024 +} +``` + +The `model` field can be: +- A specific model ID registered in Routerly (e.g. `gpt-5-mini`) +- Any value — Routerly will use its routing policies to pick the best model regardless + +### Response (non-streaming) + +Standard OpenAI `ChatCompletion` object, with an additional header: + +``` +x-routerly-trace-id: 018f3c2a-4b5d-7e8f-9012-34567890abcd +``` + +### Response (streaming) + +When `"stream": true`, the response is a Server-Sent Events stream. Each event has one of the following types: + +| SSE data prefix | Description | +|----------------|-------------| +| `data: {"type":"trace",...}` | Routing decision metadata (first event) | +| `data: {"type":"content",...}` | Token chunk from the model | +| `data: [DONE]` | End of stream | + +The `trace` event includes the selected model, policy scores, and request cost estimate. + +--- + +## Responses API + +``` +POST /v1/responses +``` + +OpenAI Responses API compatible endpoint. Supports stateful multi-turn conversations via `previous_response_id`. + +### Request + +```json +{ + "model": "gpt-5-mini", + "input": "Tell me a joke.", + "stream": false +} +``` + +### Response + +Standard OpenAI `Response` object structure. + +--- + +## Anthropic Messages + +``` +POST /v1/messages +``` + +Anthropic Messages API compatible endpoint. Use this with the Anthropic SDK by setting `base_url` to `http://localhost:3000`. + +### Request + +```json +{ + "model": "claude-haiku-4-5", + "max_tokens": 1024, + "messages": [ + { "role": "user", "content": "Hello!" } + ] +} +``` + +### Response + +Standard Anthropic `Message` object. + +--- + +## Count Tokens + +``` +POST /v1/messages/count_tokens +``` + +Anthropic-compatible token counting endpoint. Returns the number of input tokens for a given message set without making an inference call. + +### Request + +```json +{ + "model": "claude-haiku-4-5", + "messages": [ + { "role": "user", "content": "Hello!" } + ] +} +``` + +### Response + +```json +{ "input_tokens": 10 } +``` + +--- + +## Project-Scoped Proxy + +The same endpoints are available scoped to a specific project: + +``` +POST /projects/{slug}/v1/chat/completions +POST /projects/{slug}/v1/responses +POST /projects/{slug}/v1/messages +``` + +The project slug in the URL takes precedence over the slug inferred from the Bearer token. Use this when one token has access to multiple projects. + +--- + +## Streaming Protocol Details + +Routerly extends the standard SSE stream with a `trace` event at the start: + +``` +data: {"type":"trace","model":"gpt-5-mini","provider":"openai","policies":["health","cheapest"],"costEstimate":0.000025} + +data: {"type":"content","delta":"Hello"} + +data: {"type":"content","delta":" there"} + +data: [DONE] +``` + +Clients that only look for `data:` lines starting after the `trace` event will receive standard OpenAI delta chunks and will not need modification. diff --git a/website/versioned_docs/version-0.1.5/api/management.md b/website/versioned_docs/version-0.1.5/api/management.md new file mode 100644 index 0000000..bee7a71 --- /dev/null +++ b/website/versioned_docs/version-0.1.5/api/management.md @@ -0,0 +1,454 @@ +--- +title: Management API +sidebar_position: 3 +--- + +# Management API + +The management API is used by the dashboard and CLI. All endpoints require a JWT session token. + +**Base URL:** `http://localhost:3000/api` + +**Authentication:** `Authorization: Bearer ` + +Obtain a JWT via [POST /api/auth/login](#login). + +--- + +## Authentication + +### Login + +``` +POST /api/auth/login +``` + +```json +{ "email": "admin@example.com", "password": "your-password" } +``` + +**Response:** +```json +{ + "token": "eyJ...", + "refreshToken": "a3f8c2...", + "user": { "id": "uuid", "email": "admin@example.com", "role": "admin", "permissions": [] } +} +``` + +- `token` — short-lived JWT (1 hour). Use as `Authorization: Bearer ` on all other endpoints. +- `refreshToken` — opaque token used to obtain new access tokens without re-entering credentials. Store securely; see [POST /api/auth/refresh](#refresh). Rotates on every use. + +### Refresh + +``` +POST /api/auth/refresh +``` + +This endpoint is **public** (no `Authorization` header required). + +```json +{ "refreshToken": "a3f8c2..." } +``` + +**Response:** +```json +{ + "token": "eyJ...", + "refreshToken": "b9d4e1...", + "user": { "id": "uuid", "email": "admin@example.com", "role": "admin", "permissions": [] } +} +``` + +Issues a new 1-hour access token **and a new refresh token** (rotation). The previous refresh token is immediately invalidated — replace it with the value returned in the response. Returns `401` if the token is invalid or has already been used/revoked. + +:::note +The CLI and dashboard perform this refresh automatically — the CLI tries silently when the token expires or is within 5 minutes of expiry; the dashboard retries on any `401` response. Both clients persist the new refresh token automatically. +::: + +--- + +## Setup + +### Check Setup Status + +``` +GET /api/setup/status +``` + +Returns `{ "configured": false }` if no admin account exists yet; `{ "configured": true }` otherwise. + +### Create First Admin + +``` +POST /api/setup/first-admin +``` + +Only available when `configured: false`. + +```json +{ "email": "admin@example.com", "password": "secure-password" } +``` + +--- + +## Me (Current User) + +### Get Profile + +``` +GET /api/me +``` + +### Update Profile + +``` +PUT /api/me +``` + +```json +{ "email": "new@example.com", "currentPassword": "old", "newPassword": "new" } +``` + +--- + +## Models + +### List Models + +``` +GET /api/models +``` + +### Create Model + +``` +POST /api/models +``` + +```json +{ + "id": "gpt-5-mini", + "provider": "openai", + "apiKey": "sk-...", + "inputPrice": 0.25, + "outputPrice": 2.0, + "contextWindow": 128000, + "capabilities": ["functionCalling", "json"] +} +``` + +### Get Model + +``` +GET /api/models/:id +``` + +### Update Model + +``` +PUT /api/models/:id +``` + +### Delete Model + +``` +DELETE /api/models/:id +``` + +### Rotate Model API Key + +``` +POST /api/models/:id/apikey +``` + +```json +{ "apiKey": "sk-NEW_KEY" } +``` + +--- + +## Projects + +### List Projects + +``` +GET /api/projects +``` + +### Create Project + +``` +POST /api/projects +``` + +```json +{ + "name": "My App", + "slug": "my-app", + "defaultTimeoutMs": 30000, + "models": ["gpt-5-mini"] +} +``` + +### Get Project + +``` +GET /api/projects/:slug +``` + +### Update Project + +``` +PUT /api/projects/:slug +``` + +### Delete Project + +``` +DELETE /api/projects/:slug +``` + +--- + +## Project Tokens + +### List Tokens + +``` +GET /api/projects/:slug/tokens +``` + +### Create Token + +``` +POST /api/projects/:slug/tokens +``` + +```json +{ + "name": "production", + "limits": [ + { + "metric": "cost", + "limit": 10.00, + "window": "monthly", + "mode": "extend" + } + ] +} +``` + +**Response includes the token value in plain text — returned once only.** + +### Update Token + +``` +PUT /api/projects/:slug/tokens/:tokenId +``` + +### Delete Token + +``` +DELETE /api/projects/:slug/tokens/:tokenId +``` + +--- + +## Project Members + +### List Members + +``` +GET /api/projects/:slug/members +``` + +### Add Member + +``` +POST /api/projects/:slug/members +``` + +```json +{ "userId": "user-uuid", "role": "viewer" } +``` + +### Update Member Role + +``` +PUT /api/projects/:slug/members/:userId +``` + +```json +{ "role": "editor" } +``` + +### Remove Member + +``` +DELETE /api/projects/:slug/members/:userId +``` + +--- + +## Users + +### List Users + +``` +GET /api/users +``` + +### Create User + +``` +POST /api/users +``` + +```json +{ "email": "user@example.com", "password": "password", "role": "operator" } +``` + +### Get User + +``` +GET /api/users/:id +``` + +### Update User + +``` +PUT /api/users/:id +``` + +### Delete User + +``` +DELETE /api/users/:id +``` + +--- + +## Roles + +### List Roles + +``` +GET /api/roles +``` + +### Create Role + +``` +POST /api/roles +``` + +```json +{ + "name": "billing_reviewer", + "permissions": ["project:read", "report:read"] +} +``` + +### Update Role + +``` +PUT /api/roles/:name +``` + +### Delete Role + +``` +DELETE /api/roles/:name +``` + +--- + +## Usage {#usage} + +### Query Usage Records + +``` +GET /api/usage +``` + +Query parameters: + +| Parameter | Type | Description | +|-----------|------|-------------| +| `from` | ISO date | Start of range | +| `to` | ISO date | End of range | +| `project` | string | Filter by project slug | +| `model` | string | Filter by model ID | +| `outcome` | string | `success`, `error`, `budget_exceeded` | +| `limit` | number | Max records to return (default: 100) | +| `offset` | number | Pagination offset | + +### Get Usage Record + +``` +GET /api/usage/:id +``` + +Returns the full record including the routing trace. + +--- + +## Settings + +### Get Settings + +``` +GET /api/settings +``` + +### Update Settings + +``` +PUT /api/settings +``` + +```json +{ + "port": 3000, + "logLevel": "info", + "defaultTimeoutMs": 30000, + "publicUrl": "https://routerly.example.com" +} +``` + +--- + +## Notifications + +### Test a Notification Channel + +``` +POST /api/notifications/test +``` + +```json +{ "channelName": "my-smtp" } +``` + +Returns `200 OK` on success or an error with details. + +--- + +## System + +### Get System Info + +``` +GET /api/system/info +``` + +**Response:** +```json +{ + "version": "1.2.3", + "uptime": 3600, + "node": "v22.0.0", + "platform": "darwin/arm64" +} +``` diff --git a/website/versioned_docs/version-0.1.5/api/overview.md b/website/versioned_docs/version-0.1.5/api/overview.md new file mode 100644 index 0000000..d0059e9 --- /dev/null +++ b/website/versioned_docs/version-0.1.5/api/overview.md @@ -0,0 +1,106 @@ +--- +title: API Overview +sidebar_position: 1 +--- + +# API Overview + +Routerly exposes two groups of HTTP endpoints: + +| Group | Path prefix | Auth method | Purpose | +|-------|-------------|-------------|---------| +| **LLM Proxy** | `/v1/*` | Bearer project token | Forward requests to LLM providers | +| **Management API** | `/api/*` | Bearer JWT (dashboard session) | Configure models, projects, users, etc. | + +--- + +## Authentication + +### LLM Proxy (`/v1/*`) + +Pass your **project token** as a Bearer token: + +```http +Authorization: Bearer sk-rt-YOUR_PROJECT_TOKEN +``` + +Project tokens start with `sk-rt-` and are created in the project's **Tokens** tab. + +### Management API (`/api/*`) + +First obtain a JWT by logging in: + +```bash +curl -X POST http://localhost:3000/api/auth/login \ + -H "Content-Type: application/json" \ + -d '{"email":"admin@example.com","password":"your-password"}' +``` + +Response: +```json +{ + "token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...", + "expiresIn": 86400 +} +``` + +Then pass the JWT as a Bearer token: + +```http +Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9... +``` + +JWTs expire after 24 hours. Re-authenticate to get a new token. + +--- + +## Error Format + +All errors return a JSON body: + +```json +{ + "error": "error_code", + "message": "Human-readable description" +} +``` + +Common error codes: + +| HTTP Status | `error` value | Meaning | +|-------------|--------------|---------| +| `400` | `validation_error` | Request body is invalid | +| `401` | `unauthorized` | Missing or invalid token | +| `403` | `forbidden` | Valid token but insufficient permissions | +| `404` | `not_found` | Resource does not exist | +| `409` | `conflict` | Duplicate resource (e.g. slug already taken) | +| `503` | `budget_exceeded` | Request blocked by a budget limit | +| `503` | `no_model_available` | All routing candidates filtered out | + +--- + +## Request Tracing + +Every proxied request includes the `x-routerly-trace-id` header in the response: + +``` +x-routerly-trace-id: 018f3c2a-4b5d-7e8f-9012-34567890abcd +``` + +Use this ID to look up the full request trace in the Usage page or via the [Usage API](./management.md#usage). + +--- + +## Health Check + +``` +GET /health +``` + +Unauthenticated. Returns `200 OK` with: + +```json +{ "status": "ok", "version": "1.2.3" } +``` + +Suitable for load-balancer health probes. diff --git a/website/versioned_docs/version-0.1.5/assets/screenshot-models.png b/website/versioned_docs/version-0.1.5/assets/screenshot-models.png new file mode 100644 index 0000000..23bb1d8 Binary files /dev/null and b/website/versioned_docs/version-0.1.5/assets/screenshot-models.png differ diff --git a/website/versioned_docs/version-0.1.5/assets/screenshot-overview.png b/website/versioned_docs/version-0.1.5/assets/screenshot-overview.png new file mode 100644 index 0000000..cf8cd3a Binary files /dev/null and b/website/versioned_docs/version-0.1.5/assets/screenshot-overview.png differ diff --git a/website/versioned_docs/version-0.1.5/assets/screenshot-projects.png b/website/versioned_docs/version-0.1.5/assets/screenshot-projects.png new file mode 100644 index 0000000..12d5cf0 Binary files /dev/null and b/website/versioned_docs/version-0.1.5/assets/screenshot-projects.png differ diff --git a/website/versioned_docs/version-0.1.5/assets/screenshot-usage.png b/website/versioned_docs/version-0.1.5/assets/screenshot-usage.png new file mode 100644 index 0000000..3e27c0f Binary files /dev/null and b/website/versioned_docs/version-0.1.5/assets/screenshot-usage.png differ diff --git a/website/versioned_docs/version-0.1.5/cli/commands.md b/website/versioned_docs/version-0.1.5/cli/commands.md new file mode 100644 index 0000000..d5ac6a0 --- /dev/null +++ b/website/versioned_docs/version-0.1.5/cli/commands.md @@ -0,0 +1,432 @@ +--- +title: Commands +sidebar_position: 2 +--- + +# CLI Commands + +Complete reference for all `routerly` CLI commands. + +--- + +## `routerly auth` + +### `routerly auth login` + +Authenticate with a Routerly service and save credentials locally. + +``` +routerly auth login [options] +``` + +| Option | Description | +|--------|-------------| +| `--url ` | Service URL (default: value from installation) | +| `--email ` | Your dashboard email address | +| `--password ` | Your password (prompted interactively if omitted) | +| `--alias ` | Friendly name for this account | + +If the email is already saved, you are asked whether to overwrite the existing entry or create a new one. The first account is automatically named `default`. + +On success, a permanent **refresh token** is saved alongside the session token so future sessions are renewed automatically. + +### `routerly auth refresh [alias]` + +Manually obtain a new access token using the saved refresh token. Useful after a long suspension. + +``` +routerly auth refresh [alias] +``` + +If `alias` is omitted, the currently active account is used. Fails if no refresh token is stored (run `auth login` to re-authenticate). + +### `routerly auth logout [alias]` + +``` +routerly auth logout [alias] +``` + +Removes the saved account (defaults to the active account). Removes the access token and refresh token from local storage. + +### `routerly auth ps` + +List all saved accounts. + +``` +routerly auth ps +``` + +The active account is marked with `*`. + +### `routerly auth switch ` + +``` +routerly auth switch +``` + +Sets the active account for subsequent commands. + +### `routerly auth rename ` + +``` +routerly auth rename +``` + +### `routerly auth whoami` + +``` +routerly auth whoami +``` + +Prints the active account alias, email, role, and server URL. + +--- + +## `routerly model` + +### `routerly model list` + +``` +routerly model list [--json] +``` + +### `routerly model add` + +``` +routerly model add [options] +``` + +| Option | Description | +|--------|-------------| +| `--id ` | Model identifier (e.g. `gpt-5-mini`) | +| `--provider ` | Provider ID: `openai`, `anthropic`, `gemini`, `mistral`, `cohere`, `xai`, `ollama`, `custom` | +| `--api-key ` | Provider API key | +| `--base-url ` | Override provider endpoint | +| `--input-price ` | Input price per 1M tokens (USD) | +| `--output-price ` | Output price per 1M tokens (USD) | +| `--context-window ` | Max context window tokens | + +Calling without options launches an interactive wizard. + +### `routerly model edit` + +``` +routerly model edit --id [field options] +``` + +Same options as `add`. Only specified fields are updated. + +### `routerly model remove` + +``` +routerly model remove --id +``` + +--- + +## `routerly project` + +Project commands are organised into sub-groups. The first argument is always a **project name or ID**. + +### `routerly project list` + +``` +routerly project list [--json] +``` + +### `routerly project add` + +``` +routerly project add [options] +``` + +| Option | Description | +|--------|-------------| +| `--name ` | Project display name | +| `--slug ` | URL-safe identifier (must be unique) | +| `--models ` | Comma-separated list of model IDs to assign | +| `--timeout ` | Default request timeout in ms | + +### `routerly project remove` + +``` +routerly project remove +``` + +--- + +### Routing — `routerly project routing` + +#### `routerly project routing show ` + +Display the routing configuration (auto-routing flag, routing model, fallback models, and policy stack). + +#### `routerly project routing update ` + +``` +routerly project routing update [options] +``` + +| Option | Description | +|--------|-------------| +| `--routing-model ` | Model ID used for LLM-based routing decisions | +| `--fallback-models ` | Comma-separated fallback routing model IDs | +| `--auto-routing` / `--no-auto-routing` | Enable or disable auto-routing | + +#### `routerly project routing policy list ` + +List all routing policies with their priority order, enabled status, and configuration. + +#### `routerly project routing policy enable ` + +Enable a policy type (adds it to the stack if not present). Optionally pass `--config ` for policy-specific settings. + +Available types: `health`, `context`, `capability`, `budget-remaining`, `rate-limit`, `llm`, `performance`, `fairness`, `cheapest` + +```bash +routerly project routing policy enable my-api health +routerly project routing policy enable my-api llm --config '{"memoryCount":3}' +``` + +#### `routerly project routing policy disable ` + +Disable a policy without removing it from the stack. + +#### `routerly project routing policy reorder ` + +Reorder the policy stack. Provide a comma-separated list of types in the desired evaluation order; any unlisted policies are appended at the end. + +```bash +routerly project routing policy reorder my-api health,context,budget-remaining,llm,cheapest +``` + +--- + +### Models — `routerly project model` + +#### `routerly project model list ` + +List target models configured in the project, with their prompt hints. + +#### `routerly project model add ` + +```bash +routerly project model add my-api openai/gpt-5.2 +routerly project model add my-api anthropic/claude-opus-4-6 --prompt "Use for complex reasoning" +``` + +| Option | Description | +|--------|-------------| +| `--prompt ` | System prompt hint used when this model is selected | + +#### `routerly project model remove ` + +Remove a target model from the project. + +#### `routerly project model set-prompt ` + +Update (or clear) the system prompt hint for a model. + +```bash +routerly project model set-prompt my-api openai/gpt-5.2 --prompt "Fast tasks only" +routerly project model set-prompt my-api openai/gpt-5.2 --prompt "" # clear +``` + +--- + +### Tokens — `routerly project token` + +#### `routerly project token list ` + +List all API tokens for the project. + +#### `routerly project token create ` + +Create a new project API token. The token value is shown **once only**. + +```bash +routerly project token create my-api +routerly project token create my-api --labels "production,backend" +``` + +| Option | Description | +|--------|-------------| +| `--labels ` | Comma-separated labels for the token | + +Optionally add spending limits inline: + +| Option | Description | +|--------|-------------| +| `--limit ` | Limit spec: `::::` (repeatable) | + +Limit spec examples: +- `openai/gpt-5.2:cost:period:monthly:10` — $10/month cap +- `openai/gpt-5.2:calls:rolling:24:hours:500` — 500 calls per rolling 24 h + +#### `routerly project token edit ` + +Add or remove limits on an existing token. + +| Option | Description | +|--------|-------------| +| `--add-limit ` | Add a limit (repeatable) | +| `--remove-limit ` | Remove a limit matching model+metric+window (repeatable) | + +#### `routerly project token remove ` + +Revoke and delete an API token. + +--- + +### Members — `routerly project member` + +#### `routerly project member list ` + +List project members with their role. + +#### `routerly project member add ` + +```bash +routerly project member add my-api --email user@example.com --role viewer +``` + +| Option | Description | +|--------|-------------| +| `--email ` | Member's email address | +| `--role ` | Role to assign (`admin`, `editor`, `viewer`, or a custom role) | + +#### `routerly project member set-role ` + +```bash +routerly project member set-role my-api --email user@example.com --role editor +``` + +#### `routerly project member remove ` + +```bash +routerly project member remove my-api --email user@example.com +``` + +--- + +## `routerly user` + +### `routerly user list` + +``` +routerly user list [--json] +``` + +### `routerly user add` + +``` +routerly user add --email --role +``` + +You will be prompted for the new user's password. + +### `routerly user remove` + +``` +routerly user remove --email +``` + +--- + +## `routerly role` + +### `routerly role list` + +``` +routerly role list [--json] +``` + +### `routerly role add` + +``` +routerly role add --name --permissions +``` + +Available permissions: `project:read`, `project:write`, `model:read`, `model:write`, `user:read`, `user:write`, `report:read`. + +### `routerly role edit` + +``` +routerly role edit --name --permissions +``` + +### `routerly role remove` + +``` +routerly role remove --name +``` + +--- + +## `routerly report` + +### `routerly report usage` + +Aggregated usage summary grouped by model. + +``` +routerly report usage [options] +``` + +| Option | Description | +|--------|-------------| +| `--period ` | `daily`, `weekly`, `monthly` (default: `monthly`) | +| `--project ` | Filter to one project | +| `--json` | JSON output | + +### `routerly report calls` + +Recent request log. + +``` +routerly report calls [options] +``` + +| Option | Description | +|--------|-------------| +| `--limit ` | Number of records to return (default: 20) | +| `--project ` | Filter to one project | +| `--json` | JSON output | + +--- + +## `routerly service` + +### `routerly service status` + +``` +routerly service status [--json] +``` + +Same as `routerly status`. + +### `routerly service configure` + +``` +routerly service configure [options] +``` + +| Option | Description | +|--------|-------------| +| `--port ` | Service port | +| `--host ` | Bind address | +| `--dashboard ` | Enable/disable web dashboard | +| `--log-level ` | `trace` / `debug` / `info` / `warn` / `error` | +| `--timeout ` | Global default request timeout | +| `--public-url ` | External URL of the service | + +--- + +## `routerly status` + +``` +routerly status [--json] +``` + +Check whether the active Routerly service is reachable. Prints URL, version, and uptime. Exit code `0` if the service is up, `1` otherwise. + diff --git a/website/versioned_docs/version-0.1.5/cli/overview.md b/website/versioned_docs/version-0.1.5/cli/overview.md new file mode 100644 index 0000000..07dbc1f --- /dev/null +++ b/website/versioned_docs/version-0.1.5/cli/overview.md @@ -0,0 +1,110 @@ +--- +title: CLI Overview +sidebar_position: 1 +--- + +# CLI Overview + +The `routerly` CLI lets you manage all aspects of a Routerly instance from the terminal — models, projects, users, usage reports, and service configuration. It communicates with the running Routerly service via its management API. + +--- + +## Installation + +The CLI is installed automatically by the Routerly installer. Verify it is available: + +```bash +routerly --version +``` + +--- + +## Authentication + +The CLI needs credentials to connect to a Routerly service. Authenticate with: + +```bash +routerly auth login --url http://localhost:3000 --email admin@example.com +``` + +You will be prompted for your password. On success, a JWT session token is saved in `~/.routerly/cli/auth.json`. + +### Multiple Accounts + +You can manage multiple Routerly instances with named aliases: + +```bash +routerly auth login \ + --url https://routerly.example.com \ + --email admin@example.com \ + --alias production + +routerly auth login \ + --url http://localhost:3000 \ + --email dev@example.com \ + --alias local +``` + +Switch between accounts with: + +```bash +routerly auth switch --alias production +``` + +Rename an alias: + +```bash +routerly auth rename --alias local --new-alias dev +``` + +List all saved accounts: + +```bash +routerly auth list +``` + +Log out: + +```bash +routerly auth logout # logs out the active account +routerly auth logout --alias production # logs out a specific account +``` + +--- + +## Current Service Status + +```bash +routerly status +``` + +Prints: service URL, version, uptime, and whether the service is reachable. Add `--json` for machine-readable output. + +--- + +## Global Options + +| Option | Description | +|--------|-------------| +| `--alias ` | Use a specific saved account instead of the active one | +| `--url ` | Connect to this service URL (overrides the saved account) | +| `--json` | Output as JSON (available on most commands) | +| `--help` | Show help | +| `--version` | Print CLI version | + +--- + +## Command Groups + +| Group | Description | +|-------|-------------| +| `routerly auth` | Authentication and account management | +| `routerly model` | Register and manage LLM models | +| `routerly project` | Create and manage projects | +| `routerly user` | Manage dashboard users | +| `routerly role` | Manage RBAC roles | +| `routerly report` | Usage and billing reports | +| `routerly service` | Service configuration | +| `routerly status` | Check service reachability | + +See [CLI: Commands](./commands.md) for the full reference. diff --git a/website/versioned_docs/version-0.1.5/concepts/architecture.md b/website/versioned_docs/version-0.1.5/concepts/architecture.md new file mode 100644 index 0000000..a74099d --- /dev/null +++ b/website/versioned_docs/version-0.1.5/concepts/architecture.md @@ -0,0 +1,97 @@ +--- +title: Architecture +sidebar_position: 1 +--- + +# Architecture + +Routerly is a self-hosted API gateway that sits between your application and one or more LLM providers. It exposes standard-compatible endpoints (`/v1/chat/completions`, `/v1/responses`, `/v1/messages`) so existing SDKs work without modification. + +--- + +## Component Overview + +``` +┌────────────────────────────────────────────────────────────────┐ +│ Any Client │ +│ │ +│ Your App │ OpenAI / Anthropic SDK │ Cursor │ Open WebUI│ +│ │ LibreChat │ OpenClaw │ LangChain / LlamaIndex│ +└───────────────────────┬────────────────────────────────────────┘ + │ Bearer sk-rt- + │ POST /v1/chat/completions (OpenAI) + │ POST /v1/messages (Anthropic) + ▼ +┌─────────────────────────────────────────────────────┐ +│ Routerly Service │ +│ ┌────────────┐ ┌────────────┐ ┌──────────────┐ │ +│ │ Auth Guard │ │ Router │ │ Budget Guard │ │ +│ └────────────┘ └─────┬──────┘ └──────────────┘ │ +│ │ │ +│ ┌─────────────────────▼────────────────────────┐ │ +│ │ Provider Adapters │ │ +│ │ OpenAI · Anthropic · Gemini · Mistral · … │ │ +│ └─────────────────────────────────────────────┘ │ +└─────────────────────────────────────────────────────┘ + │ + ┌──────────┴──────────┐ + ▼ ▼ + ┌─────────────┐ ┌─────────────┐ + │ OpenAI API │ … │ Ollama API │ + └─────────────┘ └─────────────┘ +``` + +--- + +## Packages + +Routerly is a monorepo composed of four packages: + +| Package | Description | +|---------|-------------| +| `packages/service` | The core Fastify HTTP server, routing engine, and provider adapters | +| `packages/dashboard` | The React + Vite web UI served at `/dashboard` | +| `packages/cli` | The `routerly` CLI tool (Commander.js) | +| `packages/shared` | Shared TypeScript types, provider definitions, and utilities | + +--- + +## Request Lifecycle + +When your application sends a chat request to Routerly: + +1. **Authentication** — The Bearer token is validated against the list of project tokens. +2. **Project resolution** — The project's routing configuration and budget are loaded. +3. **Budget pre-check** — If the project or any parent budget is exhausted, Routerly returns `503` immediately. +4. **Routing** — The configured routing policies are applied in priority order to select a model. Each policy can score or filter the candidate set. +5. **Provider dispatch** — The request is translated to the target provider's wire format (OpenAI, Anthropic Messages, Gemini, …) and forwarded. +6. **Streaming or buffering** — If `stream: true`, Routerly SSE-proxies the provider stream. Otherwise it buffers and returns a standard response. +7. **Cost accounting** — Token counts and cost are computed and appended to `usage.json`. +8. **Budget update** — All applicable budget windows (token, project, global) are incremented. +9. **Notifications** — If any budget threshold was crossed, alert channels (email, webhook) are triggered. + +--- + +## Configuration Storage + +All state is stored as JSON files on disk under `~/.routerly/` (override with `$ROUTERLY_HOME`). There is no external database dependency. + +| File | Contents | +|------|----------| +| `config/settings.json` | Service settings | +| `config/models.json` | Registered LLM models (API keys AES-encrypted) | +| `config/projects.json` | Projects, routing, tokens, member roles | +| `config/users.json` | Dashboard users (passwords bcrypt-hashed) | +| `config/roles.json` | Custom RBAC roles | +| `data/usage.json` | Per-request usage records (append-only) | + +--- + +## Ports and Protocols + +| Endpoint prefix | Protocol | Purpose | +|----------------|----------|---------| +| `/v1/*` | HTTP/1.1 + SSE | LLM proxy — authenticated with project tokens | +| `/api/*` | HTTP/1.1 | Management API — authenticated with JWT session | +| `/dashboard` | HTTP/1.1 | React SPA | +| `/health` | HTTP/1.1 | Health check (unauthenticated) | diff --git a/website/versioned_docs/version-0.1.5/concepts/budgets-and-limits.md b/website/versioned_docs/version-0.1.5/concepts/budgets-and-limits.md new file mode 100644 index 0000000..ed376b2 --- /dev/null +++ b/website/versioned_docs/version-0.1.5/concepts/budgets-and-limits.md @@ -0,0 +1,134 @@ +--- +title: Budgets & Limits +sidebar_position: 6 +--- + +# Budgets & Limits + +Routerly has a three-level budget hierarchy that lets you control spending at the platform, project, and individual-token levels. Budgets can be configured for any metric — cost, call count, or token usage — over a rolling or calendar window. + +--- + +## Budget Hierarchy + +``` +Global budget +└── Project budget + └── Per-token budget +``` + +A request must pass **all** applicable budget checks before Routerly forwards it to a provider. If any budget is exhausted, Routerly returns `503 Service Unavailable` with a descriptive message. + +--- + +## Budget Levels + +### Global budget + +Applies to all requests across all projects. Useful for setting a hard ceiling on total platform spending. + +Configure via **Dashboard → Settings → Budgets** or in `settings.json`. + +### Project budget + +Applies to all requests through a specific project. Configure per project via the **General** tab in the project settings. + +### Per-token budget + +Applies to requests made with a specific project token. Configured in the project's **Tokens** tab. Per-token limits are useful when different applications share a project and you want to isolate their spending. + +--- + +## Metrics + +Each budget limit tracks one metric: + +| Metric | Description | +|--------|-------------| +| `cost` | USD cost calculated from token prices | +| `calls` | Total number of API requests | +| `input_tokens` | Total input tokens consumed | +| `output_tokens` | Total output tokens generated | +| `total_tokens` | Sum of input and output tokens | + +--- + +## Window Types + +Budgets reset based on the configured window type. + +### Period windows + +Reset at the start of each calendar period: + +| Window | Resets | +|--------|--------| +| `hourly` | Top of each hour | +| `daily` | Midnight (UTC) | +| `weekly` | Monday midnight (UTC) | +| `monthly` | 1st of each month | +| `yearly` | January 1st | + +### Rolling windows + +Track usage over a sliding time window: + +| Window | Period | +|--------|--------| +| `rolling_second` | Last 1 second | +| `rolling_minute` | Last 60 seconds | +| `rolling_hour` | Last 3,600 seconds | +| `rolling_day` | Last 86,400 seconds | +| `rolling_week` | Last 7 days | +| `rolling_month` | Last 30 days | + +--- + +## Limit Modes (per-token budgets) + +When a per-token budget is configured, the `mode` field controls how it interacts with the parent project budget: + +| Mode | Behaviour | +|------|-----------| +| `replace` | The per-token limit overrides the project limit entirely for this token | +| `extend` | The per-token limit stacks on top of the project limit (both must pass) | +| `disable` | No budget limit for this token, regardless of project limits | + +--- + +## Configuring Budgets + +### Dashboard + +**Project budget:** Open the project → **General** tab → Budget section. +**Per-token budget:** Open the project → **Tokens** tab → click the edit icon next to a token. +**Global budget:** Open **Settings → Budgets**. + +### CLI + +You can set a project-level daily or monthly cost budget when adding a model to a project: + +```bash +routerly project add-model \ + --slug my-app \ + --model gpt-5-mini \ + --daily-budget 5.00 \ + --monthly-budget 50.00 +``` + +--- + +## What Happens When a Budget Is Exhausted + +- Routerly returns **HTTP 503** with a JSON error body: + ```json + {"error":"budget_exceeded","message":"Monthly cost limit for project 'my-app' reached ($50.00)"} + ``` +- The response is immediate — no provider API call is made. +- Once the budget window resets (e.g. at the start of next month), requests are accepted again automatically. + +--- + +## Notifications + +You can configure Routerly to send an alert when a budget reaches a configured threshold (e.g. 80% used) or when it is exhausted. See [Concepts: Notifications](./notifications.md) for setup. diff --git a/website/versioned_docs/version-0.1.5/concepts/models.md b/website/versioned_docs/version-0.1.5/concepts/models.md new file mode 100644 index 0000000..e99c86d --- /dev/null +++ b/website/versioned_docs/version-0.1.5/concepts/models.md @@ -0,0 +1,127 @@ +--- +title: Models +sidebar_position: 3 +--- + +# Models + +A **model** in Routerly is a registered entry that maps a model identifier to a provider, its API credentials, pricing, and capabilities. You register each model once, and it becomes available to all projects. + +--- + +## Registering a Model + +### CLI + +```bash +routerly model add \ + --id gpt-5-mini \ + --provider openai \ + --api-key sk-YOUR_KEY +``` + +For well-known model IDs, pricing and capabilities are pre-filled from built-in presets. You can override them with additional flags. + +```bash +routerly model add \ + --id my-fine-tune \ + --provider openai \ + --api-key sk-YOUR_KEY \ + --input-price 0.5 \ + --output-price 2.0 \ + --context-window 128000 +``` + +Use `routerly model list` to see all registered models and `routerly model remove --id ` to delete one. + +### Dashboard + +1. Open **Models** in the sidebar +2. Click **+ New Model** +3. Fill in: **Model ID**, **Provider**, **API Key** +4. Pricing fields are pre-filled for known models — adjust if needed +5. Set **Capabilities** (vision, function calling, thinking, JSON mode) for use by the capability routing policy +6. Click **Save** + +You can also **Clone** an existing model entry to register a fine-tune or variant quickly. + +--- + +## Model Configuration Fields + +| Field | Description | +|-------|-------------| +| **Model ID** | The identifier sent to the provider API (e.g. `gpt-5-mini`) | +| **Provider** | One of `openai`, `anthropic`, `gemini`, `mistral`, `cohere`, `xai`, `ollama`, `custom` | +| **API Key** | Provider API key (AES-256 encrypted at rest) | +| **Base URL** | Override the provider endpoint (useful for proxies and custom providers) | +| **Input price** | Price per 1 million input tokens in USD | +| **Output price** | Price per 1 million output tokens in USD | +| **Cache price** | Price per 1 million cached input tokens (if the provider supports it) | +| **Context window** | Maximum number of tokens the model accepts | +| **Pricing tiers** | Optional: higher prices above a token-count threshold (see below) | +| **Capabilities** | `vision`, `functionCalling`, `thinking`, `json` | +| **Enabled** | Toggle to temporarily disable a model without deleting it | + +--- + +## Pricing Tiers + +Some models have different prices for long-context requests. You can configure a pricing tier that applies above a token threshold. + +**Example — Anthropic claude-opus-4-6:** + +| Range | Input | Output | +|-------|-------|--------| +| ≤ 200k tokens | $5 / 1M | $25 / 1M | +| > 200k tokens | $10 / 1M | $37.5 / 1M | + +When Routerly calculates cost for a request, it checks the total token count and applies the appropriate tier automatically. + +--- + +## Capabilities + +Capabilities control which models the **capability** routing policy will select. Set them accurately to get correct routing behaviour. + +| Capability | Description | +|-----------|-------------| +| `vision` | Model can process image inputs | +| `functionCalling` | Model supports tool/function call format | +| `thinking` | Model exposes chain-of-thought (e.g. o1, o3, Claude with extended thinking) | +| `json` | Model reliably generates valid JSON with `response_format: {type: "json_object"}` | + +--- + +## Assigning Models to Projects + +A model is not usable in a project until it is assigned. When creating a project via CLI, use `--models`: + +```bash +routerly project add --name "My App" --slug my-app --models gpt-5-mini,claude-haiku-4-5 +``` + +From the dashboard, open the project → **Routing** tab, then drag and drop models into the routing configuration. + +--- + +## Cloning a Model + +Cloning is useful when you have a fine-tuned variant of a base model and want to reuse its pricing configuration: + +1. Click the **Clone** icon next to a model in the list +2. Change the Model ID and API Key +3. Adjust pricing if your fine-tune has different rates +4. Save + +--- + +## Removing a Model + +:::warning +Removing a model that is assigned to active projects will cause routing failures for those projects. Remove the model from all project configurations first. +::: + +```bash +routerly model remove --id gpt-5-mini +``` diff --git a/website/versioned_docs/version-0.1.5/concepts/notifications.md b/website/versioned_docs/version-0.1.5/concepts/notifications.md new file mode 100644 index 0000000..dedfb19 --- /dev/null +++ b/website/versioned_docs/version-0.1.5/concepts/notifications.md @@ -0,0 +1,162 @@ +--- +title: Notifications +sidebar_position: 7 +--- + +# Notifications + +Routerly can send notifications when a budget limit is reached (or approaching a threshold). Notifications are delivered via one or more **channels** — email providers or webhooks. + +--- + +## Configuring Notification Channels + +Configure channels in **Dashboard → Settings → Notifications**, or by editing the `notifications` array in `settings.json`. + +Each channel has: +- A `type` (the provider identifier) +- Connection settings specific to that provider +- A `name` label used in logs + +After saving, use the **Send Test** button to verify the channel works before a real alert is triggered. + +You can also test a channel via the API: + +```bash +curl -X POST http://localhost:3000/api/notifications/test \ + -H "Authorization: Bearer $ADMIN_JWT" \ + -H "Content-Type: application/json" \ + -d '{"channelName": "my-smtp"}' +``` + +--- + +## Channel Types + +### SMTP + +Sends email via any SMTP server. Routerly auto-detects whether to use SSL (port 465) or STARTTLS (port 587 / 25). + +```jsonc +{ + "type": "smtp", + "name": "my-smtp", + "host": "smtp.example.com", + "port": 587, + "user": "alerts@example.com", + "password": "secret", + "from": "Routerly ", + "to": "admin@example.com" +} +``` + +### Amazon SES + +Uses Amazon SES via its regional SMTP endpoint. Authentication is the standard SES SMTP username + password (not your AWS credentials). + +```jsonc +{ + "type": "ses", + "name": "ses-us-east", + "region": "us-east-1", + "user": "AKIAIOSFODNN7EXAMPLE", + "password": "ses_smtp_password", + "from": "alerts@example.com", + "to": "admin@example.com" +} +``` + +### SendGrid + +Uses SendGrid's SMTP relay at `smtp.sendgrid.net:587`. The username is always `apikey` and the password is your SendGrid API key. + +```jsonc +{ + "type": "sendgrid", + "name": "sendgrid", + "apiKey": "SG.xxxx", + "from": "alerts@example.com", + "to": "admin@example.com" +} +``` + +### Azure Communication Services + +Sends email via Azure Communication Services. Authentication uses HMAC-SHA256 with your connection string's access key. + +```jsonc +{ + "type": "azure", + "name": "azure-email", + "connectionString": "endpoint=https://....communication.azure.com;accesskey=BASE64KEY==", + "from": "alerts@yourdomain.com", + "to": "admin@example.com" +} +``` + +### Google (Gmail / Google Workspace) + +Uses the Gmail API via OAuth 2.0. Requires a Google Cloud project with the Gmail API enabled and a refresh token. + +```jsonc +{ + "type": "google", + "name": "gmail", + "clientId": "123456789.apps.googleusercontent.com", + "clientSecret": "GOCSPX-xxxx", + "refreshToken": "1//xxxx", + "from": "alerts@gmail.com", + "to": "admin@example.com" +} +``` + +### Webhook + +Sends an HTTP POST request to any URL. An optional HMAC-SHA256 signature is included in the `X-Routerly-Signature` header when a `secret` is configured. + +```jsonc +{ + "type": "webhook", + "name": "slack-webhook", + "url": "https://hooks.slack.com/services/xxx/yyy/zzz", + "secret": "optional_signing_secret" +} +``` + +**Webhook payload:** +```json +{ + "event": "budget.exhausted", + "budget": { + "level": "project", + "name": "my-app", + "metric": "cost", + "window": "monthly", + "limit": 50.00, + "current": 50.12 + }, + "timestamp": "2025-01-15T14:30:00Z" +} +``` + +**Signature verification (Node.js):** +```javascript +import { createHmac } from 'crypto'; + +const signature = req.headers['x-routerly-signature']; +const body = req.rawBody; // raw request body as string +const expected = createHmac('sha256', secret).update(body).digest('hex'); +const isValid = signature === `sha256=${expected}`; +``` + +--- + +## Notification Events + +| Event | Description | +|-------|-------------| +| `budget.threshold` | Budget reached the configured warning threshold (e.g. 80%) | +| `budget.exhausted` | Budget reached its limit | +| `budget.reset` | Budget window reset (optional) | + +Threshold percentage is configurable per budget limit. diff --git a/website/versioned_docs/version-0.1.5/concepts/projects.md b/website/versioned_docs/version-0.1.5/concepts/projects.md new file mode 100644 index 0000000..89e19fe --- /dev/null +++ b/website/versioned_docs/version-0.1.5/concepts/projects.md @@ -0,0 +1,98 @@ +--- +title: Projects +sidebar_position: 4 +--- + +# Projects + +A **project** is an isolated workspace inside Routerly. Each project has: + +- Its own **API tokens** that your applications use to authenticate +- Its own **routing configuration** (which models to use and in what order) +- Its own **budget limits** (optional) +- Its own **usage logs** +- A set of **members** with specific roles (for dashboard access) + +--- + +## Creating a Project + +### CLI + +```bash +routerly project add \ + --name "My App" \ + --slug my-app \ + --models gpt-5-mini,claude-haiku-4-5 +``` + +`--slug` is the URL-safe identifier used in logs and the dashboard. It must be unique. + +### Dashboard + +1. Open **Projects** in the sidebar +2. Click **+ New Project** +3. Fill in Name, Slug, and optionally an initial model list +4. Click **Create** + +--- + +## Project Tabs + +Each project in the dashboard has five tabs: + +### General + +Shows the project name, slug, default request timeout, and the connection snippet (base URL and a masked token) ready to copy into your code. + +### Routing + +Configure which models the project can use and in what order. Drag routing policies into the list and set their parameters. See [Concepts: Routing](./routing.md) for details. + +### Tokens + +Manage the Bearer tokens used to authenticate API calls. Each token can have per-token budget limits that stack on top of the project-level limits. + +**Creating a token:** + +1. Click **+ New Token** +2. Give it a name (e.g. `production`, `staging`, `ci`) +3. Optionally configure per-token limits +4. Click **Create** — the token value is shown **once only** + +**Per-token limits** allow you to cap spending for individual applications or environments independently of the project-level budget. + +### Users + +Assign dashboard users to this project and control what they can see and do. Available project-scoped roles: `viewer`, `editor`, `admin`. See [Dashboard: Users & Roles](../dashboard/users-and-roles.md). + +### Logs + +Live view of recent requests routed through this project. Columns: timestamp, model used, status, input tokens, output tokens, cost. Click any row to see the full routing trace. + +The log table auto-refreshes at a configurable interval (5 s / 15 s / 30 s / 1 min / 5 min). + +--- + +## Project Slugs + +Slugs are used in the scoped proxy URL: + +``` +POST http://localhost:3000/projects/{slug}/v1/chat/completions +``` + +Using the scoped URL is optional — you can also use the generic `/v1/chat/completions` with a project token that is already bound to the project. + +--- + +## Listing and Removing Projects + +```bash +routerly project list +routerly project remove --slug my-app +``` + +:::warning +Removing a project deletes all its tokens and budget configuration. Usage records in `usage.json` are preserved for historical reporting. +::: diff --git a/website/versioned_docs/version-0.1.5/concepts/providers.md b/website/versioned_docs/version-0.1.5/concepts/providers.md new file mode 100644 index 0000000..eb40635 --- /dev/null +++ b/website/versioned_docs/version-0.1.5/concepts/providers.md @@ -0,0 +1,148 @@ +--- +title: Providers +sidebar_position: 2 +--- + +# Providers + +A **provider** is an LLM platform that Routerly knows how to communicate with. Each provider has its own wire protocol, authentication scheme, and model catalogue. + +--- + +## Supported Providers + +| Provider | ID | Authentication | Notes | +|----------|----|----------------|-------| +| OpenAI | `openai` | API key | Chat completions + Responses API + token counting | +| Anthropic | `anthropic` | API key | Messages API + token counting | +| Google Gemini | `gemini` | API key | OpenAI-compatible endpoint | +| Mistral | `mistral` | API key | OpenAI-compatible endpoint | +| Cohere | `cohere` | API key | OpenAI-compatible endpoint | +| xAI (Grok) | `xai` | API key | OpenAI-compatible endpoint | +| Ollama | `ollama` | None | Local inference; set `baseUrl` to your Ollama host | +| Custom | `custom` | Optional | Any OpenAI-compatible endpoint | + +--- + +## OpenAI + +| Model ID | Context | Input price | Output price | Capabilities | +|----------|---------|-------------|--------------|--------------| +| `gpt-5.2` | 128k | $1.75 / 1M | $14 / 1M | Vision, function calling, JSON | +| `gpt-5.1` | 128k | $1.25 / 1M | $10 / 1M | Vision, function calling, JSON | +| `gpt-5` | 128k | $1.25 / 1M | $10 / 1M | Vision, function calling, JSON | +| `gpt-5-mini` | 128k | $0.25 / 1M | $2 / 1M | Vision, function calling, JSON | +| `gpt-5-nano` | 128k | $0.05 / 1M | $0.4 / 1M | Function calling, JSON | +| `gpt-4.1` | 1M | $2 / 1M | $8 / 1M | Vision, function calling, JSON | +| `gpt-4.1-mini` | 1M | $0.40 / 1M | $1.6 / 1M | Vision, function calling, JSON | +| `gpt-4.1-nano` | 1M | $0.10 / 1M | $0.4 / 1M | Function calling, JSON | +| `gpt-4o` | 128k | $2.50 / 1M | $10 / 1M | Vision, function calling, JSON | +| `gpt-4o-mini` | 128k | $0.15 / 1M | $0.6 / 1M | Vision, function calling, JSON | +| `o1` | 200k | $15 / 1M | $60 / 1M | Thinking, function calling, JSON | +| `o3` | 200k | $2 / 1M | $8 / 1M | Thinking, function calling, JSON | +| `o4-mini` | 200k | $1.10 / 1M | $4.4 / 1M | Thinking, function calling, JSON | + +Prices are per 1 million tokens unless otherwise noted. + +--- + +## Anthropic + +| Model ID | Context | Input price | Output price | Notes | +|----------|---------|-------------|--------------|-------| +| `claude-opus-4-6` | 200k | $5 / 1M | $25 / 1M | Tier >200k tokens: $10 / $37.5 | +| `claude-sonnet-4-6` | 200k | $3 / 1M | $15 / 1M | | +| `claude-sonnet-4-5` | 200k | $3 / 1M | $15 / 1M | Tier >200k tokens: $6 / $22.5 | +| `claude-haiku-4-5` | 200k | $1 / 1M | $5 / 1M | | +| `claude-opus-4-1` | 200k | $15 / 1M | $75 / 1M | Vision, function calling, JSON | +| `claude-sonnet-4-1` | 200k | $3 / 1M | $15 / 1M | Vision, function calling, JSON | + +--- + +## Google Gemini + +| Model ID | Context | Input price | Output price | Notes | +|----------|---------|-------------|--------------|-------| +| `gemini-2.5-pro` | 2M | $1.25 / 1M | $10 / 1M | Tier >200k: $2.5 / $15 | +| `gemini-2.5-flash` | 1M | $0.30 / 1M | $2.5 / 1M | | +| `gemini-2.5-flash-lite` | 1M | $0.10 / 1M | $0.4 / 1M | | +| `gemini-3.1-pro-preview` | 2M | $2 / 1M | $12 / 1M | Tier >200k: higher | +| `gemini-3-pro-preview` | 2M | — | — | Experimental | +| `gemini-3-flash-preview` | 1M | — | — | Experimental | +| `gemini-2.0-flash` | 1M | $0.10 / 1M | $0.4 / 1M | | +| `gemini-2.0-flash-lite` | 1M | $0.075 / 1M | $0.3 / 1M | | + +--- + +## Mistral + +| Model ID | Notes | +|----------|-------| +| `mistral-large-latest` | Flagship model | +| `mistral-small-latest` | Efficient, low cost | +| `mistral-nemo` | Open-weight, 12B | +| `codestral-latest` | Code specialised | +| `ministral-8b-latest` | Ultra-small | + +--- + +## Cohere + +| Model ID | Notes | +|----------|-------| +| `command-r-plus` | Best quality | +| `command-r` | Balanced | +| `command-a-03-2025` | Latest generation | +| `command-nightly` | Bleeding edge | +| `c4ai-aya-expanse-8b` | Multilingual, 8B | +| `c4ai-aya-expanse-32b` | Multilingual, 32B | +| `embed-english-v3.0` | Embeddings | + +--- + +## xAI (Grok) + +| Model ID | Notes | +|----------|-------| +| `grok-3` | Latest flagship | +| `grok-3-fast` | Optimised for speed | +| `grok-3-mini` | Efficient | +| `grok-3-mini-fast` | Smallest / fastest | + +--- + +## Ollama (Local) + +| Model ID | Notes | +|----------|-------| +| `ollama/llama3.2` | Meta Llama 3.2, 3B | +| `ollama/llama3.1:8b` | Meta Llama 3.1, 8B | +| `ollama/qwen3:4b` | Qwen3, 4B | +| `ollama/qwen3:8b` | Qwen3, 8B | +| `ollama/mistral` | Mistral 7B | +| `ollama/phi4-mini` | Microsoft Phi-4 Mini | +| `ollama/gemma3:4b` | Google Gemma 3, 4B | +| `ollama/deepseek-r1:7b` | DeepSeek R1, 7B | + +Ollama models require a running Ollama server. The default base URL is `http://localhost:11434`. Override it per-model in the dashboard with the **Base URL** field. + +--- + +## Custom / Self-hosted + +Use provider ID `custom` for any OpenAI-compatible endpoint (vLLM, LM Studio, LocalAI, etc.): + +```bash +routerly model add \ + --id my-custom-model \ + --provider custom \ + --base-url http://192.168.1.50:8000/v1 \ + --input-price 0 \ + --output-price 0 +``` + +--- + +## Adding a Provider Model + +All models must be registered in Routerly before they can be used. See [Concepts: Models](./models.md) for registration details. diff --git a/website/versioned_docs/version-0.1.5/concepts/routing.md b/website/versioned_docs/version-0.1.5/concepts/routing.md new file mode 100644 index 0000000..7e9486c --- /dev/null +++ b/website/versioned_docs/version-0.1.5/concepts/routing.md @@ -0,0 +1,156 @@ +--- +title: Routing +sidebar_position: 5 +--- + +# Routing + +Routerly's router selects which model to use for each request by running a configurable stack of **routing policies**. Policies are applied in priority order; each policy can score, filter, or directly pick a model from the candidate set. + +:::tip Benchmarks +Reproducible routing benchmarks — latency overhead, cost savings, and failover behaviour — are published at **[github.com/Inebrio/routerly-benchmark](https://github.com/Inebrio/routerly-benchmark)**. +::: + +--- + +## How Routing Works + +1. The project's configured models are loaded as the candidate set. +2. Policies run in the order they appear in the routing configuration. +3. Each policy either **filters** some models out or **scores** them. At the end, the model with the highest combined score is selected. +4. If no model passes all filters, Routerly returns a `503` error with a descriptive message. + +### Positional Scoring + +Each model's position in the routing list contributes a base score: + +``` +weight = total_models - index +``` + +So a model at position 0 gets `weight = N`, the one at position 1 gets `weight = N-1`, etc. This creates a natural preference order even when no other scoring policies are active. + +--- + +## Available Policies + +### `cheapest` + +Selects the model with the lowest estimated cost for the current request. Estimation is based on registered pricing and the input token count. Output tokens are estimated at a configurable multiplier. + +**Use when:** cost control is the primary concern. + +### `health` + +Filters out models that have had a high error rate in the recent window, or that failed the last health check. Keeps Routerly routing away from degraded providers automatically. + +**Use when:** you want automatic failover. + +### `performance` + +Scores models by their recent p95 latency. Faster models receive higher scores. + +**Use when:** response time matters more than cost. + +### `capability` + +Filters models by required capabilities (`vision`, `functionCalling`, `thinking`, `json`). Only models that have all required capabilities remain as candidates. + +**Use when:** the request requires a specific capability (e.g. image input). + +### `context` + +Filters out models whose context window is smaller than the current request's estimated token count. + +**Use when:** you send long documents or long conversations, and some of your models have smaller context windows. + +### `llm` + +Uses a separate LLM call to decide which model to route to, based on request content. This policy is experimental and introduces an extra API call per request. + +**Config options:** + +| Option | Type | Default | Description | +|--------|------|---------|-------------| +| `thinking` | `boolean` | `false` | If `true` and the routing model supports extended thinking (e.g. Claude with thinking capability), the routing call uses it for more accurate decisions. **Warning:** this increases routing latency significantly. | + +**Use when:** you want dynamic model selection based on request semantics. + +### `rate-limit` + +Filters out models that are currently rate-limited (i.e. received a 429 response recently). The cooldown period is configurable per model. + +**Use when:** your usage volume can hit provider rate limits. + +### `fairness` + +Distributes requests across models to balance load, or ensures that cheaper models are only used up to a configured share of traffic. + +**Use when:** you have multiple capable models and want to spread load. + +### `budget-remaining` + +Scores models by how much of their associated budget is still available. Models with more remaining budget get higher scores. + +**Use when:** you have per-model spending limits and want Routerly to naturally prefer models with headroom. + +--- + +## Configuring Routing + +### Dashboard (recommended) + +1. Open the project → **Routing** tab +2. Drag a policy from the left panel into the active list +3. Configure the policy's parameters in the settings panel on the right +4. Drag to reorder — policies at the top have higher priority +5. Add target models below the policies + +### CLI + +```bash +# Add a model to a project with a monthly budget +routerly project add-model \ + --slug my-app \ + --model gpt-5-mini \ + --monthly-budget 10.00 + +# Remove a model +routerly project remove-model --slug my-app --model gpt-5-mini +``` + +--- + +## Example: Cost-first with Health Failover + +This configuration tries the cheapest available healthy model: + +``` +Policies (in order): + 1. health — remove unhealthy models + 2. cheapest — prefer lowest cost + +Models (in priority order): + 1. gpt-5-nano + 2. gpt-5-mini + 3. gpt-5 +``` + +If `gpt-5-nano` is unhealthy, `health` removes it from candidates, and `cheapest` picks `gpt-5-mini`. + +--- + +## Example: Capability Routing + +Route vision requests to a capable model while serving text-only requests with a cheaper model: + +``` +Policies: + 1. capability — requires: vision (if the request includes an image) + +Models: + 1. gpt-4.1 (has vision) + 2. gpt-5-nano (no vision) +``` + +Text-only requests → both are candidates → positional scoring picks `gpt-4.1`. Vision requests → `gpt-5-nano` is filtered out → `gpt-4.1` is used. If no vision model is available, Routerly returns `503`. diff --git a/website/versioned_docs/version-0.1.5/dashboard/models.md b/website/versioned_docs/version-0.1.5/dashboard/models.md new file mode 100644 index 0000000..7e84021 --- /dev/null +++ b/website/versioned_docs/version-0.1.5/dashboard/models.md @@ -0,0 +1,75 @@ +--- +title: Models +sidebar_position: 3 +--- + +# Dashboard: Models + +The Models page lets you register, edit, clone, and remove LLM models. All models registered here become available for use in project routing configurations. + +--- + +## Model List + +The list shows all registered models with the following columns: + +| Column | Description | +|--------|-------------| +| **Model ID** | Provider model identifier | +| **Provider** | OpenAI, Anthropic, Gemini, etc. | +| **Input Price** | USD per 1M input tokens | +| **Output Price** | USD per 1M output tokens | +| **Context Window** | Maximum tokens accepted | +| **Capabilities** | Icons for vision, function calling, thinking, JSON | +| **Enabled** | Toggle on/off without deleting | + +Click any column header to sort. + +--- + +## Adding a Model + +1. Click **+ New Model** +2. Fill in the form: + - **Model ID** — the identifier sent to the provider (e.g. `gpt-5-mini`) + - **Provider** — select from the dropdown + - **API Key** — encrypted at rest; leave blank for Ollama / custom models without auth + - **Base URL** — optional override (useful for proxies or self-hosted models) + - **Context Window** — pre-filled for known models + - **Pricing** — input/output/cache prices per 1M tokens; pre-filled for known models + - **Pricing Tiers** — add a tier for long-context pricing (e.g. Anthropic above 200k tokens) + - **Capabilities** — check all that apply + +3. Click **Save** + +--- + +## Editing a Model + +Click the **Edit** (pencil) icon next to a model. All fields except the Model ID are editable. + +To update the API key, enter a new value — Routerly re-encrypts it immediately. + +--- + +## Cloning a Model + +Click the **Clone** icon to create a copy of a model entry. Useful when registering a fine-tuned variant that shares the same provider and pricing as a base model. + +Change the **Model ID** and **API Key** as needed, then save. + +--- + +## Disabling a Model + +Toggle the **Enabled** switch to `off` to temporarily remove a model from routing without deleting it. Disabled models are visible in the list but are excluded from all routing decisions. + +--- + +## Removing a Model + +Click the **Delete** (trash) icon. You will be asked to confirm. + +:::warning +Removing a model that is assigned to active project routing configurations will cause routing failures for those projects. Remove the model from all project routing configs before deleting it. +::: diff --git a/website/versioned_docs/version-0.1.5/dashboard/overview.md b/website/versioned_docs/version-0.1.5/dashboard/overview.md new file mode 100644 index 0000000..4bbcf31 --- /dev/null +++ b/website/versioned_docs/version-0.1.5/dashboard/overview.md @@ -0,0 +1,49 @@ +--- +title: Overview +sidebar_position: 2 +--- + +# Overview + +The Overview page is the dashboard home screen. It provides a snapshot of activity across all projects for the selected time period. + +--- + +## Summary Cards + +The top row shows aggregate stats for the selected period: + +| Card | Description | +|------|-------------| +| **Total Cost** | Sum of all LLM costs in USD | +| **Total Calls** | Number of API requests routed | +| **Success Rate** | Percentage of requests that returned a successful response | +| **Errors** | Number of failed requests (provider errors, budget exceeded, etc.) | +| **Active Models** | Number of models that received at least one call | +| **Active Projects** | Number of projects with at least one call | + +--- + +## Daily Cost Chart + +The bar chart below the cards shows cost per day (or per period, depending on the selected window). Hover over a bar to see the breakdown by project or model. + +--- + +## Period Selector + +Use the selector in the top-right to change the reporting window: + +| Option | Description | +|--------|-------------| +| Daily | Today only | +| Weekly | Current calendar week | +| Monthly | Current calendar month | +| All | All time | + +--- + +## Navigating to Details + +- Click a project name in the chart legend to go directly to that project's detail view. +- Click the **Usage** item in the sidebar for the full analytics page with filtering and drill-down. diff --git a/website/versioned_docs/version-0.1.5/dashboard/playground.md b/website/versioned_docs/version-0.1.5/dashboard/playground.md new file mode 100644 index 0000000..00c77f0 --- /dev/null +++ b/website/versioned_docs/version-0.1.5/dashboard/playground.md @@ -0,0 +1,70 @@ +--- +title: Playground +sidebar_position: 8 +--- + +# Dashboard: Playground + +The Playground lets you send chat requests directly from the browser without writing any code. It is useful for testing models, verifying routing behaviour, and debugging prompt changes. + +--- + +## Getting Started + +1. Open **Playground** from the sidebar +2. Select a **Project Token** from the dropdown — only tokens from projects you have access to are listed +3. Type a message in the input box and press **Send** (or `Enter`) + +--- + +## Interface + +### Message History + +Messages are displayed in a conversational thread. Each message shows: +- The role (`user` / `assistant`) +- The content (with Markdown rendering for assistant responses) +- For image inputs: a thumbnail + +The conversation history is maintained for the duration of the browser session and sent with each request as `messages` context. + +### Model Display + +The model actually used is shown above each assistant response. If routing assigned a different model than expected, this is where you'll see it. + +### Streaming + +Responses stream in real time when the selected project's routing configuration supports streaming. A stop button (⏹) appears while a response is in progress — click it to abort. + +### Image Attachments + +Click the **Attach Image** button (or paste an image) to include image content in your message. This requires the routed model to have the `vision` capability. + +--- + +## Debug Panels + +Below the conversation, four collapsible panels show the complete request lifecycle: + +| Panel | Contents | +|-------|----------| +| **Router Request** | Project slug, requested model (if any), active routing policies | +| **Router Response** | Policy scores, candidate models, selected model and reason | +| **Model Request** | Exact payload sent to the provider | +| **Model Response** | Raw provider response including token counts and finish reason | + +These panels are invaluable for understanding why the router chose a specific model, or for diagnosing provider errors. + +--- + +## Clearing the Conversation + +Click **Clear** to reset the message history. The system prompt (if any) is preserved. + +--- + +## Limitations + +- The Playground does not save conversations after a page refresh. +- Function-calling / tool-use responses are shown as raw JSON. +- Audio and document inputs are not supported via the Playground UI. diff --git a/website/versioned_docs/version-0.1.5/dashboard/profile.md b/website/versioned_docs/version-0.1.5/dashboard/profile.md new file mode 100644 index 0000000..f8805a4 --- /dev/null +++ b/website/versioned_docs/version-0.1.5/dashboard/profile.md @@ -0,0 +1,38 @@ +--- +title: Profile +sidebar_position: 9 +--- + +# Dashboard: Profile + +The Profile page lets each logged-in user manage their own account settings. Access it by clicking your avatar or email address in the top-right corner of the dashboard. + +--- + +## Changing Your Email + +1. Enter your new email address in the **Email** field +2. Click **Save** + +The change takes effect immediately. You will need to use the new email address on your next login. + +--- + +## Changing Your Password + +1. Enter your **Current Password** +2. Enter your **New Password** (minimum 8 characters) +3. Re-enter the new password in **Confirm New Password** +4. Click **Update Password** + +Routerly hashes passwords with bcrypt. Your current session remains active after a password change. + +--- + +## Account Information + +The profile page also shows: + +- Your **email address** +- Your **role** (admin, operator, viewer, or a custom role name) +- The **date your account was created** diff --git a/website/versioned_docs/version-0.1.5/dashboard/projects.md b/website/versioned_docs/version-0.1.5/dashboard/projects.md new file mode 100644 index 0000000..ef70ae8 --- /dev/null +++ b/website/versioned_docs/version-0.1.5/dashboard/projects.md @@ -0,0 +1,104 @@ +--- +title: Projects +sidebar_position: 4 +--- + +# Dashboard: Projects + +The Projects page gives you an overview of all projects and provides access to each project's configuration. + +--- + +## Projects List + +The list shows each project's name, slug, number of tokens, assigned models, and a summary of today's cost and call count. + +Click any project to open its detail view, which has five tabs. + +--- + +## General Tab + +Shows and lets you edit: + +- **Name** — display name for the project +- **Slug** — URL-safe identifier (read-only after creation) +- **Default Timeout** — per-request timeout in milliseconds (overrides the global `defaultTimeoutMs`) +- **Connection Info** — base URL and masked token snippet ready to copy into your SDK configuration +- **Budget** — project-level cost limit (daily / monthly) + +--- + +## Routing Tab + +Configure which models this project can use and how to select between them. + +### Adding Models + +Use the **+ Add Model** button to pick from registered models. Models appear in a numbered list — their order determines the default routing priority (position 0 is highest). + +Drag and drop to reorder. + +### Adding Policies + +Drag policies from the policy panel on the right into the active-policies list on the left. Each policy can be expanded to configure its parameters. + +Available policies: `cheapest`, `health`, `performance`, `capability`, `context`, `llm`, `rate-limit`, `fairness`, `budget-remaining`. + +See [Concepts: Routing](../concepts/routing.md) for each policy's behaviour and parameters. + +--- + +## Tokens Tab + +Manage Bearer tokens for this project. + +### Creating a Token + +1. Click **+ New Token** +2. Enter a **Name** (e.g. `production`, `staging`, `ci`) +3. Optionally configure per-token limits (metric, limit value, window type, mode) +4. Click **Create** + +The token value (`sk-rt-…`) is shown **once**. Copy it immediately. + +### Per-Token Limits + +Per-token limits let you cap spending for individual applications sharing the same project. The `mode` field controls how the per-token limit interacts with the project-level limit: + +| Mode | Behaviour | +|------|-----------| +| `replace` | Per-token limit overrides the project limit for this token | +| `extend` | Both per-token and project limits must pass | +| `disable` | No budget check for this token | + +### Rolling or Regenerating a Token + +Click the **Re-generate** icon to invalidate the current token and issue a new one. The previous token stops working immediately. + +--- + +## Users Tab + +Assign dashboard users to this project. A user assigned here can see and manage the project based on their role's permissions. + +Available roles: `viewer`, `editor`, `admin` (or any custom role defined in [Users & Roles](./users-and-roles.md)). + +--- + +## Logs Tab + +A live log of recent requests routed through this project. + +| Column | Description | +|--------|-------------| +| Timestamp | When the request arrived | +| Model | Provider model that handled the request | +| Status | `success`, `error`, `budget_exceeded`, etc. | +| Input Tokens | Number of input tokens | +| Output Tokens | Number of output tokens generated | +| Cost | Estimated USD cost | + +Click any row to open the **Trace view** which shows the full routing decision: which policies ran, which models were considered, and why the final model was chosen. + +The table auto-refreshes at a configurable interval. Use the interval selector (5 s / 15 s / 30 s / 1 min / 5 min / Off) to control polling. diff --git a/website/versioned_docs/version-0.1.5/dashboard/settings.md b/website/versioned_docs/version-0.1.5/dashboard/settings.md new file mode 100644 index 0000000..418773a --- /dev/null +++ b/website/versioned_docs/version-0.1.5/dashboard/settings.md @@ -0,0 +1,65 @@ +--- +title: Settings +sidebar_position: 7 +--- + +# Dashboard: Settings + +The Settings page allows admins to configure the Routerly service, notification channels, and view system information. It is only accessible to users with the `admin` role. + +Open Settings from the **Settings** item in the sidebar. + +--- + +## General Tab + +### Service Configuration + +| Field | Description | +|-------|-------------| +| **Port** | The port the service listens on (read-only — change via CLI or environment variable) | +| **Host** | The bind address (read-only) | +| **Public URL** | The externally accessible URL of this Routerly instance. Shown in project connection snippets | +| **Default Timeout** | Per-request timeout in milliseconds (applies to all projects unless overridden per-project) | +| **Log Level** | `trace` / `debug` / `info` / `warn` / `error` | +| **Dashboard Enabled** | Toggle the web dashboard on or off | + +Changes are saved immediately and take effect without a restart (except Port and Host, which require a restart). + +--- + +## Notifications Tab + +Configure one or more notification channels for budget alerts. + +### Adding a Channel + +1. Click **+ Add Channel** +2. Select the channel type: `SMTP`, `SES`, `SendGrid`, `Azure`, `Google`, `Webhook` +3. Fill in the connection details for the selected type +4. Click **Save** +5. Click **Send Test** to verify the channel delivers a message correctly + +See [Concepts: Notifications](../concepts/notifications.md) for the configuration fields required by each provider. + +### Testing a Channel + +Click **Send Test** next to a channel. Routerly sends a test message immediately. Check for a success toast or an error with details. + +### Removing a Channel + +Click the **Delete** icon next to a channel. + +--- + +## About Tab + +Read-only system information: + +| Field | Description | +|-------|-------------| +| **Version** | Routerly version string | +| **Uptime** | How long the service has been running since last start | +| **Node.js** | Node.js runtime version | +| **Platform** | OS and architecture | +| **Config Directory** | Path to `~/.routerly/config/` (or `$ROUTERLY_HOME/config/`) | diff --git a/website/versioned_docs/version-0.1.5/dashboard/setup.md b/website/versioned_docs/version-0.1.5/dashboard/setup.md new file mode 100644 index 0000000..6720b32 --- /dev/null +++ b/website/versioned_docs/version-0.1.5/dashboard/setup.md @@ -0,0 +1,70 @@ +--- +title: Setup +sidebar_position: 1 +--- + +# Setup + +The Setup screen appears the first time you open the Routerly dashboard at `http://localhost:3000/dashboard`. It guides you through creating the first admin account. + +--- + +## First-Time Setup + +1. Open `http://localhost:3000/dashboard` in your browser. +2. You will be redirected to the **Setup** page automatically if no admin account exists yet. +3. Enter an **email address** and a **password** for the admin account. +4. Click **Create Account**. + +After account creation you are redirected to the login page. Log in with the credentials you just created. + +--- + +## Setup API + +The setup state can be checked programmatically: + +```bash +# Check whether setup has been completed +GET /api/setup/status +``` + +Response when setup is not yet done: +```json +{ "configured": false } +``` + +Response after setup: +```json +{ "configured": true } +``` + +The first-admin endpoint is only available when `configured: false`: + +```bash +POST /api/setup/first-admin +Content-Type: application/json + +{ + "email": "admin@example.com", + "password": "your-secure-password" +} +``` + +Once an admin account exists, this endpoint returns `403 Forbidden`. + +--- + +## Subsequent Admin Accounts + +After setup is complete, additional admin users can be created from **Users** in the dashboard sidebar. Only users with the `user:write` permission can create new users. + +--- + +## Resetting the Admin Password + +If you lose access to the admin account, stop the service and delete `~/.routerly/config/users.json`. On the next start, the setup page will be available again. + +:::warning +Deleting `users.json` removes all dashboard users. API keys and project tokens are not affected. +::: diff --git a/website/versioned_docs/version-0.1.5/dashboard/usage.md b/website/versioned_docs/version-0.1.5/dashboard/usage.md new file mode 100644 index 0000000..2595d6d --- /dev/null +++ b/website/versioned_docs/version-0.1.5/dashboard/usage.md @@ -0,0 +1,97 @@ +--- +title: Usage +sidebar_position: 5 +--- + +# Dashboard: Usage + +The Usage page provides aggregate analytics and per-request logs across all projects. Use it to understand spending patterns, investigate errors, and drill into individual request traces. + +--- + +## Summary Statistics + +The top row shows aggregated totals for the selected filter set: + +- Total cost (USD) +- Total call count +- Success rate +- Error count +- Average latency + +--- + +## Filters + +| Filter | Description | +|--------|-------------| +| **Date range** | Start and end date/time picker | +| **Project** | Filter to one or more projects | +| **Model** | Filter to specific model IDs | +| **Type** | `chat`, `responses`, `messages` | +| **Outcome** | `success`, `error`, `budget_exceeded`, `timeout` | + +Filters are applied immediately; the page updates in real time. + +--- + +## Usage Table + +The table lists individual requests with: + +| Column | Description | +|--------|-------------| +| Timestamp | When the request arrived | +| Project | The project the request belonged to | +| Model | Provider model used | +| Type | API type (`chat`, `responses`, `messages`) | +| Status | Outcome | +| Input Tokens | Input token count | +| Output Tokens | Output token count | +| Cost | Estimated cost in USD | +| Latency | Time to first byte / total response time | + +Click any row to open the full **Trace view**. + +--- + +## Trace View + +The trace view shows the complete lifecycle of a single request: + +1. **Router Request** — the routing engine's input: the project slug, requested model (if any), and active policies +2. **Router Response** — which model was selected and why (policy scores listed) +3. **Model Request** — the actual payload sent to the provider +4. **Model Response** — the raw provider response including all tokens and finish reason + +This detail is useful for debugging unexpected model selections, routing failures, or provider errors. + +--- + +## Live Polling + +The usage table can auto-refresh to show new requests as they arrive. Use the interval selector in the top-right: + +| Interval | Meaning | +|----------|---------| +| 5 s | Refresh every 5 seconds | +| 15 s | Refresh every 15 seconds | +| 30 s | Refresh every 30 seconds | +| 1 min | Refresh every minute | +| 5 min | Refresh every 5 minutes | +| Now | Manual refresh only | + +--- + +## Exporting Usage Data + +Usage data is stored in `~/.routerly/data/usage.json` as newline-delimited JSON. You can process it with any standard tool: + +```bash +# Total cost this month +cat ~/.routerly/data/usage.json | \ + jq -r 'select(.timestamp | startswith("2025-07")) | .cost' | \ + awk '{sum+=$1} END {printf "Total: $%.4f\n", sum}' +``` + +For programmatic access, use the [Usage API](../api/management.md#usage). diff --git a/website/versioned_docs/version-0.1.5/dashboard/users-and-roles.md b/website/versioned_docs/version-0.1.5/dashboard/users-and-roles.md new file mode 100644 index 0000000..0b149b2 --- /dev/null +++ b/website/versioned_docs/version-0.1.5/dashboard/users-and-roles.md @@ -0,0 +1,105 @@ +--- +title: Users & Roles +sidebar_position: 6 +--- + +# Dashboard: Users & Roles + +Routerly has a role-based access control (RBAC) system for the dashboard. Users are granted permissions via roles. Three built-in roles are provided; you can create additional custom roles with any combination of permissions. + +--- + +## Permissions + +| Permission | Description | +|-----------|-------------| +| `project:read` | View projects and their configuration | +| `project:write` | Create, edit, and delete projects | +| `model:read` | View registered models | +| `model:write` | Create, edit, and delete models | +| `user:read` | View dashboard users | +| `user:write` | Create, edit, and delete users, assign roles | +| `report:read` | View usage analytics and request logs | + +--- + +## Built-in Roles + +| Role | Permissions | +|------|-------------| +| `admin` | All permissions | +| `operator` | `project:read`, `project:write`, `model:read`, `model:write`, `report:read` | +| `viewer` | `project:read`, `model:read`, `report:read` | + +--- + +## Managing Users + +### Adding a User + +1. Open **Users** in the sidebar +2. Click **+ New User** +3. Enter the email address, password, and assign a role +4. Click **Create** + +The new user can log in immediately. + +### Editing a User + +Click the **Edit** icon to change the user's email, password, or role. + +### Removing a User + +Click the **Delete** icon and confirm. The user's session is invalidated immediately. + +--- + +## Custom Roles + +Custom roles let you define granular permission sets for specific team members. + +### Creating a Custom Role + +1. Open **Roles** in the sidebar +2. Click **+ New Role** +3. Give the role a name (e.g. `billing_reviewer`) +4. Check the permissions this role should have +5. Click **Save** + +Custom roles can be assigned to users the same way as built-in roles. + +### Editing / Deleting a Custom Role + +Click **Edit** to change the role's permissions. Users with this role are affected immediately. + +Click **Delete** to remove the role. Users who had this role will lose their dashboard access. Reassign them first. + +--- + +## Project-Level User Assignment + +Users can also be assigned to specific projects (from the project's **Users** tab). This limits their access to that project without changing their global role. + +--- + +## CLI Management + +```bash +# List all users +routerly user list + +# Add a user +routerly user add --email ops@example.com --role operator + +# Remove a user +routerly user remove --email ops@example.com + +# List roles +routerly role list + +# Add a custom role +routerly role add --name billing_reviewer --permissions report:read + +# Remove a custom role +routerly role remove --name billing_reviewer +``` diff --git a/website/versioned_docs/version-0.1.5/examples/dotnet.md b/website/versioned_docs/version-0.1.5/examples/dotnet.md new file mode 100644 index 0000000..cf00409 --- /dev/null +++ b/website/versioned_docs/version-0.1.5/examples/dotnet.md @@ -0,0 +1,109 @@ +--- +title: C# / .NET +sidebar_label: C# / .NET +--- + +# C# / .NET + +--- + +## OpenAI .NET SDK + +The official [OpenAI .NET SDK](https://github.com/openai/openai-dotnet) supports custom endpoints. + +```bash +dotnet add package OpenAI +``` + +```csharp +using OpenAI; +using OpenAI.Chat; +using System.ClientModel; + +var client = new OpenAIClient( + new ApiKeyCredential("sk-rt-YOUR_PROJECT_TOKEN"), + new OpenAIClientOptions { Endpoint = new Uri("http://localhost:3000/v1") } +); + +ChatClient chat = client.GetChatClient("gpt-5-mini"); + +// Non-streaming +ChatCompletion response = await chat.CompleteChatAsync( + new UserChatMessage("Hello from .NET!") +); +Console.WriteLine(response.Content[0].Text); + +// Streaming +await foreach (StreamingChatCompletionUpdate update in + chat.CompleteChatStreamingAsync(new UserChatMessage("Tell me a story."))) +{ + foreach (ChatMessageContentPart part in update.ContentUpdate) + { + Console.Write(part.Text); + } +} +``` + +--- + +## Raw HTTP (HttpClient) + +```csharp +using System.Net.Http; +using System.Net.Http.Json; +using System.Text.Json; + +using var http = new HttpClient(); +http.DefaultRequestHeaders.Authorization = + new System.Net.Http.Headers.AuthenticationHeaderValue("Bearer", "sk-rt-YOUR_PROJECT_TOKEN"); + +var payload = new +{ + model = "gpt-5-mini", + messages = new[] { new { role = "user", content = "Hello from HttpClient!" } } +}; + +var response = await http.PostAsJsonAsync( + "http://localhost:3000/v1/chat/completions", + payload +); + +using var doc = await JsonDocument.ParseAsync(await response.Content.ReadAsStreamAsync()); +Console.WriteLine(doc.RootElement + .GetProperty("choices")[0] + .GetProperty("message") + .GetProperty("content") + .GetString()); +``` + +--- + +## Streaming (HttpClient + SSE) + +```csharp +using var request = new HttpRequestMessage(HttpMethod.Post, + "http://localhost:3000/v1/chat/completions"); +request.Headers.Authorization = + new System.Net.Http.Headers.AuthenticationHeaderValue("Bearer", "sk-rt-YOUR_PROJECT_TOKEN"); +request.Content = JsonContent.Create(new +{ + model = "gpt-5-mini", + messages = new[] { new { role = "user", content = "Tell me a story." } }, + stream = true +}); + +using var response = await http.SendAsync(request, + HttpCompletionOption.ResponseHeadersRead); +using var stream = await response.Content.ReadAsStreamAsync(); +using var reader = new StreamReader(stream); + +while (await reader.ReadLineAsync() is string line) +{ + if (!line.StartsWith("data: ") || line == "data: [DONE]") continue; + var data = JsonDocument.Parse(line[6..]).RootElement; + Console.Write(data.GetProperty("choices")[0] + .GetProperty("delta") + .GetProperty("content") + .GetString()); +} +``` diff --git a/website/versioned_docs/version-0.1.5/examples/go.md b/website/versioned_docs/version-0.1.5/examples/go.md new file mode 100644 index 0000000..9de1461 --- /dev/null +++ b/website/versioned_docs/version-0.1.5/examples/go.md @@ -0,0 +1,119 @@ +--- +title: Go +sidebar_label: Go +--- + +# Go + +--- + +## go-openai + +[go-openai](https://github.com/sashabaranov/go-openai) is the most popular OpenAI client for Go. + +```bash +go get github.com/sashabaranov/go-openai +``` + +```go +package main + +import ( + "context" + "fmt" + + openai "github.com/sashabaranov/go-openai" +) + +func main() { + config := openai.DefaultConfig("sk-rt-YOUR_PROJECT_TOKEN") + config.BaseURL = "http://localhost:3000/v1" + client := openai.NewClientWithConfig(config) + + // Non-streaming + resp, err := client.CreateChatCompletion( + context.Background(), + openai.ChatCompletionRequest{ + Model: "gpt-5-mini", + Messages: []openai.ChatCompletionMessage{ + {Role: openai.ChatMessageRoleUser, Content: "Hello from Go!"}, + }, + }, + ) + if err != nil { + panic(err) + } + fmt.Println(resp.Choices[0].Message.Content) +} +``` + +### Streaming + +```go +stream, err := client.CreateChatCompletionStream( + context.Background(), + openai.ChatCompletionRequest{ + Model: "gpt-5-mini", + Messages: []openai.ChatCompletionMessage{ + {Role: openai.ChatMessageRoleUser, Content: "Tell me a story."}, + }, + Stream: true, + }, +) +if err != nil { + panic(err) +} +defer stream.Close() + +for { + response, err := stream.Recv() + if err != nil { + break + } + fmt.Print(response.Choices[0].Delta.Content) +} +``` + +--- + +## Raw HTTP (net/http) + +No external dependencies. + +```go +package main + +import ( + "bytes" + "encoding/json" + "fmt" + "io" + "net/http" +) + +func main() { + payload, _ := json.Marshal(map[string]any{ + "model": "gpt-5-mini", + "messages": []map[string]string{ + {"role": "user", "content": "Hello from net/http!"}, + }, + }) + + req, _ := http.NewRequest( + "POST", + "http://localhost:3000/v1/chat/completions", + bytes.NewBuffer(payload), + ) + req.Header.Set("Content-Type", "application/json") + req.Header.Set("Authorization", "Bearer sk-rt-YOUR_PROJECT_TOKEN") + + resp, err := http.DefaultClient.Do(req) + if err != nil { + panic(err) + } + defer resp.Body.Close() + + body, _ := io.ReadAll(resp.Body) + fmt.Println(string(body)) +} +``` diff --git a/website/versioned_docs/version-0.1.5/examples/java.md b/website/versioned_docs/version-0.1.5/examples/java.md new file mode 100644 index 0000000..971dee0 --- /dev/null +++ b/website/versioned_docs/version-0.1.5/examples/java.md @@ -0,0 +1,129 @@ +--- +title: Java +sidebar_label: Java +--- + +# Java + +Java does not have an official OpenAI SDK, but the standard `java.net.http` client (Java 11+) covers all use cases. + +--- + +## Raw HTTP (java.net.http) + +No external dependencies required. + +```java +import java.net.URI; +import java.net.http.HttpClient; +import java.net.http.HttpRequest; +import java.net.http.HttpResponse; + +public class RouterlyExample { + + private static final String BASE_URL = "http://localhost:3000/v1"; + private static final String API_KEY = "sk-rt-YOUR_PROJECT_TOKEN"; + + public static void main(String[] args) throws Exception { + HttpClient client = HttpClient.newHttpClient(); + + String body = """ + { + "model": "gpt-5-mini", + "messages": [{"role": "user", "content": "Hello from Java!"}] + } + """; + + HttpRequest request = HttpRequest.newBuilder() + .uri(URI.create(BASE_URL + "/chat/completions")) + .header("Content-Type", "application/json") + .header("Authorization", "Bearer " + API_KEY) + .POST(HttpRequest.BodyPublishers.ofString(body)) + .build(); + + HttpResponse response = + client.send(request, HttpResponse.BodyHandlers.ofString()); + + System.out.println(response.body()); + // Parse with Jackson / Gson to extract choices[0].message.content + } +} +``` + +--- + +## With OkHttp + +If you already use OkHttp in your project: + +```java +// build.gradle +// implementation("com.squareup.okhttp3:okhttp:4.12.0") + +import okhttp3.*; +import java.io.IOException; + +public class RouterlyOkHttp { + + private static final MediaType JSON = MediaType.get("application/json"); + + public static void main(String[] args) throws IOException { + OkHttpClient client = new OkHttpClient(); + + String json = """ + { + "model": "gpt-5-mini", + "messages": [{"role": "user", "content": "Hello from OkHttp!"}] + } + """; + + Request request = new Request.Builder() + .url("http://localhost:3000/v1/chat/completions") + .header("Authorization", "Bearer sk-rt-YOUR_PROJECT_TOKEN") + .post(RequestBody.create(json, JSON)) + .build(); + + try (Response response = client.newCall(request).execute()) { + System.out.println(response.body().string()); + } + } +} +``` + +--- + +## Streaming (SSE) + +For streaming responses, consume the `InputStream` line by line: + +```java +HttpRequest streamRequest = HttpRequest.newBuilder() + .uri(URI.create(BASE_URL + "/chat/completions")) + .header("Content-Type", "application/json") + .header("Authorization", "Bearer " + API_KEY) + .POST(HttpRequest.BodyPublishers.ofString(""" + { + "model": "gpt-5-mini", + "messages": [{"role": "user", "content": "Tell me a story."}], + "stream": true + } + """)) + .build(); + +HttpResponse streamResponse = + client.send(streamRequest, HttpResponse.BodyHandlers.ofInputStream()); + +try (var reader = new java.io.BufferedReader( + new java.io.InputStreamReader(streamResponse.body()))) { + String line; + while ((line = reader.readLine()) != null) { + if (line.startsWith("data: ") && !line.equals("data: [DONE]")) { + System.out.println(line.substring(6)); // parse JSON delta here + } + } +} +``` + +:::tip +Use [Jackson](https://github.com/FasterXML/jackson) or [Gson](https://github.com/google/gson) to deserialise the JSON response body into a typed object. +::: diff --git a/website/versioned_docs/version-0.1.5/examples/javascript.md b/website/versioned_docs/version-0.1.5/examples/javascript.md new file mode 100644 index 0000000..ee07e61 --- /dev/null +++ b/website/versioned_docs/version-0.1.5/examples/javascript.md @@ -0,0 +1,115 @@ +--- +title: JavaScript / TypeScript +sidebar_label: JavaScript / TypeScript +--- + +# JavaScript / TypeScript + +--- + +## OpenAI SDK + +```bash +npm install openai +``` + +```typescript +import OpenAI from "openai"; + +const client = new OpenAI({ + baseURL: "http://localhost:3000/v1", + apiKey: "sk-rt-YOUR_PROJECT_TOKEN", +}); + +// Non-streaming +const response = await client.chat.completions.create({ + model: "gpt-5-mini", + messages: [{ role: "user", content: "Hello!" }], +}); +console.log(response.choices[0].message.content); + +// Streaming +const stream = await client.chat.completions.create({ + model: "gpt-5-mini", + messages: [{ role: "user", content: "Tell me a story." }], + stream: true, +}); +for await (const chunk of stream) { + process.stdout.write(chunk.choices[0]?.delta?.content ?? ""); +} +``` + +--- + +## Anthropic SDK + +```bash +npm install @anthropic-ai/sdk +``` + +```typescript +import Anthropic from "@anthropic-ai/sdk"; + +const client = new Anthropic({ + baseURL: "http://localhost:3000", + apiKey: "sk-rt-YOUR_PROJECT_TOKEN", +}); + +const message = await client.messages.create({ + model: "claude-haiku-4-5", + max_tokens: 1024, + messages: [{ role: "user", content: "Hello, Claude!" }], +}); +console.log(message.content[0].text); +``` + +--- + +## Raw HTTP (fetch) + +No dependencies needed — works in Node.js 18+, Deno, Bun, and the browser. + +```typescript +// Non-streaming +const response = await fetch("http://localhost:3000/v1/chat/completions", { + method: "POST", + headers: { + "Content-Type": "application/json", + Authorization: "Bearer sk-rt-YOUR_PROJECT_TOKEN", + }, + body: JSON.stringify({ + model: "gpt-5-mini", + messages: [{ role: "user", content: "Hello!" }], + }), +}); +const data = await response.json(); +console.log(data.choices[0].message.content); + +// Streaming (Server-Sent Events) +const streamRes = await fetch("http://localhost:3000/v1/chat/completions", { + method: "POST", + headers: { + "Content-Type": "application/json", + Authorization: "Bearer sk-rt-YOUR_PROJECT_TOKEN", + }, + body: JSON.stringify({ + model: "gpt-5-mini", + messages: [{ role: "user", content: "Tell me a story." }], + stream: true, + }), +}); + +const reader = streamRes.body!.getReader(); +const decoder = new TextDecoder(); +while (true) { + const { done, value } = await reader.read(); + if (done) break; + const lines = decoder.decode(value).split("\n"); + for (const line of lines) { + if (!line.startsWith("data: ") || line === "data: [DONE]") continue; + const json = JSON.parse(line.slice(6)); + if (json.type === "content") process.stdout.write(json.delta); // Routerly SSE + else process.stdout.write(json.choices?.[0]?.delta?.content ?? ""); // standard SSE + } +} +``` diff --git a/website/versioned_docs/version-0.1.5/examples/overview.md b/website/versioned_docs/version-0.1.5/examples/overview.md new file mode 100644 index 0000000..682fe3c --- /dev/null +++ b/website/versioned_docs/version-0.1.5/examples/overview.md @@ -0,0 +1,55 @@ +--- +title: Examples +sidebar_label: Overview +sidebar_position: 1 +--- + +# Examples + +Ready-to-run code snippets for calling Routerly from any language. Every example connects to the same two endpoints: + +| Protocol | Endpoint | When to use | +|----------|----------|-------------| +| **OpenAI** | `http://localhost:3000/v1/chat/completions` | Default — use with any OpenAI-compatible SDK | +| **Anthropic** | `http://localhost:3000/v1/messages` | Use with the Anthropic SDK or when targeting Claude models | + +--- + +## Common pattern + +Every integration needs two values: + +- **Base URL** — `http://localhost:3000/v1` (OpenAI) or `http://localhost:3000` (Anthropic) +- **API Key** — your project token: `sk-rt-YOUR_PROJECT_TOKEN` + +Change these two lines in your existing code and nothing else needs to change. + +--- + +## Languages + +| Language | OpenAI SDK | Anthropic SDK | Raw HTTP | +|----------|-----------|---------------|----------| +| [JavaScript / TypeScript](./javascript) | `openai` npm | `@anthropic-ai/sdk` npm | `fetch` | +| [Python](./python) | `openai` pip | `anthropic` pip | `httpx` | +| [Java](./java) | — | — | `java.net.http` | +| [Go](./go) | `go-openai` | — | `net/http` | +| [C# / .NET](./dotnet) | `Azure.AI.OpenAI` | — | `HttpClient` | +| [PHP](./php) | `openai-php/client` | — | `Guzzle` | +| [Ruby](./ruby) | `ruby-openai` gem | — | `Net::HTTP` | +| [Rust](./rust) | `async-openai` crate | — | `reqwest` | + +--- + +## Getting a project token + +Create a token in the dashboard: **Projects** → select your project → **Tokens** → **New Token**. + +Tokens start with `sk-rt-` and are shown once. Copy it before closing the dialog. + +--- + +## Next steps + +- Read the full [LLM Proxy API reference](../api/llm-proxy) to see all supported request parameters. +- See [Integrations](../integrations/overview) for ready-made setup guides for tools like Cursor, Open WebUI, and LangChain. diff --git a/website/versioned_docs/version-0.1.5/examples/php.md b/website/versioned_docs/version-0.1.5/examples/php.md new file mode 100644 index 0000000..5537c4a --- /dev/null +++ b/website/versioned_docs/version-0.1.5/examples/php.md @@ -0,0 +1,113 @@ +--- +title: PHP +sidebar_label: PHP +--- + +# PHP + +--- + +## openai-php/client + +[openai-php/client](https://github.com/openai-php/client) is a community-maintained PHP SDK with full OpenAI API support. + +```bash +composer require openai-php/client +``` + +```php +withBaseUri('http://localhost:3000/v1') + ->withApiKey('sk-rt-YOUR_PROJECT_TOKEN') + ->make(); + +// Non-streaming +$response = $client->chat()->create([ + 'model' => 'gpt-5-mini', + 'messages' => [ + ['role' => 'user', 'content' => 'Hello from PHP!'], + ], +]); + +echo $response->choices[0]->message->content; + +// Streaming +$stream = $client->chat()->createStreamed([ + 'model' => 'gpt-5-mini', + 'messages' => [ + ['role' => 'user', 'content' => 'Tell me a story.'], + ], +]); + +foreach ($stream as $response) { + echo $response->choices[0]->delta->content; +} +``` + +--- + +## Raw HTTP (Guzzle) + +```bash +composer require guzzlehttp/guzzle +``` + +```php +post('http://localhost:3000/v1/chat/completions', [ + 'headers' => [ + 'Authorization' => 'Bearer sk-rt-YOUR_PROJECT_TOKEN', + 'Content-Type' => 'application/json', + ], + 'json' => [ + 'model' => 'gpt-5-mini', + 'messages' => [ + ['role' => 'user', 'content' => 'Hello from Guzzle!'], + ], + ], +]); + +$data = json_decode($response->getBody(), true); +echo $data['choices'][0]['message']['content']; +``` + +--- + +## Raw HTTP (curl — no dependencies) + +```php + 'gpt-5-mini', + 'messages' => [['role' => 'user', 'content' => 'Hello!']], +]); + +$ch = curl_init('http://localhost:3000/v1/chat/completions'); +curl_setopt_array($ch, [ + CURLOPT_POST => true, + CURLOPT_POSTFIELDS => $payload, + CURLOPT_RETURNTRANSFER => true, + CURLOPT_HTTPHEADER => [ + 'Content-Type: application/json', + 'Authorization: Bearer sk-rt-YOUR_PROJECT_TOKEN', + ], +]); + +$body = curl_exec($ch); +curl_close($ch); + +$data = json_decode($body, true); +echo $data['choices'][0]['message']['content']; +``` diff --git a/website/versioned_docs/version-0.1.5/examples/python.md b/website/versioned_docs/version-0.1.5/examples/python.md new file mode 100644 index 0000000..bb12b41 --- /dev/null +++ b/website/versioned_docs/version-0.1.5/examples/python.md @@ -0,0 +1,131 @@ +--- +title: Python +sidebar_label: Python +--- + +# Python + +--- + +## OpenAI SDK + +```bash +pip install openai +``` + +```python +from openai import OpenAI + +client = OpenAI( + base_url="http://localhost:3000/v1", + api_key="sk-rt-YOUR_PROJECT_TOKEN", +) + +# Non-streaming +response = client.chat.completions.create( + model="gpt-5-mini", + messages=[{"role": "user", "content": "Hello!"}], +) +print(response.choices[0].message.content) + +# Streaming +stream = client.chat.completions.create( + model="gpt-5-mini", + messages=[{"role": "user", "content": "Tell me a story."}], + stream=True, +) +for chunk in stream: + print(chunk.choices[0].delta.content or "", end="", flush=True) +``` + +--- + +## Anthropic SDK + +```bash +pip install anthropic +``` + +```python +import anthropic + +client = anthropic.Anthropic( + base_url="http://localhost:3000", + api_key="sk-rt-YOUR_PROJECT_TOKEN", +) + +# Non-streaming +message = client.messages.create( + model="claude-haiku-4-5", + max_tokens=1024, + messages=[{"role": "user", "content": "Hello, Claude!"}], +) +print(message.content[0].text) + +# Streaming +with client.messages.stream( + model="claude-haiku-4-5", + max_tokens=1024, + messages=[{"role": "user", "content": "Tell me a story."}], +) as stream: + for text in stream.text_stream: + print(text, end="", flush=True) +``` + +--- + +## Raw HTTP (httpx) + +```bash +pip install httpx +``` + +```python +import httpx +import json + +HEADERS = { + "Content-Type": "application/json", + "Authorization": "Bearer sk-rt-YOUR_PROJECT_TOKEN", +} + +# Non-streaming +with httpx.Client() as client: + r = client.post( + "http://localhost:3000/v1/chat/completions", + headers=HEADERS, + json={ + "model": "gpt-5-mini", + "messages": [{"role": "user", "content": "Hello!"}], + }, + ) + r.raise_for_status() + print(r.json()["choices"][0]["message"]["content"]) + +# Streaming +with httpx.Client() as client: + with client.stream( + "POST", + "http://localhost:3000/v1/chat/completions", + headers=HEADERS, + json={ + "model": "gpt-5-mini", + "messages": [{"role": "user", "content": "Tell me a story."}], + "stream": True, + }, + ) as r: + for line in r.iter_lines(): + if not line.startswith("data: ") or line == "data: [DONE]": + continue + data = json.loads(line[6:]) + delta = data.get("choices", [{}])[0].get("delta", {}).get("content", "") + print(delta, end="", flush=True) +``` + +:::tip +Use `ROUTERLY_API_KEY` as an environment variable instead of hardcoding the token: +```python +import os +client = OpenAI(base_url="http://localhost:3000/v1", api_key=os.environ["ROUTERLY_API_KEY"]) +``` +::: diff --git a/website/versioned_docs/version-0.1.5/examples/ruby.md b/website/versioned_docs/version-0.1.5/examples/ruby.md new file mode 100644 index 0000000..b434523 --- /dev/null +++ b/website/versioned_docs/version-0.1.5/examples/ruby.md @@ -0,0 +1,105 @@ +--- +title: Ruby +sidebar_label: Ruby +--- + +# Ruby + +--- + +## ruby-openai gem + +[ruby-openai](https://github.com/alexrudall/ruby-openai) is the most widely used OpenAI SDK for Ruby. + +```bash +gem install ruby-openai +# or add to Gemfile: gem "ruby-openai" +``` + +```ruby +require "openai" + +client = OpenAI::Client.new( + access_token: "sk-rt-YOUR_PROJECT_TOKEN", + uri_base: "http://localhost:3000/v1/", +) + +# Non-streaming +response = client.chat( + parameters: { + model: "gpt-5-mini", + messages: [{ role: "user", content: "Hello from Ruby!" }], + } +) +puts response.dig("choices", 0, "message", "content") + +# Streaming +client.chat( + parameters: { + model: "gpt-5-mini", + messages: [{ role: "user", content: "Tell me a story." }], + stream: proc { |chunk, _bytesize| + print chunk.dig("choices", 0, "delta", "content") + }, + } +) +``` + +--- + +## Raw HTTP (Net::HTTP — no dependencies) + +```ruby +require "net/http" +require "uri" +require "json" + +uri = URI("http://localhost:3000/v1/chat/completions") +http = Net::HTTP.new(uri.host, uri.port) + +request = Net::HTTP::Post.new(uri.path) +request["Content-Type"] = "application/json" +request["Authorization"] = "Bearer sk-rt-YOUR_PROJECT_TOKEN" +request.body = JSON.generate( + model: "gpt-5-mini", + messages: [{ role: "user", content: "Hello from Net::HTTP!" }] +) + +response = http.request(request) +data = JSON.parse(response.body) +puts data.dig("choices", 0, "message", "content") +``` + +--- + +## Streaming (Net::HTTP) + +```ruby +require "net/http" +require "uri" +require "json" + +uri = URI("http://localhost:3000/v1/chat/completions") +http = Net::HTTP.new(uri.host, uri.port) + +request = Net::HTTP::Post.new(uri.path) +request["Content-Type"] = "application/json" +request["Authorization"] = "Bearer sk-rt-YOUR_PROJECT_TOKEN" +request.body = JSON.generate( + model: "gpt-5-mini", + messages: [{ role: "user", content: "Tell me a story." }], + stream: true +) + +http.request(request) do |response| + response.read_body do |chunk| + chunk.split("\n").each do |line| + next unless line.start_with?("data: ") + data = line[6..] + next if data == "[DONE]" + parsed = JSON.parse(data) rescue next + print parsed.dig("choices", 0, "delta", "content").to_s + end + end +end +``` diff --git a/website/versioned_docs/version-0.1.5/examples/rust.md b/website/versioned_docs/version-0.1.5/examples/rust.md new file mode 100644 index 0000000..1024964 --- /dev/null +++ b/website/versioned_docs/version-0.1.5/examples/rust.md @@ -0,0 +1,124 @@ +--- +title: Rust +sidebar_label: Rust +--- + +# Rust + +--- + +## async-openai + +[async-openai](https://github.com/64bit/async-openai) is the most popular async OpenAI client for Rust. + +```toml +# Cargo.toml +[dependencies] +async-openai = "0.27" +futures = "0.3" +tokio = { version = "1", features = ["full"] } +``` + +```rust +use async_openai::{ + config::OpenAIConfig, + types::{ChatCompletionRequestUserMessageArgs, CreateChatCompletionRequestArgs}, + Client, +}; + +#[tokio::main] +async fn main() -> Result<(), Box> { + let config = OpenAIConfig::new() + .with_api_base("http://localhost:3000/v1") + .with_api_key("sk-rt-YOUR_PROJECT_TOKEN"); + + let client = Client::with_config(config); + + // Non-streaming + let request = CreateChatCompletionRequestArgs::default() + .model("gpt-5-mini") + .messages([ChatCompletionRequestUserMessageArgs::default() + .content("Hello from Rust!") + .build()? + .into()]) + .build()?; + + let response = client.chat().create(request).await?; + println!("{}", response.choices[0].message.content.as_deref().unwrap_or("")); + + Ok(()) +} +``` + +### Streaming + +```rust +use async_openai::types::CreateChatCompletionRequestArgs; +use futures::StreamExt; + +let request = CreateChatCompletionRequestArgs::default() + .model("gpt-5-mini") + .messages([ChatCompletionRequestUserMessageArgs::default() + .content("Tell me a story.") + .build()? + .into()]) + .build()?; + +let mut stream = client.chat().create_stream(request).await?; + +while let Some(result) = stream.next().await { + match result { + Ok(response) => { + for choice in &response.choices { + if let Some(ref content) = choice.delta.content { + print!("{}", content); + } + } + } + Err(e) => eprintln!("Stream error: {e}"), + } +} +``` + +--- + +## Raw HTTP (reqwest) + +```toml +# Cargo.toml +[dependencies] +reqwest = { version = "0.12", features = ["json"] } +serde_json = "1" +tokio = { version = "1", features = ["full"] } +``` + +```rust +use serde_json::json; + +#[tokio::main] +async fn main() -> Result<(), Box> { + let client = reqwest::Client::new(); + + let body = json!({ + "model": "gpt-5-mini", + "messages": [{"role": "user", "content": "Hello from reqwest!"}] + }); + + let response = client + .post("http://localhost:3000/v1/chat/completions") + .header("Authorization", "Bearer sk-rt-YOUR_PROJECT_TOKEN") + .json(&body) + .send() + .await?; + + let data: serde_json::Value = response.json().await?; + println!( + "{}", + data["choices"][0]["message"]["content"] + .as_str() + .unwrap_or("") + ); + + Ok(()) +} +``` diff --git a/website/versioned_docs/version-0.1.5/getting-started/configuration.md b/website/versioned_docs/version-0.1.5/getting-started/configuration.md new file mode 100644 index 0000000..b52e876 --- /dev/null +++ b/website/versioned_docs/version-0.1.5/getting-started/configuration.md @@ -0,0 +1,104 @@ +--- +title: Configuration +sidebar_position: 3 +--- + +# Configuration + +Routerly stores all configuration as JSON files under `~/.routerly/`. This page explains the layout, the main settings, and all environment variables. + +--- + +## Directory Structure + +``` +~/.routerly/ +├── config/ +│ ├── settings.json # Port, log level, notifications, … +│ ├── models.json # Registered LLM models (API keys encrypted) +│ ├── projects.json # Projects, routing policies, tokens, members +│ ├── users.json # Dashboard users (passwords bcrypt-hashed) +│ ├── roles.json # Custom RBAC role definitions +│ └── secret # AES-256 encryption key (auto-generated) +└── data/ + └── usage.json # Append-only usage records +``` + +Override the base directory with the `ROUTERLY_HOME` environment variable — useful for Docker volumes or multi-instance setups. + +```bash +# Docker example +docker run -e ROUTERLY_HOME=/data -v routerly_data:/data ... +``` + +--- + +## Service Settings (`settings.json`) + +### Configure via CLI + +```bash +routerly service configure \ + --port 3000 \ + --host 0.0.0.0 \ + --dashboard true \ + --log-level info \ + --timeout 30000 \ + --public-url http://localhost:3000 +``` + +### Configure via Dashboard + +Open **Settings → General** in the dashboard. Changes take effect immediately (no restart required except for port/host changes). + +### Settings reference + +| Field | Default | Description | +|-------|---------|-------------| +| `port` | `3000` | TCP port the service listens on | +| `host` | `"0.0.0.0"` | Bind address (`"127.0.0.1"` for local-only) | +| `dashboardEnabled` | `true` | Whether to serve the web dashboard | +| `defaultTimeoutMs` | `30000` | Per-request timeout in milliseconds | +| `logLevel` | `"info"` | Log verbosity: `trace` / `debug` / `info` / `warn` / `error` | +| `publicUrl` | `"http://localhost:3000"` | External URL shown in the dashboard connection snippets | +| `notifications` | `[]` | Array of notification channel configurations | + +--- + +## Environment Variables + +| Variable | Description | +|----------|-------------| +| `ROUTERLY_HOME` | Config/data directory (default: `~/.routerly`) | +| `ROUTERLY_PORT` | Service port (overrides `settings.json`) | +| `ROUTERLY_SCOPE` | Installation scope: `user` or `system` | +| `ROUTERLY_PUBLIC_URL` | External URL of the service | +| `NODE_ENV` | Set to `production` to disable pretty-printed logs | + +--- + +## Security Notes + +- **API keys** (in `models.json`) and **project tokens** (in `projects.json`) are AES-256 encrypted using the key stored in the `secret` file. +- **User passwords** (in `users.json`) are bcrypt-hashed and never stored in plain text. +- The `secret` file is generated automatically on first run. + +:::warning Back up the `secret` file +If you lose the `secret` file, all API keys and project tokens become unreadable. Always include it in your backups alongside the rest of `~/.routerly/config/`. +::: + +--- + +## Notification Channels + +Routerly can send budget alerts and other notifications via email or webhook. Configure channels from **Settings → Notifications** in the dashboard, or by editing the `notifications` array in `settings.json`. + +Supported providers: `smtp`, `ses`, `sendgrid`, `azure`, `google`, `webhook`. + +See [Concepts: Notifications](../concepts/notifications.md) for per-provider configuration details. + +--- + +## Full Config File Schemas + +For the complete JSON schema of each configuration file, see [Reference: Config Files](../reference/config-files.md). diff --git a/website/versioned_docs/version-0.1.5/getting-started/installation.md b/website/versioned_docs/version-0.1.5/getting-started/installation.md new file mode 100644 index 0000000..fff38c0 --- /dev/null +++ b/website/versioned_docs/version-0.1.5/getting-started/installation.md @@ -0,0 +1,290 @@ +--- +title: Installation +sidebar_position: 1 +--- + +# Installation + +Routerly can be installed with a one-line script on macOS, Linux, and Windows. Docker is also supported for containerised deployments. + +--- + +## One-line Installer (Recommended) + +### macOS / Linux + +```bash +curl -fsSL https://www.routerly.ai/install.sh | bash +``` + +The installer will: +1. Detect your OS and architecture +2. Download the latest Routerly release +3. Install the service and CLI binaries +4. Optionally configure a system daemon (systemd on Linux, launchd on macOS) +5. Start the service + +#### Installer flags + +You can customise the installation by passing flags after `--`: + +```bash +curl -fsSL https://www.routerly.ai/install.sh | bash -s -- \ + --yes # Non-interactive; accept all defaults + --scope user # Install for current user only (default) + --scope system # System-wide install (requires sudo) + --port 8080 # Use a custom port (default: 3000) + --public-url https://routerly.example.com # External URL of the service + --no-service # Skip service installation (CLI only) + --no-daemon # Skip auto-start setup +``` + +#### Installation scopes + +| Scope | App directory | CLI binary | +|-------|--------------|------------| +| `user` (default) | `~/.routerly/app/` | `~/.local/bin/routerly` | +| `system` | `/opt/routerly/` | `/usr/local/bin/routerly` | + +The **service config and data directory** depends on both scope and platform: + +| Scope | Linux | macOS | Windows | +|-------|-------|-------|---------| +| `user` | `~/.routerly/` | `~/.routerly/` | `%USERPROFILE%\.routerly\` | +| `system` | `/var/lib/routerly/` | `/Library/Application Support/Routerly/` | `C:\ProgramData\Routerly\` | + +:::note +System scope requires `sudo`. The installer sets `ROUTERLY_HOME` in the daemon unit file so the service always reads the correct directory automatically. +::: + +:::info CLI auth tokens are always per-user +Regardless of scope, each user's CLI credentials (JWT tokens, refresh tokens) are stored in `~/.routerly/cli/config.json` with mode `0600`. They are never placed in the system config directory. +::: + +### Windows + +```powershell +powershell -c "irm https://www.routerly.ai/install.ps1 | iex" +``` + +This installs Routerly as a Windows Service and adds the CLI to your PATH. + +--- + +## Docker + +Two options are available: pull the pre-built image from Docker Hub (recommended), or build it yourself from source. + +### Option 1 — Pre-built image (Docker Hub) + +The official image is published on [Docker Hub](https://hub.docker.com/r/inebrio/routerly) for `linux/amd64` and `linux/arm64`. + +#### docker-compose (recommended) + +Create a `docker-compose.yml`: + +```yaml +services: + routerly: + image: inebrio/routerly:latest + ports: + - "3000:3000" + volumes: + - routerly_data:/data + environment: + - ROUTERLY_HOME=/data + restart: unless-stopped + healthcheck: + test: ["CMD", "curl", "-f", "http://localhost:3000/health"] + interval: 30s + timeout: 5s + retries: 3 + +volumes: + routerly_data: +``` + +```bash +docker compose up -d +``` + +#### docker run + +```bash +docker run -d \ + --name routerly \ + -p 3000:3000 \ + -v routerly_data:/data \ + -e ROUTERLY_HOME=/data \ + --restart unless-stopped \ + inebrio/routerly:latest +``` + +### Option 2 — Build from source + +Use this if you want to run a local branch or a customised build. + +```bash +git clone https://github.com/Inebrio/Routerly.git +cd Routerly +docker build -t routerly:local . +docker run -d \ + --name routerly \ + -p 3000:3000 \ + -v routerly_data:/data \ + -e ROUTERLY_HOME=/data \ + --restart unless-stopped \ + routerly:local +``` + +Or with docker-compose, add a `docker-compose.override.yml` next to your existing `docker-compose.yml`: + +```yaml +services: + routerly: + build: ./Routerly + image: routerly:local +``` + +```bash +docker compose up -d --build +``` + +--- + +## Manual Installation (from source) + +Requirements: **Node.js ≥ 20**, **npm ≥ 10** + +```bash +git clone https://github.com/Inebrio/Routerly.git +cd Routerly +npm install +npm run build +npm run start --workspace=packages/service +``` + +The CLI is available via: + +```bash +node packages/cli/dist/index.js +``` + +--- + +## Updating Routerly + +Run the installer again — it detects an existing installation and presents a menu: + +```bash +# macOS / Linux +curl -fsSL https://www.routerly.ai/install.sh | bash + +# Windows (PowerShell) +powershell -c "irm https://www.routerly.ai/install.ps1 | iex" +``` + +When an existing install is found you will see: + +``` + Existing installation detected + + What would you like to do? + + 1 Update — download & rebuild latest code, keep all settings + 2 Reinstall — change components or settings (user data preserved) + 3 Uninstall — remove Routerly from this machine + 0 Cancel +``` + +Select **1** (or press Enter to accept the default) to download and rebuild the latest release. All configuration and user data are preserved. + +To update without prompts (e.g. in a script or CI): + +```bash +curl -fsSL https://www.routerly.ai/install.sh | bash -s -- --yes +``` + +--- + +## Reinstalling + +Reinstalling lets you change installed components (service, CLI, dashboard) or reconfigure settings (port, scope, daemon) while keeping all user data intact. + +Run the installer and select **2** at the menu: + +```bash +# macOS / Linux +curl -fsSL https://www.routerly.ai/install.sh | bash + +# Windows (PowerShell) +powershell -c "irm https://www.routerly.ai/install.ps1 | iex" +``` + +The wizard will walk you through the same questions as a fresh install, pre-filling your existing answers. All accounts, projects, models and usage history are preserved. + +--- + +## Uninstalling + +Run the installer and select **3** at the menu: + +```bash +# macOS / Linux +curl -fsSL https://www.routerly.ai/install.sh | bash + +# Windows (PowerShell) +powershell -c "irm https://www.routerly.ai/install.ps1 | iex" +``` + +The uninstall flow will: + +1. Stop and remove the system daemon (systemd / launchd / Windows Service) +2. Remove the application files and CLI binary +3. Ask whether to also delete user data (`~/.routerly/` — accounts, settings, usage history) + +:::note Data preservation +If you answer **No** to the data removal prompt, all accounts and history are kept. Running the installer again will detect the existing data and offer to resume from where you left off. +::: + +--- + +## Auto-start Configuration + +The installer can configure Routerly to start automatically on boot: + +| OS | Scope | Method | Location | +|----|-------|--------|----------| +| Linux | user | systemd user service | `~/.config/systemd/user/routerly.service` | +| Linux | system | systemd system service | `/etc/systemd/system/routerly.service` | +| macOS | user | launchd LaunchAgent | `~/Library/LaunchAgents/ai.routerly.service.plist` | +| macOS | system | launchd LaunchDaemon | `/Library/LaunchDaemons/ai.routerly.service.plist` | +| Windows | — | Windows Service | via `sc.exe` | + +To start/stop manually: + +```bash +# Linux (user scope) +systemctl --user start routerly +systemctl --user stop routerly + +# macOS (user scope) +launchctl load ~/Library/LaunchAgents/ai.routerly.service.plist +launchctl unload ~/Library/LaunchAgents/ai.routerly.service.plist +``` + +--- + +## Verifying the Installation + +```bash +routerly status +``` + +You should see the service URL, version, and a reachability check. Then open the dashboard: + +``` +http://localhost:3000/dashboard +``` + +→ Continue to [Quick Start](./quick-start.md) to set up your first model and project. diff --git a/website/versioned_docs/version-0.1.5/getting-started/quick-start.md b/website/versioned_docs/version-0.1.5/getting-started/quick-start.md new file mode 100644 index 0000000..a2a1c13 --- /dev/null +++ b/website/versioned_docs/version-0.1.5/getting-started/quick-start.md @@ -0,0 +1,150 @@ +--- +title: Quick Start +sidebar_position: 2 +--- + +# Quick Start + +From zero to your first routed AI response in under 5 minutes. + +--- + +## Step 1: Install Routerly + +```bash +# macOS / Linux +curl -fsSL https://www.routerly.ai/install.sh | bash + +# Windows (PowerShell) +powershell -c "irm https://www.routerly.ai/install.ps1 | iex" +``` + +After the installer finishes, start the service if it isn't already running: + +```bash +routerly start +``` + +--- + +## Step 2: Create Your Admin Account + +Open the dashboard: + +``` +http://localhost:3000/dashboard +``` + +On first launch you will see the **Setup** screen. Enter an email address and password to create the admin account. This account has full control over all settings. + +--- + +## Step 3: Register a Model + +A **model** is a specific LLM available through a provider. You register it once with its API key; Routerly reuses it across all projects. + +**Via CLI:** + +```bash +# OpenAI +routerly model add --id gpt-5-mini --provider openai --api-key sk-YOUR_KEY + +# Anthropic +routerly model add --id claude-haiku-4-5 --provider anthropic --api-key sk-ant-YOUR_KEY + +# Ollama (local — no API key needed) +routerly model add --id ollama/qwen3:4b --provider ollama +``` + +Built-in pricing presets are available for well-known model IDs. If a preset is found, you do not need to specify `--input-price` / `--output-price`. + +**Via Dashboard** → **Models** → **+ New Model**: fill in the Model ID, Provider, and API Key. Pricing fields are pre-filled automatically for known models. + +--- + +## Step 4: Create a Project + +A **project** is an isolated workspace. It gets its own Bearer token and its own routing configuration. + +```bash +routerly project add \ + --name "My App" \ + --slug my-app \ + --models gpt-5-mini +``` + +The command prints your **project token** — a string starting with `sk-rt-`. Save it; you'll use it in your application. + +:::warning Token visibility +The project token is shown **only once** after creation. Store it securely. You can generate a new token from the dashboard at any time. +::: + +--- + +## Step 5: Make Your First API Call + +Point any OpenAI-compatible SDK at Routerly and use your project token as the API key. + +### Python + +```python +from openai import OpenAI + +client = OpenAI( + base_url="http://localhost:3000/v1", + api_key="sk-rt-YOUR_PROJECT_TOKEN", +) + +response = client.chat.completions.create( + model="gpt-5-mini", + messages=[{"role": "user", "content": "Hello!"}], +) +print(response.choices[0].message.content) +``` + +### TypeScript / Node.js + +```typescript +import OpenAI from 'openai'; + +const client = new OpenAI({ + baseURL: 'http://localhost:3000/v1', + apiKey: 'sk-rt-YOUR_PROJECT_TOKEN', +}); + +const response = await client.chat.completions.create({ + model: 'gpt-5-mini', + messages: [{ role: 'user', content: 'Hello!' }], +}); +console.log(response.choices[0].message.content); +``` + +### curl + +```bash +curl http://localhost:3000/v1/chat/completions \ + -H "Authorization: Bearer sk-rt-YOUR_PROJECT_TOKEN" \ + -H "Content-Type: application/json" \ + -d '{"model":"gpt-5-mini","messages":[{"role":"user","content":"Hello!"}]}' +``` + +--- + +## Step 6: Check Usage + +Open **Usage** in the dashboard or use the CLI: + +```bash +routerly report usage # aggregated by model, this month +routerly report calls --limit 10 # last 10 request records +``` + +--- + +## Next Steps + +- Add more models → [Concepts: Models](../concepts/models.md) +- Configure routing policies → [Concepts: Routing](../concepts/routing.md) +- Set spending limits → [Concepts: Budgets & Limits](../concepts/budgets-and-limits.md) +- Invite team members → [Dashboard: Users & Roles](../dashboard/users-and-roles.md) +- Try requests in the browser → [Dashboard: Playground](../dashboard/playground.md) diff --git a/website/versioned_docs/version-0.1.5/guides/self-hosting.md b/website/versioned_docs/version-0.1.5/guides/self-hosting.md new file mode 100644 index 0000000..b775929 --- /dev/null +++ b/website/versioned_docs/version-0.1.5/guides/self-hosting.md @@ -0,0 +1,226 @@ +--- +title: Self-Hosting +sidebar_position: 1 +--- + +# Self-Hosting + +This guide covers production deployment options for Routerly: Docker, systemd, launchd (macOS), and Windows Service. It also includes reverse-proxy configuration and a production readiness checklist. + +--- + +## Docker (Recommended) + +Docker is the easiest way to run Routerly in production. Data persists in a named volume. + +Two options are available: pull the pre-built image from Docker Hub, or build locally from source. + +### Option 1 — Pre-built image (Docker Hub) + +The official image is published on [`inebrio/routerly`](https://hub.docker.com/r/inebrio/routerly) for `linux/amd64` and `linux/arm64`. + +### docker-compose.yml + +```yaml +services: + routerly: + image: inebrio/routerly:latest + container_name: routerly + ports: + - "3000:3000" + volumes: + - routerly_data:/data + environment: + - ROUTERLY_HOME=/data + - NODE_ENV=production + restart: unless-stopped + healthcheck: + test: ["CMD", "curl", "-f", "http://localhost:3000/health"] + interval: 30s + timeout: 5s + retries: 3 + +volumes: + routerly_data: +``` + +```bash +docker compose up -d +docker compose logs -f # follow logs +``` + +### Option 2 — Build from source + +Use this if you want to run a local branch or a customised build. + +```bash +git clone https://github.com/Inebrio/Routerly.git +cd Routerly +docker build -t routerly:local . +docker run -d \ + --name routerly \ + -p 3000:3000 \ + -v routerly_data:/data \ + -e ROUTERLY_HOME=/data \ + --restart unless-stopped \ + routerly:local +``` + +### Backup + +```bash +# Backup config and data +docker run --rm \ + -v routerly_data:/data \ + -v $(pwd):/backup \ + alpine tar czf /backup/routerly-backup-$(date +%Y%m%d).tar.gz -C /data . +``` + +--- + +## systemd (Linux) + +### User service (no root required) + +Create `~/.config/systemd/user/routerly.service`: + +```ini +[Unit] +Description=Routerly LLM Gateway +After=network.target + +[Service] +ExecStart=/home/USERNAME/.routerly/app/routerly-service +WorkingDirectory=/home/USERNAME/.routerly +Restart=on-failure +RestartSec=5 +StandardOutput=journal +StandardError=journal + +[Install] +WantedBy=default.target +``` + +Enable and start: + +```bash +systemctl --user daemon-reload +systemctl --user enable routerly +systemctl --user start routerly +systemctl --user status routerly +``` + +### System service (root) + +Create `/etc/systemd/system/routerly.service`: + +```ini +[Unit] +Description=Routerly LLM Gateway +After=network.target + +[Service] +User=routerly +Group=routerly +ExecStart=/opt/routerly/app/routerly-service +WorkingDirectory=/opt/routerly +Restart=on-failure +RestartSec=5 +StandardOutput=journal +StandardError=journal + +[Install] +WantedBy=multi-user.target +``` + +```bash +useradd --system --home /opt/routerly routerly +systemctl daemon-reload +systemctl enable --now routerly +``` + +--- + +## launchd (macOS) + +Create `~/Library/LaunchAgents/ai.routerly.service.plist`: + +```xml + + + + + Label + ai.routerly.service + ProgramArguments + + /Users/USERNAME/.routerly/app/routerly-service + + RunAtLoad + + KeepAlive + + StandardOutPath + /Users/USERNAME/.routerly/logs/output.log + StandardErrorPath + /Users/USERNAME/.routerly/logs/error.log + + +``` + +```bash +mkdir -p ~/.routerly/logs +launchctl load ~/Library/LaunchAgents/ai.routerly.service.plist +``` + +--- + +## Reverse Proxy + +**Always** place Routerly behind a reverse proxy in production to handle TLS termination, rate limiting, and access control. + +### nginx + +```nginx +server { + listen 443 ssl http2; + server_name routerly.example.com; + + ssl_certificate /etc/letsencrypt/live/routerly.example.com/fullchain.pem; + ssl_certificate_key /etc/letsencrypt/live/routerly.example.com/privkey.pem; + + location / { + proxy_pass http://127.0.0.1:3000; + proxy_http_version 1.1; + proxy_set_header Upgrade $http_upgrade; + proxy_set_header Connection "upgrade"; + proxy_set_header Host $host; + proxy_set_header X-Real-IP $remote_addr; + proxy_buffering off; # Required for SSE streaming + proxy_read_timeout 120s; + } +} +``` + +### Caddy + +```caddyfile +routerly.example.com { + reverse_proxy localhost:3000 { + flush_interval -1 # Required for SSE streaming + } +} +``` + +--- + +## Production Checklist + +- [ ] Routerly is behind a reverse proxy with TLS +- [ ] `publicUrl` in Settings is set to the external HTTPS URL +- [ ] `host` is set to `127.0.0.1` (bind only to localhost, let the proxy handle external traffic) +- [ ] `logLevel` is set to `warn` or `error` to reduce log volume +- [ ] `~/.routerly/config/secret` and `config/*.json` are backed up +- [ ] Notification channels are configured for budget alerts +- [ ] A global budget limit is set to prevent unbounded spending +- [ ] Dashboard access is restricted to internal network or behind auth if the public URL is externally accessible diff --git a/website/versioned_docs/version-0.1.5/integrations/cline.md b/website/versioned_docs/version-0.1.5/integrations/cline.md new file mode 100644 index 0000000..e93d127 --- /dev/null +++ b/website/versioned_docs/version-0.1.5/integrations/cline.md @@ -0,0 +1,43 @@ +--- +title: Cline +sidebar_label: Cline +--- + +# Cline + +[Cline](https://github.com/cline/cline) is an autonomous coding agent that can read files, write code, run terminal commands, and browse the web. It runs inside VS Code and uses the OpenAI or Anthropic API for its reasoning model. + +--- + +## Install + +Install the [Cline extension](https://marketplace.visualstudio.com/items?itemName=saoudrizwan.claude-dev) from the VS Code Marketplace. + +--- + +## Configure + +1. Open the Cline extension panel and click the **Settings** gear. +2. Set **API Provider** to `OpenAI Compatible`. +3. Fill in: + - **Base URL** → `http://localhost:3000/v1` + - **API Key** → `sk-rt-YOUR_PROJECT_TOKEN` + - **Model** → any model registered in your Routerly project (e.g. `gpt-5-mini`) +4. Set **Context Window** to at least `32000` — agentic tasks require a large context. + +To use Anthropic via Routerly: + +1. Set **API Provider** to `Anthropic`. +2. Set **Base URL** to `http://localhost:3000` (no `/v1`). +3. Set **API Key** to `sk-rt-YOUR_PROJECT_TOKEN`. +4. Pick a model (e.g. `claude-haiku-4-5`). + +:::note +Agentic tasks consume many tokens per step. Set a [budget limit](../concepts/budgets-and-limits) on your project token to cap spending automatically. +::: + +--- + +## Usage + +Open the Cline panel and describe the task. Cline will plan, write code, and execute steps autonomously. Every LLM call is routed through Routerly — costs and traces are visible in the dashboard. diff --git a/website/versioned_docs/version-0.1.5/integrations/continue.md b/website/versioned_docs/version-0.1.5/integrations/continue.md new file mode 100644 index 0000000..30b4379 --- /dev/null +++ b/website/versioned_docs/version-0.1.5/integrations/continue.md @@ -0,0 +1,62 @@ +--- +title: Continue.dev +sidebar_label: Continue.dev +--- + +# Continue.dev + +[Continue](https://continue.dev) is an open-source AI coding assistant for VS Code and JetBrains. It supports any OpenAI-compatible backend and has first-class support for custom endpoints. + +--- + +## Install + +Install the [Continue extension](https://marketplace.visualstudio.com/items?itemName=Continue.continue) from the VS Code Marketplace, or the [JetBrains plugin](https://plugins.jetbrains.com/plugin/22707-continue) from the JetBrains Marketplace. + +--- + +## Configure + +Open `~/.continue/config.json` and add Routerly as a model provider: + +```json +{ + "models": [ + { + "title": "Routerly", + "provider": "openai", + "model": "gpt-5-mini", + "apiBase": "http://localhost:3000/v1", + "apiKey": "sk-rt-YOUR_PROJECT_TOKEN" + } + ] +} +``` + +To use the Anthropic Messages API instead: + +```json +{ + "models": [ + { + "title": "Routerly (Anthropic)", + "provider": "anthropic", + "model": "claude-haiku-4-5", + "apiBase": "http://localhost:3000", + "apiKey": "sk-rt-YOUR_PROJECT_TOKEN" + } + ] +} +``` + +Save the file — Continue reloads configuration automatically. + +:::tip +You can add multiple entries pointing at Routerly with different model names. Continue will show them as separate options in its model picker, while Routerly routes them all through the same policy engine. +::: + +--- + +## Usage + +Click the Continue icon in the sidebar, select your Routerly model from the picker, and use Chat, Edit, or Autocomplete as normal. Every request is routed through Routerly's engine. diff --git a/website/versioned_docs/version-0.1.5/integrations/cursor.md b/website/versioned_docs/version-0.1.5/integrations/cursor.md new file mode 100644 index 0000000..de9767e --- /dev/null +++ b/website/versioned_docs/version-0.1.5/integrations/cursor.md @@ -0,0 +1,33 @@ +--- +title: Cursor +sidebar_label: Cursor +--- + +# Cursor + +[Cursor](https://cursor.com) is an AI-first code editor built on VS Code. Its AI features — inline completions, Composer, and Chat — all use the OpenAI API under the hood and can be redirected to Routerly. + +--- + +## Install + +Download Cursor from [cursor.com](https://cursor.com). + +--- + +## Configure + +1. Open Cursor → **Settings** → **Cursor Settings** → **Models**. +2. Scroll to **OpenAI API Key** and enter: `sk-rt-YOUR_PROJECT_TOKEN` +3. Enable **Override OpenAI Base URL** and set it to: `http://localhost:3000/v1` +4. Click **Verify** to confirm the connection. + +:::note +Cursor sends model names from its own list. Because Routerly's router ignores the model field and selects the best candidate from your routing policy, this works correctly even if the model name does not match any registered model exactly. To pin a specific model, register it in your project and set the routing policy to `pinned`. +::: + +--- + +## Usage + +Use Cursor as normal. All AI requests — Tab completions, Chat, Composer — are proxied through Routerly. Cost data and routing traces appear in **Usage** in the Routerly dashboard. diff --git a/website/versioned_docs/version-0.1.5/integrations/haystack.md b/website/versioned_docs/version-0.1.5/integrations/haystack.md new file mode 100644 index 0000000..f51fc88 --- /dev/null +++ b/website/versioned_docs/version-0.1.5/integrations/haystack.md @@ -0,0 +1,73 @@ +--- +title: Haystack +sidebar_label: Haystack +--- + +# Haystack + +[Haystack](https://haystack.deepset.ai) by deepset is an open-source NLP framework for building production-ready RAG and Question Answering pipelines. Its `OpenAIChatGenerator` and `OpenAITextEmbedder` components accept a custom endpoint, making them fully compatible with Routerly. + +--- + +## Install + +```bash +pip install haystack-ai +``` + +--- + +## Configure + +Pass the Routerly endpoint when constructing any OpenAI-based component: + +```python +from haystack.components.generators.chat import OpenAIChatGenerator +from haystack.dataclasses import ChatMessage +from haystack.utils import Secret + +generator = OpenAIChatGenerator( + model="gpt-5-mini", + api_base_url="http://localhost:3000/v1", + api_key=Secret.from_token("sk-rt-YOUR_PROJECT_TOKEN"), +) +``` + +For embeddings: + +```python +from haystack.components.embedders import OpenAITextEmbedder + +embedder = OpenAITextEmbedder( + model="text-embedding-3-small", + api_base_url="http://localhost:3000/v1", + api_key=Secret.from_token("sk-rt-YOUR_PROJECT_TOKEN"), +) +``` + +--- + +## Usage + +Use the components in a Haystack `Pipeline` as usual: + +```python +from haystack import Pipeline + +pipeline = Pipeline() +pipeline.add_component("generator", generator) + +result = pipeline.run({ + "generator": { + "messages": [ChatMessage.from_user("Summarise the French Revolution in two sentences.")] + } +}) + +print(result["generator"]["replies"][0].content) +``` + +All inference calls in the pipeline are routed through Routerly's engine. + +:::tip +If your pipeline uses both a generator and an embedder, you can register both models in the same Routerly project and control costs and limits centrally. +::: diff --git a/website/versioned_docs/version-0.1.5/integrations/jupyter.md b/website/versioned_docs/version-0.1.5/integrations/jupyter.md new file mode 100644 index 0000000..e107259 --- /dev/null +++ b/website/versioned_docs/version-0.1.5/integrations/jupyter.md @@ -0,0 +1,85 @@ +--- +title: Jupyter +sidebar_label: Jupyter +--- + +# Jupyter + +[Jupyter](https://jupyter.org) notebooks are the standard environment for Python-based data science and AI experimentation. Because Routerly speaks the OpenAI API, you can use the `openai` Python package inside any notebook without extra dependencies. + +--- + +## Install + +Install the OpenAI SDK in your notebook environment: + +```python +%pip install openai +``` + +--- + +## Configure + +Create the client once at the top of your notebook: + +```python +from openai import OpenAI + +client = OpenAI( + base_url="http://localhost:3000/v1", + api_key="sk-rt-YOUR_PROJECT_TOKEN", +) +``` + +--- + +## Usage + +### Chat completion + +```python +response = client.chat.completions.create( + model="gpt-5-mini", + messages=[{"role": "user", "content": "Explain gradient descent in two sentences."}], +) +print(response.choices[0].message.content) +``` + +### Streaming + +```python +stream = client.chat.completions.create( + model="gpt-5-mini", + messages=[{"role": "user", "content": "Write a haiku about neural networks."}], + stream=True, +) +for chunk in stream: + print(chunk.choices[0].delta.content or "", end="", flush=True) +``` + +### Using Anthropic SDK + +```python +%pip install anthropic +``` + +```python +import anthropic + +client = anthropic.Anthropic( + base_url="http://localhost:3000", + api_key="sk-rt-YOUR_PROJECT_TOKEN", +) + +message = client.messages.create( + model="claude-haiku-4-5", + max_tokens=256, + messages=[{"role": "user", "content": "What is backpropagation?"}], +) +print(message.content[0].text) +``` + +:::tip +Store your project token in a Jupyter environment variable rather than hardcoding it. Add `export ROUTERLY_TOKEN=sk-rt-…` to your shell profile and read it with `os.environ["ROUTERLY_TOKEN"]`. +::: diff --git a/website/versioned_docs/version-0.1.5/integrations/langchain.md b/website/versioned_docs/version-0.1.5/integrations/langchain.md new file mode 100644 index 0000000..c664da4 --- /dev/null +++ b/website/versioned_docs/version-0.1.5/integrations/langchain.md @@ -0,0 +1,75 @@ +--- +title: LangChain +sidebar_label: LangChain +--- + +# LangChain + +[LangChain](https://langchain.com) is a framework for building LLM-powered applications — chains, agents, RAG pipelines, and more. Both the Python and JavaScript versions use the OpenAI or Anthropic client under the hood, so they work with Routerly without any framework-specific changes. + +--- + +## Install + +```bash +# Python +pip install langchain langchain-openai langchain-anthropic + +# JavaScript / TypeScript +npm install langchain @langchain/openai +``` + +--- + +## Configure + +Pass the Routerly base URL and project token when initialising the ChatOpenAI model: + +```python title="Python" +from langchain_openai import ChatOpenAI + +llm = ChatOpenAI( + model="gpt-5-mini", + base_url="http://localhost:3000/v1", + api_key="sk-rt-YOUR_PROJECT_TOKEN", +) +``` + +```typescript title="JavaScript / TypeScript" +import { ChatOpenAI } from "@langchain/openai"; + +const llm = new ChatOpenAI({ + model: "gpt-5-mini", + configuration: { + baseURL: "http://localhost:3000/v1", + apiKey: "sk-rt-YOUR_PROJECT_TOKEN", + }, +}); +``` + +To use Anthropic models via the Anthropic SDK + LangChain: + +```python title="Python (Anthropic)" +from langchain_anthropic import ChatAnthropic + +llm = ChatAnthropic( + model="claude-haiku-4-5", + anthropic_api_url="http://localhost:3000", + api_key="sk-rt-YOUR_PROJECT_TOKEN", +) +``` + +--- + +## Usage + +Use `llm` in any LangChain chain, agent, or LCEL expression as you normally would: + +```python +from langchain_core.messages import HumanMessage + +response = llm.invoke([HumanMessage(content="Explain LangChain in one sentence.")]) +print(response.content) +``` + +Every call goes through Routerly. Retries, failover, and cost tracking are handled transparently. diff --git a/website/versioned_docs/version-0.1.5/integrations/librechat.md b/website/versioned_docs/version-0.1.5/integrations/librechat.md new file mode 100644 index 0000000..73bb50f --- /dev/null +++ b/website/versioned_docs/version-0.1.5/integrations/librechat.md @@ -0,0 +1,74 @@ +--- +title: LibreChat +sidebar_position: 10 +--- + +# LibreChat + +[LibreChat](https://www.librechat.ai/) is a self-hosted, open-source chat interface that supports custom OpenAI-compatible endpoints. Routerly integrates as a named endpoint so all your registered models appear automatically inside LibreChat. + +--- + +## Setup + +### 1. Configure the endpoint + +In your LibreChat `librechat.yaml` configuration file, add a custom endpoint entry: + +```yaml +endpoints: + custom: + - name: "Routerly" + apiKey: "sk-rt-YOUR_PROJECT_TOKEN" + baseURL: "http://localhost:3000/v1" + models: + default: ["gpt-5-mini"] + fetch: false + titleConvo: true + titleModel: "gpt-5-mini" + summarize: false + summaryModel: "gpt-5-mini" + forcePrompt: false + dropParams: [] +``` + +Replace `sk-rt-YOUR_PROJECT_TOKEN` with a valid project token from your Routerly dashboard (**Projects → your project → Tokens**). + +### 2. Restart LibreChat + +```bash +docker compose restart +# or +npm run start +``` + +**Routerly** will appear as a selectable endpoint in the LibreChat UI. Any model listed under `models.default` (or dynamically fetched if `fetch: true`) will be available to users. + +--- + +## Fetching models dynamically + +Set `fetch: true` to let LibreChat query Routerly's model list at startup: + +```yaml +models: + fetch: true +``` + +Routerly returns the models registered in the project associated with the token. Users will see exactly the models you have configured in that project. + +--- + +## Tips + +- **Multiple projects**: Add one `custom` entry per Routerly project token, each with a distinct `name`. +- **Routing transparency**: Each request goes through Routerly's full routing stack — routing policies, budget enforcement, and cost tracking all apply. +- **API compatibility**: LibreChat uses the OpenAI `/v1/chat/completions` format. Anthropic-format models registered in Routerly are also available because Routerly normalises all requests internally. + +--- + +## Related + +- [Open WebUI](./open-webui) — another self-hosted chat UI that works the same way +- [API — LLM Proxy](../api/llm-proxy) — full endpoint reference +- [Concepts — Projects](../concepts/projects) — how project tokens work diff --git a/website/versioned_docs/version-0.1.5/integrations/llamaindex.md b/website/versioned_docs/version-0.1.5/integrations/llamaindex.md new file mode 100644 index 0000000..4384fed --- /dev/null +++ b/website/versioned_docs/version-0.1.5/integrations/llamaindex.md @@ -0,0 +1,77 @@ +--- +title: LlamaIndex +sidebar_label: LlamaIndex +--- + +# LlamaIndex + +[LlamaIndex](https://llamaindex.ai) is a data framework for building LLM applications over custom data — RAG, document parsing, agent workflows, and structured data extraction. It supports OpenAI-compatible endpoints natively. + +--- + +## Install + +```bash +# Python +pip install llama-index llama-index-llms-openai + +# TypeScript +npm install llamaindex +``` + +--- + +## Configure + +```python title="Python" +from llama_index.llms.openai import OpenAI + +llm = OpenAI( + model="gpt-5-mini", + api_base="http://localhost:3000/v1", + api_key="sk-rt-YOUR_PROJECT_TOKEN", +) +``` + +To set Routerly as the global default so every index and query engine uses it automatically: + +```python +from llama_index.core import Settings + +Settings.llm = OpenAI( + model="gpt-5-mini", + api_base="http://localhost:3000/v1", + api_key="sk-rt-YOUR_PROJECT_TOKEN", +) +``` + +```typescript title="TypeScript" +import { OpenAI, Settings } from "llamaindex"; + +Settings.llm = new OpenAI({ + model: "gpt-5-mini", + additionalSessionOptions: { + baseURL: "http://localhost:3000/v1", + apiKey: "sk-rt-YOUR_PROJECT_TOKEN", + }, +}); +``` + +--- + +## Usage + +Build your index and query normally: + +```python +from llama_index.core import VectorStoreIndex, SimpleDirectoryReader + +documents = SimpleDirectoryReader("./data").load_data() +index = VectorStoreIndex.from_documents(documents) +query_engine = index.as_query_engine() + +response = query_engine.query("What is the main topic of these documents?") +print(response) +``` + +All LLM calls from LlamaIndex — retrieval, synthesis, re-ranking — flow through Routerly. diff --git a/website/versioned_docs/version-0.1.5/integrations/make.md b/website/versioned_docs/version-0.1.5/integrations/make.md new file mode 100644 index 0000000..b7198ca --- /dev/null +++ b/website/versioned_docs/version-0.1.5/integrations/make.md @@ -0,0 +1,48 @@ +--- +title: Make +sidebar_label: Make +--- + +# Make + +[Make](https://make.com) (formerly Integromat) is a visual workflow automation platform. Use its **HTTP module** to call Routerly from any scenario. + +--- + +## Configure + +Make does not have a native Routerly module, but its **HTTP → Make a request** action handles it cleanly. + +1. Add a **HTTP → Make a request** module to your scenario. +2. Configure it: + +| Field | Value | +|-------|-------| +| **URL** | `http://localhost:3000/v1/chat/completions` | +| **Method** | `POST` | +| **Headers** | `Authorization: Bearer sk-rt-YOUR_PROJECT_TOKEN` / `Content-Type: application/json` | +| **Body type** | Raw | +| **Content type** | JSON (application/json) | +| **Request content** | See below | + +```json +{ + "model": "gpt-5-mini", + "messages": [ + { "role": "user", "content": "{{1.text}}" } + ] +} +``` + +3. Map `{{1.text}}` (or any other variable) to the prompt you want to send. +4. Parse the response: the reply is at `choices[0].message.content`. + +:::note +Replace `localhost:3000` with your production Routerly URL if Make is connecting over the internet. Make sure the endpoint is accessible from Make's cloud infrastructure. +::: + +--- + +## Usage + +Run the scenario. The response body contains a standard OpenAI `ChatCompletion` object. Use the Make **JSON → Parse JSON** module to extract the reply text. diff --git a/website/versioned_docs/version-0.1.5/integrations/marimo.md b/website/versioned_docs/version-0.1.5/integrations/marimo.md new file mode 100644 index 0000000..d555343 --- /dev/null +++ b/website/versioned_docs/version-0.1.5/integrations/marimo.md @@ -0,0 +1,63 @@ +--- +title: marimo +sidebar_label: marimo +--- + +# marimo + +[marimo](https://marimo.io) is a reactive Python notebook where cells re-run automatically when their inputs change. It has a built-in AI cell assistant that calls an OpenAI-compatible endpoint — point it at Routerly to route those calls through your policies. + +--- + +## Install + +```bash +pip install marimo +``` + +--- + +## Configure + +### AI cell assistant + +marimo's built-in AI features read from environment variables. Set them before launching: + +```bash +export OPENAI_API_KEY="sk-rt-YOUR_PROJECT_TOKEN" +export OPENAI_BASE_URL="http://localhost:3000/v1" +marimo edit notebook.py +``` + +### SDK calls inside cells + +You can also use the OpenAI or Anthropic SDK directly inside marimo cells: + +```python +import marimo as mo +from openai import OpenAI + +client = OpenAI( + base_url="http://localhost:3000/v1", + api_key="sk-rt-YOUR_PROJECT_TOKEN", +) + +prompt = mo.ui.text(placeholder="Ask something…") +prompt +``` + +```python +# Runs reactively whenever `prompt` changes +if prompt.value: + response = client.chat.completions.create( + model="gpt-5-mini", + messages=[{"role": "user", "content": prompt.value}], + ) + mo.md(response.choices[0].message.content) +``` + +--- + +## Usage + +Launch your notebook with `marimo edit notebook.py`. The AI cell assistant button and any direct SDK calls will route through Routerly. Usage data appears in the Routerly dashboard under **Usage**. diff --git a/website/versioned_docs/version-0.1.5/integrations/n8n.md b/website/versioned_docs/version-0.1.5/integrations/n8n.md new file mode 100644 index 0000000..e3af2fd --- /dev/null +++ b/website/versioned_docs/version-0.1.5/integrations/n8n.md @@ -0,0 +1,57 @@ +--- +title: n8n +sidebar_label: n8n +--- + +# n8n + +[n8n](https://n8n.io) is a self-hostable workflow automation platform. Its **OpenAI node** and generic **HTTP Request node** both connect to Routerly out of the box. + +--- + +## Install + +Run n8n via Docker: + +```bash +docker run -it --rm -p 5678:5678 n8nio/n8n +``` + +Or follow the [full n8n installation guide](https://docs.n8n.io/hosting/). + +--- + +## Configure + +### Option A — OpenAI node (recommended) + +1. In n8n, go to **Credentials** → **New** → **OpenAI API**. +2. Set: + - **API Key** → `sk-rt-YOUR_PROJECT_TOKEN` + - **Base URL** → `http://localhost:3000/v1` +3. Save the credential with a recognisable name (e.g. *Routerly*). +4. Add an **OpenAI** node to your workflow and select the new credential. + +### Option B — HTTP Request node + +For full control over the request body: + +1. Add an **HTTP Request** node. +2. Set **Method** to `POST` and **URL** to `http://localhost:3000/v1/chat/completions`. +3. Add a header: `Authorization: Bearer sk-rt-YOUR_PROJECT_TOKEN`. +4. Use **JSON Body**: + +```json +{ + "model": "gpt-5-mini", + "messages": [ + { "role": "user", "content": "{{ $json.prompt }}" } + ] +} +``` + +--- + +## Usage + +Connect the Routerly credential to any OpenAI node in your workflows. Cost data for each call is visible in the Routerly dashboard under **Usage**. diff --git a/website/versioned_docs/version-0.1.5/integrations/open-webui.md b/website/versioned_docs/version-0.1.5/integrations/open-webui.md new file mode 100644 index 0000000..c203a17 --- /dev/null +++ b/website/versioned_docs/version-0.1.5/integrations/open-webui.md @@ -0,0 +1,39 @@ +--- +title: Open WebUI +sidebar_label: Open WebUI +--- + +# Open WebUI + +[Open WebUI](https://openwebui.com) is a self-hosted chat interface that supports multiple OpenAI-compatible backends. Connect it to Routerly to route every conversation through your configured models and policies. + +--- + +## Install + +Follow the [Open WebUI installation guide](https://docs.openwebui.com/getting-started/) — Docker is the quickest path: + +```bash +docker run -d -p 3001:8080 --name open-webui ghcr.io/open-webui/open-webui:main +``` + +--- + +## Configure + +1. Open Open WebUI → **Admin Panel** → **Settings** → **Connections**. +2. Under **OpenAI API**, set: + - **API Base URL** → `http://localhost:3000/v1` + - **API Key** → `sk-rt-YOUR_PROJECT_TOKEN` +3. Click **Save**. +4. Open **Admin Panel** → **Settings** → **Models** and click the sync button to import your Routerly model list. + +:::note +If Routerly is running in Docker alongside Open WebUI, use the container name or host IP instead of `localhost` — for example `http://routerly:3000/v1`. +::: + +--- + +## Usage + +Start a new chat. Select any model from the model picker — the list is fetched live from Routerly's `/v1/models` endpoint. All requests pass through Routerly's routing engine, so budget enforcement and cost tracking apply automatically. diff --git a/website/versioned_docs/version-0.1.5/integrations/openclaw.md b/website/versioned_docs/version-0.1.5/integrations/openclaw.md new file mode 100644 index 0000000..c785a42 --- /dev/null +++ b/website/versioned_docs/version-0.1.5/integrations/openclaw.md @@ -0,0 +1,56 @@ +--- +title: OpenClaw +sidebar_label: OpenClaw +--- + +# OpenClaw + +[OpenClaw](https://openclaw.ai) is a personal AI agent that runs on your machine and lets you interact with it via Telegram, WhatsApp, Discord, iMessage, and other channels. It supports Custom Provider (OpenAI-compatible) endpoints, so you can point it at Routerly during setup or at any time afterwards. + +--- + +## Install + +```bash +curl -fsSL https://openclaw.ai/install.sh | bash +``` + +The script detects your OS, installs Node.js if needed, and launches the onboarding wizard. + +--- + +## Configure + +### During onboarding + +When `openclaw onboard` asks you to choose a model provider, select **Custom Provider (OpenAI-compatible)** and enter: + +- **Base URL** → `http://localhost:3000/v1` +- **API Key** → `sk-rt-YOUR_PROJECT_TOKEN` +- **Default model** → any model configured in your Routerly project (e.g. `gpt-4o`, `claude-opus-4-5`) + +### After onboarding + +To switch an existing installation to Routerly, re-run the model configuration step: + +```bash +openclaw configure --section model +``` + +Follow the same prompts: choose **Custom Provider (OpenAI-compatible)** and enter the Routerly base URL and project token above. + +--- + +## Usage + +Once the Gateway is running (`openclaw gateway status`), every message you send through your configured channel (Telegram, WhatsApp, etc.) goes through Routerly's routing engine. Cost tracking and budget enforcement apply automatically. + +Open the dashboard to verify: + +```bash +openclaw dashboard +``` + +:::tip +If OpenClaw and Routerly are running on the same machine, use `http://localhost:3000/v1` as the base URL. If Routerly is on a different host or in Docker, replace `localhost` with the appropriate address. +::: diff --git a/website/versioned_docs/version-0.1.5/integrations/overview.md b/website/versioned_docs/version-0.1.5/integrations/overview.md new file mode 100644 index 0000000..fd1341f --- /dev/null +++ b/website/versioned_docs/version-0.1.5/integrations/overview.md @@ -0,0 +1,95 @@ +--- +title: Integrations +sidebar_label: Overview +sidebar_position: 1 +--- + +# Integrations + +Routerly is compatible with any tool that speaks the OpenAI or Anthropic API. Point the tool at your Routerly instance, set your project token as the API key, and everything works — routing, budget enforcement, and cost tracking included. + +--- + +## Chat UI + +Chat interfaces that let you talk to your models directly. + +| Tool | Description | +|------|-------------| +| [Open WebUI](./open-webui) | Full-featured chat UI with multi-model support | +| [OpenClaw](./openclaw) | Personal AI agent with Telegram, WhatsApp, Discord support | +| [LibreChat](./librechat) | Self-hosted open-source chat interface with multi-endpoint support | + +--- + +## IDE & Editor + +Native integrations for popular development environments. + +| Tool | Description | +|------|-------------| +| [Cursor](./cursor) | AI-first code editor with inline completions and chat | +| [Continue.dev](./continue) | Open-source AI code assistant for VS Code and JetBrains | +| [Cline](./cline) | Autonomous coding agent with file and terminal access | +| [VS Code](./vscode) | GitHub Copilot-compatible AI coding assistant | + +--- + +## Frameworks + +AI application frameworks that call the OpenAI or Anthropic API. + +| Tool | Description | +|------|-------------| +| [LangChain](./langchain) | Composable LLM application framework | +| [LlamaIndex](./llamaindex) | Data framework for LLM-powered applications | +| [Haystack](./haystack) | NLP and RAG pipeline framework | + +--- + +## Automation + +Workflow automation platforms with built-in AI nodes. + +| Tool | Description | +|------|-------------| +| [n8n](./n8n) | Self-hostable workflow automation with an HTTP/AI node | +| [Make](./make) | Visual workflow automation with HTTP actions | + +--- + +## Notebooks + +Interactive computing environments. + +| Tool | Description | +|------|-------------| +| [Jupyter](./jupyter) | Classic interactive notebook via the OpenAI Python SDK | +| [marimo](./marimo) | Reactive Python notebook with built-in AI cell support | + +--- + +## How it works + +Every integration follows the same two-step pattern: + +1. **Set the base URL** to your Routerly instance — `http://localhost:3000/v1` for local, or your production URL. +2. **Set the API key** to your project token — `sk-rt-YOUR_PROJECT_TOKEN`. + +Routerly looks like OpenAI or Anthropic to any client. No SDK patches, no plugins. + +:::tip +If a tool asks for an **OpenAI API key** or **base URL**, those are the two fields to change. +For tools with an **Anthropic** mode, the base URL is `http://localhost:3000` (without `/v1`). +::: + +--- + +## Not seeing your tool? + +If a tool exposes a configurable OpenAI base URL, it will work with Routerly. Check the tool's documentation for terms like: + +- *custom base URL* +- *API endpoint* +- *OpenAI-compatible* +- *self-hosted* diff --git a/website/versioned_docs/version-0.1.5/integrations/vscode.md b/website/versioned_docs/version-0.1.5/integrations/vscode.md new file mode 100644 index 0000000..92cb9d3 --- /dev/null +++ b/website/versioned_docs/version-0.1.5/integrations/vscode.md @@ -0,0 +1,37 @@ +--- +title: VS Code +sidebar_label: VS Code +--- + +# VS Code + +[VS Code](https://code.visualstudio.com) offers two main paths to bring LLM features into the editor: **GitHub Copilot** (built-in, OAuth-only) and **third-party AI extensions** such as Continue and Cline that support custom OpenAI-compatible endpoints. + +--- + +## GitHub Copilot + +GitHub Copilot authenticates exclusively via **GitHub OAuth** — it sends a GitHub-issued token to the upstream server, not an `sk-rt-*` project token. Routerly expects a Bearer project token and returns **401** for any other credential, so routing Copilot through Routerly is not possible. The authentication schemes are fundamentally incompatible. + +:::info Alternativa consigliata +Usa [Continue](./continue) o [Cline](./cline) in VS Code. Entrambe le estensioni supportano un base URL OpenAI-compatible personalizzato e si autenticano con un project token esattamente come Routerly si aspetta. +::: + +--- + +## Estensioni con endpoint personalizzato + +Le estensioni che usano l'API OpenAI e accettano un base URL personalizzato funzionano con Routerly senza modifiche. Due scelte popolari: + +| Estensione | Installa | Guida | +|------------|----------|-------| +| Continue | [Marketplace](https://marketplace.visualstudio.com/items?itemName=Continue.continue) | [Continue → Routerly](./continue) | +| Cline | [Marketplace](https://marketplace.visualstudio.com/items?itemName=saoudrizwan.claude-dev) | [Cline → Routerly](./cline) | + +Entrambe seguono lo stesso schema: imposta il **base URL** su `http://localhost:3000/v1` e l'**API key** sul tuo project token. + +--- + +## Language Model API (vscode.lm) + +Le estensioni VS Code che usano la [Language Model API](https://code.visualstudio.com/api/extension-guides/language-model) built-in instradano le richieste attraverso il backend di Copilot e sono quindi soggette alla stessa limitazione OAuth descritta sopra. Le estensioni custom-endpoint come Continue e Cline bypassano questa API e chiamano Routerly direttamente. diff --git a/website/versioned_docs/version-0.1.5/intro.md b/website/versioned_docs/version-0.1.5/intro.md new file mode 100644 index 0000000..b28852e --- /dev/null +++ b/website/versioned_docs/version-0.1.5/intro.md @@ -0,0 +1,126 @@ +--- +slug: / +title: Introduction +sidebar_position: 1 +--- + +# Routerly + +**One gateway. Any AI model. Total control.** + +Routerly is a self-hosted LLM API gateway with intelligent routing, cost tracking, and budget enforcement. It sits between your application and your LLM providers — OpenAI, Anthropic, Google Gemini, Ollama, and more — and decides which model to use for every request based on configurable policies. + +**Compatible with OpenAI and Anthropic SDKs out of the box.** Change the base URL in your client, nothing else. + +--- + +## Key Features + +| Feature | Description | +|---------|-------------| +| **Intelligent routing** | 9 configurable policies score every request in parallel — cheapest, fastest, healthiest, most capable, or LLM-powered | +| **Budget enforcement** | Hard limits per model, project, and token. Requests are blocked before being sent to providers | +| **Full cost tracking** | Every API call logged with tokens, USD cost, latency, and routing trace | +| **Multi-project isolation** | Each project has its own API token, model list, routing config, and budget envelope | +| **OpenAI + Anthropic compatible** | Drop-in replacement for both APIs — `/v1/chat/completions` and `/v1/messages` | +| **Zero infrastructure** | No database, no Redis, no PostgreSQL — config lives in JSON files | +| **Web dashboard** | Register models, configure projects, monitor usage, manage users | +| **Admin CLI** | Full management from the terminal with `routerly` commands | +| **RBAC** | Role-based access control with 7 granular permissions and custom roles | +| **Notifications** | Email and webhook alerts via SMTP, SES, SendGrid, Azure, Google, or custom webhook | + +--- + +## How It Works + +Point your app — or any AI tool like Cursor, Open WebUI, OpenClaw, or LangChain — at `localhost:3000` instead of the provider's URL. From that moment, Routerly takes over: it picks the best available model for each request, enforces your budget, tracks every token spent, and automatically reroutes if a provider fails. + +``` +Any Client Routerly Providers +────────── ──────── ───────── +Your App ──▶ Authenticate OpenAI +Cursor Policy scoring ──▶ Anthropic +Open WebUI POST /v1/ Select model Gemini +OpenClaw chat/ Forward request Ollama +LangChain completions Track cost ◀── ... + ◀── Return response +``` + +1. Your app sends a request to Routerly using a **project token** as the Bearer header +2. Routerly authenticates the token and resolves the project +3. Enabled routing policies run in parallel and score each candidate model +4. The highest-scoring model within budget receives the request +5. Response is forwarded back; tokens, cost, and latency are recorded +6. On error or timeout, the next candidate is tried automatically + +--- + +## Use Cases + +| Scenario | How Routerly helps | +|----------|--------------------| +| **SaaS & multi-tenant** | One project per tenant, hard spend caps, automatic cheapest-model routing | +| **Local-first development** | Develop against Ollama locally, promote to GPT-4o in production — same API, same code | +| **Automatic cost reduction** | Route simple tasks to cheap models, complex ones to capable models | +| **Resilience & failover** | Register the same capability across OpenAI, Claude, and Gemini — Routerly detects failures and reroutes in real time | +| **AI tools & chat UIs** | Cursor, Open WebUI, OpenClaw, LibreChat — any tool that speaks OpenAI or Anthropic format | +| **Model evaluation / A/B testing** | Split traffic with the fairness policy; compare cost, latency, and error rate in the dashboard | + +--- + +## Quick Start + +**macOS / Linux:** + +```bash +curl -fsSL https://www.routerly.ai/install.sh | bash +``` + +**Windows (PowerShell):** + +```powershell +powershell -c "irm https://www.routerly.ai/install.ps1 | iex" +``` + +Then register a model, create a project, and start: + +```bash +routerly model add --id gpt-5-mini --provider openai --api-key sk-YOUR_KEY +routerly project add --name "My App" --slug my-app --models gpt-5-mini +routerly start +``` + +Open `http://localhost:3000/dashboard` to manage everything from the web UI. + +→ See the [Installation guide](./getting-started/installation.md) and [Quick Start tutorial](./getting-started/quick-start.md) for full details. + +--- + +## Comparison + +> Routerly is the only gateway that combines self-hosting, native Anthropic support, LLM-powered routing, and zero external dependencies. + +| | **Routerly** | **LiteLLM** | **OpenRouter** | +|---|:---:|:---:|:---:| +| Self-hosted | ✅ | ✅ | ❌ cloud-only | +| OpenAI-compatible API | ✅ | ✅ | ✅ | +| Native Anthropic API format | ✅ | ❌ | ❌ | +| Local model support (Ollama) | ✅ | ✅ | ❌ | +| LLM-powered smart routing | ✅ | ❌ | ❌ | +| Deterministic routing policies | ✅ | ⚠️ limited | ❌ | +| Budget enforcement | ✅ | ✅ | ✅ | +| Database required | **None** | SQLite / PostgreSQL | N/A | +| External infrastructure | **None** | Optional Redis | N/A | +| Web dashboard | ✅ built-in | ✅ | ✅ | +| Admin CLI | ✅ | ✅ | ❌ | +| Data privacy | ✅ stays local | ✅ | ❌ transits cloud | + +--- + +## Next Steps + +- [Install Routerly](./getting-started/installation.md) +- [Run through the Quick Start](./getting-started/quick-start.md) +- [Understand routing policies](./concepts/routing.md) +- [Configure budgets and limits](./concepts/budgets-and-limits.md) +- [Explore the API reference](./api/overview.md) diff --git a/website/versioned_docs/version-0.1.5/logo.svg b/website/versioned_docs/version-0.1.5/logo.svg new file mode 100644 index 0000000..5c31eb1 --- /dev/null +++ b/website/versioned_docs/version-0.1.5/logo.svg @@ -0,0 +1,39 @@ + + + + + + + + + + + + + diff --git a/website/versioned_docs/version-0.1.5/reference/config-files.md b/website/versioned_docs/version-0.1.5/reference/config-files.md new file mode 100644 index 0000000..f6b077e --- /dev/null +++ b/website/versioned_docs/version-0.1.5/reference/config-files.md @@ -0,0 +1,289 @@ +--- +title: Config Files +sidebar_position: 1 +--- + +# Config Files + +Routerly stores all configuration as JSON files. The exact location depends on the **installation scope** chosen at install time. Files are written atomically and are human-readable. + +The service reads the root directory from the `ROUTERLY_HOME` environment variable (set automatically by the installer in the daemon unit). If the variable is not set, it falls back to `~/.routerly/`. + +## Directory Layout + +### User scope (default) + +Everything lives under the installing user's home directory. + +``` +~/.routerly/ +├── app/ # Service binary (managed by installer) +├── config/ +│ ├── settings.json # Global settings +│ ├── models.json # Registered LLM models +│ ├── projects.json # Projects, routing, budgets, tokens +│ ├── users.json # User accounts +│ ├── roles.json # Custom roles and permissions +│ └── secret # JWT signing key (mode 0600, keep safe) +└── data/ + └── usage.json # Usage records +``` + +### System scope + +Service config and data move to a system-wide directory; the CLI auth tokens remain per-user. + +| Platform | Service config & data directory | +|----------|---------------------------------| +| Linux | `/var/lib/routerly/` | +| macOS | `/Library/Application Support/Routerly/` | +| Windows | `C:\ProgramData\Routerly\` | + +``` +/var/lib/routerly/ # (Linux example; see table above for other platforms) +├── config/ +│ ├── settings.json +│ ├── models.json +│ ├── projects.json +│ ├── users.json +│ ├── roles.json +│ └── secret # JWT signing key (mode 0600) +└── data/ + └── usage.json +``` + +### CLI auth tokens (always per-user) + +Regardless of install scope, each user's CLI credentials are stored in their own home directory, never in the system directory: + +``` +~/.routerly/ +└── cli/ + └── config.json # Saved accounts, JWT tokens, refresh tokens (mode 0600) +``` + +--- + +## settings.json + +Global service configuration. + +```json +{ + "port": 3000, + "host": "0.0.0.0", + "dashboardEnabled": true, + "defaultTimeoutMs": 30000, + "logLevel": "info", + "publicUrl": "http://localhost:3000", + "notifications": [] +} +``` + +| Field | Type | Default | Description | +|-------|------|---------|-------------| +| `port` | `number` | `3000` | TCP port the service listens on | +| `host` | `string` | `"0.0.0.0"` | Bind address. Use `127.0.0.1` behind a reverse proxy | +| `dashboardEnabled` | `boolean` | `true` | Enable or disable the web dashboard | +| `defaultTimeoutMs` | `number` | `30000` | Default provider request timeout in milliseconds | +| `logLevel` | `string` | `"info"` | Log verbosity: `"error"`, `"warn"`, `"info"`, `"debug"` | +| `publicUrl` | `string` | `"http://localhost:3000"` | Externally reachable URL, used for notification links | +| `notifications` | `array` | `[]` | Notification channel configurations — see [Notifications](../concepts/notifications.md) | + +--- + +## models.json + +Array of registered LLM model configurations. + +```json +[ + { + "id": "gpt-5-mini", + "provider": "openai", + "apiKey": "ENCRYPTED:...", + "inputPrice": 0.00015, + "outputPrice": 0.0006, + "cachePrice": 0.000075, + "contextWindow": 128000, + "capabilities": ["chat", "vision"], + "enabled": true + } +] +``` + +| Field | Type | Description | +|-------|------|-------------| +| `id` | `string` | Unique model identifier within Routerly | +| `provider` | `string` | Provider name: `openai`, `anthropic`, `gemini`, `mistral`, `cohere`, `xai`, `ollama`, `custom` | +| `apiKey` | `string` | API key — stored AES-256 encrypted with the value from `secret` | +| `inputPrice` | `number` | Cost per 1,000 input tokens in USD | +| `outputPrice` | `number` | Cost per 1,000 output tokens in USD | +| `cachePrice` | `number` | Cost per 1,000 cached/read tokens in USD (optional) | +| `contextWindow` | `number` | Maximum context window in tokens | +| `capabilities` | `string[]` | Supported capabilities: `"chat"`, `"vision"`, `"tools"`, `"json_mode"` | +| `pricingTiers` | `array` | Volume-based pricing tiers (optional) | +| `enabled` | `boolean` | Whether the model is available for routing | +| `baseUrl` | `string` | Custom base URL — required for `custom` provider, used for non-default Ollama hosts | + +:::caution +Never edit `apiKey` values manually. Use the dashboard or CLI to manage API keys; they are encrypted using the `secret` file. +::: + +--- + +## projects.json + +Array of project configurations including routing policies, budgets, tokens, and members. + +```json +[ + { + "id": "proj_abc123", + "name": "My App", + "slug": "my-app", + "defaultTimeoutMs": 30000, + "policies": ["random"], + "models": ["gpt-5-mini", "claude-haiku-4-5"], + "tokens": [ + { + "id": "tok_xyz", + "token": "HASHED:...", + "description": "Production token", + "createdAt": "2024-01-15T10:00:00Z" + } + ], + "members": [ + { "userId": "usr_abc", "role": "admin" } + ], + "budgets": [ + { + "metric": "cost", + "limit": 10.00, + "windowType": "period", + "windowSize": "monthly", + "onExhausted": "block" + } + ] + } +] +``` + +### Project Fields + +| Field | Type | Description | +|-------|------|-------------| +| `id` | `string` | Internal project ID (`proj_…`) | +| `name` | `string` | Human-readable project name | +| `slug` | `string` | URL-safe identifier, used in scoped proxy path `/projects/{slug}/v1/*` | +| `defaultTimeoutMs` | `number` | Per-project request timeout override | +| `policies` | `string[]` | Routing policies in priority order | +| `models` | `string[]` | Model IDs assigned to the project | + +### Token Fields + +| Field | Type | Description | +|-------|------|-------------| +| `id` | `string` | Token ID (`tok_…`) | +| `token` | `string` | Token value, stored as a bcrypt hash — the plain `sk-rt-…` value is only shown once on creation | +| `description` | `string` | Optional label | +| `createdAt` | `string` | ISO 8601 creation timestamp | + +### Budget Fields + +| Field | Type | Description | +|-------|------|-------------| +| `metric` | `string` | `"cost"`, `"calls"`, `"input_tokens"`, `"output_tokens"`, `"total_tokens"` | +| `limit` | `number` | Maximum allowed value for the metric | +| `windowType` | `string` | `"period"` (fixed calendar window) or `"rolling"` (sliding window) | +| `windowSize` | `string` | For period: `"hourly"`, `"daily"`, `"weekly"`, `"monthly"`, `"yearly"`. For rolling: `"second"`, `"minute"`, `"hour"`, `"day"`, `"week"`, `"month"` | +| `onExhausted` | `string` | `"block"` — return HTTP 503 when budget is reached | + +--- + +## users.json + +Array of user accounts. + +```json +[ + { + "id": "usr_abc123", + "email": "admin@example.com", + "passwordHash": "$2b$10$...", + "role": "admin", + "createdAt": "2024-01-01T00:00:00Z" + } +] +``` + +| Field | Type | Description | +|-------|------|-------------| +| `id` | `string` | User ID (`usr_…`) | +| `email` | `string` | Login email | +| `passwordHash` | `string` | bcrypt hash of the password | +| `role` | `string` | Global role name: `"admin"`, `"member"`, `"viewer"`, or a custom role | +| `createdAt` | `string` | ISO 8601 creation timestamp | + +--- + +## roles.json + +Array of custom role definitions. The three built-in roles (`admin`, `member`, `viewer`) are not stored here and cannot be modified. + +```json +[ + { + "name": "billing-viewer", + "permissions": ["usage:read"] + } +] +``` + +| Field | Type | Description | +|-------|------|-------------| +| `name` | `string` | Unique role name | +| `permissions` | `string[]` | List of permission strings | + +Available permissions: `models:write`, `projects:write`, `users:write`, `roles:write`, `settings:write`, `usage:read`, `proxy:use`. + +--- + +## data/usage.json + +Array of usage records, one per LLM request. Written by the service after each completed call. + +```json +[ + { + "id": "use_abc123", + "timestamp": "2024-01-15T10:30:00Z", + "projectId": "proj_abc", + "projectSlug": "my-app", + "modelId": "gpt-5-mini", + "provider": "openai", + "inputTokens": 150, + "outputTokens": 42, + "cacheTokens": 0, + "totalTokens": 192, + "cost": 0.000048, + "durationMs": 1234, + "status": "success", + "traceId": "trc_xyz" + } +] +``` + +This file grows continuously. Routerly does not currently rotate or archive it automatically — back it up and truncate as needed. + +--- + +## secret + +A single-line file containing the 32-byte AES-256 encryption key used to encrypt API keys in `models.json`. + +``` +a1b2c3d4e5f6... +``` + +**Never share or commit this file.** If lost, all stored API keys must be re-entered. diff --git a/website/versioned_docs/version-0.1.5/reference/environment-variables.md b/website/versioned_docs/version-0.1.5/reference/environment-variables.md new file mode 100644 index 0000000..74208a8 --- /dev/null +++ b/website/versioned_docs/version-0.1.5/reference/environment-variables.md @@ -0,0 +1,48 @@ +--- +title: Environment Variables +sidebar_position: 2 +--- + +# Environment Variables + +Environment variables override corresponding settings from `settings.json` and are useful for container and CI/CD deployments. + +## Runtime Variables + +| Variable | Default | Description | +|----------|---------|-------------| +| `ROUTERLY_HOME` | `~/.routerly` | Root directory for the **service** config and data (set automatically by the installer in the daemon unit). CLI auth tokens are always stored in `~/.routerly/cli/` regardless of this value. | +| `ROUTERLY_PORT` | `3000` (or from `settings.json`) | TCP port the service listens on. Overrides `port` in settings | +| `ROUTERLY_HOST` | `0.0.0.0` (or from `settings.json`) | Bind address. Overrides `host` in settings | +| `ROUTERLY_PUBLIC_URL` | `http://localhost:3000` | Externally reachable URL. Overrides `publicUrl` in settings | +| `ROUTERLY_LOG_LEVEL` | `info` | Log verbosity. Overrides `logLevel` in settings. Values: `error`, `warn`, `info`, `debug` | +| `NODE_ENV` | `development` | Set to `production` for production deployments (affects error verbosity and logging format) | + +## Installer Variables + +These variables are only used during the install/update process (`install.sh`, `install.ps1`, `install.mjs`) and have no effect at runtime. + +| Variable | Values | Description | +|----------|--------|-------------| +| `ROUTERLY_SCOPE` | `user` (default), `system` | Install scope. `user` keeps all service config in `~/.routerly/`; `system` moves service config and data to the platform system directory (`/var/lib/routerly/` on Linux, `/Library/Application Support/Routerly/` on macOS, `C:\ProgramData\Routerly\` on Windows) and requires root/sudo. CLI auth tokens remain per-user in both cases. | +| `ROUTERLY_DAEMON` | `true`, `false` | Register as a background service after installation. Defaults to `true` | +| `ROUTERLY_INSTALL_DIR` | _(path)_ | Override the installation directory | + +## Docker / Container Usage + +In Docker deployments, set `ROUTERLY_HOME` to the path of your mounted volume: + +```yaml +environment: + - ROUTERLY_HOME=/data + - NODE_ENV=production + - ROUTERLY_PORT=3000 +``` + +## Precedence + +Environment variables always take precedence over values in `settings.json`. The lookup order is: + +1. Environment variable (highest priority) +2. `settings.json` value +3. Built-in default (lowest priority) diff --git a/website/versioned_docs/version-0.1.5/reference/troubleshooting.md b/website/versioned_docs/version-0.1.5/reference/troubleshooting.md new file mode 100644 index 0000000..66bded0 --- /dev/null +++ b/website/versioned_docs/version-0.1.5/reference/troubleshooting.md @@ -0,0 +1,175 @@ +--- +title: Troubleshooting +sidebar_position: 3 +--- + +# Troubleshooting + +## Service Won't Start + +### Port already in use + +**Symptom:** `Error: listen EADDRINUSE :::3000` + +**Fix:** + +```bash +# Find what is using port 3000 +lsof -i :3000 + +# Change Routerly's port in settings +routerly service configure --port 3001 +# or edit ~/.routerly/config/settings.json directly +``` + +### Config file has invalid JSON + +**Symptom:** Service exits immediately with a JSON parse error in the logs. + +**Fix:** + +```bash +# Validate all config files +for f in ~/.routerly/config/*.json; do + node -e "JSON.parse(require('fs').readFileSync('$f','utf8'))" && echo "OK: $f" || echo "INVALID: $f" +done +``` + +Fix the reported file by hand or restore from a backup. + +--- + +## Authentication Errors + +### 401 on the LLM proxy — "Invalid or missing Authorization header" + +**Likely causes:** +- Missing `Authorization: Bearer sk-rt-…` header +- The project token was rotated or deleted +- The token was typed incorrectly + +**Fix:** Verify the token in the dashboard under **Projects → Tokens**. Generate a new token if needed; old tokens cannot be recovered. + +### 401 on the management API — "JWT expired" + +**Symptom:** Dashboard shows a login prompt; CLI returns `Unauthorized`. + +**Fix:** + +```bash +# Re-authenticate +routerly auth login +``` + +Dashboard sessions expire after 24 hours. The CLI persists credentials and prompts for re-login automatically. + +--- + +## Model and Provider Errors + +### Model unreachable / 502 from provider + +**Symptom:** Requests return HTTP 502 with a message like `"provider error: …"`. + +**Fixes:** +1. Verify the API key is correct in the dashboard under **Models → Edit**. +2. Check that the model ID matches the provider's exact ID (e.g. `gpt-5-mini`, not `gpt5mini`). +3. Test connectivity to the provider from the server: + ```bash + curl https://api.openai.com/v1/models \ + -H "Authorization: Bearer YOUR_API_KEY" + ``` + +### Ollama unreachable + +**Symptom:** Requests to Ollama models fail with a connection error. + +**Fixes:** +1. Ensure the Ollama process is running: `ollama serve` +2. Confirm the `baseUrl` for your Ollama model points to the correct host and port (default: `http://localhost:11434`) +3. If Routerly runs in Docker and Ollama runs on the host, use `http://host.docker.internal:11434` as `baseUrl` + +--- + +## Budget and Limit Errors + +### 503 — "Budget exceeded" + +**Symptom:** Requests return HTTP 503 with `"Budget limit reached"`. + +**Fixes:** +1. Open the dashboard → **Projects → [Project] → Tokens** to see which budget was hit and when it resets. +2. Increase the budget limit or wait for the current window to reset. +3. If a **Global** budget is the issue, an admin must adjust it in the dashboard → **Overview** or **Settings**. + +--- + +## Routing Issues + +### All requests use the same model (ignoring policies) + +**Symptom:** The routing policy is set to `random` or `round-robin`, but the same model is always selected. + +**Fix:** Check that more than one model is assigned to the project. A project with only one model always routes to that model regardless of policy. + +### Preferred model is never selected + +**Symptom:** The `preferred` or `priority` policy is configured, but a different model is chosen. + +**Fix:** Verify that the preferred model is **enabled** (not disabled) and assigned to the project. + +--- + +## Dashboard Issues + +### Dashboard shows blank page after login + +**Symptom:** URL changes to `/overview` but the page is empty. + +**Fixes:** +- Hard-refresh the browser (`Cmd+Shift+R` / `Ctrl+Shift+R`) to clear the cached JS bundle. +- Check the browser console for JS errors. + +### Setup page appears even after completing setup + +**Symptom:** Visiting the dashboard always redirects to the setup wizard. + +**Fix:** Check that at least one user with the `admin` role exists in `~/.routerly/config/users.json`. If the file is empty or was accidentally deleted, the service treats itself as unconfigured. + +--- + +## Getting More Information + +### Enable debug logging + +```bash +# Temporarily (environment variable) +ROUTERLY_LOG_LEVEL=debug routerly-service + +# Persistently +routerly service configure --log-level debug +``` + +### View service logs + +```bash +# systemd user service +journalctl --user -u routerly -f + +# systemd system service +sudo journalctl -u routerly -f + +# Docker +docker compose logs -f routerly + +# launchd (macOS) +tail -f ~/.routerly/logs/output.log +``` + +### Check service status + +```bash +routerly status +``` + +This prints the service URL, version, uptime, and a summary of loaded configuration. diff --git a/website/versioned_docs/version-0.1.5/service/endpoints.md b/website/versioned_docs/version-0.1.5/service/endpoints.md new file mode 100644 index 0000000..4fee2d5 --- /dev/null +++ b/website/versioned_docs/version-0.1.5/service/endpoints.md @@ -0,0 +1,190 @@ +--- +title: HTTP Endpoints +sidebar_position: 2 +--- + +# HTTP Endpoints + +The service exposes three groups of HTTP endpoints on the same port (default: `3000`): + +| Group | Path prefix | Auth | Purpose | +|-------|-------------|------|---------| +| [LLM Proxy](#llm-proxy) | `/v1/*` | Bearer project token (`sk-rt-…`) | Forward requests to LLM providers | +| [Management API](#management-api) | `/api/*` | Bearer JWT (dashboard session) | Configure models, projects, users | +| [Dashboard](#dashboard) | `/dashboard/*` | Browser session (cookie) | Serve the React web UI | +| [Health](#health-check) | `/health` | None | Liveness probe | + +For the full request/response schemas of each route, see [API — LLM Proxy](../api/llm-proxy) and [API — Management](../api/management). + +--- + +## LLM Proxy + +These routes accept the same request bodies as the original provider APIs. Authentication is via a **project token** (`Authorization: Bearer sk-rt-…`). + +Every request goes through the full routing and budget stack before being forwarded to a provider. + +### `POST /v1/chat/completions` + +OpenAI Chat Completions format. Supports both streaming (`"stream": true`) and non-streaming responses. + +```http +POST /v1/chat/completions +Authorization: Bearer sk-rt-YOUR_PROJECT_TOKEN +Content-Type: application/json + +{ + "model": "gpt-5-mini", + "messages": [{ "role": "user", "content": "Hello!" }], + "stream": false +} +``` + +The `model` field is the model ID registered in your project. Routerly ignores it as an upstream model directive — the routing engine picks the actual provider model based on your policies. + +### `POST /v1/responses` + +OpenAI Responses API format (newer API surface). Uses `input` instead of `messages` and always streams. Routerly normalises it to the `chat/completions` shape internally before routing. + +```http +POST /v1/responses +Authorization: Bearer sk-rt-YOUR_PROJECT_TOKEN +Content-Type: application/json + +{ + "model": "gpt-5-mini", + "input": [{ "role": "user", "content": "Hello!" }] +} +``` + +### `POST /v1/messages` + +Anthropic Messages API format. The request body matches the Anthropic SDK wire format exactly. + +```http +POST /v1/messages +Authorization: Bearer sk-rt-YOUR_PROJECT_TOKEN +Content-Type: application/json + +{ + "model": "claude-haiku-4-5", + "max_tokens": 1024, + "messages": [{ "role": "user", "content": "Hello!" }] +} +``` + +Routerly proxies this to the Anthropic provider adapter. If the selected model is an OpenAI model, the adapter translates the request format automatically. + +### `GET /v1/models` + +Returns the list of models available in the project associated with the token, in the OpenAI `GET /v1/models` response format. + +```http +GET /v1/models +Authorization: Bearer sk-rt-YOUR_PROJECT_TOKEN +``` + +### Error format + +All LLM Proxy errors follow the OpenAI error envelope: + +```json +{ + "error": { + "message": "Budget exceeded for model gpt-5-mini", + "type": "budget_exceeded", + "code": "budget_exceeded" + } +} +``` + +Common status codes: + +| Code | Cause | +|------|-------| +| `401` | Missing or invalid project token | +| `503` | No model passed all routing filters (all excluded or over budget) | +| `503` | Budget exhausted for the project or token | +| `504` | Provider timeout | + +--- + +## Management API + +The Management API is used by the dashboard and the CLI. Authentication is via a **JWT** obtained from `POST /api/auth/login`. + +Full endpoint catalogue: [API — Management](../api/management). + +### Key routes + +| Method | Path | Description | +|--------|------|-------------| +| `POST` | `/api/auth/login` | Obtain a JWT | +| `GET` | `/api/models` | List registered models | +| `POST` | `/api/models` | Register a new model | +| `PUT` | `/api/models/:id` | Update a model | +| `DELETE` | `/api/models/:id` | Remove a model | +| `GET` | `/api/projects` | List projects | +| `POST` | `/api/projects` | Create a project | +| `GET` | `/api/usage` | Query usage records | +| `GET` | `/api/settings` | Read service settings | +| `PUT` | `/api/settings` | Update service settings | +| `GET` | `/api/users` | List users (admin only) | + +--- + +## Dashboard + +When `dashboardEnabled: true` (default), the service bundles and serves the React web UI as static files. + +| Path | Behaviour | +|------|-----------| +| `GET /dashboard/` | Serves `index.html` (React app entry point) | +| `GET /dashboard/*` | Static assets (JS, CSS, icons) — falls back to `index.html` for client-side routes | +| `GET /dashboard` | Redirects to `/dashboard/` | +| `GET /` | Redirects to `/dashboard/` | + +To disable the dashboard (e.g. in a headless production deployment): + +```json +// settings.json +{ "dashboardEnabled": false } +``` + +--- + +## Health Check + +```http +GET /health +``` + +No authentication required. Returns HTTP 200 with a JSON body: + +```json +{ + "status": "ok", + "version": "0.1.5", + "timestamp": "2026-03-27T12:00:00.000Z" +} +``` + +Suitable for Docker `HEALTHCHECK`, Kubernetes liveness probes, and load balancer checks. + +--- + +## Trace Header + +Every LLM Proxy response includes an `x-routerly-trace-id` header containing a UUID that identifies the routing trace for that request. You can use this ID to look up the routing decision in the dashboard's Playground trace viewer. + +```http +x-routerly-trace-id: 3fa85f64-5717-4562-b3fc-2c963f66afa6 +``` + +--- + +## Related + +- [API — LLM Proxy](../api/llm-proxy) — full request/response schemas +- [API — Management](../api/management) — full management endpoint catalogue +- [Service — Routing Engine](./routing-engine) — how the model is selected for each request diff --git a/website/versioned_docs/version-0.1.5/service/overview.md b/website/versioned_docs/version-0.1.5/service/overview.md new file mode 100644 index 0000000..4aca410 --- /dev/null +++ b/website/versioned_docs/version-0.1.5/service/overview.md @@ -0,0 +1,124 @@ +--- +title: Overview +sidebar_position: 1 +--- + +# Service + +`packages/service` is the core of Routerly. It is a [Fastify](https://fastify.dev/) HTTP server that handles authentication, intelligent model routing, provider dispatch, cost accounting, and budget enforcement. + +The CLI and the dashboard both communicate with the running service over HTTP — there is no other inter-process communication. + +--- + +## Startup Sequence + +When the service starts it performs the following steps in order: + +1. **`initConfigDirs`** — Creates `~/.routerly/config/` and `~/.routerly/data/` if they do not exist. +2. **`loadSecret`** — Reads the AES-256 encryption key from `~/.routerly/config/secret`, generating and persisting it automatically if missing. API keys are encrypted with this secret at rest. +3. **`readConfig('settings')`** — Loads `settings.json`. Missing file → defaults are written and used. +4. **`buildServer`** — Registers Fastify plugins, routes, and middleware in this order: + - CORS (`@fastify/cors`) + - Dashboard static files (`@fastify/static` at `/dashboard`, only if `dashboardEnabled: true`) + - Management API routes (`/api/*`) + - Auth guard plugin (validates `Bearer sk-rt-*` tokens for `/v1/*` routes) + - LLM Proxy routes (`/v1/*`) + - Root redirect (`/` → `/dashboard/`) + - Health check (`/health`) +5. **`server.listen`** — Binds to `host:port` from settings (defaults: `0.0.0.0:3000`). + +--- + +## Running Manually + +The service is normally managed as a background daemon by the installer. You can also start it directly: + +```bash +# From the monorepo root (development) +npm run dev --workspace packages/service + +# Standalone binary (after installation) +~/.routerly/app/routerly-service + +# Specify a custom config/data directory +ROUTERLY_HOME=/opt/routerly ~/.routerly/app/routerly-service +``` + +Environment variables override `settings.json` — see [Environment Variables](../reference/environment-variables) for the full list. + +--- + +## Health Check + +The service exposes a lightweight health endpoint at `GET /health`: + +```bash +curl http://localhost:3000/health +``` + +```json +{ + "status": "ok", + "version": "0.1.5", + "timestamp": "2026-03-27T12:00:00.000Z" +} +``` + +Returns HTTP 200 when the process is up and accepting connections. This endpoint does **not** require authentication and is suitable for load balancer and container health probes. + +--- + +## Configuration Storage + +All state is stored as JSON files on disk — there is no external database. The base directory defaults to `~/.routerly/` and can be overridden with `$ROUTERLY_HOME`. + +``` +~/.routerly/ +├── config/ +│ ├── settings.json # Port, log level, dashboard toggle, timeout, public URL +│ ├── models.json # Registered LLM models (API keys AES-256 encrypted) +│ ├── projects.json # Projects, routing policies, tokens, members, budgets +│ ├── users.json # Dashboard users (passwords bcrypt-hashed) +│ ├── roles.json # Custom RBAC role definitions +│ └── secret # AES-256 encryption key (auto-generated, never commit) +└── data/ + └── usage.json # Append-only call records (tokens, cost, latency, outcome) +``` + +All writes to config files use a file lock (`proper-lockfile`) to prevent concurrent corruption. Missing files are auto-created with their defaults on first read. + +--- + +## Process Signals + +| Signal | Behaviour | +|--------|-----------| +| `SIGTERM` | Graceful shutdown — Fastify drains in-flight requests before exiting | +| `SIGINT` | Same as `SIGTERM` (Ctrl-C in a terminal) | +| `SIGHUP` | Not specially handled — restart the process to reload config | + +Config changes made via the CLI or dashboard API take effect immediately (the service re-reads files on each request) without a restart, except for `port` and `host` which require a restart. + +--- + +## Logging + +The service uses [pino](https://getpino.io/) via Fastify's built-in logger. + +| `NODE_ENV` | Format | Default level | +|------------|--------|---------------| +| development | pretty-printed with colours (`pino-pretty`) | `info` | +| production | JSON (one object per line) | `info` | + +Change the log level in `settings.json`, via the dashboard (**Settings → General → Log Level**), or with the `ROUTERLY_LOG_LEVEL` environment variable. + +--- + +## Related + +- [Service — HTTP Endpoints](./endpoints) — all routes the service exposes +- [Service — Routing Engine](./routing-engine) — how model selection works +- [Service — Provider Adapters](./providers) — how requests are forwarded to providers +- [Reference — Configuration Files](../reference/config-files) +- [Reference — Environment Variables](../reference/environment-variables) diff --git a/website/versioned_docs/version-0.1.5/service/providers.md b/website/versioned_docs/version-0.1.5/service/providers.md new file mode 100644 index 0000000..fb49c2a --- /dev/null +++ b/website/versioned_docs/version-0.1.5/service/providers.md @@ -0,0 +1,167 @@ +--- +title: Provider Adapters +sidebar_position: 4 +--- + +# Provider Adapters + +Each LLM provider has a different HTTP API, authentication scheme, and wire format. Routerly bridges those differences through **provider adapters** — thin classes that translate a normalised internal request into the provider's specific format and translate the response back. + +Adapters are selected automatically based on the `provider` field in a model's configuration. + +--- + +## Adapter Overview + +| Provider ID | Class | Protocol | Notes | +|-------------|-------|----------|-------| +| `openai` | `OpenAIAdapter` | OpenAI Chat Completions | Native SDK; also handles `/v1/responses` | +| `anthropic` | `AnthropicAdapter` | Anthropic Messages API | Full message conversion, prompt caching | +| `gemini` | `GeminiAdapter` | OpenAI-compatible endpoint | Uses OpenAI SDK pointed at Google's OpenAI-compatible base URL | +| `ollama` | `OllamaAdapter` | OpenAI-compatible endpoint | Uses OpenAI SDK pointed at local Ollama host | +| `custom` | `CustomAdapter` | OpenAI-compatible endpoint | Any endpoint that speaks `/v1/chat/completions` | + +--- + +## OpenAI Adapter + +Uses the official `openai` Node.js SDK. + +**Model ID resolution** — If the registered model ID contains a slash (e.g. `openai/gpt-4o`), the adapter strips the prefix and sends only the part after the slash (`gpt-4o`) to the provider. This lets you namespace model IDs within Routerly without confusing OpenAI. + +**Endpoint override** — If `endpoint` is set in the model config, the adapter uses it instead of `https://api.openai.com/v1`. This lets you point to Azure OpenAI, local OpenAI proxies, or compatible services. + +**Streaming** — Uses the SDK's native async iterator. Chunks are forwarded to the client as Server-Sent Events (SSE) as they arrive. + +```json +// Example model config +{ + "id": "gpt-5-mini", + "provider": "openai", + "apiKey": "", + "endpoint": null +} +``` + +--- + +## Anthropic Adapter + +Uses the official `@anthropic-ai/sdk` Node.js SDK. + +**Message format conversion** — The Anthropic Messages API differs from OpenAI's Chat Completions format in several ways. The adapter handles all conversions automatically: + +| OpenAI format | Anthropic format | +|---------------|-----------------| +| `messages[].role = "tool"` | Converted to `role: "user"` with a `tool_result` content block | +| `messages[].role = "assistant"` with `tool_calls` | Content blocks of type `tool_use` | +| Consecutive tool result messages | Merged into a single `user` message (Anthropic requirement) | +| `content` as an array of content parts | Mapped block-by-block, preserving `cache_control` fields | +| `system` message in `messages[]` | Extracted and passed as Anthropic's top-level `system` field | +| `image_url` content parts (data URI) | Converted to Anthropic base64 image blocks | +| `image_url` content parts (URL) | Converted to Anthropic URL image source | + +**Prompt caching** — The adapter preserves any `cache_control` fields present in message content parts, enabling Anthropic's prompt caching feature to work end-to-end. + +**Streaming** — Uses the SDK's streaming API. SSE chunks are translated back to OpenAI-compatible format for `/v1/chat/completions` requests, or forwarded as-is for `/v1/messages` requests. + +```json +// Example model config +{ + "id": "claude-haiku-4-5", + "provider": "anthropic", + "apiKey": "", + "endpoint": null +} +``` + +--- + +## Gemini Adapter + +Uses the `openai` SDK pointed at Google's OpenAI-compatible base URL (`https://generativelanguage.googleapis.com/v1beta/openai`). No format conversion is needed — Gemini's compatibility layer handles it. + +```json +// Example model config +{ + "id": "gemini-2.5-flash", + "provider": "gemini", + "apiKey": "" +} +``` + +--- + +## Ollama Adapter + +Uses the `openai` SDK pointed at the Ollama host. No API key is required (Ollama has no auth by default). + +Set the `baseUrl` field in the model config to your Ollama host: + +```json +// Example model config +{ + "id": "llama3", + "provider": "ollama", + "endpoint": "http://localhost:11434/v1", + "apiKey": null +} +``` + +If you run Ollama on a different machine, change the `endpoint` to match. Routerly treats Ollama models as zero-cost (`inputPerMillion: 0, outputPerMillion: 0`) by default unless you configure explicit pricing. + +--- + +## Custom Adapter + +For any provider that exposes an OpenAI-compatible `/v1/chat/completions` endpoint. Uses the `openai` SDK with a custom `baseURL`. + +**Required field:** `endpoint` must be set to the provider's base URL. + +```json +// Example model config +{ + "id": "my-local-llm", + "provider": "custom", + "endpoint": "http://192.168.1.50:8080/v1", + "apiKey": "optional-key-if-required" +} +``` + +This adapter works with LM Studio, llama.cpp server, vLLM, LocalAI, and any other service that implements the OpenAI `/v1/chat/completions` interface. + +--- + +## Mistral, Cohere, xAI + +These providers use the OpenAI-compatible protocol. Register them using the `custom` adapter with the appropriate `endpoint` and `apiKey`: + +| Provider | Endpoint | +|----------|----------| +| Mistral | `https://api.mistral.ai/v1` | +| Cohere | `https://api.cohere.com/compatibility/v1` | +| xAI (Grok) | `https://api.x.ai/v1` | + +```json +// Example: Mistral +{ + "id": "mistral-large", + "provider": "custom", + "endpoint": "https://api.mistral.ai/v1", + "apiKey": "" +} +``` + +--- + +## Timeout Handling + +Each adapter respects the `timeout` field in the model config (in milliseconds). If not set, it falls back to the service-wide `defaultTimeoutMs` setting (`30000` ms). Timed-out requests are recorded as `outcome: "timeout"` in usage records and the `health` policy will penalise the model accordingly. + +--- + +## Related + +- [Concepts — Providers](../concepts/providers) — provider catalogue with model lists and pricing +- [Service — Routing Engine](./routing-engine) — how adapters are invoked after model selection +- [Dashboard — Models](../dashboard/models) — how to register a model with a provider diff --git a/website/versioned_docs/version-0.1.5/service/routing-engine.md b/website/versioned_docs/version-0.1.5/service/routing-engine.md new file mode 100644 index 0000000..1f9fa94 --- /dev/null +++ b/website/versioned_docs/version-0.1.5/service/routing-engine.md @@ -0,0 +1,214 @@ +--- +title: Routing Engine +sidebar_position: 3 +--- + +# Routing Engine + +The routing engine is the component responsible for selecting which model receives each request. It runs a configurable stack of **policies** that each score or filter the candidate model set. The highest-scoring model that passes all filters wins. + +--- + +## Request Selection Lifecycle + +For each incoming request the engine performs the following steps: + +1. **Load candidates** — Read the project's model list. Each candidate carries its `ModelConfig` (id, provider, cost, context window, capabilities, limits) plus any per-model routing guidance (`prompt`) configured on the project. + +2. **Pre-filter: budget limits** — Any model that has already exceeded one of its configured spending or token limits is excluded before the policies run. This is a hard pre-check so exhausted models never consume policy computation. + +3. **Run policies in priority order** — Enabled policies execute in the order they appear in the project's policy list. Each policy receives the full candidate set and returns: + - A **score** for each model (`0.0` – `1.0`) + - Optionally an **excludes** set (hard filters: excluded models are dropped from the candidate set entirely) + +4. **Positional scoring** — A base weight is derived from the model's position in the project model list: + ``` + weight = totalPolicies - policyIndex + ``` + With 3 enabled policies: position 0 → weight 3, position 1 → weight 2, position 2 → weight 1. This creates a natural preference ordering even when policies produce equal scores. + +5. **Aggregate scores** — For each model: `totalScore = sum(policy.score × policy.weight)`. + +6. **Select winner** — The model with the highest `totalScore` among non-excluded candidates is chosen. On a tie the first in the project list wins. + +7. **Fallback** — If the winning model returns a provider error or timeout, the engine retries with the next-highest scoring candidate. This continues until a model succeeds or the candidate set is exhausted (→ `503`). + +--- + +## Available Policies + +### `context` + +**Type:** soft filter + scoring + +Estimates the token count of the request and checks it against each model's `contextWindow`. Models that cannot fit the request are excluded (score `0.0`). Models in the "danger zone" (>80% of their window consumed) receive a linear penalty down to a minimum of `0.1`. + +No configuration options. + +**Use when:** your project mixes models with different context window sizes. + +--- + +### `cheapest` + +**Type:** scoring + +Scores models by cost efficiency. The cheapest model gets `1.0`; others receive a proportional score (`minCost / theirCost`). Free models (e.g. Ollama) always get `1.0` and paid models are capped at `0.5` when free models coexist — ensuring a meaningful, visible gap between free and paid options. + +No configuration options. + +**Use when:** cost control is the primary goal. + +--- + +### `health` + +**Type:** scoring with circuit breaker + +Evaluates the weighted error rate for each model in a recent time window using exponential decay (recent errors weigh more than old ones). A Bayesian prior (`pseudoCounts`) prevents over-penalising models with little data. + +When the weighted error rate exceeds `circuitBreaker`, the model's score drops to `0.0` (effectively excluded). + +| Config key | Default | Description | +|------------|---------|-------------| +| `windowMinutes` | `20` | Look-back window for usage records | +| `halfLifeMinutes` | `5` | Exponential decay half-life — smaller values weight recent errors more | +| `pseudoCounts` | `2` | Bayesian smoothing counts (prior successes) | +| `circuitBreaker` | `0.9` | Weighted error rate threshold that trips the circuit breaker | + +Models with no recent records get score `1.0` (optimistic exploration). + +**Use when:** you want automatic failover when a provider degrades. + +--- + +### `performance` + +**Type:** scoring + +Scores models by their recent weighted average latency. The fastest model gets `1.0`; others get `minLatency / theirLatency`. Uses the same exponential decay window as `health`. + +Only successful calls (`outcome !== 'error' && outcome !== 'timeout'`) contribute to the average. Models without enough samples (`minSamples`) get `1.0`. + +| Config key | Default | Description | +|------------|---------|-------------| +| `windowMinutes` | `20` | Look-back window | +| `halfLifeMinutes` | `5` | Decay half-life (set to `0` for unweighted average) | +| `minSamples` | `1` | Minimum sample count to use the model's data | + +**Use when:** response time matters more than cost. + +--- + +### `llm` + +**Type:** scoring (uses an LLM to score) + +Sends the candidate list and the request to a small "routing LLM" and asks it to score each model's fit for the task. Scores are returned as JSON (`0.0`–`1.0`). The system prompt instructs the routing LLM to match task complexity to model capability (simple tasks → smaller models; complex tasks → stronger models). + +Per-model `prompt` guidance (set on the project model entry) is included in the system prompt to give the routing LLM operator-defined hints (e.g. "prefer this model for code tasks"). + +| Config key | Default | Description | +|------------|---------|-------------| +| `modelId` | _(required)_ | ID of the model to use as the routing LLM | +| `additionalPrompt` | — | Extra instructions injected into the routing system prompt | + +**Use when:** you want semantic, task-aware routing without hand-crafting rules. + +:::caution Cost +The `llm` policy itself makes an LLM call, which incurs cost and adds latency to every proxied request. Use a small, fast model as the routing LLM. +::: + +--- + +### `capability` + +**Type:** hard filter + +Inspects the request body and excludes models that explicitly declare they do not support a required capability. Capability mismatches result in score `0.0`. + +Detected capabilities: + +| Capability | Trigger | +|------------|---------| +| `vision` | Request contains a message with an `image_url` content part | +| `functionCalling` | Request contains `tools` or `functions` | +| `json` | `response_format.type === 'json_object'` | + +Models that do not declare a capability (i.e. the field is absent) are assumed compatible — only an explicit `false` triggers exclusion. + +No configuration options. + +**Use when:** your project includes a mix of models with different capability sets. + +--- + +### `rate-limit` + +**Type:** scoring with optional hard threshold + +Counts recent calls per model and penalises heavily-used ones to reduce the risk of hitting provider-side rate limits (HTTP 429). Uses proportional scoring: the least-used model gets `1.0`. + +Only models that have a `calls` limit configured are scored by this policy; models without a `calls` limit always get `1.0`. + +| Config key | Default | Description | +|------------|---------|-------------| +| `windowMinutes` | `1` | Look-back window | +| `maxCallsPerWindow` | — | Hard threshold — models over this are excluded | + +**Use when:** you have multiple projects sharing a provider API key with a strict RPM limit. + +--- + +### `fairness` + +**Type:** scoring + +Distributes traffic evenly across candidates by penalising models that have received a disproportionate share of recent successful calls. Score = `1 - (myShare / totalCalls)`. A model that monopolises all traffic scores `0.0`; a perfectly balanced distribution across N models gives each model `1 - 1/N`. + +| Config key | Default | Description | +|------------|---------|-------------| +| `windowMinutes` | `60` | Look-back window for call counts | + +**Use when:** you want round-robin-like load distribution across equivalent models. + +--- + +### `budget-remaining` + +**Type:** scoring + +Scores models by how much budget headroom they have left across all configured limits (global thresholds, project budgets, token budgets). The score is the minimum headroom ratio across all active limits: `(limit - used) / limit`. A model with 80% budget remaining scores `0.8`; a fully exhausted model scores `0.0`. + +No configuration options (reads limits from the project and model config). + +**Use when:** you want to spread spending across multiple models before any single one runs dry. + +--- + +## Policy Ordering and Weights + +Policies are applied in the order configured in the project. Their positional weight (`total − index`) means policies near the top of the list have more influence on the final score. Reorder policies via the dashboard (**Projects → your project → Routing**) or the CLI. + +**Example** — 3 policies enabled (health, cheapest, performance): + +| Policy | Position | Weight | Score for model A | Weighted score | +|--------|----------|--------|-------------------|----------------| +| health | 0 | 3 | 0.9 | 2.70 | +| cheapest | 1 | 2 | 0.6 | 1.20 | +| performance | 2 | 1 | 0.8 | 0.80 | +| **Total** | | | | **4.70** | + +--- + +## Routing Trace + +Every request produces a routing trace that records each policy's scores and decisions. The trace is accessible in the dashboard's Playground view and is identified by the `x-routerly-trace-id` response header. + +--- + +## Related + +- [Concepts — Routing](../concepts/routing) — conceptual overview for end users +- [Service — Provider Adapters](./providers) — what happens after a model is selected +- [Dashboard — Playground](../dashboard/playground) — interactive trace viewer diff --git a/website/versioned_sidebars/version-0.1.5-sidebars.json b/website/versioned_sidebars/version-0.1.5-sidebars.json new file mode 100644 index 0000000..c1bbe37 --- /dev/null +++ b/website/versioned_sidebars/version-0.1.5-sidebars.json @@ -0,0 +1,152 @@ +{ + "docsSidebar": [ + "intro", + { + "type": "category", + "label": "Getting Started", + "collapsed": false, + "items": [ + "getting-started/installation", + "getting-started/quick-start", + "getting-started/configuration" + ] + }, + { + "type": "category", + "label": "Concepts", + "items": [ + "concepts/architecture", + "concepts/providers", + "concepts/models", + "concepts/projects", + "concepts/routing", + "concepts/budgets-and-limits", + "concepts/notifications" + ] + }, + { + "type": "category", + "label": "Service", + "items": [ + "service/overview", + "service/endpoints", + "service/routing-engine", + "service/providers" + ] + }, + { + "type": "category", + "label": "Dashboard", + "items": [ + "dashboard/setup", + "dashboard/overview", + "dashboard/models", + "dashboard/projects", + "dashboard/usage", + "dashboard/users-and-roles", + "dashboard/settings", + "dashboard/playground", + "dashboard/profile" + ] + }, + { + "type": "category", + "label": "CLI", + "items": [ + "cli/overview", + "cli/commands" + ] + }, + { + "type": "category", + "label": "API Reference", + "items": [ + "api/overview", + "api/llm-proxy", + "api/management" + ] + }, + { + "type": "category", + "label": "Guides", + "items": [ + "guides/self-hosting" + ] + }, + { + "type": "category", + "label": "Integrations", + "items": [ + "integrations/overview", + { + "type": "category", + "label": "Chat UI", + "items": [ + "integrations/open-webui", + "integrations/openclaw", + "integrations/librechat" + ] + }, + { + "type": "category", + "label": "IDE & Editor", + "items": [ + "integrations/cursor", + "integrations/continue", + "integrations/cline", + "integrations/vscode" + ] + }, + { + "type": "category", + "label": "Frameworks", + "items": [ + "integrations/langchain", + "integrations/llamaindex", + "integrations/haystack" + ] + }, + { + "type": "category", + "label": "Automation", + "items": [ + "integrations/n8n", + "integrations/make" + ] + }, + { + "type": "category", + "label": "Notebooks", + "items": [ + "integrations/jupyter", + "integrations/marimo" + ] + } + ] + }, + { + "type": "category", + "label": "Examples", + "items": [ + "examples/overview", + "examples/javascript", + "examples/python", + "examples/java", + "examples/go", + "examples/dotnet", + "examples/php", + "examples/ruby", + "examples/rust" + ] + }, + { + "type": "category", + "label": "Reference", + "items": [ + "reference/config-files", + "reference/environment-variables", + "reference/troubleshooting" + ] + } + ] +} diff --git a/website/versions.json b/website/versions.json new file mode 100644 index 0000000..dbc8d60 --- /dev/null +++ b/website/versions.json @@ -0,0 +1,3 @@ +[ + "0.1.5" +]