LLM Backends
LLM backends are registered inference endpoints. Alexandria is backend-agnostic; any OpenAI-compatible endpoint can be registered. Admin authentication required for all write operations.
Backend object
{
"id": "01HXZ...",
"name": "gpt4o",
"url": "https://api.openai.com/v1",
"kind": "open-ai",
"model": "gpt-4o",
"has_api_key": true,
"is_default": true,
"enabled": true,
"backend_type": "external",
"status": "active",
"created_at": "2026-01-10T09:00:00Z",
"updated_at": "2026-01-10T09:00:00Z",
"source_type": "external",
"capabilities": ["chat"],
"protocol": "chat",
"cache_strategy": "native",
"cache_ttl_seconds": 0
}
Kinds: open-ai, claude, llama-cpp, custom
Capabilities: chat, embed, voice, vision, rerank
chatandembedhave live runtime pathsvoice,vision,rerankare reserved for future runtimes (registering is allowed for pre-provisioning)
Cache strategies:
native— delegate to the upstream API's native caching (default forclaude,open-ai)alexandria— use Alexandria's distributed Memcached cache (requiresmemcached_cacheentitlement)none— no caching (default forcustom)
GET /admin/llm
List all registered backends (enabled and disabled).
POST /admin/llm
Register a backend. After creation, the config is synced to the orchestrator.
External backend
{
"name": "gpt4o",
"url": "https://api.openai.com/v1",
"kind": "open-ai",
"model": "gpt-4o",
"api_key": "sk-...",
"is_default": true,
"capabilities": ["chat"],
"cache_strategy": "native"
}
Managed backend (OCI image)
Only available in the k8s_enabled build with model_controller.enabled = true.
{
"name": "llama-local",
"kind": "llama-cpp",
"source_type": "oci",
"image": "registry.example.com/llama:3.2",
"image_digest": "sha256:abc...",
"engine": "llama-cpp",
"container_port": 8080,
"replicas": 1,
"capabilities": ["chat"],
"resources": { "cpu": "2", "memory": "8Gi" }
}
Managed backend (HuggingFace)
{
"name": "mistral",
"kind": "open-ai",
"source_type": "hf",
"hf_repo": "mistralai/Mistral-7B-Instruct-v0.2",
"hf_revision": "abc1234",
"engine": "vllm",
"replicas": 2
}
Errors
409— backend name already exists403—cache_strategy="alexandria"requiresmemcached_cacheentitlement
GET /admin/llm/{name}
PATCH /admin/llm/{name}
Partial update. URL changes are SSRF-validated.
{
"enabled": true,
"model": "gpt-4o-mini",
"cache_strategy": "native"
}
DELETE /admin/llm/{name}
Deletes the backend and removes the associated API key from the secret store. Returns 204.
POST /admin/llm/{name}/default
Mark a backend as the default for routing (when no explicit backend is specified in a query). Syncs config to orchestrator.
POST /admin/llm/{name}/ping
Check backend reachability with SSRF protection. The raw error is not echoed to the client — internal IPs and DNS results are never exposed.
Response 200
{
"name": "gpt4o",
"url": "https://api.openai.com/v1",
"healthy": true,
"status_code": 200
}
On failure:
{
"name": "gpt4o",
"url": "https://api.openai.com/v1",
"healthy": false,
"error": "backend unreachable"
}
POST /admin/llm/sync
Explicitly sync all backend configs to the orchestrator's config file and trigger a reload.
Response 200
{
"synced": true,
"backends_loaded": 3
}
Managed backend deployment (k8s)
Build-tag note: the
/deploy,/undeploy, and/logsendpoints are only available in thek8s_enabledbuild. Non-k8s builds return503for these routes.
Both require the backend_autoscaling license entitlement. Returns 402 if not licensed.
POST /admin/llm/{name}/deploy
Enables the managed backend — alex-model-controller picks it up on the next reconcile and creates a k8s Deployment + Service.
POST /admin/llm/{name}/undeploy
Disables the backend — controller tears down the Deployment/Service.
GET /admin/llm/{name}/logs
Streams Pod logs for debugging stuck deploys.
Query params
tail— last N lines (default 200, max 5000)container— container name (defaultengine)
Picks the most recently created Pod matching the backend label.
Errors
400— external backend (no Pod to log)503— k8s client unavailable
curl examples
# Register OpenAI backend
curl -s -X POST http://localhost:8080/admin/llm \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H 'Content-Type: application/json' \
-d '{
"name":"gpt4o",
"url":"https://api.openai.com/v1",
"kind":"open-ai",
"model":"gpt-4o",
"api_key":"sk-...",
"is_default":true
}' | jq .
# Register local llama.cpp backend
curl -s -X POST http://localhost:8080/admin/llm \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H 'Content-Type: application/json' \
-d '{
"name":"llama-local",
"url":"http://localhost:8080",
"kind":"llama-cpp",
"capabilities":["chat","embed"]
}' | jq .
# Ping
curl -s -X POST http://localhost:8080/admin/llm/gpt4o/ping \
-H "Authorization: Bearer $ADMIN_TOKEN" | jq .
# Enable / set default
curl -s -X POST http://localhost:8080/admin/llm/gpt4o/default \
-H "Authorization: Bearer $ADMIN_TOKEN" | jq .