OpenAI-compatible API
Alexandria exposes a subset of the OpenAI API shape so that tools built against the OpenAI SDK can be routed through Alexandria. The surface is agent-centric: model names an Alexandria agent, not a raw backend.
All routes require a human access token.
GET /v1/models
Lists agents the caller can invoke, in OpenAI model shape. Standard OAI fields are present; Alexandria adds additive fields that standard OAI parsers will ignore.
| Field | Source |
|---|---|
id | agent name (canonical handle for model in chat completions) |
object | "model" |
created | unix timestamp of agent creation |
owned_by | "alexandria" |
agent | true (always — distinguishes from backend entries elsewhere) |
backend | the backend the agent is pinned to, or null |
underlying_model | the model name the agent passes through, or null |
capability_role | the agent's role string |
Response 200
{
"object": "list",
"data": [
{
"id": "researcher",
"object": "model",
"created": 1715000000,
"owned_by": "alexandria",
"agent": true,
"backend": "anthropic",
"underlying_model": "claude-sonnet-4-6",
"capability_role": "default"
}
]
}
For backend introspection (admin UI, internal tooling), use GET /v1/backends instead.
GET /v1/backends
Lists the registered LLM and .amodel backends. Same body shape that /v1/models used to return.
| Field | Source |
|---|---|
id | backend name |
object | "model" |
owned_by | "alexandria" |
capabilities | array of capabilities ("chat", "embed", etc.) |
protocol | wire protocol |
kind | backend kind (llama-cpp, open-ai, claude, etc.) |
category | "registered" or "installed" |
providers | provider sub-variants (for .amodel backends) |
Use this when you need backend-level metadata; use /v1/models to discover what an OpenAI client can invoke.
POST /v1/chat/completions
OpenAI-compatible chat completions. The model field resolves in this order:
- Strip
workflow/prefix silently if present (back-compat — bare agent name is now canonical). - Match agent by name → invoke as that agent. Audit row:
agent.invoke. This is the supported path. - Match backend by name → wrap in a synthetic
__passthrough__agent and forward. Response carriesWarning: 299 - "Deprecated: invoke agents by name; raw backend prompting will be removed". Audit row:agent.invokewithvia_passthrough: true. Soft-deprecated; this path will be removed once telemetry shows zero usage. - Neither found →
404with{"error": {"message": "model '<x>' not found ...", "type": "not_found"}}.
Empty model → 400.
Session header: set X-Alexandria-Session: <session-id> (or session_id/conversation_id in the body) to control conversation continuity and cache affinity. Defaults to principal:<jwt-subject>.
Request (non-streaming)
{
"model": "researcher",
"messages": [
{ "role": "user", "content": "What is 2+2?" }
],
"stream": false
}
Response 200 (non-streaming)
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1715000000,
"model": "researcher",
"choices": [
{
"index": 0,
"message": { "role": "assistant", "content": "4" },
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 0,
"completion_tokens": 0,
"total_tokens": 0
}
}
Note: Token usage counts are not yet populated (reported as 0). This is a known gap.
Streaming request
{
"model": "researcher",
"messages": [{ "role": "user", "content": "Tell me a story." }],
"stream": true
}
Streaming returns SSE chunks in the OpenAI chat.completion.chunk format:
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","created":1715000000,"model":"researcher","choices":[{"index":0,"delta":{"content":"Once"},"finish_reason":null}]}
data: [DONE]
curl example
# Non-streaming, by agent name
curl -s -X POST http://localhost:8080/v1/chat/completions \
-H "Authorization: Bearer $ACCESS" \
-H 'Content-Type: application/json' \
-H 'X-Alexandria-Session: my-session' \
-d '{"model":"researcher","messages":[{"role":"user","content":"Hello"}]}' | jq .
# Streaming
curl -s -X POST http://localhost:8080/v1/chat/completions \
-H "Authorization: Bearer $ACCESS" \
-H 'Content-Type: application/json' \
-d '{"model":"researcher","messages":[{"role":"user","content":"Hello"}],"stream":true}' \
--no-buffer
Deprecation: raw backend prompting
POST /v1/chat/completions historically accepted a backend name in model (e.g. claude-sonnet-4-6). That path still works — it routes through a synthetic __passthrough__ agent — but emits a Warning: 299 response header and a via_passthrough: true audit row. Migrate clients to invoke a named agent instead. The fallback will be removed in a future release.
POST /v1/embeddings
OpenAI-compatible embeddings. Backend-addressable — embeddings are not routed through agents. Emits a model.embed audit entry per call.
Backend resolution:
- If
modelis set and matches a registered backend by name → use that backend. - Otherwise → auto-select first backend with
"embed"capability (preferis_default = true). - If the resolved backend lacks
"embed"capability →400. - If no embed backend is registered →
503.
Request
{
"input": "Hello world",
"model": "text-embed-3"
}
Or an array:
{
"input": ["sentence one", "sentence two"]
}
Response 200
{
"object": "list",
"data": [
{
"object": "embedding",
"embedding": [0.012, -0.034, 0.078, "..."],
"index": 0
}
],
"model": "text-embed-3"
}
Errors
400— backend lacksembedcapability503— no embedding backend registered502— upstream embedding request failed
curl -s -X POST http://localhost:8080/v1/embeddings \
-H "Authorization: Bearer $ACCESS" \
-H 'Content-Type: application/json' \
-d '{"input":"Hello world"}' | jq '.data[0].embedding | length'