Backends

Alexandria is backend-agnostic. You register any running inference endpoint; Alexandria forwards requests to it and manages conversation history.

Add a backend

alexandria llm add <name> <url> [options]

Option	Description
`--kind`	Wire format: `llama-cpp`, `open-ai` (default), `claude`, `custom`
`--model`	Model name sent in request body (required for OpenAI/Claude endpoints)
`--api-key`	Bearer token / API key
`--completions-path`	Completions path override (only used with `--kind custom`)

Examples

# llama-server (llama.cpp)
alexandria llm add local http://localhost:8080 --kind llama-cpp

# Ollama
alexandria llm add ollama http://localhost:11434 --kind open-ai --model llama3

# vLLM
alexandria llm add vllm http://localhost:8000 --kind open-ai --model meta-llama/Llama-3.1-8B-Instruct

# OpenAI
alexandria llm add openai https://api.openai.com --kind open-ai --model gpt-4o --api-key $OPENAI_API_KEY

# Anthropic Claude
alexandria llm add claude https://api.anthropic.com --kind claude --model claude-opus-4-6 --api-key $ANTHROPIC_API_KEY

# Custom endpoint
alexandria llm add custom http://my-host:9000 --kind custom --completions-path /api/generate

Manage backends

alexandria llm list          # table: name | url | kind | default | health
alexandria llm use <name>    # set default backend
alexandria llm ping [name]   # health check one or all backends
alexandria llm remove <name> # unregister

Wire formats

Kind	Path	Request body
`llama-cpp`	`POST /completion`	`{prompt, n_predict, temperature}`
`open-ai`	`POST /v1/chat/completions`	`{model, messages, max_tokens, temperature}`
`claude`	`POST /v1/messages`	`{model, max_tokens, messages}` + `x-api-key` / `anthropic-version` headers
`custom`	`POST <completions-path>`	OpenAI format

Health checks use GET /health for llama-cpp/custom and GET /v1/models for open-ai/claude.

Add a backend​

Examples​

Manage backends​

Wire formats​

Add a backend

Examples

Manage backends

Wire formats