Backends
Alexandria is backend-agnostic. You register any running inference endpoint; Alexandria forwards requests to it and manages conversation history.
Add a backend
alexandria llm add <name> <url> [options]
| Option | Description |
|---|---|
--kind | Wire format: llama-cpp, open-ai (default), claude, custom |
--model | Model name sent in request body (required for OpenAI/Claude endpoints) |
--api-key | Bearer token / API key |
--completions-path | Completions path override (only used with --kind custom) |
Examples
# llama-server (llama.cpp)
alexandria llm add local http://localhost:8080 --kind llama-cpp
# Ollama
alexandria llm add ollama http://localhost:11434 --kind open-ai --model llama3
# vLLM
alexandria llm add vllm http://localhost:8000 --kind open-ai --model meta-llama/Llama-3.1-8B-Instruct
# OpenAI
alexandria llm add openai https://api.openai.com --kind open-ai --model gpt-4o --api-key $OPENAI_API_KEY
# Anthropic Claude
alexandria llm add claude https://api.anthropic.com --kind claude --model claude-opus-4-6 --api-key $ANTHROPIC_API_KEY
# Custom endpoint
alexandria llm add custom http://my-host:9000 --kind custom --completions-path /api/generate
Manage backends
alexandria llm list # table: name | url | kind | default | health
alexandria llm use <name> # set default backend
alexandria llm ping [name] # health check one or all backends
alexandria llm remove <name> # unregister
Wire formats
| Kind | Path | Request body |
|---|---|---|
llama-cpp | POST /completion | {prompt, n_predict, temperature} |
open-ai | POST /v1/chat/completions | {model, messages, max_tokens, temperature} |
claude | POST /v1/messages | {model, max_tokens, messages} + x-api-key / anthropic-version headers |
custom | POST <completions-path> | OpenAI format |
Health checks use GET /health for llama-cpp/custom and GET /v1/models for open-ai/claude.