Skip to main content

Backends

Alexandria is backend-agnostic. You register any running inference endpoint; Alexandria forwards requests to it and manages conversation history.

Add a backend

alexandria llm add <name> <url> [options]
OptionDescription
--kindWire format: llama-cpp, open-ai (default), claude, custom
--modelModel name sent in request body (required for OpenAI/Claude endpoints)
--api-keyBearer token / API key
--completions-pathCompletions path override (only used with --kind custom)

Examples

# llama-server (llama.cpp)
alexandria llm add local http://localhost:8080 --kind llama-cpp

# Ollama
alexandria llm add ollama http://localhost:11434 --kind open-ai --model llama3

# vLLM
alexandria llm add vllm http://localhost:8000 --kind open-ai --model meta-llama/Llama-3.1-8B-Instruct

# OpenAI
alexandria llm add openai https://api.openai.com --kind open-ai --model gpt-4o --api-key $OPENAI_API_KEY

# Anthropic Claude
alexandria llm add claude https://api.anthropic.com --kind claude --model claude-opus-4-6 --api-key $ANTHROPIC_API_KEY

# Custom endpoint
alexandria llm add custom http://my-host:9000 --kind custom --completions-path /api/generate

Manage backends

alexandria llm list # table: name | url | kind | default | health
alexandria llm use <name> # set default backend
alexandria llm ping [name] # health check one or all backends
alexandria llm remove <name> # unregister

Wire formats

KindPathRequest body
llama-cppPOST /completion{prompt, n_predict, temperature}
open-aiPOST /v1/chat/completions{model, messages, max_tokens, temperature}
claudePOST /v1/messages{model, max_tokens, messages} + x-api-key / anthropic-version headers
customPOST <completions-path>OpenAI format

Health checks use GET /health for llama-cpp/custom and GET /v1/models for open-ai/claude.