Matrix logo

Gateway API

The metered, OpenAI-compatible LLM proxy — chat completions and embeddings, priced per model against a versioned PAX rate card and debited from a per-user credit ledger.

The gateway is an OpenAI-compatible LLM proxy that meters every call: it prices the request against a versioned rate card, debits a per-user PAX credit ledger, then forwards to the upstream provider.

Routes

MethodPathPurpose
GET/healthzLiveness.
POST/v1/chat/completionsOpenAI-compatible chat completion (metered).
POST/v1/embeddingsEmbeddings (input-only pricing; metered).

Metering model

1
Price

The model id is looked up in the versioned rate card (RateTableVersion). PAX rates are derived from USD provider prices at a fixed PAX reference.

2
Gate

Free-tier callers are restricted to a per-slot model whitelist and a daily PAX cap. Other models require X-Matrix-BYO-API-Key.

3
Debit

Token usage is priced and written to the credit_ledger, recording the rate_table_v so historical rows replay byte-identically after a reprice.

Every ledger row records the rate-table version that priced it, so historical costs remain auditable and reproducible even after the rate card is bumped.

Chat completions

curl -X POST https://gateway.example/v1/chat/completions \
  -H "Authorization: Bearer $MATRIX_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "accounts/fireworks/models/gpt-oss-120b",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

The response is the standard OpenAI chat-completion shape. Cost accounting happens server-side; inspect the credit ledger for the debited PAX amount and the rate_table_v used.

Pricing & PAX

How PAX, the rate card, and the free tier work.