AI API
Endpoints for interacting with AI models through a unified provider layer. These endpoints are served by the backend API service, not the web application. They are available at the backend base URL, which may differ from the web API base URL depending on your deployment.
These endpoints are internal backend endpoints and are not exposed through the web application’s /api routes. All /api/ai/* endpoints require bearer token (API key) authentication — the auth middleware is applied at the router mount level, so requests without a valid token receive a 401 response regardless of the endpoint. The chat and cost estimation endpoints additionally require a valid subscription plan. All POST requests must include the Content-Type: application/json header.
All /api/ai/* endpoints share a rate limit of 30 requests per minute per IP, not just the chat endpoint.
Health check
Requires bearer token authentication. Returns the availability status of configured AI providers.
Response
{
"status": "healthy",
"providers": {
"openrouter": true
},
"timestamp": "2026-03-19T00:00:00Z"
}
The status field is healthy when the OpenRouter provider is reachable and degraded when it is not. Only the OpenRouter provider is checked.
Error response
When the provider check fails, the response uses status: "error" and includes the error message:
{
"status": "error",
"error": "Provider connection failed"
}
| Code | Description |
|---|
| 503 | AI service unavailable |
List models
Requires bearer token authentication. Returns all available AI models across providers.
Response
{
"models": [
{
"id": "anthropic/claude-sonnet-4-20250514",
"name": "Claude Sonnet",
"provider": "openrouter",
"description": "Fast, intelligent model for everyday tasks",
"tags": ["chat", "code"],
"inputCost": 0.003,
"outputCost": 0.015,
"contextWindow": 200000,
"available": true
}
],
"count": 1,
"openrouter": 1,
"timestamp": "2026-03-19T00:00:00Z"
}
Errors
| Code | Description |
|---|
| 500 | Failed to fetch models |
List models by provider
GET /api/ai/models/:provider
Requires bearer token authentication.
Path parameters
| Parameter | Type | Description |
|---|
provider | string | Provider name (for example, openrouter) |
Response
{
"provider": "openrouter",
"models": [],
"count": 0,
"timestamp": "2026-03-19T00:00:00Z"
}
Select model
POST /api/ai/models/select
Requires bearer token authentication and a valid subscription plan.
Automatically selects the best model for a given task type.
Request body
| Field | Type | Required | Description |
|---|
taskType | string | No | Type of task (default: general) |
Response
{
"model": {
"id": "anthropic/claude-sonnet-4-20250514",
"provider": "openrouter"
},
"taskType": "general",
"timestamp": "2026-03-19T00:00:00Z"
}
Errors
| Code | Description |
|---|
| 401 | Unauthorized — missing or invalid bearer token |
| 402 | Valid subscription required |
| 404 | No models available |
Chat completion
Send a chat completion request through the unified AI provider layer. The model is auto-selected if not specified.
This endpoint requires a valid subscription plan. Requests without a recognized plan or active Stripe subscription receive a
402 response. The requested model must also be available on your plan — see
plan-based model access below.
The chat endpoint uses
header-based authentication. Access control is enforced through the
x-user-plan and
x-stripe-subscription-id headers rather than JWT verification. Admin emails (configured via
ADMIN_EMAILS) bypass both plan and subscription requirements.
The following headers are required for plan enforcement:
| Header | Type | Required | Description |
|---|
x-user-plan | string | Yes | Subscription plan name (label, solo, collective, or network) |
x-user-email | string | No | User email. Admin emails bypass plan restrictions. |
x-stripe-subscription-id | string | Yes | Active Stripe subscription ID |
Request body
| Field | Type | Required | Description |
|---|
messages | array | Yes | Array of message objects with role (user, assistant, or system) and content |
model | string | No | Model ID. Auto-selected based on taskType if omitted. Must be allowed by your plan. |
taskType | string | No | Used for auto-selection when model is omitted |
temperature | number | No | Sampling temperature |
top_p | number | No | Nucleus sampling parameter |
max_tokens | number | No | Maximum tokens in the response |
algorithmMode | boolean | No | When true, injects the PAI Algorithm system prompt into the conversation. This enables a 7-phase structured problem-solving format (Observe, Think, Plan, Build, Execute, Verify, Learn) for the agent’s responses. The system prompt is prepended to the messages array only if no existing system message already contains the Algorithm phases. Defaults to false. |
Example request
{
"messages": [
{ "role": "system", "content": "You are a helpful assistant." },
{ "role": "user", "content": "Hello!" }
],
"temperature": 0.7,
"max_tokens": 1024
}
Example request with Algorithm mode
When algorithmMode is enabled, the agent responds using a structured 7-phase format for non-trivial tasks:
{
"messages": [
{ "role": "user", "content": "Audit the authentication flow for security issues" }
],
"algorithmMode": true
}
Response
Returns a structured response with the following shape:
{
"id": "chatcmpl-abc123",
"model": "anthropic/claude-sonnet-4-20250514",
"provider": "openrouter",
"message": {
"role": "assistant",
"content": "Hello! How can I help you today?"
},
"usage": {
"prompt_tokens": 25,
"completion_tokens": 10,
"total_tokens": 35
},
"timestamp": "2026-03-19T00:00:00Z"
}
Errors
| Code | Description |
|---|
| 400 | Messages array is required and must be non-empty |
| 402 | Valid subscription required. Returned when the plan header is missing or unrecognized (PLAN_REQUIRED), or when there is no active Stripe subscription (SUBSCRIPTION_REQUIRED). |
| 403 | Model not available on your plan (MODEL_RESTRICTED). The response includes an allowedModels array listing the models your plan supports. |
| 404 | No models available |
| 500 | AI provider error |
402 error example
{
"success": false,
"error": "Valid subscription required. Choose a plan at /pricing",
"code": "PLAN_REQUIRED"
}
403 error example
{
"error": "Model openai/gpt-4-turbo not available on your plan. Upgrade for more models.",
"code": "MODEL_RESTRICTED",
"allowedModels": ["openai/gpt-4o-mini", "google/gemini-2.0-flash"]
}
Token quotas (planned)
Token quotas are not yet enforced. The backend does not currently track token usage or reject requests based on usage limits. The quota system described below is planned for a future release. When implemented, requests exceeding the plan limit will be rejected with a 429 status and a QUOTA_EXCEEDED error code.
| Plan | Monthly token limit (planned) |
|---|
solo | 2,000,000 |
collective | 6,000,000 |
label | 20,000,000 |
network | Unlimited |
Plan-based model access
Each subscription plan grants access to a specific set of AI models. The chat endpoint enforces these limits automatically via the plan middleware.
| Plan | Price | Models | Agent limit | Skill limit | A2A messages/day |
|---|
solo | £29/mo | openai/gpt-4o-mini, google/gemini-2.0-flash, xiaomi/mimo-v2-pro | 1 | 3 | 100 |
collective | £69/mo | openai/gpt-4o-mini, openai/gpt-4o, google/gemini-2.0-flash, anthropic/claude-3.5-sonnet, xiaomi/mimo-v2-pro | 10 | 10 | 500 |
label | £149/mo | openai/gpt-4o-mini, openai/gpt-4o, openai/gpt-4-turbo, google/gemini-2.0-flash, anthropic/claude-3.5-sonnet, anthropic/claude-3-opus, xiaomi/mimo-v2-pro | 3 | 25 | 2,000 |
network | £499/mo | All models | 100 | 100 | 10,000 |
Admin users are automatically granted network-level access regardless of their subscription plan.
The plan middleware (x-user-plan header) enforces model access, skill limits, and A2A message quotas. The provisioning endpoint enforces separate agent creation limits: solo 1, collective 3, label 10, network unlimited. The provisioning limits determine how many agents you can create, while the middleware limits in the table above apply to per-request AI model access and skill usage.
Model fallbacks
Each AI provider is configured with a primary model and a fallback model. When the primary model is unavailable or returns an error, the system automatically retries the request using the fallback model.
| Provider | Primary model | Fallback model |
|---|
openrouter | moonshotai/kimi-k2.5 | openrouter/openai/gpt-4o-mini |
gemini | google/gemini-2.0-flash | openrouter/anthropic/claude-sonnet-4-5 |
groq | groq/gemma2-9b-it | openai/gpt-4o-mini |
anthropic | anthropic/claude-sonnet-4-5 | openai/gpt-4o |
openai | openai/gpt-4o | openai/gpt-4o-mini |
minimax (MiniMax/MiniMax-Text-01) is available in the provider configuration map but is not currently supported in the model fallback chain. It may be enabled in a future release.
Fallback routing is handled transparently. The response always indicates which model ultimately served the request via the model field.
Task-based model selection
In addition to provider-level fallbacks, the backend AI service uses tag-based model selection that picks the best available model based on the type of work being performed. When you specify a taskType, the system searches the available OpenRouter models for matching capability tags and selects the first match.
| Task type | Matching tags |
|---|
coding | coding, logic |
analysis | analysis |
creative | creative |
long | long-context |
general | general, balanced |
When no model matches the requested task tags, the first available model from the OpenRouter catalog is used as a fallback. All task-based requests are routed through OpenRouter.
Algorithm mode
The chat endpoint supports an optional structured problem-solving mode called PAI Algorithm mode. When enabled via the algorithmMode parameter, a system prompt is injected that instructs the model to process non-trivial tasks using a 7-phase format:
| Phase | Name | Purpose |
|---|
| 1 | Observe | Reverse-engineer the request: what was asked, what was implied, and what is not wanted. Produce 3–5 Ideal State Criteria (ISC). |
| 2 | Think | Select capabilities and a composition pattern (Pipeline, TDD Loop, Fan-out, or Gate). |
| 3 | Plan | Define concrete numbered steps with clear handoffs. |
| 4 | Build | Create artifacts such as files, configs, or code. |
| 5 | Execute | Run the work using the selected capabilities. |
| 6 | Verify | Test each ISC criterion with evidence, marking each as pass or fail. |
| 7 | Learn | Summarize what worked, what didn’t, and what to improve. |
The Algorithm system prompt is prepended to the messages array as a system message. If the messages already contain a system message with Algorithm phase markers, the prompt is not duplicated. For simple greetings or acknowledgments, the model skips the 7-phase format and responds naturally.
Algorithm mode is opt-in and does not affect billing or model selection. It only modifies the system prompt sent to the model. Based on Daniel Miessler’s TheAlgorithm v0.2.24.
Estimate cost
POST /api/ai/estimate-cost
Requires bearer token authentication and a valid subscription plan.
Estimate the cost of a request based on token counts and model pricing.
Request body
| Field | Type | Required | Description |
|---|
model | string | Yes | Model ID |
inputTokens | number | Yes | Number of input tokens |
outputTokens | number | Yes | Number of output tokens |
Response
{
"model": "anthropic/claude-sonnet-4-20250514",
"inputTokens": 1000,
"outputTokens": 500,
"estimatedCost": 0.0045,
"currency": "USD",
"timestamp": "2026-03-19T00:00:00Z"
}
Errors
| Code | Description |
|---|
| 400 | Model, inputTokens, and outputTokens are all required |
| 401 | Unauthorized — missing or invalid bearer token |
| 402 | Valid subscription required |