API Reference

ModelHive exposes an OpenAI-compatible API. If your code already works with the OpenAI SDK, it works with ModelHive — just change the base URL and API key.

Base URL

https://api.modelhive.ai/v1

Authentication

All requests require an API Key in the Authorization header:

Authorization: Bearer sk-your-modelhive-key

API keys are created from the ModelHive Dashboard. Each key has its own budget, optional model restrictions, and auto-recharge settings.

Endpoints

Method	Endpoint	Description
`POST`	`/v1/chat/completions`	Generate a chat completion
`GET`	`/v1/models`	List available models
`POST`	`/v1/embeddings`	Generate text embeddings
`POST`	`/v1/images/generations`	Generate images from text prompts
`POST`	`/v1/videos/generations`	Create video generation jobs
`POST`	`/v1/audio/transcriptions`	Transcribe audio to text
`POST`	`/v1/audio/speech`	Convert text to speech

Complete Endpoint Map

ModelHive exposes its API publicly on https://api.modelhive.ai and accepts standard Authorization: Bearer sk-... API keys.

Internal admin routes are blocked externally (/ui, /sso, /login). Public API routes are available with /v1 prefix.

Core LLM Endpoints

POST /v1/chat/completions
POST /v1/messages
POST /v1/responses
POST /v1/responses/compact
POST /v1/completions
GET /v1/models
POST /v1/embeddings
POST /v1/rerank
POST /v1/moderations
POST /v1/fine_tuning/jobs
GET|POST /v1/realtime

Media Endpoints

POST /v1/images/generations
POST /v1/images/edits
POST /v1/images/variations
POST /v1/audio/transcriptions
POST /v1/audio/speech
POST /v1/videos (or /v1/videos/generations compatibility route)
GET /v1/videos/{video_id}
GET /v1/videos/{video_id}/content
POST /v1/videos/{video_id}/remix

Data and Workflow Endpoints

POST /v1/batches
POST /v1/files
POST /v1/vector_stores
POST /v1/vector_stores/{id}/files
POST /v1/vector_stores/{id}/search

Agent and Utility Endpoints

POST /v1/assistants
POST /v1/a2a/{agent}/message/send
POST /v1/interactions
POST /v1/ocr
POST /v1/rag/ingest
POST /v1/rag/query
POST /v1/utils/token_counter
POST /v1/generateContent
POST /v1/containers
POST /v1/containers/{id}/files

info

Availability depends on model/provider support and tenant-level model permissions. If an endpoint is enabled but your model does not support it, the API returns an error (typically 400/404).

Resource-style endpoint families (for example files, vector_stores, assistants, containers, videos) also expose related GET/POST/DELETE sub-routes under the same prefix according to OpenAI compatibility.

Chat Completions

POST /v1/chat/completions

Generate a model response for the given conversation. This is the primary endpoint for all LLM interactions.

Request Body

Parameter	Type	Required	Description
`model`	string	Yes	Model ID (e.g., `gpt-4o`, `claude-sonnet-4-20250514`, `gemini/gemini-2.5-pro`)
`messages`	array	Yes	Conversation messages. Each message has `role` and `content`. Content can be a string or an array of `text`/`image_url` parts for multimodal input
`temperature`	number		Sampling temperature (0–2). Default: 1
`max_tokens`	integer		Maximum tokens to generate
`top_p`	number		Nucleus sampling (0–1)
`stream`	boolean		Stream response via SSE. Default: false
`stop`	string/array		Stop sequences
`presence_penalty`	number		Presence penalty (-2 to 2)
`frequency_penalty`	number		Frequency penalty (-2 to 2)
`tools`	array		Function/tool definitions
`tool_choice`	string/object		Tool selection strategy
`response_format`	object		Force structured output (e.g., `{"type": "json_object"}`)

Message Format

{
  "role": "user",
  "content": "Hello, how are you?"
}

For multimodal requests (images, PDFs):

{
  "role": "user",
  "content": [
    {"type": "text", "text": "Describe this image."},
    {"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}}
  ]
}

Example — Basic

Python
JavaScript
cURL

from openai import OpenAI

client = OpenAI(
    api_key="sk-your-modelhive-key",
    base_url="https://api.modelhive.ai/v1"
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain quantum computing in simple terms."}
    ],
    temperature=0.7,
    max_tokens=500
)

print(response.choices[0].message.content)

import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: 'sk-your-modelhive-key',
  baseURL: 'https://api.modelhive.ai/v1',
});

const response = await client.chat.completions.create({
  model: 'gpt-4o',
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'Explain quantum computing in simple terms.' },
  ],
  temperature: 0.7,
  max_tokens: 500,
});

console.log(response.choices[0].message.content);

curl -X POST https://api.modelhive.ai/v1/chat/completions \
  -H "Authorization: Bearer sk-your-modelhive-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Explain quantum computing in simple terms."}
    ],
    "temperature": 0.7,
    "max_tokens": 500
  }'

Example — Streaming

Python
JavaScript

stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Write a poem about AI."}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

const stream = await client.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'Write a poem about AI.' }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || '');
}

Example — With Image

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What's in this image?"},
                {
                    "type": "image_url",
                    "image_url": {"url": "https://example.com/photo.jpg"}
                }
            ]
        }
    ],
    max_tokens=500
)

Example — With Base64 Image

import base64

with open("screenshot.png", "rb") as f:
    b64 = base64.b64encode(f.read()).decode("utf-8")

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Describe this screenshot."},
                {
                    "type": "image_url",
                    "image_url": {"url": f"data:image/png;base64,{b64}"}
                }
            ]
        }
    ]
)

Example — With PDF

import base64

with open("report.pdf", "rb") as f:
    b64 = base64.b64encode(f.read()).decode("utf-8")

response = client.chat.completions.create(
    model="gemini/gemini-2.5-pro",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Summarize this report."},
                {
                    "type": "image_url",
                    "image_url": {"url": f"data:application/pdf;base64,{b64}"}
                }
            ]
        }
    ]
)

Example — Function Calling

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What's the weather in Rome?"}],
    tools=[
        {
            "type": "function",
            "function": {
                "name": "get_weather",
                "description": "Get the current weather for a city",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "city": {"type": "string", "description": "City name"}
                    },
                    "required": ["city"]
                }
            }
        }
    ],
    tool_choice="auto"
)

# The model may return a tool_call:
tool_calls = response.choices[0].message.tool_calls
if tool_calls:
    print(f"Function: {tool_calls[0].function.name}")
    print(f"Arguments: {tool_calls[0].function.arguments}")

Example — JSON Mode

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "Respond in JSON format."},
        {"role": "user", "content": "List 3 European capitals with population."}
    ],
    response_format={"type": "json_object"}
)

import json
data = json.loads(response.choices[0].message.content)
print(data)

Response

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1709000000,
  "model": "gpt-4o",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Quantum computing uses quantum mechanics..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 150,
    "total_tokens": 175
  }
}

Special Headers

Request Headers

Control ModelHive's intelligent features on a per-request basis:

Header	Value	Description
`x-hive-cache`	`false`	Skip HiveCache lookup (response is still cached for future hits)
`x-hive-cache-meta`	any string	Attach opaque metadata to the cached entry (for tracing/benchmarking)
`x-hive-state`	`false`	Skip HiveState conversation compression
`x-hive-route`	`false`	Skip HiveRoute model routing (keep HiveState compression)
`x-hive-guard`	`none`	Disable all security guardrails for this request
`x-hive-guard`	`prompt-injection,toxicity`	Run only the listed guardrails (comma-separated)

Response Headers — HiveCache

Header	Example	Description
`x-hivecache-status`	`DIRECT`	Band: `DIRECT`, `REUSE`, `TWEAK`, or `MISS`
`x-hivecache-score`	`0.9632`	DualScore semantic similarity
`x-hivecache-tokens-saved`	`1847`	Tokens saved by serving from cache
`x-hivecache-matched-meta`	`req-abc`	Metadata from the matched cache entry

Response Headers — HiveState

Header	Example	Description
`x-hivestate-mode`	`state`	`state` (compressed) or `none` (pass-through)
`x-hivestate-original-tokens`	`7000`	Token count before compression
`x-hivestate-tokens`	`220`	Token count after compression
`x-hivestate-ratio`	`0.97`	Compression ratio (1 − result/original)
`x-hivestate-latency-ms`	`1215`	Extraction latency in milliseconds
`x-hivestate-intent`	`fix_build_error`	Extracted conversation intent

Response Headers — HiveRoute

Header	Example	Description
`x-hiveroute-level`	`standard`	Difficulty: `trivial`, `standard`, `complex`
`x-hiveroute-model`	`gpt-oss`	Model the request was routed to
`x-hiveroute-effort`	`medium`	Reasoning effort injected

Response Headers — Router / Provider

Header	Example	Description
`X-Modelhive-Provider`	`azure-east`	Deployment that served the request
`X-Modelhive-Attempts`	`2`	Number of attempts (only if >1)
`X-Modelhive-Failed`	`azure-west`	Providers that failed before success
`X-Modelhive-Cost`	`0.0012340000`	USD cost of the request

List Models

GET /v1/models

List all models available to your API key.

Example

Python
cURL

models = client.models.list()
for model in models.data:
    print(model.id)

curl https://api.modelhive.ai/v1/models \
  -H "Authorization: Bearer sk-your-modelhive-key"

Response

{
  "object": "list",
  "data": [
    {
      "id": "gpt-4o",
      "object": "model",
      "owned_by": "openai"
    },
    {
      "id": "claude-sonnet-4-20250514",
      "object": "model",
      "owned_by": "anthropic"
    },
    {
      "id": "gemini/gemini-2.5-pro",
      "object": "model",
      "owned_by": "google"
    }
  ]
}

info

The list of models depends on which models your tenant administrator has enabled for your organization.

Embeddings

POST /v1/embeddings

Generate vector embeddings for text input.

Request Body

Parameter	Type	Required	Description
`model`	string	Yes	Embedding model ID
`input`	string/array	Yes	Text to embed (string or array of strings)
`dimensions`	integer		Optional output vector size (supported by specific models)
`encoding_format`	string		`float` (default) or `base64`
`user`	string		End-user identifier for tracing/abuse monitoring

Example — Basic

Python
JavaScript
cURL

response = client.embeddings.create(
    model="text-embedding-3-small",
    input="ModelHive is an AI gateway platform."
)

print(f"Dimensions: {len(response.data[0].embedding)}")

const response = await client.embeddings.create({
  model: 'text-embedding-3-small',
  input: 'ModelHive is an AI gateway platform.',
});

console.log(response.data[0].embedding.length);

curl -X POST https://api.modelhive.ai/v1/embeddings \
  -H "Authorization: Bearer sk-your-modelhive-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "text-embedding-3-small",
    "input": "ModelHive is an AI gateway platform."
  }'

Example — Batch Input

response = client.embeddings.create(
    model="text-embedding-3-small",
    input=[
        "ModelHive routes requests across providers.",
        "Embeddings are useful for semantic search."
    ],
    encoding_format="float"
)

for item in response.data:
    print(item.index, len(item.embedding))

Response

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [0.0023, -0.0091, 0.0152, ...]
    }
  ],
  "model": "text-embedding-3-small",
  "usage": {
    "prompt_tokens": 8,
    "total_tokens": 8
  }
}

Image Generations

POST /v1/images/generations

Generate one or more images from a text prompt.

Request Body

Parameter	Type	Required	Description
`model`	string	Yes	Image model ID (e.g., `gpt-image-1`, `dall-e-3`)
`prompt`	string	Yes	Natural language prompt describing the image
`n`	integer		Number of images to generate
`size`	string		Output size (e.g., `1024x1024`, model-dependent)
`quality`	string		Quality preset (model-dependent)
`response_format`	string		`url` or `b64_json`
`style`	string		Style preset where supported (e.g., `vivid`, `natural`)
`user`	string		End-user identifier for tracing/abuse monitoring

Example

Python
JavaScript
cURL

image = client.images.generate(
    model="gpt-image-1",
    prompt="A futuristic city at sunrise, cinematic light",
    size="1024x1024",
    quality="high"
)

print(image.data[0].url)

const image = await client.images.generate({
  model: 'gpt-image-1',
  prompt: 'A futuristic city at sunrise, cinematic light',
  size: '1024x1024',
  quality: 'high',
});

console.log(image.data[0].url);

curl -X POST https://api.modelhive.ai/v1/images/generations \
  -H "Authorization: Bearer sk-your-modelhive-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-image-1",
    "prompt": "A futuristic city at sunrise, cinematic light",
    "size": "1024x1024",
    "quality": "high"
  }'

Response

{
  "created": 1709000000,
  "data": [
    {
      "url": "https://cdn.example.com/generated/image-001.png"
    }
  ]
}

Video Generations

POST /v1/videos/generations

Create an asynchronous video generation job from a prompt.

info

Many OpenAI-compatible deployments expose the same operation on POST /v1/videos. If /v1/videos/generations is not available in your runtime, use POST /v1/videos with the same payload.

Request Body

Parameter	Type	Required	Description
`model`	string	Yes	Video model ID (e.g., `sora-2`, provider aliases)
`prompt`	string	Yes	Prompt describing the video to generate
`seconds`	string/integer		Video duration (provider/model dependent)
`size`	string		Resolution (e.g., `720x1280`)
`input_reference`	string/file		Optional reference image for image-to-video/editing flows
`user`	string		End-user identifier for tracing/abuse monitoring

Example — Create Job

Python
JavaScript
cURL

import requests

response = requests.post(
    "https://api.modelhive.ai/v1/videos/generations",
    headers={
        "Authorization": "Bearer sk-your-modelhive-key",
        "Content-Type": "application/json"
    },
    json={
        "model": "sora-2",
        "prompt": "A cinematic drone shot over snowy mountains",
        "seconds": "8",
        "size": "720x1280"
    },
    timeout=120
)

video = response.json()
print(video["id"], video["status"])

const response = await fetch('https://api.modelhive.ai/v1/videos/generations', {
  method: 'POST',
  headers: {
    Authorization: 'Bearer sk-your-modelhive-key',
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    model: 'sora-2',
    prompt: 'A cinematic drone shot over snowy mountains',
    seconds: '8',
    size: '720x1280',
  }),
});

const video = await response.json();
console.log(video.id, video.status);

curl -X POST https://api.modelhive.ai/v1/videos/generations \
  -H "Authorization: Bearer sk-your-modelhive-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "sora-2",
    "prompt": "A cinematic drone shot over snowy mountains",
    "seconds": "8",
    "size": "720x1280"
  }'

Example — Check Status

curl https://api.modelhive.ai/v1/videos/video_abc123 \
  -H "Authorization: Bearer sk-your-modelhive-key"

Response

{
  "id": "video_abc123",
  "object": "video",
  "status": "queued",
  "created_at": 1709000000,
  "model": "sora-2",
  "seconds": "8",
  "size": "720x1280"
}

Audio Transcriptions

POST /v1/audio/transcriptions

Convert an audio file into text.

Request Body (`multipart/form-data`)

Parameter	Type	Required	Description
`model`	string	Yes	Speech-to-text model ID (e.g., `whisper-1`, provider aliases)
`file`	file	Yes	Audio file to transcribe
`language`	string		Language hint (ISO code, improves accuracy/latency)
`prompt`	string		Optional context prompt to bias transcription
`response_format`	string		`json`, `text`, `srt`, `vtt`, or `verbose_json`
`temperature`	number		Sampling temperature (usually keep low for STT)

Example

Python
JavaScript
cURL

audio_file = open("meeting.mp3", "rb")

transcript = client.audio.transcriptions.create(
    model="whisper-1",
    file=audio_file
)

print(transcript.text)

import fs from 'fs';

const transcript = await client.audio.transcriptions.create({
  model: 'whisper-1',
  file: fs.createReadStream('meeting.mp3'),
});

console.log(transcript.text);

curl -X POST https://api.modelhive.ai/v1/audio/transcriptions \
  -H "Authorization: Bearer sk-your-modelhive-key" \
  -F file="@meeting.mp3" \
  -F model="whisper-1"

Response

{
  "text": "Welcome everyone. Today we will review the Q2 roadmap..."
}

Audio Speech

POST /v1/audio/speech

Convert text to spoken audio.

Request Body

Parameter	Type	Required	Description
`model`	string	Yes	TTS model ID (e.g., `tts-1`, provider aliases)
`input`	string	Yes	Text to synthesize
`voice`	string	Yes	Voice preset (e.g., `alloy`, `nova`)
`response_format`	string		`mp3`, `wav`, `opus`, `aac`, `flac`, `pcm`
`speed`	number		Speaking speed (model-dependent range)

Example

Python
JavaScript
cURL

speech = client.audio.speech.create(
    model="tts-1",
    voice="alloy",
    input="ModelHive routes AI traffic across multiple providers."
)

speech.stream_to_file("speech.mp3")

import fs from 'fs/promises';

const speech = await client.audio.speech.create({
  model: 'tts-1',
  voice: 'alloy',
  input: 'ModelHive routes AI traffic across multiple providers.',
});

const buffer = Buffer.from(await speech.arrayBuffer());
await fs.writeFile('speech.mp3', buffer);

curl -X POST https://api.modelhive.ai/v1/audio/speech \
  -H "Authorization: Bearer sk-your-modelhive-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tts-1",
    "voice": "alloy",
    "input": "ModelHive routes AI traffic across multiple providers."
  }' \
  --output speech.mp3

Response

Returns binary audio data in the selected response_format (default typically mp3).

Error Codes

HTTP Code	Meaning
`400`	Bad request — invalid parameters
`401`	Invalid or missing API key
`402`	Insufficient budget — recharge your key or wallet
`403`	Request blocked by security guardrails
`404`	Model not found or not enabled for your tenant
`429`	Rate limit exceeded — try again shortly
`500`	Internal server error

Error Response Format

{
  "error": {
    "message": "Insufficient budget. Remaining: $0.12, estimated cost: $0.50",
    "type": "budget_exceeded",
    "code": 402
  }
}

Rate Limits

Rate limits are applied per API key. If you hit rate limits, the response includes:

Header	Description
`x-ratelimit-limit-requests`	Max requests per minute
`x-ratelimit-remaining-requests`	Remaining requests
`x-ratelimit-reset-requests`	Time until reset

Base URL​

Authentication​

Endpoints​

Complete Endpoint Map​

Core LLM Endpoints​

Media Endpoints​

Data and Workflow Endpoints​

Agent and Utility Endpoints​

Chat Completions​

Request Body​

Message Format​

Example — Basic​

Example — Streaming​

Example — With Image​

Example — With Base64 Image​

Example — With PDF​

Example — Function Calling​

Example — JSON Mode​

Response​

Special Headers​

Request Headers​

Response Headers — HiveCache​

Response Headers — HiveState​

Response Headers — HiveRoute​

Response Headers — Router / Provider​

List Models​

Example​

Response​

Embeddings​

Request Body​

Example — Basic​

Example — Batch Input​

Response​

Image Generations​

Request Body​

Example​

Response​

Video Generations​

Request Body​

Example — Create Job​

Example — Check Status​

Response​

Audio Transcriptions​

Request Body (multipart/form-data)​

Example​

Response​

Audio Speech​

Request Body​

Example​

Response​

Error Codes​

Error Response Format​

Rate Limits​

Base URL

Authentication

Endpoints

Complete Endpoint Map

Core LLM Endpoints

Media Endpoints

Data and Workflow Endpoints

Agent and Utility Endpoints

Chat Completions

Request Body

Message Format

Example — Basic

Example — Streaming

Example — With Image

Example — With Base64 Image

Example — With PDF

Example — Function Calling

Example — JSON Mode

Response

Special Headers

Request Headers

Response Headers — HiveCache

Response Headers — HiveState

Response Headers — HiveRoute

Response Headers — Router / Provider

List Models

Example

Response

Embeddings

Request Body

Example — Basic

Example — Batch Input

Response

Image Generations

Request Body

Example

Response

Video Generations

Request Body

Example — Create Job

Example — Check Status

Response

Audio Transcriptions

Request Body (`multipart/form-data`)

Example

Response

Audio Speech

Request Body

Example

Response

Error Codes

Error Response Format

Rate Limits