API Reference
ModelHive exposes an OpenAI-compatible API. If your code already works with the OpenAI SDK, it works with ModelHive — just change the base URL and API key.
Base URL
https://api.modelhive.ai/v1
Authentication
All requests require an API Key in the Authorization header:
Authorization: Bearer sk-your-modelhive-key
API keys are created from the ModelHive Dashboard. Each key has its own budget, optional model restrictions, and auto-recharge settings.
Endpoints
| Method | Endpoint | Description |
|---|---|---|
POST | /v1/chat/completions | Generate a chat completion |
GET | /v1/models | List available models |
POST | /v1/embeddings | Generate text embeddings |
POST | /v1/images/generations | Generate images from text prompts |
POST | /v1/videos/generations | Create video generation jobs |
POST | /v1/audio/transcriptions | Transcribe audio to text |
POST | /v1/audio/speech | Convert text to speech |
Complete Endpoint Map
ModelHive exposes its API publicly on https://api.modelhive.ai and accepts standard Authorization: Bearer sk-... API keys.
Internal admin routes are blocked externally (/ui, /sso, /login). Public API routes are available with /v1 prefix.
Core LLM Endpoints
POST /v1/chat/completionsPOST /v1/messagesPOST /v1/responsesPOST /v1/responses/compactPOST /v1/completionsGET /v1/modelsPOST /v1/embeddingsPOST /v1/rerankPOST /v1/moderationsPOST /v1/fine_tuning/jobsGET|POST /v1/realtime
Media Endpoints
POST /v1/images/generationsPOST /v1/images/editsPOST /v1/images/variationsPOST /v1/audio/transcriptionsPOST /v1/audio/speechPOST /v1/videos(or/v1/videos/generationscompatibility route)GET /v1/videos/{video_id}GET /v1/videos/{video_id}/contentPOST /v1/videos/{video_id}/remix
Data and Workflow Endpoints
POST /v1/batchesPOST /v1/filesPOST /v1/vector_storesPOST /v1/vector_stores/{id}/filesPOST /v1/vector_stores/{id}/search
Agent and Utility Endpoints
POST /v1/assistantsPOST /v1/a2a/{agent}/message/sendPOST /v1/interactionsPOST /v1/ocrPOST /v1/rag/ingestPOST /v1/rag/queryPOST /v1/utils/token_counterPOST /v1/generateContentPOST /v1/containersPOST /v1/containers/{id}/files
Availability depends on model/provider support and tenant-level model permissions. If an endpoint is enabled but your model does not support it, the API returns an error (typically 400/404).
Resource-style endpoint families (for example files, vector_stores, assistants, containers, videos) also expose related GET/POST/DELETE sub-routes under the same prefix according to OpenAI compatibility.
Chat Completions
POST /v1/chat/completions
Generate a model response for the given conversation. This is the primary endpoint for all LLM interactions.
Request Body
| Parameter | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Model ID (e.g., gpt-4o, claude-sonnet-4-20250514, gemini/gemini-2.5-pro) |
messages | array | Yes | Conversation messages. Each message has role and content. Content can be a string or an array of text/image_url parts for multimodal input |
temperature | number | Sampling temperature (0–2). Default: 1 | |
max_tokens | integer | Maximum tokens to generate | |
top_p | number | Nucleus sampling (0–1) | |
stream | boolean | Stream response via SSE. Default: false | |
stop | string/array | Stop sequences | |
presence_penalty | number | Presence penalty (-2 to 2) | |
frequency_penalty | number | Frequency penalty (-2 to 2) | |
tools | array | Function/tool definitions | |
tool_choice | string/object | Tool selection strategy | |
response_format | object | Force structured output (e.g., {"type": "json_object"}) |
Message Format
{
"role": "user",
"content": "Hello, how are you?"
}
For multimodal requests (images, PDFs):
{
"role": "user",
"content": [
{"type": "text", "text": "Describe this image."},
{"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}}
]
}
Example — Basic
- Python
- JavaScript
- cURL
from openai import OpenAI
client = OpenAI(
api_key="sk-your-modelhive-key",
base_url="https://api.modelhive.ai/v1"
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain quantum computing in simple terms."}
],
temperature=0.7,
max_tokens=500
)
print(response.choices[0].message.content)
import OpenAI from 'openai';
const client = new OpenAI({
apiKey: 'sk-your-modelhive-key',
baseURL: 'https://api.modelhive.ai/v1',
});
const response = await client.chat.completions.create({
model: 'gpt-4o',
messages: [
{ role: 'system', content: 'You are a helpful assistant.' },
{ role: 'user', content: 'Explain quantum computing in simple terms.' },
],
temperature: 0.7,
max_tokens: 500,
});
console.log(response.choices[0].message.content);
curl -X POST https://api.modelhive.ai/v1/chat/completions \
-H "Authorization: Bearer sk-your-modelhive-key" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain quantum computing in simple terms."}
],
"temperature": 0.7,
"max_tokens": 500
}'
Example — Streaming
- Python
- JavaScript
stream = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Write a poem about AI."}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
const stream = await client.chat.completions.create({
model: 'gpt-4o',
messages: [{ role: 'user', content: 'Write a poem about AI.' }],
stream: true,
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content || '');
}
Example — With Image
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{
"type": "image_url",
"image_url": {"url": "https://example.com/photo.jpg"}
}
]
}
],
max_tokens=500
)
Example — With Base64 Image
import base64
with open("screenshot.png", "rb") as f:
b64 = base64.b64encode(f.read()).decode("utf-8")
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "Describe this screenshot."},
{
"type": "image_url",
"image_url": {"url": f"data:image/png;base64,{b64}"}
}
]
}
]
)
Example — With PDF
import base64
with open("report.pdf", "rb") as f:
b64 = base64.b64encode(f.read()).decode("utf-8")
response = client.chat.completions.create(
model="gemini/gemini-2.5-pro",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "Summarize this report."},
{
"type": "image_url",
"image_url": {"url": f"data:application/pdf;base64,{b64}"}
}
]
}
]
)
Example — Function Calling
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "What's the weather in Rome?"}],
tools=[
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather for a city",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name"}
},
"required": ["city"]
}
}
}
],
tool_choice="auto"
)
# The model may return a tool_call:
tool_calls = response.choices[0].message.tool_calls
if tool_calls:
print(f"Function: {tool_calls[0].function.name}")
print(f"Arguments: {tool_calls[0].function.arguments}")
Example — JSON Mode
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "Respond in JSON format."},
{"role": "user", "content": "List 3 European capitals with population."}
],
response_format={"type": "json_object"}
)
import json
data = json.loads(response.choices[0].message.content)
print(data)
Response
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1709000000,
"model": "gpt-4o",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Quantum computing uses quantum mechanics..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 25,
"completion_tokens": 150,
"total_tokens": 175
}
}
Special Headers
| Header | Value | Description |
|---|---|---|
x-hive-cache | false | Skip HiveCache lookup for this request (response is still cached for future hits) |
x-hive-guard | none | Disable all security guardrails for this request |
x-hive-guard | prompt-injection,toxicity | Run only the listed guardrails (comma-separated) for this request |
List Models
GET /v1/models
List all models available to your API key.
Example
- Python
- cURL
models = client.models.list()
for model in models.data:
print(model.id)
curl https://api.modelhive.ai/v1/models \
-H "Authorization: Bearer sk-your-modelhive-key"
Response
{
"object": "list",
"data": [
{
"id": "gpt-4o",
"object": "model",
"owned_by": "openai"
},
{
"id": "claude-sonnet-4-20250514",
"object": "model",
"owned_by": "anthropic"
},
{
"id": "gemini/gemini-2.5-pro",
"object": "model",
"owned_by": "google"
}
]
}
The list of models depends on which models your tenant administrator has enabled for your organization.
Embeddings
POST /v1/embeddings
Generate vector embeddings for text input.
Request Body
| Parameter | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Embedding model ID |
input | string/array | Yes | Text to embed (string or array of strings) |
dimensions | integer | Optional output vector size (supported by specific models) | |
encoding_format | string | float (default) or base64 | |
user | string | End-user identifier for tracing/abuse monitoring |
Example — Basic
- Python
- JavaScript
- cURL
response = client.embeddings.create(
model="text-embedding-3-small",
input="ModelHive is an AI gateway platform."
)
print(f"Dimensions: {len(response.data[0].embedding)}")
const response = await client.embeddings.create({
model: 'text-embedding-3-small',
input: 'ModelHive is an AI gateway platform.',
});
console.log(response.data[0].embedding.length);
curl -X POST https://api.modelhive.ai/v1/embeddings \
-H "Authorization: Bearer sk-your-modelhive-key" \
-H "Content-Type: application/json" \
-d '{
"model": "text-embedding-3-small",
"input": "ModelHive is an AI gateway platform."
}'
Example — Batch Input
response = client.embeddings.create(
model="text-embedding-3-small",
input=[
"ModelHive routes requests across providers.",
"Embeddings are useful for semantic search."
],
encoding_format="float"
)
for item in response.data:
print(item.index, len(item.embedding))
Response
{
"object": "list",
"data": [
{
"object": "embedding",
"index": 0,
"embedding": [0.0023, -0.0091, 0.0152, ...]
}
],
"model": "text-embedding-3-small",
"usage": {
"prompt_tokens": 8,
"total_tokens": 8
}
}
Image Generations
POST /v1/images/generations
Generate one or more images from a text prompt.
Request Body
| Parameter | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Image model ID (e.g., gpt-image-1, dall-e-3) |
prompt | string | Yes | Natural language prompt describing the image |
n | integer | Number of images to generate | |
size | string | Output size (e.g., 1024x1024, model-dependent) | |
quality | string | Quality preset (model-dependent) | |
response_format | string | url or b64_json | |
style | string | Style preset where supported (e.g., vivid, natural) | |
user | string | End-user identifier for tracing/abuse monitoring |
Example
- Python
- JavaScript
- cURL
image = client.images.generate(
model="gpt-image-1",
prompt="A futuristic city at sunrise, cinematic light",
size="1024x1024",
quality="high"
)
print(image.data[0].url)
const image = await client.images.generate({
model: 'gpt-image-1',
prompt: 'A futuristic city at sunrise, cinematic light',
size: '1024x1024',
quality: 'high',
});
console.log(image.data[0].url);
curl -X POST https://api.modelhive.ai/v1/images/generations \
-H "Authorization: Bearer sk-your-modelhive-key" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-image-1",
"prompt": "A futuristic city at sunrise, cinematic light",
"size": "1024x1024",
"quality": "high"
}'
Response
{
"created": 1709000000,
"data": [
{
"url": "https://cdn.example.com/generated/image-001.png"
}
]
}
Video Generations
POST /v1/videos/generations
Create an asynchronous video generation job from a prompt.
Many OpenAI-compatible deployments expose the same operation on POST /v1/videos. If /v1/videos/generations is not available in your runtime, use POST /v1/videos with the same payload.
Request Body
| Parameter | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Video model ID (e.g., sora-2, provider aliases) |
prompt | string | Yes | Prompt describing the video to generate |
seconds | string/integer | Video duration (provider/model dependent) | |
size | string | Resolution (e.g., 720x1280) | |
input_reference | string/file | Optional reference image for image-to-video/editing flows | |
user | string | End-user identifier for tracing/abuse monitoring |
Example — Create Job
- Python
- JavaScript
- cURL
import requests
response = requests.post(
"https://api.modelhive.ai/v1/videos/generations",
headers={
"Authorization": "Bearer sk-your-modelhive-key",
"Content-Type": "application/json"
},
json={
"model": "sora-2",
"prompt": "A cinematic drone shot over snowy mountains",
"seconds": "8",
"size": "720x1280"
},
timeout=120
)
video = response.json()
print(video["id"], video["status"])
const response = await fetch('https://api.modelhive.ai/v1/videos/generations', {
method: 'POST',
headers: {
Authorization: 'Bearer sk-your-modelhive-key',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'sora-2',
prompt: 'A cinematic drone shot over snowy mountains',
seconds: '8',
size: '720x1280',
}),
});
const video = await response.json();
console.log(video.id, video.status);
curl -X POST https://api.modelhive.ai/v1/videos/generations \
-H "Authorization: Bearer sk-your-modelhive-key" \
-H "Content-Type: application/json" \
-d '{
"model": "sora-2",
"prompt": "A cinematic drone shot over snowy mountains",
"seconds": "8",
"size": "720x1280"
}'
Example — Check Status
curl https://api.modelhive.ai/v1/videos/video_abc123 \
-H "Authorization: Bearer sk-your-modelhive-key"
Response
{
"id": "video_abc123",
"object": "video",
"status": "queued",
"created_at": 1709000000,
"model": "sora-2",
"seconds": "8",
"size": "720x1280"
}
Audio Transcriptions
POST /v1/audio/transcriptions
Convert an audio file into text.
Request Body (multipart/form-data)
| Parameter | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Speech-to-text model ID (e.g., whisper-1, provider aliases) |
file | file | Yes | Audio file to transcribe |
language | string | Language hint (ISO code, improves accuracy/latency) | |
prompt | string | Optional context prompt to bias transcription | |
response_format | string | json, text, srt, vtt, or verbose_json | |
temperature | number | Sampling temperature (usually keep low for STT) |
Example
- Python
- JavaScript
- cURL
audio_file = open("meeting.mp3", "rb")
transcript = client.audio.transcriptions.create(
model="whisper-1",
file=audio_file
)
print(transcript.text)
import fs from 'fs';
const transcript = await client.audio.transcriptions.create({
model: 'whisper-1',
file: fs.createReadStream('meeting.mp3'),
});
console.log(transcript.text);
curl -X POST https://api.modelhive.ai/v1/audio/transcriptions \
-H "Authorization: Bearer sk-your-modelhive-key" \
-F file="@meeting.mp3" \
-F model="whisper-1"
Response
{
"text": "Welcome everyone. Today we will review the Q2 roadmap..."
}
Audio Speech
POST /v1/audio/speech
Convert text to spoken audio.
Request Body
| Parameter | Type | Required | Description |
|---|---|---|---|
model | string | Yes | TTS model ID (e.g., tts-1, provider aliases) |
input | string | Yes | Text to synthesize |
voice | string | Yes | Voice preset (e.g., alloy, nova) |
response_format | string | mp3, wav, opus, aac, flac, pcm | |
speed | number | Speaking speed (model-dependent range) |
Example
- Python
- JavaScript
- cURL
speech = client.audio.speech.create(
model="tts-1",
voice="alloy",
input="ModelHive routes AI traffic across multiple providers."
)
speech.stream_to_file("speech.mp3")
import fs from 'fs/promises';
const speech = await client.audio.speech.create({
model: 'tts-1',
voice: 'alloy',
input: 'ModelHive routes AI traffic across multiple providers.',
});
const buffer = Buffer.from(await speech.arrayBuffer());
await fs.writeFile('speech.mp3', buffer);
curl -X POST https://api.modelhive.ai/v1/audio/speech \
-H "Authorization: Bearer sk-your-modelhive-key" \
-H "Content-Type: application/json" \
-d '{
"model": "tts-1",
"voice": "alloy",
"input": "ModelHive routes AI traffic across multiple providers."
}' \
--output speech.mp3
Response
Returns binary audio data in the selected response_format (default typically mp3).
Error Codes
| HTTP Code | Meaning |
|---|---|
400 | Bad request — invalid parameters |
401 | Invalid or missing API key |
402 | Insufficient budget — recharge your key or wallet |
403 | Request blocked by security guardrails |
404 | Model not found or not enabled for your tenant |
429 | Rate limit exceeded — try again shortly |
500 | Internal server error |
Error Response Format
{
"error": {
"message": "Insufficient budget. Remaining: $0.12, estimated cost: $0.50",
"type": "budget_exceeded",
"code": 402
}
}
Rate Limits
Rate limits are applied per API key. If you hit rate limits, the response includes:
| Header | Description |
|---|---|
x-ratelimit-limit-requests | Max requests per minute |
x-ratelimit-remaining-requests | Remaining requests |
x-ratelimit-reset-requests | Time until reset |