Skip to main content

API Reference

ModelHive exposes an OpenAI-compatible API. If your code already works with the OpenAI SDK, it works with ModelHive — just change the base URL and API key.

Base URL

https://api.modelhive.ai/v1

Authentication

All requests require an API Key in the Authorization header:

Authorization: Bearer sk-your-modelhive-key

API keys are created from the ModelHive Dashboard. Each key has its own budget, optional model restrictions, and auto-recharge settings.

Endpoints

MethodEndpointDescription
POST/v1/chat/completionsGenerate a chat completion
GET/v1/modelsList available models
POST/v1/embeddingsGenerate text embeddings
POST/v1/images/generationsGenerate images from text prompts
POST/v1/videos/generationsCreate video generation jobs
POST/v1/audio/transcriptionsTranscribe audio to text
POST/v1/audio/speechConvert text to speech

Complete Endpoint Map

ModelHive exposes its API publicly on https://api.modelhive.ai and accepts standard Authorization: Bearer sk-... API keys.

Internal admin routes are blocked externally (/ui, /sso, /login). Public API routes are available with /v1 prefix.

Core LLM Endpoints

  • POST /v1/chat/completions
  • POST /v1/messages
  • POST /v1/responses
  • POST /v1/responses/compact
  • POST /v1/completions
  • GET /v1/models
  • POST /v1/embeddings
  • POST /v1/rerank
  • POST /v1/moderations
  • POST /v1/fine_tuning/jobs
  • GET|POST /v1/realtime

Media Endpoints

  • POST /v1/images/generations
  • POST /v1/images/edits
  • POST /v1/images/variations
  • POST /v1/audio/transcriptions
  • POST /v1/audio/speech
  • POST /v1/videos (or /v1/videos/generations compatibility route)
  • GET /v1/videos/{video_id}
  • GET /v1/videos/{video_id}/content
  • POST /v1/videos/{video_id}/remix

Data and Workflow Endpoints

  • POST /v1/batches
  • POST /v1/files
  • POST /v1/vector_stores
  • POST /v1/vector_stores/{id}/files
  • POST /v1/vector_stores/{id}/search

Agent and Utility Endpoints

  • POST /v1/assistants
  • POST /v1/a2a/{agent}/message/send
  • POST /v1/interactions
  • POST /v1/ocr
  • POST /v1/rag/ingest
  • POST /v1/rag/query
  • POST /v1/utils/token_counter
  • POST /v1/generateContent
  • POST /v1/containers
  • POST /v1/containers/{id}/files
info

Availability depends on model/provider support and tenant-level model permissions. If an endpoint is enabled but your model does not support it, the API returns an error (typically 400/404).

Resource-style endpoint families (for example files, vector_stores, assistants, containers, videos) also expose related GET/POST/DELETE sub-routes under the same prefix according to OpenAI compatibility.


Chat Completions

POST /v1/chat/completions

Generate a model response for the given conversation. This is the primary endpoint for all LLM interactions.

Request Body

ParameterTypeRequiredDescription
modelstringYesModel ID (e.g., gpt-4o, claude-sonnet-4-20250514, gemini/gemini-2.5-pro)
messagesarrayYesConversation messages. Each message has role and content. Content can be a string or an array of text/image_url parts for multimodal input
temperaturenumberSampling temperature (0–2). Default: 1
max_tokensintegerMaximum tokens to generate
top_pnumberNucleus sampling (0–1)
streambooleanStream response via SSE. Default: false
stopstring/arrayStop sequences
presence_penaltynumberPresence penalty (-2 to 2)
frequency_penaltynumberFrequency penalty (-2 to 2)
toolsarrayFunction/tool definitions
tool_choicestring/objectTool selection strategy
response_formatobjectForce structured output (e.g., {"type": "json_object"})

Message Format

{
"role": "user",
"content": "Hello, how are you?"
}

For multimodal requests (images, PDFs):

{
"role": "user",
"content": [
{"type": "text", "text": "Describe this image."},
{"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}}
]
}

Example — Basic

from openai import OpenAI

client = OpenAI(
api_key="sk-your-modelhive-key",
base_url="https://api.modelhive.ai/v1"
)

response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain quantum computing in simple terms."}
],
temperature=0.7,
max_tokens=500
)

print(response.choices[0].message.content)

Example — Streaming

stream = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Write a poem about AI."}],
stream=True
)

for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)

Example — With Image

response = client.chat.completions.create(
model="gpt-4o",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{
"type": "image_url",
"image_url": {"url": "https://example.com/photo.jpg"}
}
]
}
],
max_tokens=500
)

Example — With Base64 Image

import base64

with open("screenshot.png", "rb") as f:
b64 = base64.b64encode(f.read()).decode("utf-8")

response = client.chat.completions.create(
model="gpt-4o",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "Describe this screenshot."},
{
"type": "image_url",
"image_url": {"url": f"data:image/png;base64,{b64}"}
}
]
}
]
)

Example — With PDF

import base64

with open("report.pdf", "rb") as f:
b64 = base64.b64encode(f.read()).decode("utf-8")

response = client.chat.completions.create(
model="gemini/gemini-2.5-pro",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "Summarize this report."},
{
"type": "image_url",
"image_url": {"url": f"data:application/pdf;base64,{b64}"}
}
]
}
]
)

Example — Function Calling

response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "What's the weather in Rome?"}],
tools=[
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather for a city",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name"}
},
"required": ["city"]
}
}
}
],
tool_choice="auto"
)

# The model may return a tool_call:
tool_calls = response.choices[0].message.tool_calls
if tool_calls:
print(f"Function: {tool_calls[0].function.name}")
print(f"Arguments: {tool_calls[0].function.arguments}")

Example — JSON Mode

response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "Respond in JSON format."},
{"role": "user", "content": "List 3 European capitals with population."}
],
response_format={"type": "json_object"}
)

import json
data = json.loads(response.choices[0].message.content)
print(data)

Response

{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1709000000,
"model": "gpt-4o",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Quantum computing uses quantum mechanics..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 25,
"completion_tokens": 150,
"total_tokens": 175
}
}

Special Headers

HeaderValueDescription
x-hive-cachefalseSkip HiveCache lookup for this request (response is still cached for future hits)
x-hive-guardnoneDisable all security guardrails for this request
x-hive-guardprompt-injection,toxicityRun only the listed guardrails (comma-separated) for this request

List Models

GET /v1/models

List all models available to your API key.

Example

models = client.models.list()
for model in models.data:
print(model.id)

Response

{
"object": "list",
"data": [
{
"id": "gpt-4o",
"object": "model",
"owned_by": "openai"
},
{
"id": "claude-sonnet-4-20250514",
"object": "model",
"owned_by": "anthropic"
},
{
"id": "gemini/gemini-2.5-pro",
"object": "model",
"owned_by": "google"
}
]
}
info

The list of models depends on which models your tenant administrator has enabled for your organization.


Embeddings

POST /v1/embeddings

Generate vector embeddings for text input.

Request Body

ParameterTypeRequiredDescription
modelstringYesEmbedding model ID
inputstring/arrayYesText to embed (string or array of strings)
dimensionsintegerOptional output vector size (supported by specific models)
encoding_formatstringfloat (default) or base64
userstringEnd-user identifier for tracing/abuse monitoring

Example — Basic

response = client.embeddings.create(
model="text-embedding-3-small",
input="ModelHive is an AI gateway platform."
)

print(f"Dimensions: {len(response.data[0].embedding)}")

Example — Batch Input

response = client.embeddings.create(
model="text-embedding-3-small",
input=[
"ModelHive routes requests across providers.",
"Embeddings are useful for semantic search."
],
encoding_format="float"
)

for item in response.data:
print(item.index, len(item.embedding))

Response

{
"object": "list",
"data": [
{
"object": "embedding",
"index": 0,
"embedding": [0.0023, -0.0091, 0.0152, ...]
}
],
"model": "text-embedding-3-small",
"usage": {
"prompt_tokens": 8,
"total_tokens": 8
}
}

Image Generations

POST /v1/images/generations

Generate one or more images from a text prompt.

Request Body

ParameterTypeRequiredDescription
modelstringYesImage model ID (e.g., gpt-image-1, dall-e-3)
promptstringYesNatural language prompt describing the image
nintegerNumber of images to generate
sizestringOutput size (e.g., 1024x1024, model-dependent)
qualitystringQuality preset (model-dependent)
response_formatstringurl or b64_json
stylestringStyle preset where supported (e.g., vivid, natural)
userstringEnd-user identifier for tracing/abuse monitoring

Example

image = client.images.generate(
model="gpt-image-1",
prompt="A futuristic city at sunrise, cinematic light",
size="1024x1024",
quality="high"
)

print(image.data[0].url)

Response

{
"created": 1709000000,
"data": [
{
"url": "https://cdn.example.com/generated/image-001.png"
}
]
}

Video Generations

POST /v1/videos/generations

Create an asynchronous video generation job from a prompt.

info

Many OpenAI-compatible deployments expose the same operation on POST /v1/videos. If /v1/videos/generations is not available in your runtime, use POST /v1/videos with the same payload.

Request Body

ParameterTypeRequiredDescription
modelstringYesVideo model ID (e.g., sora-2, provider aliases)
promptstringYesPrompt describing the video to generate
secondsstring/integerVideo duration (provider/model dependent)
sizestringResolution (e.g., 720x1280)
input_referencestring/fileOptional reference image for image-to-video/editing flows
userstringEnd-user identifier for tracing/abuse monitoring

Example — Create Job

import requests

response = requests.post(
"https://api.modelhive.ai/v1/videos/generations",
headers={
"Authorization": "Bearer sk-your-modelhive-key",
"Content-Type": "application/json"
},
json={
"model": "sora-2",
"prompt": "A cinematic drone shot over snowy mountains",
"seconds": "8",
"size": "720x1280"
},
timeout=120
)

video = response.json()
print(video["id"], video["status"])

Example — Check Status

curl https://api.modelhive.ai/v1/videos/video_abc123 \
-H "Authorization: Bearer sk-your-modelhive-key"

Response

{
"id": "video_abc123",
"object": "video",
"status": "queued",
"created_at": 1709000000,
"model": "sora-2",
"seconds": "8",
"size": "720x1280"
}

Audio Transcriptions

POST /v1/audio/transcriptions

Convert an audio file into text.

Request Body (multipart/form-data)

ParameterTypeRequiredDescription
modelstringYesSpeech-to-text model ID (e.g., whisper-1, provider aliases)
filefileYesAudio file to transcribe
languagestringLanguage hint (ISO code, improves accuracy/latency)
promptstringOptional context prompt to bias transcription
response_formatstringjson, text, srt, vtt, or verbose_json
temperaturenumberSampling temperature (usually keep low for STT)

Example

audio_file = open("meeting.mp3", "rb")

transcript = client.audio.transcriptions.create(
model="whisper-1",
file=audio_file
)

print(transcript.text)

Response

{
"text": "Welcome everyone. Today we will review the Q2 roadmap..."
}

Audio Speech

POST /v1/audio/speech

Convert text to spoken audio.

Request Body

ParameterTypeRequiredDescription
modelstringYesTTS model ID (e.g., tts-1, provider aliases)
inputstringYesText to synthesize
voicestringYesVoice preset (e.g., alloy, nova)
response_formatstringmp3, wav, opus, aac, flac, pcm
speednumberSpeaking speed (model-dependent range)

Example

speech = client.audio.speech.create(
model="tts-1",
voice="alloy",
input="ModelHive routes AI traffic across multiple providers."
)

speech.stream_to_file("speech.mp3")

Response

Returns binary audio data in the selected response_format (default typically mp3).


Error Codes

HTTP CodeMeaning
400Bad request — invalid parameters
401Invalid or missing API key
402Insufficient budget — recharge your key or wallet
403Request blocked by security guardrails
404Model not found or not enabled for your tenant
429Rate limit exceeded — try again shortly
500Internal server error

Error Response Format

{
"error": {
"message": "Insufficient budget. Remaining: $0.12, estimated cost: $0.50",
"type": "budget_exceeded",
"code": 402
}
}

Rate Limits

Rate limits are applied per API key. If you hit rate limits, the response includes:

HeaderDescription
x-ratelimit-limit-requestsMax requests per minute
x-ratelimit-remaining-requestsRemaining requests
x-ratelimit-reset-requestsTime until reset