API Reference

API Reference

Complete documentation for all Velesio AI Server endpoints.

Authentication

All API endpoints require authentication using a Bearer token in the Authorization header:

1
Authorization: Bearer your-api-token-here

Configure tokens in the .env file:

1
API_TOKENS=token1,token2,token3

Base URL

1
http://localhost:8000

For production deployments, replace with your actual domain.

Text Generation Endpoints

POST /completion

Generate text completion using the LLM model.

Request Body:

1
2
3
4
5
6
7
8
9
10
11
{
  "prompt": "string",
  "max_tokens": 150,
  "temperature": 0.7,
  "top_p": 0.9,
  "top_k": 40,
  "frequency_penalty": 0.0,
  "presence_penalty": 0.0,
  "stop": ["string"],
  "stream": false
}

Parameters:

Parameter Type Default Description
prompt string Required Input text to complete
max_tokens integer 150 Maximum tokens to generate
temperature float 0.7 Sampling temperature (0.0-2.0)
top_p float 0.9 Nucleus sampling threshold
top_k integer 40 Top-k sampling limit
frequency_penalty float 0.0 Frequency penalty (-2.0 to 2.0)
presence_penalty float 0.0 Presence penalty (-2.0 to 2.0)
stop array null Stop sequences
stream boolean false Enable streaming response

Response:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
{
  "id": "cmpl-abc123",
  "object": "text_completion",
  "created": 1677652288,
  "model": "gpt-3.5-turbo",
  "choices": [
    {
      "text": "Generated text continues here...",
      "index": 0,
      "logprobs": null,
      "finish_reason": "length"
    }
  ],
  "usage": {
    "prompt_tokens": 5,
    "completion_tokens": 150,
    "total_tokens": 155
  }
}

Example:

1
2
3
4
5
6
7
8
curl -X POST http://localhost:8000/completion \
  -H "Authorization: Bearer your-token" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Explain quantum computing:",
    "max_tokens": 100,
    "temperature": 0.7
  }'

POST /chat/completions

Chat completions with conversation history support.

Request Body:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
{
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user", 
      "content": "Hello, how are you?"
    }
  ],
  "max_tokens": 150,
  "temperature": 0.7,
  "stream": false
}

Parameters:

Parameter Type Default Description
messages array Required Conversation messages
max_tokens integer 150 Maximum tokens to generate
temperature float 0.7 Sampling temperature
stream boolean false Enable streaming

Message Format:

1
2
3
4
{
  "role": "user|assistant|system",
  "content": "message content"
}

Response:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "gpt-3.5-turbo",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! I'm doing well, thank you for asking."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 11,
    "total_tokens": 23
  }
}

Image Generation Endpoints

POST /sdapi/v1/txt2img

Generate images from text prompts using Stable Diffusion.

Request Body:

1
2
3
4
5
6
7
8
9
10
11
12
{
  "prompt": "a beautiful landscape with mountains",
  "negative_prompt": "blurry, low quality",
  "width": 512,
  "height": 512,
  "steps": 20,
  "cfg_scale": 7.5,
  "sampler_name": "Euler a",
  "seed": -1,
  "batch_size": 1,
  "n_iter": 1
}

Parameters:

Parameter Type Default Description
prompt string Required Text description of desired image
negative_prompt string ”” What to avoid in the image
width integer 512 Image width in pixels
height integer 512 Image height in pixels
steps integer 20 Number of sampling steps
cfg_scale float 7.5 Classifier-free guidance scale
sampler_name string “Euler a” Sampling method
seed integer -1 Random seed (-1 for random)
batch_size integer 1 Number of images per batch
n_iter integer 1 Number of iterations

Response:

1
2
3
4
5
6
7
8
9
10
11
12
13
{
  "images": [
    "base64-encoded-image-data..."
  ],
  "parameters": {
    "prompt": "a beautiful landscape with mountains",
    "steps": 20,
    "seed": 1234567890,
    "width": 512,
    "height": 512
  },
  "info": "Additional generation info"
}

Example:

1
2
3
4
5
6
7
8
9
curl -X POST http://localhost:8000/sdapi/v1/txt2img \
  -H "Authorization: Bearer your-token" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "a futuristic city at sunset",
    "width": 768,
    "height": 512,
    "steps": 25
  }'

POST /sdapi/v1/img2img

Transform existing images using Stable Diffusion.

Request Body:

1
2
3
4
5
6
7
8
9
{
  "init_images": ["base64-encoded-image"],
  "prompt": "turn this into a painting",
  "denoising_strength": 0.75,
  "width": 512,
  "height": 512,
  "steps": 20,
  "cfg_scale": 7.5
}

Parameters:

Parameter Type Default Description
init_images array Required Base64 encoded input images
prompt string Required Transformation description
denoising_strength float 0.75 How much to change (0.0-1.0)
width integer 512 Output width
height integer 512 Output height
steps integer 20 Sampling steps
cfg_scale float 7.5 Guidance scale

Utility Endpoints

GET /health

Check service health status.

Response:

1
2
3
4
5
6
7
8
9
10
{
  "status": "healthy",
  "timestamp": "2024-01-15T10:30:00Z",
  "version": "1.0.0",
  "services": {
    "redis": "connected",
    "llm_worker": "ready",
    "sd_worker": "ready"
  }
}

GET /models

List available models.

Response:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
{
  "text_models": [
    {
      "name": "llama-2-7b-chat",
      "size": "7B",
      "type": "chat",
      "loaded": true
    }
  ],
  "image_models": [
    {
      "name": "stable-diffusion-v1-5",
      "type": "checkpoint",
      "loaded": true
    }
  ]
}

GET /queue/status

Check queue status and worker information.

Response:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
{
  "queue_depth": 3,
  "active_workers": 2,
  "pending_jobs": 1,
  "completed_jobs_24h": 150,
  "workers": [
    {
      "id": "worker-1",
      "type": "llm",
      "status": "busy",
      "current_job": "job-abc123"
    },
    {
      "id": "worker-2", 
      "type": "sd",
      "status": "idle"
    }
  ]
}

Streaming Responses

For real-time text generation, set stream: true in the request:

Example Streaming Request:

1
2
3
4
5
curl -X POST http://localhost:8000/completion \
  -H "Authorization: Bearer your-token" \
  -H "Content-Type: application/json" \
  -d '{"prompt": "Write a story:", "stream": true}' \
  --no-buffer

Streaming Response Format:

1
2
3
4
5
6
7
data: {"choices":[{"text":"Once","index":0}]}

data: {"choices":[{"text":" upon","index":0}]}

data: {"choices":[{"text":" a","index":0}]}

data: [DONE]

Error Responses

All endpoints return structured error responses:

1
2
3
4
5
6
7
{
  "error": {
    "code": "invalid_request",
    "message": "The request was invalid",
    "details": "Additional error context"
  }
}

Common Error Codes:

Code Status Description
unauthorized 401 Invalid or missing API token
invalid_request 400 Malformed request body
model_not_found 404 Requested model not available
queue_full 503 Job queue at capacity
internal_error 500 Server error

Rate Limiting

API endpoints are subject to rate limiting:

  • Default: 60 requests per minute per token
  • Burst: Up to 10 concurrent requests
  • Headers: Rate limit info included in response headers

Rate Limit Headers:

1
2
3
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 45
X-RateLimit-Reset: 1640995200

Unity Integration

For Unity developers using the “LLM for Unity” asset:

Configuration:

1
2
3
4
// In Unity LLM settings
API_URL = "http://your-server:8000"
API_KEY = "your-bearer-token"
MODEL = "completion" // Use the /completion endpoint

Example Unity Code:

1
2
3
4
5
6
7
8
9
10
11
12
13
using UnityEngine;
using LLMUnity;

public class AIChat : MonoBehaviour 
{
    public LLMCharacter llmCharacter;
    
    async void Start() 
    {
        string response = await llmCharacter.Chat("Hello, AI!");
        Debug.Log(response);
    }
}

SDKs and Libraries

Python SDK

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
import requests

class VelesioClient:
    def __init__(self, base_url, api_token):
        self.base_url = base_url
        self.headers = {"Authorization": f"Bearer {api_token}"}
    
    def complete(self, prompt, **kwargs):
        response = requests.post(
            f"{self.base_url}/completion",
            headers=self.headers,
            json={"prompt": prompt, **kwargs}
        )
        return response.json()

# Usage
client = VelesioClient("http://localhost:8000", "your-token")
result = client.complete("Explain AI:", max_tokens=100)

JavaScript/Node.js

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
class VelesioClient {
    constructor(baseUrl, apiToken) {
        this.baseUrl = baseUrl;
        this.headers = {
            'Authorization': `Bearer ${apiToken}`,
            'Content-Type': 'application/json'
        };
    }
    
    async complete(prompt, options = {}) {
        const response = await fetch(`${this.baseUrl}/completion`, {
            method: 'POST',
            headers: this.headers,
            body: JSON.stringify({ prompt, ...options })
        });
        return response.json();
    }
}

// Usage
const client = new VelesioClient('http://localhost:8000', 'your-token');
const result = await client.complete('Hello AI!');

Next Steps