Getting Started

Prerequisites

Before you begin, ensure you have:

Docker and Docker Compose installed
NVIDIA GPU with CUDA support (for GPU acceleration)
NVIDIA Docker runtime configured

Installation

1. Clone the Repository

git clone https://github.com/Velesio/Velesio-aiserver.git
cd Velesio-aiserver

2. Environment Configuration

Copy the example environment file and configure it:

cp .env.example .env

Edit the .env file with your settings:

# LLAMACPP Server Startup Command
STARTUP_COMMAND=./undreamai_server --model /app/data/models/text/model.gguf --host 0.0.0.0 --port 1337 --gpu-layers 37 --template chatml

#Connectivity
REMOTE=true # false does not connect llamacpp server to api
REDIS_HOST=redis
REDIS_PASS=secure_redis_pass
API_TOKENS=secure_token,secure_token2

#UndreamAI Server Settings
MODEL_URL=https://huggingface.co/Qwen/Qwen2.5-3B-Instruct-GGUF/resolve/main/qwen2.5-3b-instruct-q8_0.gguf
LLAMA_SERVER_URL=http://localhost:1337


#Stable Diffusion Settings
RUN_SD=true
SD_MODEL_URL=https://civitai.com/api/download/models/128713?type=Model&format=SafeTensor&size=pruned&fp=fp16
LORA_URL=https://civitai.com/api/download/models/110115?type=Model&format=SafeTensor
VAE_URL=https://huggingface.co/stabilityai/sd-vae-ft-mse-original/resolve/main/vae-ft-mse-840000-ema-pruned.safetensors

3. (Optional) Self Hosted Setup Configuration

The server binaries are included in the public images, but if you are rebuilding the image you should include them, you can do so with.

# Setup LLM binary and models
cd gpu && ./data/llama/server_setup.sh

The system will automatically download model from MODEL_RULs on first run, you can also optionally place models in:

LLamacpp models: gpu/data/models/text/model.gguf
SD models: gpu/data/models/image/models/

4. Run

# Start all services
docker-compose up -d

# (Optional) Rebuild images
docker-compose up --build -d 

# Check service status
docker-compose ps

# View logs
docker-compose logs -f

First API Call

Test your installation with a simple API call:

curl -X POST http://localhost:8000/completion \
  -H "Authorization: Bearer secure_token" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Explain quantum computing in simple terms:",
    "max_tokens": 100,
    "temperature": 0.7
  }'

Expected response:

{
  "choices": [
    {
      "text": "Quantum computing is a revolutionary approach to computation...",
      "finish_reason": "length"
    }
  ],
  "usage": {
    "prompt_tokens": 8,
    "completion_tokens": 100,
    "total_tokens": 108
  }
}

Service Access

Once running, you can access:

Service	URL	Credentials
API Documentation	http://localhost:8000/docs	Bearer token required
LLamaCPP / UndreamAI Server	http://localhost:1337	None
Stable Diffusion WebUI	http://localhost:7860	None
Grafana Dashboard	http://localhost:3000	admin/admin
Prometheus Metrics	http://localhost:9090	None
Redis	localhost:6379	None

Verification Checklist

✅ Docker containers running

docker-compose ps
# Should show: api, redis, Velesio-gpu all running

✅ API responds to health check

curl http://localhost:8000/health
# Should return: {"status": "healthy"}

✅ Models loaded successfully

docker-compose logs velesio-gpu | grep -i "model"
# Should show model loading messages

✅ Redis queue operational

docker-compose logs redis
# Should show Redis server ready messages

Next Steps

Check out the Unity Integrations section for various integrations!
Architecture Overview - Understand the system design
API Reference - Explore all available endpoints
Deployment Guide - Production deployment strategies

Troubleshooting

Common Issues:

GPU not detected: Ensure NVIDIA Docker runtime is installed
Model download fails: Check internet connection and disk space
API returns 401: Verify API_TOKENS environment variable
Out of memory: Reduce GPU_LAYERS or use smaller models

See the Troubleshooting Guide for detailed solutions.