Getting Started

Prerequisites

Before you begin, ensure you have:

  • Docker and Docker Compose installed
  • NVIDIA GPU with CUDA support (for GPU acceleration)
  • NVIDIA Docker runtime configured

Installation

1. Clone the Repository

1
2
git clone https://github.com/Velesio/Velesio-aiserver.git
cd Velesio-aiserver

2. Environment Configuration

Copy the example environment file and configure it:

1
cp .env.example .env

Edit the .env file with your settings:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# LLAMACPP Server Startup Command
STARTUP_COMMAND=./undreamai_server --model /app/data/models/text/model.gguf --host 0.0.0.0 --port 1337 --gpu-layers 37 --template chatml

#Connectivity
REMOTE=true # false does not connect llamacpp server to api
REDIS_HOST=redis
REDIS_PASS=secure_redis_pass
API_TOKENS=secure_token,secure_token2

#UndreamAI Server Settings
MODEL_URL=https://huggingface.co/Qwen/Qwen2.5-3B-Instruct-GGUF/resolve/main/qwen2.5-3b-instruct-q8_0.gguf
LLAMA_SERVER_URL=http://localhost:1337


#Stable Diffusion Settings
RUN_SD=true
SD_MODEL_URL=https://civitai.com/api/download/models/128713?type=Model&format=SafeTensor&size=pruned&fp=fp16
LORA_URL=https://civitai.com/api/download/models/110115?type=Model&format=SafeTensor
VAE_URL=https://huggingface.co/stabilityai/sd-vae-ft-mse-original/resolve/main/vae-ft-mse-840000-ema-pruned.safetensors

3. (Optional) Self Hosted Setup Configuration

The server binaries are included in the public images, but if you are rebuilding the image you should include them, you can do so with.

1
2
# Setup LLM binary and models
cd gpu && ./data/llama/server_setup.sh

The system will automatically download model from MODEL_RULs on first run, you can also optionally place models in:

  • LLamacpp models: gpu/data/models/text/model.gguf
  • SD models: gpu/data/models/image/models/

4. Run

1
2
3
4
5
6
7
8
9
10
11
# Start all services
docker-compose up -d

# (Optional) Rebuild images
docker-compose up --build -d 

# Check service status
docker-compose ps

# View logs
docker-compose logs -f

First API Call

Test your installation with a simple API call:

1
2
3
4
5
6
7
8
curl -X POST http://localhost:8000/completion \
  -H "Authorization: Bearer secure_token" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Explain quantum computing in simple terms:",
    "max_tokens": 100,
    "temperature": 0.7
  }'

Expected response:

1
2
3
4
5
6
7
8
9
10
11
12
13
{
  "choices": [
    {
      "text": "Quantum computing is a revolutionary approach to computation...",
      "finish_reason": "length"
    }
  ],
  "usage": {
    "prompt_tokens": 8,
    "completion_tokens": 100,
    "total_tokens": 108
  }
}

Service Access

Once running, you can access:

Service URL Credentials
API Documentation http://localhost:8000/docs Bearer token required
LLamaCPP / UndreamAI Server http://localhost:1337 None
Stable Diffusion WebUI http://localhost:7860 None
Grafana Dashboard http://localhost:3000 admin/admin
Prometheus Metrics http://localhost:9090 None
Redis localhost:6379 None

Verification Checklist

Docker containers running

1
2
docker-compose ps
# Should show: api, redis, Velesio-gpu all running

API responds to health check

1
2
curl http://localhost:8000/health
# Should return: {"status": "healthy"}

Models loaded successfully

1
2
docker-compose logs velesio-gpu | grep -i "model"
# Should show model loading messages

Redis queue operational

1
2
docker-compose logs redis
# Should show Redis server ready messages

Next Steps

Troubleshooting

Common Issues:

  • GPU not detected: Ensure NVIDIA Docker runtime is installed
  • Model download fails: Check internet connection and disk space
  • API returns 401: Verify API_TOKENS environment variable
  • Out of memory: Reduce GPU_LAYERS or use smaller models

See the Troubleshooting Guide for detailed solutions.