Getting Started
Prerequisites
Before you begin, ensure you have:
- Docker and Docker Compose installed
- NVIDIA GPU with CUDA support (for GPU acceleration)
- NVIDIA Docker runtime configured
Installation
1. Clone the Repository
1
2
git clone https://github.com/Velesio/Velesio-aiserver.git
cd Velesio-aiserver
2. Environment Configuration
Copy the example environment file and configure it:
1
cp .env.example .env
Edit the .env
file with your settings:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# LLAMACPP Server Startup Command
STARTUP_COMMAND=./undreamai_server --model /app/data/models/text/model.gguf --host 0.0.0.0 --port 1337 --gpu-layers 37 --template chatml
#Connectivity
REMOTE=true # false does not connect llamacpp server to api
REDIS_HOST=redis
REDIS_PASS=secure_redis_pass
API_TOKENS=secure_token,secure_token2
#UndreamAI Server Settings
MODEL_URL=https://huggingface.co/Qwen/Qwen2.5-3B-Instruct-GGUF/resolve/main/qwen2.5-3b-instruct-q8_0.gguf
LLAMA_SERVER_URL=http://localhost:1337
#Stable Diffusion Settings
RUN_SD=true
SD_MODEL_URL=https://civitai.com/api/download/models/128713?type=Model&format=SafeTensor&size=pruned&fp=fp16
LORA_URL=https://civitai.com/api/download/models/110115?type=Model&format=SafeTensor
VAE_URL=https://huggingface.co/stabilityai/sd-vae-ft-mse-original/resolve/main/vae-ft-mse-840000-ema-pruned.safetensors
3. (Optional) Self Hosted Setup Configuration
The server binaries are included in the public images, but if you are rebuilding the image you should include them, you can do so with.
1
2
# Setup LLM binary and models
cd gpu && ./data/llama/server_setup.sh
The system will automatically download model from MODEL_RULs on first run, you can also optionally place models in:
- LLamacpp models:
gpu/data/models/text/model.gguf
- SD models:
gpu/data/models/image/models/
4. Run
1
2
3
4
5
6
7
8
9
10
11
# Start all services
docker-compose up -d
# (Optional) Rebuild images
docker-compose up --build -d
# Check service status
docker-compose ps
# View logs
docker-compose logs -f
First API Call
Test your installation with a simple API call:
1
2
3
4
5
6
7
8
curl -X POST http://localhost:8000/completion \
-H "Authorization: Bearer secure_token" \
-H "Content-Type: application/json" \
-d '{
"prompt": "Explain quantum computing in simple terms:",
"max_tokens": 100,
"temperature": 0.7
}'
Expected response:
1
2
3
4
5
6
7
8
9
10
11
12
13
{
"choices": [
{
"text": "Quantum computing is a revolutionary approach to computation...",
"finish_reason": "length"
}
],
"usage": {
"prompt_tokens": 8,
"completion_tokens": 100,
"total_tokens": 108
}
}
Service Access
Once running, you can access:
Service | URL | Credentials |
---|---|---|
API Documentation | http://localhost:8000/docs | Bearer token required |
LLamaCPP / UndreamAI Server | http://localhost:1337 | None |
Stable Diffusion WebUI | http://localhost:7860 | None |
Grafana Dashboard | http://localhost:3000 | admin/admin |
Prometheus Metrics | http://localhost:9090 | None |
Redis | localhost:6379 | None |
Verification Checklist
✅ Docker containers running
1
2
docker-compose ps
# Should show: api, redis, Velesio-gpu all running
✅ API responds to health check
1
2
curl http://localhost:8000/health
# Should return: {"status": "healthy"}
✅ Models loaded successfully
1
2
docker-compose logs velesio-gpu | grep -i "model"
# Should show model loading messages
✅ Redis queue operational
1
2
docker-compose logs redis
# Should show Redis server ready messages
Next Steps
- Check out the Unity Integrations section for various integrations!
- Architecture Overview - Understand the system design
- API Reference - Explore all available endpoints
- Deployment Guide - Production deployment strategies
Troubleshooting
Common Issues:
- GPU not detected: Ensure NVIDIA Docker runtime is installed
- Model download fails: Check internet connection and disk space
- API returns 401: Verify
API_TOKENS
environment variable - Out of memory: Reduce
GPU_LAYERS
or use smaller models
See the Troubleshooting Guide for detailed solutions.