Home
A microservice-based AI inference server with Redis queue-based worker architecture
Velesio AI Server
A high-performance, microservice-based AI inference server designed for scalable LLM and Stable Diffusion workloads.
Overview
Velesio AI Server is a production-ready AI inference platform that provides:
- LLM Text Generation via custom llama.cpp integration
- Stable Diffusion Image Generation with WebUI support
- Redis Queue Architecture for scalable job processing
- Docker-based Deployment with GPU acceleration
- Built-in Monitoring with Grafana and Prometheus
- Unity Integration ready endpoints
Architecture
1
2
3
4
5
6
7
8
9
βββββββββββββββ βββββββββββ βββββββββββββββ
β API ββββββ Redis ββββββ GPU Workers β
β (FastAPI) β β Queue β β (LLM + SD) β
βββββββββββββββ βββββββββββ βββββββββββββββ
β β
β βββββββββββββββ β
βββββββββββββ Monitoring ββββββββββ
β(Grafana+Prom)β
βββββββββββββββ
Key Features
π High Performance
- Custom llama.cpp binary (
undreamai_server
) for optimized inference - GPU acceleration with CUDA support
- Asynchronous job processing via Redis Queue
π§ Easy Setup
- Docker Compose deployment
- Automatic model downloading
- Pre-configured monitoring stack
π― Unity Ready
- Compatible with βLLM for Unityβ asset
- Base64 image encoding for seamless integration
- Standardized API endpoints
π Production Monitoring
- Real-time metrics with Prometheus
- Visual dashboards in Grafana
- Redis queue monitoring
- GPU utilization tracking
Quick Start
- Clone and Configure
1 2 3
git clone https://github.com/Velesio/Velesio-aiserver.git cd Velesio-aiserver cp .env.example .env
- Set API Tokens
1 2
# Edit .env file API_TOKENS=your-secret-token-here
- Deploy
1
docker-compose up -d --build
- Test API
1 2 3 4
curl -X POST http://localhost:8000/completion \ -H "Authorization: Bearer your-secret-token-here" \ -H "Content-Type: application/json" \ -d '{"prompt": "Hello, world!", "max_tokens": 50}'
Services
Service | Port | Description |
---|---|---|
API | 8000 | FastAPI web server |
Redis | 6379 | Message queue |
LLM Worker | 1337 | Direct LLM access (when REMOTE=false) |
Stable Diffusion | 7860 | WebUI interface (when RUN_SD=true) |
Grafana | 3000 | Monitoring dashboard |
Prometheus | 9090 | Metrics collection |
Next Steps
- Getting Started Guide - Detailed setup instructions
- Architecture Overview - Deep dive into system design
- API Reference - Complete endpoint documentation
- Deployment Guide - Production deployment strategies
- Troubleshooting - Common issues and solutions
Need help? Check our troubleshooting guide or open an issue on GitHub.