Home

A microservice-based AI inference server with Redis queue-based worker architecture

Velesio AI Server

A high-performance, microservice-based AI inference server designed for scalable LLM and Stable Diffusion workloads.

Overview

Velesio AI Server is a production-ready AI inference platform that provides:

  • LLM Text Generation via custom llama.cpp integration
  • Stable Diffusion Image Generation with WebUI support
  • Redis Queue Architecture for scalable job processing
  • Docker-based Deployment with GPU acceleration
  • Built-in Monitoring with Grafana and Prometheus
  • Unity Integration ready endpoints

Architecture

1
2
3
4
5
6
7
8
9
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚    API      │────│  Redis  │────│ GPU Workers β”‚
β”‚  (FastAPI)  β”‚    β”‚ Queue   β”‚    β”‚ (LLM + SD)  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚                                  β”‚
       β”‚           β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”        β”‚
       └───────────│ Monitoring  β”‚β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                   β”‚(Grafana+Prom)β”‚
                   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Key Features

πŸš€ High Performance

  • Custom llama.cpp binary (undreamai_server) for optimized inference
  • GPU acceleration with CUDA support
  • Asynchronous job processing via Redis Queue

πŸ”§ Easy Setup

  • Docker Compose deployment
  • Automatic model downloading
  • Pre-configured monitoring stack

🎯 Unity Ready

  • Compatible with β€œLLM for Unity” asset
  • Base64 image encoding for seamless integration
  • Standardized API endpoints

πŸ“Š Production Monitoring

  • Real-time metrics with Prometheus
  • Visual dashboards in Grafana
  • Redis queue monitoring
  • GPU utilization tracking

Quick Start

  1. Clone and Configure
    1
    2
    3
    
    git clone https://github.com/Velesio/Velesio-aiserver.git
    cd Velesio-aiserver
    cp .env.example .env
    
  2. Set API Tokens
    1
    2
    
    # Edit .env file
    API_TOKENS=your-secret-token-here
    
  3. Deploy
    1
    
    docker-compose up -d --build
    
  4. Test API
    1
    2
    3
    4
    
    curl -X POST http://localhost:8000/completion \
      -H "Authorization: Bearer your-secret-token-here" \
      -H "Content-Type: application/json" \
      -d '{"prompt": "Hello, world!", "max_tokens": 50}'
    

Services

Service Port Description
API 8000 FastAPI web server
Redis 6379 Message queue
LLM Worker 1337 Direct LLM access (when REMOTE=false)
Stable Diffusion 7860 WebUI interface (when RUN_SD=true)
Grafana 3000 Monitoring dashboard
Prometheus 9090 Metrics collection

Next Steps


Need help? Check our troubleshooting guide or open an issue on GitHub.