Components
Components
Velesio AI Server is built with a modular, component-based architecture where each component can be deployed independently or together based on your infrastructure needs.
Available Components
🌐 API Service
api/FastAPI-based web server that handles HTTP requests, authentication, and job queuing to GPU workers.
🎮 GPU Workers
gpu/Specialized containers that handle AI inference tasks using NVIDIA GPUs for both LLM and Stable Diffusion workloads.
📊 Monitoring Stack
monitoring/Comprehensive observability solution with Grafana dashboards, Prometheus metrics, and centralized logging.
Deployment Flexibility
Each component can be deployed in multiple ways:
🏢 All-in-One Deployment
Deploy all components on a single server for a complete, integrated solution:
1
2
docker-compose up -d # API + GPU Workers + Redis
cd monitoring && docker-compose up -d # Optional monitoring
🌐 Distributed Deployment
Deploy components across multiple servers for scalability:
1
2
3
4
5
6
7
8
# Server 1: API + Redis
docker-compose -f docker-compose.api.yml up -d
# Server 2-N: GPU Workers
docker-compose -f docker-compose.gpu.yml up -d
# Monitoring Server: Centralized observability
cd monitoring && docker-compose up -d
🎯 Component-Specific Deployment
Deploy only the components you need:
- API-only: For serverless GPU workers or external inference services
- GPU-only: For dedicated inference workers connecting to remote APIs
- Monitoring-only: For centralized observability across multiple deployments
Component Communication
1
2
3
4
5
6
7
8
9
┌─────────────┐ ┌─────────┐ ┌─────────────┐
│ API Service │────│ Redis │────│ GPU Workers │
│ (FastAPI) │ │ Queue │ │ (LLM + SD) │
└─────────────┘ └─────────┘ └─────────────┘
│ │
│ ┌─────────────┐ │
└───────────│ Monitoring │────────┘
│ Stack │
└─────────────┘
- API Service receives HTTP requests and queues jobs
- Redis serves as the message broker between API and workers
- GPU Workers pull jobs from the queue and process AI inference
- Monitoring Stack observes all components and provides metrics/alerts
Each component is designed to be:
- Independently deployable
- Horizontally scalable
- Self-contained with minimal dependencies
- Observable through the monitoring stack