Components

Velesio AI Server is built with a modular, component-based architecture where each component can be deployed independently or together based on your infrastructure needs.

Available Components

🌐 API Service

api/

FastAPI-based web server that handles HTTP requests, authentication, and job queuing to GPU workers.

FastAPI Authentication Job Queuing

View Documentation →

🎮 GPU Workers

gpu/

Specialized containers that handle AI inference tasks using NVIDIA GPUs for both LLM and Stable Diffusion workloads.

CUDA LLM Inference Stable Diffusion

View Documentation →

📊 Monitoring Stack

monitoring/

Comprehensive observability solution with Grafana dashboards, Prometheus metrics, and centralized logging.

Grafana Prometheus Observability

View Documentation →

Deployment Flexibility

Each component can be deployed in multiple ways:

🏢 All-in-One Deployment

Deploy all components on a single server for a complete, integrated solution:

docker-compose up -d                    # API + GPU Workers + Redis
cd monitoring && docker-compose up -d   # Optional monitoring

🌐 Distributed Deployment

Deploy components across multiple servers for scalability:

# Server 1: API + Redis
docker-compose -f docker-compose.api.yml up -d

# Server 2-N: GPU Workers
docker-compose -f docker-compose.gpu.yml up -d

# Monitoring Server: Centralized observability
cd monitoring && docker-compose up -d

🎯 Component-Specific Deployment

Deploy only the components you need:

API-only: For serverless GPU workers or external inference services
GPU-only: For dedicated inference workers connecting to remote APIs
Monitoring-only: For centralized observability across multiple deployments

Component Communication

┌─────────────┐    ┌─────────┐    ┌─────────────┐
│ API Service │────│  Redis  │────│ GPU Workers │
│  (FastAPI)  │    │ Queue   │    │ (LLM + SD)  │
└─────────────┘    └─────────┘    └─────────────┘
       │                                  │
       │           ┌─────────────┐        │
       └───────────│ Monitoring  │────────┘
                   │   Stack     │
                   └─────────────┘

API Service receives HTTP requests and queues jobs
Redis serves as the message broker between API and workers
GPU Workers pull jobs from the queue and process AI inference
Monitoring Stack observes all components and provides metrics/alerts

Each component is designed to be:

Independently deployable
Horizontally scalable
Self-contained with minimal dependencies
Observable through the monitoring stack