Components

Components

Velesio AI Server is built with a modular, component-based architecture where each component can be deployed independently or together based on your infrastructure needs.

Available Components

🌐 API Service

api/

FastAPI-based web server that handles HTTP requests, authentication, and job queuing to GPU workers.

FastAPI Authentication Job Queuing
View Documentation →

🎮 GPU Workers

gpu/

Specialized containers that handle AI inference tasks using NVIDIA GPUs for both LLM and Stable Diffusion workloads.

CUDA LLM Inference Stable Diffusion
View Documentation →

📊 Monitoring Stack

monitoring/

Comprehensive observability solution with Grafana dashboards, Prometheus metrics, and centralized logging.

Grafana Prometheus Observability
View Documentation →

Deployment Flexibility

Each component can be deployed in multiple ways:

🏢 All-in-One Deployment

Deploy all components on a single server for a complete, integrated solution:

1
2
docker-compose up -d                    # API + GPU Workers + Redis
cd monitoring && docker-compose up -d   # Optional monitoring

🌐 Distributed Deployment

Deploy components across multiple servers for scalability:

1
2
3
4
5
6
7
8
# Server 1: API + Redis
docker-compose -f docker-compose.api.yml up -d

# Server 2-N: GPU Workers
docker-compose -f docker-compose.gpu.yml up -d

# Monitoring Server: Centralized observability
cd monitoring && docker-compose up -d

🎯 Component-Specific Deployment

Deploy only the components you need:

  • API-only: For serverless GPU workers or external inference services
  • GPU-only: For dedicated inference workers connecting to remote APIs
  • Monitoring-only: For centralized observability across multiple deployments

Component Communication

1
2
3
4
5
6
7
8
9
┌─────────────┐    ┌─────────┐    ┌─────────────┐
│ API Service │────│  Redis  │────│ GPU Workers │
│  (FastAPI)  │    │ Queue   │    │ (LLM + SD)  │
└─────────────┘    └─────────┘    └─────────────┘
       │                                  │
       │           ┌─────────────┐        │
       └───────────│ Monitoring  │────────┘
                   │   Stack     │
                   └─────────────┘
  • API Service receives HTTP requests and queues jobs
  • Redis serves as the message broker between API and workers
  • GPU Workers pull jobs from the queue and process AI inference
  • Monitoring Stack observes all components and provides metrics/alerts

Each component is designed to be:

  • Independently deployable
  • Horizontally scalable
  • Self-contained with minimal dependencies
  • Observable through the monitoring stack