Paddler feature list for automated testing

### Basic cluster behavior

- [x] Start a balancer 
- [x] Start agents connected to the balancer
- [x] Allow to customize the number of slots in agents
- [x] Ensure the balancer distributes the requests among agents correctly
- [x] Ensure agents process the requests
- [x] Allow for agent naming
- [x] Allow to persist balancer configuration across restarts (state database as a file)
- [x] Shut down the balancer and agents cleanly when stopped

### Handle the load, scale a cluster 

- [x] Buffer the incoming requests when applicable (all slots in all agents busy)
- [x] Allow to customize the request buffer (maximum number of requests)
- [x] Allow to customize the request buffer (maximum time the requests can spend in the buffer)
- [x] Be able to add / remove agents connected to a balancer
- [x] Be able to scale from zero agent instances
- [x] Allow to customize the inference timeout
- [x] Return an error to the client when an agent disconnects mid-request

### Load and manage a model 

- [x] Load a model from Hugging Face
- [x] Load a model from local file path
- [x] Allow for specifying pooling type (for embeddings only)
- [x] Allow for swapping models without restarting the balancer or agents
- [x] Show model's metadata

### Generate tokens

- [x] Generate tokens from conversation history
- [x] Generate tokens from raw prompt
- [x] Stream generated tokens back to the client in real time
- [x] Support the thinking mode
- [x] Respect the `max_tokens` number

### Generate embeddings

- [x] Ensure embeddings are generated only when enabled
- [x] Generate embeddings from batches of input documents
- [x] Stream embedding results back to the client in real time
- [x] Preserve document IDs in embedding results so the user can match them to the input
- [x] Automatically split large batches into smaller chunks to fit agent capacity
- [x] Allow for specifying normalization method

### Multimodal support
- [x] Load a mmproj from Hugging Face
- [x] Load a mmproj from local file path
- [x] Generate tokens from conversation history that includes images 

### Use function calling

- [x] Ensure `tools` parameter is optional
- [x] Allow for adding functions in `tools` parameter
- [x] Validate the tools schema

### Control response quality

- [x] Customize the inference parameters
- [x] Apply chat template from the model if no override is provided
- [x] Allow to override the model's chat template

### Monitor cluster's metrics

- [x] Expose metrics for slots in use, total slots, and buffered requests (Prometheus format)
- [x] Allow for optionally enabling StatsD when starting the balancer
- [x] Push the metrics (slots in use, total slots, and buffered requests) to a StatsD server

### Web admin panel

- [x] Allow for optionally enabling Web Admin Panel when starting the balancer
- [x] Dashboard view presents the setup in real time (buffered requests, agents being added/removed, slots occupancy, model downloading)
- [x] Model view allows for model and chat template management
- [x] Model's metadata and chat template can be viewed
- [x] Prompt section allows for inference testing (token generation only and the history of the conversation is not supported at the moment)

### Use OpenAI-compatible API

- [x] Allow for optionally enabling OpenAI compatibility endpoint when starting the balancer
- [x] Ensure `/v1/chat/completions` endpoint is supported (requests are translated to OpenAI's format and then responses are translated back to Paddler's format)

### CORS configuration
- [x] Allow to configure CORS allowed origins for inference service
- [x] Allow to configure CORS allowed origins for management service

### Monitor cluster's health
- [x] Report agent issues to the user (e.g. model failed to load, file path does not exist, chat template missing)
- [x] Expose a health check endpoint for inference service
- [x] Expose a health check endpoint for management service
- [x] Expose a health check endpoint for OpenAI compatibility service
 
---
### Unhappy paths

#### Base model input 
- [x] invalid or non-existent model references (wrong repo, filename, revision, or local path
- [x] files that are invalid GGUF 
- [x] invalid or non-existent multimodal projection references

#### Multimodal projection input
- [x] invalid or non-existent multimodal projection references
- [x] files that are not valid mmproj models

Image analysis
- [x] images sent to a text-only model (no mmproj loaded) 
- [x] invalid or unsupported image data (malformed data URI, invalid base64, unconvertible format, remote URLs)

---
### Code coverage

#### Code coverage
- [x] implement code coverage tool

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Paddler feature list for automated testing #173

Basic cluster behavior

Handle the load, scale a cluster

Load and manage a model

Generate tokens

Generate embeddings

Multimodal support

Use function calling

Control response quality

Monitor cluster's metrics

Web admin panel

Use OpenAI-compatible API

CORS configuration

Monitor cluster's health

Unhappy paths

Base model input

Multimodal projection input

Code coverage

Code coverage

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Paddler feature list for automated testing #173

Description

Basic cluster behavior

Handle the load, scale a cluster

Load and manage a model

Generate tokens

Generate embeddings

Multimodal support

Use function calling

Control response quality

Monitor cluster's metrics

Web admin panel

Use OpenAI-compatible API

CORS configuration

Monitor cluster's health

Unhappy paths

Base model input

Multimodal projection input

Code coverage

Code coverage

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions