Skip to content

Paddler feature list for automated testing #173

@malzag

Description

@malzag

Basic cluster behavior

  • Start a balancer
  • Start agents connected to the balancer
  • Allow to customize the number of slots in agents
  • Ensure the balancer distributes the requests among agents correctly
  • Ensure agents process the requests
  • Allow for agent naming
  • Allow to persist balancer configuration across restarts (state database as a file)
  • Shut down the balancer and agents cleanly when stopped

Handle the load, scale a cluster

  • Buffer the incoming requests when applicable (all slots in all agents busy)
  • Allow to customize the request buffer (maximum number of requests)
  • Allow to customize the request buffer (maximum time the requests can spend in the buffer)
  • Be able to add / remove agents connected to a balancer
  • Be able to scale from zero agent instances
  • Allow to customize the inference timeout
  • Return an error to the client when an agent disconnects mid-request

Load and manage a model

  • Load a model from Hugging Face
  • Load a model from local file path
  • Allow for specifying pooling type (for embeddings only)
  • Allow for swapping models without restarting the balancer or agents
  • Show model's metadata

Generate tokens

  • Generate tokens from conversation history
  • Generate tokens from raw prompt
  • Stream generated tokens back to the client in real time
  • Support the thinking mode
  • Respect the max_tokens number

Generate embeddings

  • Ensure embeddings are generated only when enabled
  • Generate embeddings from batches of input documents
  • Stream embedding results back to the client in real time
  • Preserve document IDs in embedding results so the user can match them to the input
  • Automatically split large batches into smaller chunks to fit agent capacity
  • Allow for specifying normalization method

Multimodal support

  • Load a mmproj from Hugging Face
  • Load a mmproj from local file path
  • Generate tokens from conversation history that includes images

Use function calling

  • Ensure tools parameter is optional
  • Allow for adding functions in tools parameter
  • Validate the tools schema

Control response quality

  • Customize the inference parameters
  • Apply chat template from the model if no override is provided
  • Allow to override the model's chat template

Monitor cluster's metrics

  • Expose metrics for slots in use, total slots, and buffered requests (Prometheus format)
  • Allow for optionally enabling StatsD when starting the balancer
  • Push the metrics (slots in use, total slots, and buffered requests) to a StatsD server

Web admin panel

  • Allow for optionally enabling Web Admin Panel when starting the balancer
  • Dashboard view presents the setup in real time (buffered requests, agents being added/removed, slots occupancy, model downloading)
  • Model view allows for model and chat template management
  • Model's metadata and chat template can be viewed
  • Prompt section allows for inference testing (token generation only and the history of the conversation is not supported at the moment)

Use OpenAI-compatible API

  • Allow for optionally enabling OpenAI compatibility endpoint when starting the balancer
  • Ensure /v1/chat/completions endpoint is supported (requests are translated to OpenAI's format and then responses are translated back to Paddler's format)

CORS configuration

  • Allow to configure CORS allowed origins for inference service
  • Allow to configure CORS allowed origins for management service

Monitor cluster's health

  • Report agent issues to the user (e.g. model failed to load, file path does not exist, chat template missing)
  • Expose a health check endpoint for inference service
  • Expose a health check endpoint for management service
  • Expose a health check endpoint for OpenAI compatibility service

Unhappy paths

Base model input

  • invalid or non-existent model references (wrong repo, filename, revision, or local path
  • files that are invalid GGUF
  • invalid or non-existent multimodal projection references

Multimodal projection input

  • invalid or non-existent multimodal projection references
  • files that are not valid mmproj models

Image analysis

  • images sent to a text-only model (no mmproj loaded)
  • invalid or unsupported image data (malformed data URI, invalid base64, unconvertible format, remote URLs)

Code coverage

Code coverage

  • implement code coverage tool

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions