Skip to content

Latest commit

 

History

History
130 lines (101 loc) · 3.88 KB

File metadata and controls

130 lines (101 loc) · 3.88 KB

Project Functionality

Main Menu

After running python3 run.py, the user is presented with three options:

1) Offline AI Chat
2) System Analyzer
3) Exit

Offline AI Chat Scenario

Purpose

This scenario is intended for local execution of a GGUF model through llama.cpp followed by interactive communication with the model in the console.

Workflow Stages

1. Model Selection

The project reads _work-models/catalog.json and displays the list of models.

For each model, the following information is shown:

  • name;
  • short description;
  • max_tokens;
  • approximate host requirements;
  • source;
  • GGUF layout type: single-file or sharded.

2. Preparing Model Files

After a model is selected, the project checks the _work-models/models/<model_key>/ directory.

Workflow logic:

  • if the file already exists, its SHA256 is calculated and compared against the catalog;
  • if the file is missing, it is downloaded from the URL specified in the catalog;
  • after download, the file is verified again using SHA256;
  • if the checksum does not match, execution stops with an error.

As a result, the project does not use unverified local GGUF files.

3. Building the Docker Image

To build the image, a temporary build context is created inside .runtime/offline-ai-chat/.

Implementation details:

  • the Dockerfile is taken from apps/offline_llm_chat/docker/Dockerfile;
  • model files are added to the build context via hardlinks rather than by copying;
  • the image is built with the following tags:
    • offline-ai-chat-llm:latest;
    • offline-ai-chat-llm:<model_key>.

4. Starting the Container

After the image is built, the project starts the container with llama.cpp and waits for the Unix socket to appear.

The runtime uses parameters from .env:

  • CTX_SIZE;
  • MEM_LIMIT;
  • CPU_LIMIT;
  • PIDS_LIMIT;
  • DEBUG_LOGS.

5. Model Readiness Check

Once the socket appears, the chat is not opened immediately. Two checks are performed first:

  • polling /v1/models for API readiness;
  • a smoke test with a short test request.

Only after these checks complete successfully does the project start the user chat.

6. Interactive Mode

In interactive mode, the following commands are supported:

  • /exit — exit the chat;
  • /reset — reset the current message history.

In addition, the project outputs service statistics for each response:

  • token count;
  • processing duration;
  • prompt/generation speed;
  • cumulative token counter since the start of the session.

System Analyzer Scenario

Purpose

This scenario is intended for preliminary evaluation of Linux host resources and selection of container startup parameters.

What Is Analyzed

Based on system data, the project collects information about:

  • OS version;
  • CPU and the number of logical cores;
  • RAM and swap size;
  • root filesystem;
  • physical disks;
  • GPU, if supported detection tools are available.

Output

Based on the analysis results, the project generates startup profiles and suggests values for .env:

  • CTX_SIZE;
  • MEM_LIMIT;
  • CPU_LIMIT;
  • PIDS_LIMIT.

After user confirmation, the values are written to the root .env file.

Functional Limitations

The project performs a strictly defined task and does not include additional operating modes.

In the current implementation, the project:

  • does not mount a user repository into the runtime container;
  • does not modify external source code;
  • does not launch a web application;
  • does not expose a REST API externally;
  • does not maintain a separate file-based report storage;
  • does not manage multiple containers simultaneously.

Practical Operating Scenario

Baseline workflow:

Run run.py
  -> System Analyzer
  -> profile evaluation and .env update
  -> return to the main menu
  -> Offline AI Chat
  -> model selection
  -> GGUF verification or download
  -> Docker build
  -> Docker run
  -> smoke test
  -> interactive CLI chat