Skip to content

Standalone gRPC inference service for CLIP and OCR.#27

Open
KarunyaChavan wants to merge 1 commit into
mainfrom
feature/grpc-inference-service
Open

Standalone gRPC inference service for CLIP and OCR.#27
KarunyaChavan wants to merge 1 commit into
mainfrom
feature/grpc-inference-service

Conversation

@KarunyaChavan

Copy link
Copy Markdown
Owner

Description

closes #18

ML inference (CLIP embedding and OCR) was previously tightly coupled to the Flask web server. As a result, future Go services (scanner, GraphQL gateway) could only access inference functionality through Python REST endpoints.

This PR extracts inference into a standalone gRPC service, allowing any client to consume ML capabilities directly through a shared protobuf contract.


What Changed

File Description
proto/semantixel_inference.proto Defines the gRPC service contract with four RPCs: EmbedImage, EmbedText, ExtractOCR, and HealthCheck.
semantixel/grpc_server.py Implements the gRPC server, servicer, lifecycle management, and CLI entry point.
main.py Adds --grpc and --grpc-port flags for starting the gRPC server.
requirements.txt Adds grpcio and grpcio-tools.
scripts/generate_proto.py OS-agnostic utility for regenerating protobuf stubs.

Implementation Details

1. Protobuf Contract

Defined a standalone protobuf contract in proto/semantixel_inference.proto containing:

  • EmbedImage
  • EmbedText
  • ExtractOCR
  • HealthCheck

The contract includes:

  • ServingStatus enum for readiness reporting
  • optional float threshold for OCR confidence filtering
  • OCRResult wrapper message for future response extensibility

2. gRPC Servicer

Implemented InferenceServicer, which delegates all inference requests to the existing ModelManager singleton.

This preserves:

  • Existing model loading behavior
  • CLIP embedding normalization
  • OCR output formatting
  • Inference semantics already used by the Flask API

3. Server Lifecycle Management

Implemented GrpcInferenceServer using grpc.aio for asynchronous request handling.

Features include:

  • Async RPC execution
  • Graceful shutdown
  • SIGINT handling
  • SIGTERM handling
  • Clean resource teardown

4. CLI Integration

Added multiple startup options:

python main.py --grpc
python main.py --grpc --grpc-port 50051
python -m semantixel.grpc_server

A dedicated generation script was also added:

python scripts/generate_proto.py

Rationale

Decoupling

Inference is no longer tied to Flask.

Model serving can evolve independently of the REST layer, and web server restarts no longer imply inference service restarts.

Language Agnostic Integration

The shared protobuf contract enables clients in any supported language.

Go services such as the scanner and GraphQL gateway can generate native stubs and communicate directly with the inference service.

Performance

gRPC uses Protocol Buffers for serialization, reducing payload size and serialization overhead compared to JSON.

This is particularly beneficial for high-dimensional embedding vectors.

Readiness and Observability

The HealthCheck RPC exposes:

  • Service status (ServingStatus)
  • Model load state
  • Active model information
  • Runtime device information

This allows orchestration and monitoring systems to verify service readiness.

Future Extensibility

The API was designed with forward compatibility in mind:

  • optional threshold distinguishes omitted values from explicit 0.0
  • OCRResult allows additional per-image metadata without changing response structure
  • Embedding responses expose model metadata and dimensionality
  • Batch-oriented request/response structures support future scaling requirements

Result

Semantixel inference is now exposed as a standalone, language-agnostic gRPC service that can be consumed directly by Go services and other future clients while preserving existing model behavior and inference outputs.

- Extract ML inference (CLIP embeddings + OCR extraction) into an independent gRPC server, decoupling it from the Flask REST layer and enabling polyglot consumers (Go scanner, GraphQL gateway).
@KarunyaChavan KarunyaChavan added enhancement New feature or request dependencies Pull requests that update a dependency file labels Jun 10, 2026
@KarunyaChavan KarunyaChavan marked this pull request as ready for review June 10, 2026 09:51
@KarunyaChavan KarunyaChavan self-assigned this Jun 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Introduce Python gRPC Inference Service

1 participant