Standalone gRPC inference service for CLIP and OCR. by KarunyaChavan · Pull Request #27 · KarunyaChavan/Semantixel-Semantic_Image_Retrieval

KarunyaChavan · 2026-06-10T09:50:05Z

Description

closes #18

ML inference (CLIP embedding and OCR) was previously tightly coupled to the Flask web server. As a result, future Go services (scanner, GraphQL gateway) could only access inference functionality through Python REST endpoints.

This PR extracts inference into a standalone gRPC service, allowing any client to consume ML capabilities directly through a shared protobuf contract.

What Changed

File	Description
`proto/semantixel_inference.proto`	Defines the gRPC service contract with four RPCs: `EmbedImage`, `EmbedText`, `ExtractOCR`, and `HealthCheck`.
`semantixel/grpc_server.py`	Implements the gRPC server, servicer, lifecycle management, and CLI entry point.
`main.py`	Adds `--grpc` and `--grpc-port` flags for starting the gRPC server.
`requirements.txt`	Adds `grpcio` and `grpcio-tools`.
`scripts/generate_proto.py`	OS-agnostic utility for regenerating protobuf stubs.

Implementation Details

1. Protobuf Contract

Defined a standalone protobuf contract in proto/semantixel_inference.proto containing:

EmbedImage
EmbedText
ExtractOCR
HealthCheck

The contract includes:

ServingStatus enum for readiness reporting
optional float threshold for OCR confidence filtering
OCRResult wrapper message for future response extensibility

2. gRPC Servicer

Implemented InferenceServicer, which delegates all inference requests to the existing ModelManager singleton.

This preserves:

Existing model loading behavior
CLIP embedding normalization
OCR output formatting
Inference semantics already used by the Flask API

3. Server Lifecycle Management

Implemented GrpcInferenceServer using grpc.aio for asynchronous request handling.

Features include:

Async RPC execution
Graceful shutdown
SIGINT handling
SIGTERM handling
Clean resource teardown

4. CLI Integration

Added multiple startup options:

python main.py --grpc
python main.py --grpc --grpc-port 50051
python -m semantixel.grpc_server

A dedicated generation script was also added:

python scripts/generate_proto.py

Rationale

Decoupling

Inference is no longer tied to Flask.

Model serving can evolve independently of the REST layer, and web server restarts no longer imply inference service restarts.

Language Agnostic Integration

The shared protobuf contract enables clients in any supported language.

Go services such as the scanner and GraphQL gateway can generate native stubs and communicate directly with the inference service.

Performance

gRPC uses Protocol Buffers for serialization, reducing payload size and serialization overhead compared to JSON.

This is particularly beneficial for high-dimensional embedding vectors.

Readiness and Observability

The HealthCheck RPC exposes:

Service status (ServingStatus)
Model load state
Active model information
Runtime device information

This allows orchestration and monitoring systems to verify service readiness.

Future Extensibility

The API was designed with forward compatibility in mind:

optional threshold distinguishes omitted values from explicit 0.0
OCRResult allows additional per-image metadata without changing response structure
Embedding responses expose model metadata and dimensionality
Batch-oriented request/response structures support future scaling requirements

Result

Semantixel inference is now exposed as a standalone, language-agnostic gRPC service that can be consumed directly by Go services and other future clients while preserving existing model behavior and inference outputs.

- Extract ML inference (CLIP embeddings + OCR extraction) into an independent gRPC server, decoupling it from the Flask REST layer and enabling polyglot consumers (Go scanner, GraphQL gateway).

feat(service): add standalone gRPC inference service for CLIP and OCR.

aef9873

- Extract ML inference (CLIP embeddings + OCR extraction) into an independent gRPC server, decoupling it from the Flask REST layer and enabling polyglot consumers (Go scanner, GraphQL gateway).

KarunyaChavan added enhancement New feature or request dependencies Pull requests that update a dependency file labels Jun 10, 2026

KarunyaChavan marked this pull request as ready for review June 10, 2026 09:51

KarunyaChavan requested review from rockers2004 and taslim121 June 10, 2026 09:52

KarunyaChavan added this to the v2.0 - Microservices Migration milestone Jun 10, 2026

KarunyaChavan self-assigned this Jun 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Standalone gRPC inference service for CLIP and OCR.#27

Standalone gRPC inference service for CLIP and OCR.#27
KarunyaChavan wants to merge 1 commit into
mainfrom
feature/grpc-inference-service

KarunyaChavan commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

KarunyaChavan commented Jun 10, 2026

Description

What Changed

Implementation Details

1. Protobuf Contract

2. gRPC Servicer

3. Server Lifecycle Management

4. CLI Integration

Rationale

Decoupling

Language Agnostic Integration

Performance

Readiness and Observability

Future Extensibility

Result

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant