Standalone gRPC inference service for CLIP and OCR.#27
Open
KarunyaChavan wants to merge 1 commit into
Open
Conversation
- Extract ML inference (CLIP embeddings + OCR extraction) into an independent gRPC server, decoupling it from the Flask REST layer and enabling polyglot consumers (Go scanner, GraphQL gateway).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
closes #18
ML inference (CLIP embedding and OCR) was previously tightly coupled to the Flask web server. As a result, future Go services (scanner, GraphQL gateway) could only access inference functionality through Python REST endpoints.
This PR extracts inference into a standalone gRPC service, allowing any client to consume ML capabilities directly through a shared protobuf contract.
What Changed
proto/semantixel_inference.protoEmbedImage,EmbedText,ExtractOCR, andHealthCheck.semantixel/grpc_server.pymain.py--grpcand--grpc-portflags for starting the gRPC server.requirements.txtgrpcioandgrpcio-tools.scripts/generate_proto.pyImplementation Details
1. Protobuf Contract
Defined a standalone protobuf contract in
proto/semantixel_inference.protocontaining:EmbedImageEmbedTextExtractOCRHealthCheckThe contract includes:
ServingStatusenum for readiness reportingoptional float thresholdfor OCR confidence filteringOCRResultwrapper message for future response extensibility2. gRPC Servicer
Implemented
InferenceServicer, which delegates all inference requests to the existingModelManagersingleton.This preserves:
3. Server Lifecycle Management
Implemented
GrpcInferenceServerusinggrpc.aiofor asynchronous request handling.Features include:
SIGINThandlingSIGTERMhandling4. CLI Integration
Added multiple startup options:
A dedicated generation script was also added:
Rationale
Decoupling
Inference is no longer tied to Flask.
Model serving can evolve independently of the REST layer, and web server restarts no longer imply inference service restarts.
Language Agnostic Integration
The shared protobuf contract enables clients in any supported language.
Go services such as the scanner and GraphQL gateway can generate native stubs and communicate directly with the inference service.
Performance
gRPC uses Protocol Buffers for serialization, reducing payload size and serialization overhead compared to JSON.
This is particularly beneficial for high-dimensional embedding vectors.
Readiness and Observability
The
HealthCheckRPC exposes:ServingStatus)This allows orchestration and monitoring systems to verify service readiness.
Future Extensibility
The API was designed with forward compatibility in mind:
optional thresholddistinguishes omitted values from explicit0.0OCRResultallows additional per-image metadata without changing response structureResult
Semantixel inference is now exposed as a standalone, language-agnostic gRPC service that can be consumed directly by Go services and other future clients while preserving existing model behavior and inference outputs.