-
making a directory
mkdir -p <dirname>
mkdir -p resnet
-
To download the model (must be on Savedmodel format it's a directory)
curl -L -o <path_to_save_the_moderl.tar.gz>\ link_to_the_Saved_model_on_kagglehub#!/bin/bash curl -L -o resnet.tar.gz \ https://www.kaggle.com/api/v1/models/tensorflow/resnet-50/tensorFlow2/classification/1/download- -o, --output Write to file instead of stdout
-
Extract the file
tar -xzvf resnet.tar.gz -C resnet/- -x: Extract the archive.
- -z: Decompress the archive (since it’s .tar.gz).
- -v: Verbose output (shows the files being extracted).
- -f: Specifies the archive file (resnet.tar.gz).
- -C resnet/: Specifies the directory (resnet/) where the files should be extracted.
-
Check the extracted model content
ls resnet
- output should be like this
saved_model.pb variables -
Making a directory for the models that would be served
mkdir -p models
- the different versions of the model
# creating 3 subfolders inside the parent folder models # you can ignore the previous line and run this only and it would work mkdir -p models/{1..3}
- let's for the purpose of testing coping the same moder
resnetto the three subfolders in themodels/
sudo cp -rf resnet/* models/1/ sudo cp -rf resnet/* models/2/ sudo cp -rf resnet/* models/3/
- the different versions of the model
- we would need to use volume so that the models in our VM can be mapped to the container and remain.
# it would pull and open the container termianl
docker run -it -v $(pwd)/models/:/models -p 8501:8501 --entrypoint /bin/bash tensorflow/serving- To delete the containers
docker rm $(docker container ps -aq)tensorflow_model_server --rest_api_port=8501 --model_name=resnet --model_base_path=/models/you should see something like this

- download this cat image or anyother iamge from the 1000class in imagenet
wget https://raw.githubusercontent.com/hossamAhmedSalah/devops_depi/refs/heads/main/cat.jpeg-
create this file
request_payload.jsonas it would be used to send the data to the model to make predictions on it, using this commandtocuh request_payload.json -
You would need to path the image to the tensorflow serving server and resnet expect images to be in a certain shape and dimensions so I made a utility script in python : image_preprocessing.py
-
To use this file you would run
python3 script.py <image_path> <mode>make sure you have the libraries installed- mode can be
appendoroverwrtieas this script can be used sequentially to preprocess the images before passing it to the server, it save the image in therequest_payload.jsonfile. appendadd the new preprocessed image to the json file.overwritejust delete any previous content and add the new processed image.
- mode can be
-
great let's use the send the image to the server after we had processed it.
curl -d @request_payload.json -H "Content-Type: application/json" -X POST http://localhost:8501/v1/models/resnet:predict- you would see a terrifying matrix of predictions

- let's postprocess the prediction to make it readable I hope the model would guess the image correctly after all this...anyway
- here is the 1000 class of iamgenet
- I used another script that would send the request and map the result to the classes predict_and_map.py
- let's run it
python3 predict_and_map.py
- Let's make it more simpler and package everything in one script that is preprocess_predict_map.py

- The current version that is runing
curl http://localhost:8501/v1/models/resnet
- so by default it goes to the
models/directory and select the highest number to be the servable version, let's change this and serve multiple versions at the same time.- pause the server inside the container by pressing
ctrl+c - the config file
model.config.a
model_config_list: { config: { name: "resnet", base_path: "/models/", model_platform: "tensorflow", model_version_policy: { all: { } } } }
- it should be like this
- this in the host

- this in the container

- run this command in the container to server the models following the configurations we made
tensorflow_model_server --rest_api_port=8501 --model_config_file=/models/model.config.a
- now let's see the versions by runing this command
curl http://localhost:8501/v1/models/resne
{ "model_version_status": [ { "version": "3", "state": "AVAILABLE", "status": { "error_code": "OK", "error_message": "" } }, { "version": "2", "state": "AVAILABLE", "status": { "error_code": "OK", "error_message": "" } }, { "version": "1", "state": "AVAILABLE", "status": { "error_code": "OK", "error_message": "" } } ] } - pause the server inside the container by pressing
- so by default it goes to the
- we would need to create new one with the batching parameters
config_batching
max_batch_size { value: 128 }
batch_timeout_micros { value: 0 }
max_enqueued_batches { value: 1000000 }
num_batch_threads { value: 8 }
tensorflow_model_server --rest_api_port=8501 --model_config_file=/models/model.config.a --batching_parameters_file=/models/config_batching --enable_batching=true- building docker file for a custom tensorflow/serving image
- build & push
- Define Kubernetes Manifests
- deployment & service
- Monitoring and visualization
run_server.shput it
#!/bin/bash
# Run TensorFlow Serving with the specified model and batching configurations
tensorflow_model_server --rest_api_port=8501 --model_config_file=/models/model.config.a --batching_parameters_file=/models/config_batching --enable_batching=true
# Keep the container running
tail -f /dev/null
# Use TensorFlow Serving as base image
FROM tensorflow/serving:latest
# Copy the model and configuration files into the container
COPY models/ /models/
# Create a script to run TensorFlow Serving and keep the container running
COPY run_server.sh /models/run_server.sh
RUN chmod +x /models/run_server.sh
# Set the entrypoint to run the script
ENTRYPOINT ["/models/run_server.sh"]
- Building
sudo docker build -t hossamahmedsalah/tf-serving:resnet .The command
tail -f /dev/nullis a clever trick to keep a container running indefinitely.
- What is tail? tail is a Unix command that displays the last few lines of a file. By default, it shows the last 10 lines of a file. The -f option stands for "follow," which means that tail will continue to display new lines as they are added to the file.
- What is
/dev/null?/dev/nullis a special file in Unix-like systems that represents a null device. It's a "black hole" where any data written to it is discarded, and it always returns an end-of-file (EOF) when read from. In other words,/dev/nullis a file that never contains any data and always appears empty.
- Pushing the image to docker hub (you need to login)
docker push hossamahmedsalah/tf-serving:resnet - to pull it you would use this command
docker pull hossamahmedsalah/tf-serving:resnet
tf-serving-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: tf-serving-deployment
labels:
app: tf-serving
spec:
replicas: 3 # Number of replicas
selector:
matchLabels:
app: tf-serving
template:
metadata:
labels:
app: tf-serving
spec:
containers:
- name: tf-serving
image: hossamahmedsalah/tf-serving:resnet
ports:
- containerPort: 8501 # HTTP/REST
- containerPort: 8500 # gPRC
tf-serving-service.yaml
apiVersion: v1
kind: Service
metadata:
name: tf-serving-service
labels:
app: tf-serving
spec:
type: LoadBalancer # Exposes the service externally
ports:
- name: grpc
port: 8500 # Port for gRPC
targetPort: 8500 # Port on the container
- name: restapi
port: 8501 # Port for HTTP/REST
targetPort: 8501 # Port on the container
selector:
app: tf-serving # Selects the pods with this label- Let's check the nodes Before applying
kubectl get nodes- Let's apply
tf-serving-deployment.yaml
kubectl apply -f tf-serving-deployment.yaml- Let's check
kubectl get deployment- Let's apply the service that would work as a loadbalancer
kubectl apply -f tf-serving-service.yaml- Let's check for the external API
kubectl get svc tf-serving-service- created
monitoring.configinsidemodels/to enable it on tensorflow serving.
prometheus_config {
enable: true,
path: "/monitoring/prometheus/metrics"
}
- modifying
run_surver.shby adding a new flag--monitoring_config_file=/models/monitoring.config
#!/bin/bash
# Run TensorFlow Serving with the specified model and batching configurations
tensorflow_model_server --rest_api_port=8501 --model_config_file=/models/model.config.a --batching_parameters_file=/models/config_batching --enable_batching=true --monitoring_config_file=/models/monitoring.config
# Keep the container running
tail -f /dev/null- a new version of the docker image
sudo docker build -t hossamahmedsalah/tf-serving:resnet_monitoring .- Pushing
docker push hossamahmedsalah/tf-serving:resnet_monitoring- Modifying the
tf-serving-deployment.yaml
image: hossamahmedsalah/tf-serving:resnet_monitoring - Apply the changes
kubectl apply -f tf-serving-deployment.yaml- Let's check it
http://{ip}:8501/monitoring/prometheus/metrics
- create a Prometheus Service
prom_service.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: prometheus-self
labels:
app: prometheus
spec:
endpoints:
- interval: 30s
port: web
selector:
matchLabels:
app: prometheus
- create deployment
prom_deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: prometheus
labels:
app: prometheus
spec:
replicas: 2
selector:
matchLabels:
app: prometheus
template:
metadata:
labels:
app: prometheus
spec:
containers:
- name: prometheus
image: quay.io/prometheus/prometheus:v2.22.1
ports:
- containerPort: 9090
- Apply
kubectl apply -f prom_deployment.yaml
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install prometheus prometheus-community/prometheus-operator
- the one that worked
kubectl apply -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/main/bundle.yaml
kubectl apply -f prom_service.yaml
kubectl logs deployment/prometheus- Access Prometheus UI: Since Prometheus is set up as a ClusterIP service, we will need to port-forward to access the web interface.
kubectl port-forward deployment/prometheus 9090
- Let's create a new service to balance and expose the prometheus
prom_service_balance.yaml
apiVersion: v1
kind: Service
metadata:
name: prometheus-service
labels:
app: prometheus
spec:
type: LoadBalancer
ports:
- name: web
port: 9090 # Port exposed externally
targetPort: 9090 # Port on the container (Prometheus listens on 9090)
protocol: TCP
selector:
app: prometheus kubectl apply -f prom_service_balance.yaml- create a service to watch tf-serving
tf-serving-monitoring.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: tf-serving-monitor
labels:
app: tf-serving
spec:
selector:
matchLabels:
app: tf-serving
endpoints:
- port: "8501" # Port where TensorFlow Serving is exposed
interval: 30s # Scrape intervalkubectl apply -f tf-serving-monitoring.yaml- a modification on
tf-serving-monitoring.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: tf-serving-monitor
labels:
app: tf-serving
spec:
selector:
matchLabels:
app: tf-serving
endpoints:
- port: "8501" # Port where TensorFlow Serving is exposed
interval: 30s # Scrape interval
path: http://35.189.228.88:8501/monitoring/prometheus/metricsit didn't work as planed but.. enough for now
rollout:is the process of updating or deploying an application in Kubernetes, typically managed by a Deployment resource. It can involve changing the container image, updating environment variables, or modifying resource requests and limits.kubectl rollout pausecommand is used in Kubernetes to temporarily halt the rollout of a deployment. This command is particularly useful during updates or changes to a deployment when you want to prevent new replicas from being created or old replicas from being terminated while you are making adjustments.to reverse
kubectl rollout resume
kubectl rollout resume deployment/prometheus
kubectl rollout resume deployment/prometheus-operator
kubectl rollout resume deployment/tf-serving-deployment
scale down
kubectl scale deployment/prometheus --replicas=0kubectl scale deployment/prometheus-operator --replicas=0kubectl scale deployment/tf-serving-deployment --replicas=0now we don't have any pods runing so no services
kubectl scale deployment/prometheus --replicas=2
kubectl scale deployment/prometheus-operator --replicas=1
kubectl scale deployment/tf-serving-deployment --replicas=3

















