Skip to content

Junjie-Chu/DataEngineering_2_Project

Repository files navigation

DE2_Project_Group11

Cluster setting and Debug: Ying Peng, Junjie Chu
Ray code and Debug: Ying Peng, Junjie Chu
ML code and Debug: Mandus, Ying Peng

1. Login in the Orchestration VM

The detailed settings could be read in the Steps.md(step1-step3 are for Orchestration VM).
git clone:

https://github.com/Junjie-Chu/DE2_Project.git

Note: Be careful with the position of your key and authority of the key.(i.e. chmod 600 DE2_group11.pem)

2. Create instances

image

Command:

python3 DE2_Project/openstack-client/single_node_with_docker_ansible_client/start_instances.py

3. Revise the hosts IP

According to the IP of our instances, edit the file.

sudo nano /etc/ansible/hosts

4. Start to deploy services with ansible.

Command:

export ANSIBLE_HOST_KEY_CHECKING=False
ansible-playbook configuration.yml --private-key=/home/ubuntu/cluster-keys/cluster-key

image

5. Start to train models in development node

Log in the development node:

ssh -i /home/ubuntu/cluster-keys/cluster-key appuser@192.168.2.89
cd /home/DE2_Project/ci_cd/development_server/
sudo -s
docker build -t trainmodel .

According to the way you like, you could edit the .py file via vim/nano, or via jupyternotebook(floatingip:5100 and do not forget to start jupyternotebook service).
Now we could run the scripts in the VM directly or run them in the containers.
If in VM:

python3 GDBT_train.py
python3 NN_train.py
python3 RF_train.py

If in containers:

  1. Build image
docker build -t trainmodel .
  1. Run models
docker run -it imagename filename.py
  1. After the .py is changed, copy the new modeltraining.py from the VM to the container
    Note: if you want to make the containers communicate with the outside VM more easily, you could use volumes. We would like to make each container more isolated. So we do not do that.
docker cp /home/DE2_Project/ci_cd/development_server/trainmodel.py 10704c9eb7bb:/app
  1. Log in the container and run new modeltraining.py
docker exec -it imagename bash
python3 new modeltraining.py
  1. Copy the result from containers to the VM
docker cp 10704c9eb7bb:/app/model.name /home/DE2_Project/ci_cd/development_server/

6. Compare the models

Select the best one!
Copy the best result model to git folder

cp gdbt_model.m /home/appuser/my_project/

7. Do parameter tuning

On the head node (just choose some node to be the head node, in our case, we use dev server as head node), run the following. If the --port argument is omitted, Ray will choose port 6379, falling back to a random port.

$ ray start --head
or
$ ray start --head --port=6379
...
Next steps
  To connect to this Ray runtime from another node, run
    ray start --address='<ip address>:6379' --redis-password='<password>'

If connection fails, check your firewall settings and network configuration.
The command will print out the address of the Redis server that was started (the local node IP address plus the port number you specified).

Then on each of the other nodes, run the following. Make sure to replace <address> <password> with the value printed by the command on the head node (it should look something like 192.168.2.89:6379).

$ ray start --address=<address> --redis-password='<password>'
--------------------
Ray runtime started.
--------------------

To terminate the Ray runtime, run

$  ray stop

In the .py file:

import ray 
ray.init(address='auto', _redis_password='5241590000000000')

Note: In our project, the ray cluster contains 2 servers: devserver(192.168.2.124) and paraserver(192.168.2.242).
You could easily use more nodes if you want.
NOTE: Ray needs you to be root
We have run Neural Network, the result as follows: image Random Forest Regressor image Gradient Boosting Regressor image Here we can see that best model is random forest regressor.

Sometimes, it reminds us to re-run the ray install command, please use:

pip install ray[default]
pip install ray[rllib]

Or something others it says.
Strong Scalability:
Using 1/2/4 nodes with the same total workload(8 sets of parameters of NN)
image
image
Weak Scalability:
Using 1/2/4 nodes with the same local workload(8/16/32 sets of parameters of NN in total, 8 set in each node)
image
image

8. Set up the production cluster(docker swarm)

1.Log in the Orchestration VM.
2.Via Orchestration VM, log in prod1(set it as master of swarm)

ssh -i /home/ubuntu/cluster-keys/cluster-key appuser@192.168.2.89

3.Change work dir

cd /home/DE2_Project/ci_cd/production_server

4.Set up
On prod1:

docker login
docker swarm init --advertise-addr 192.168.2.89:2377 --listen-addr 192.168.2.89:2377

On prod2:
To add a worker to this swarm, (on the worker node) run 'docker swarm join-token worker' and follow the instructions.

docker swarm join --token SWMTKN-1-2mp6cxamqvsuzve6nd8yjd5ldulnuzmgl6dxjy8dhkssqyg4nk-8bh0npbssxzoesthcuwwcszz2 192.168.2.89:2377

image

To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions.
5. if not use docker login in step 4, sometimes the worker node will be rejected when try to pull image!

docker login

use docker stack deploy instead of docker compose!

docker stack deploy --with-registry-auth -c docker-compose.yml project

image

docker service scale project_worker_1=4

image

docker stack ps project

image

docker stack services project

image
If you want to see 1 container more detailed(if the container in prod2, please log in prod 2):

docker logs -f --tail=100 b3f4
docker exec -it b3f4 bash

9. Push the best model to production cluster!

The ci/cd is realized by githook. The setting of githook is realized via Ansible playbook.
The part for dev server in Ansible playbook:
image
The part for prod server in Ansible playbook:
image
Now push the first model:
Make sure your are appuser instead of root!
log in devserver:

ssh -i cluster-key appuser@<DEVELOPMENT-SERVER-IP>

Go to the /home/appuser/my_project directory
Add files for the commit and commit files

git add .
git commit -m "test 1"

Connect development server's git to production server's git

git remote add production appuser@<PRODUCTIONS-SERVER-IP>:/home/appuser/my_project
git remote add production1 appuser@<PRODUCTIONS-SERVER-IP>:/home/appuser/my_project

Push your commits to the production server

git push production master
git push production1 master

10. Now visit the page: floatip:5100/RMSE

image

11. Test scalability

Note: all the tests should be run in the same environment.
Run test.py and the start time(just example) will be recorded.
image
The end time(just example):
image

About

Part of setting clusters

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors