Skip to content

Commit 00777d5

Browse files
author
Ralf Grubenmann
committed
Update Tutorial to new MLBench version
1 parent 246f658 commit 00777d5

1 file changed

Lines changed: 25 additions & 130 deletions

File tree

_posts/2018-09-10-tutorial.md

Lines changed: 25 additions & 130 deletions
Original file line numberDiff line numberDiff line change
@@ -17,166 +17,55 @@ This tutorial guides you through setting up MLBench in a Google Cloud [Kubernete
1717
This tutorial assumes you have a Google Cloud account with permissions to create a new cluster.
1818
You also need to have [Python](https://www.python.org/), [Git](https://git-scm.com/), and [Docker](https://www.docker.com) installed locally and the Docker Daemon should be running.
1919

20-
Now you have to checkout the mlbench github repository and have a terminal open in the checked-out mlbench directory.
20+
Checkout the [mlbench-helm](https://github.com/mlbench/mlbench-helm)) github repository and have a terminal open in the checked-out mlbench directory.
2121

2222
```shell
23-
$ git clone git@github.com:mlbench/mlbench.git
23+
$ git clone git@github.com:mlbench/mlbench-helm.git
2424
```
2525

2626
Enter the newly created directory
2727

2828
```shell
29-
$ cd mlbench
29+
$ cd mlbench-helm
3030
```
3131

32-
### Setting up gcloud client
33-
34-
Follow the steps detailed [here](https://cloud.google.com/sdk/docs/quickstarts) to install the Google Cloud SDK.
35-
36-
Now install the [kubectl](https://kubernetes.io/docs/reference/kubectl/overview/) tool with the Google configuration. kubectl is a command line interface for communicating with a Kubernetes API server.
37-
38-
```shell
39-
$ gcloud components install kubectl
40-
```
41-
42-
This will configure the kubernetes kubectl with the correct credentials for your account.
43-
44-
We can now create a Kubernetes cluster called ``mlbench`` by running
45-
46-
```shell
47-
$ gcloud container clusters create mlbench --machine-type='n1-standard-2'
48-
```
49-
By default, this will create a new cluster with 3 nodes, all of which are ``n1-standard-2`` instances.
50-
Once the cluster is created, we need to set the correct credentials for kubectl
51-
52-
```shell
53-
$ gcloud container clusters get-credentials mlbench
54-
```
55-
56-
This sets the default context of kubectl to our newly created cluster.
57-
58-
### Installing Helm
59-
60-
[Helm](https://github.com/helm/helm/) is a package manager for Kubernetes applications. It helps install pre-defined distributed applications to clusters.
61-
62-
To install helm, run
63-
64-
```shell
65-
$ curl https://raw.githubusercontent.com/kubernetes/helm/master/scripts/get | bash
66-
```
67-
68-
For helm to work properly, it needs a service account in the cluster with ``cluster-admin`` rights. We can set up an account with the correct privileges by running
69-
70-
```shell
71-
$ kubectl --namespace kube-system create sa tiller
72-
$ kubectl create clusterrolebinding tiller --clusterrole cluster-admin --serviceaccount=kube-system:tiller
73-
```
74-
75-
This creates a new service account with the correct privileges for the helm server component ``tiller``, which takes care of managing the deployment of pods to our cluster.
76-
77-
We can now initialize helm with our newly created service account
78-
79-
```shell
80-
$ helm init --service-account tiller
81-
```
82-
83-
After this, helm is set up and ready to deploy applications to our newly created cluster.
84-
85-
### Building the Master and Worker images
86-
87-
*Note: You can skip this part if you want to use the precompiled docker images*
88-
89-
To use custom images, we will have to host them in a docker registry. The [Google Cloud Container Registry](https://cloud.google.com/container-registry/) is an obvious choice for Google Cloud.
90-
91-
First we need to enable access to it on our commandline
92-
93-
```shell
94-
$ gcloud auth configure-docker
95-
```
96-
97-
We provide easy to use commands to build and deploy the images using ```make```. Make sure to correctly set the name of your Google Cloud project (``<gcloud project name>``) in the following commands
98-
99-
```shell
100-
$ make publish-docker component=master docker_registry=gcr.io/<gcloud project name>
101-
$ make publish-docker component=worker docker_registry=gcr.io/<gcloud project name>
102-
```
103-
104-
This will build the ``master`` and ``worker`` docker images and push them to the Container Registry.
105-
10632
### Installing MLBench
10733

108-
Now copy the file `charts/mlbench/values.yaml` to the current directory, calling it `myvalues.yaml`.
34+
Copy the file `values.yaml` to the current directory, calling it `myvalues.yaml`.
10935

11036
```shell
111-
$ cp charts/mlbench/values.yaml myvalues.yaml
37+
$ cp values.yaml myvalues.yaml
11238
```
11339

11440
This file contains default values for most settings in mlbench. There are however some you need to set yourself to reasonable values for your cluster, namely:
11541

11642
```yaml
11743
limits:
11844
cpu: 1000m
119-
gpu: 0
120-
maximumWorkers: 3
45+
workers: 3
12146
bandwidth: 1000
122-
```
123-
124-
This limits the maximum usable resources (And the maximum you are able to chose in the UI) to 1 CPU core , 0 GPUs, 1000 mbit/s network speed per node and 3 nodes total.
125-
126-
*Note: Our ``n1-standard-2`` instances have 2 CPU cores. But due to Google Cloud Kubernetes running its own monitoring and management pods, which also use some CPU, it is advisable to set MLBench to use one core less than available*
127-
128-
If you followed the previous section and built the docker images yourself, your ``myvalues.yaml`` file should look as follows (again, replace ``<gcloud project name>`` with your Google Cloud project name)
129-
130-
```yaml
131-
master:
132-
image:
133-
repository: gcr.io/<gcloud project name>/mlbench_master
134-
tag: latest
135-
pullPolicy: Always
136-
137-
138-
worker:
139-
image:
140-
repository: gcr.io/<gcloud project name>/mlbench_worker
141-
tag: latest
142-
pullPolicy: Always
143-
144-
limits:
145-
cpu: 1000m
14647
gpu: 0
147-
maximumWorkers: 3
148-
bandwidth: 1000
14948
```
15049
151-
Now it is time to install MLBench
152-
153-
```shell
154-
$ helm upgrade --wait --recreate-pods -f myvalues.yaml --timeout 900 --install release1 charts/mlbench
155-
```
50+
This limits the maximum usable resources (And the maximum you are able to chose in the UI) to 1 CPU (1000m = 1000 milli-CPUs) core , 0 GPUs, 1000 mbit/s network speed per node and 3 nodes total.
15651
157-
This creates Kubernetes templates based on the values set in ``myvalues.yaml`` and installs them to our Kubernetes cluster, calling the release ``release1``.
52+
*Note: ``n1-standard-2`` instances have 2 CPU cores. But due to Google Cloud Kubernetes running its own monitoring and management pods, which also use some CPU, it is advisable to set MLBench to use one core less than available*
15853
159-
*Note: Release names allow you to install multiple instances of the same helm chart side by side, but are not relevant for this tutorial*
54+
With those values set, MLBench can be installed with the `google_cloud_setup.sh` script (Run `google_cloud_setup.sh help` to see all available options).
16055

161-
Since the deployment is not open to the internet by default, the default instructions printed by the previous command **do not apply**.
162-
To gain access to MLBench, we need to add a firewall rule to Google Cloud
56+
First, create a GKE cluster:
16357

16458
```shell
165-
$ export NODE_PORT=$(kubectl get --namespace default -o jsonpath="{.spec.ports[0].nodePort}" services ${RELEASE_NAME}-mlbench-master)
166-
$ export NODE_IP=$(gcloud compute instances list|grep $(kubectl get nodes --namespace default -o jsonpath="{.items[0].status.addresses[0].address}") |awk '{print $5}')
167-
$ gcloud compute firewall-rules create --quiet mlbench --allow tcp:$NODE_PORT,tcp:$NODE_PORT
59+
$ ./google_cloud_setup.sh create-cluster NUM_NODES=4
16860
```
16961

170-
This gets the public ip of the node the ``master`` image is deployed on, plus the randomly selected port it is running on, and adds a firewall rule allowing access to that port.
171-
172-
To get the URL the dashboard is accessible under, we can now just run
62+
and then install the helm chart:
17363

17464
```shell
175-
$ echo http://$NODE_IP:$NODE_PORT
176-
http://172.16.0.1:32145
65+
$ ./google_cloud_setup.sh install-chart
17766
```
17867

179-
and it should print the URL (In this example it printed ``http://172.16.0.1:32145``)
68+
That's it, this should setup MLBench in your Google Kubernetes cluster. The Dashboard URL can be found in at the end of the output of the last command (e.g. `http://172.16.0.1:32145`).
18069

18170
Simply open the URL in your browser and you should be ready to go.
18271

@@ -222,15 +111,21 @@ You can then see the details of the experiment by clicking on its entry in the l
222111

223112
That's it! You successfully ran an distributed machine learning algorithm in the cloud. You can also easily develop custom worker images for your own models and compare them to existing benchmarking code without a lot of overhead.
224113

225-
### Appendix 1: Run mlbench on GKE
226114

227-
The previous commands can be summarized as follows
115+
### Cleanup
116+
To delete MLBench, run :
117+
118+
```shell
119+
$ ./google_cloud_setup.sh uninstall-chart
120+
```
228121

229-
{% gist 23361aea5fe252570496acc7da4fb599 %}
122+
To delete the whole Cluster (and cleanup firewall rules), run:
230123

231-
Customize environment variables like `NUM_NODES` and run the scripts with `create`, `install` and `dashboard` sequantially. The last command will give you the external ip of the dashboard. When the job is done, run this script with `cleanup` to delete everything.
124+
```shell
125+
$ ./google_cloud_setup.sh delete-cluster
126+
```
232127

233-
### Appendix 2: Use NFS for Data storage
128+
### Appendix 1: Use NFS for Data storage
234129
To avoid downloading datasets everytime we reinstall mlbench, we can use a persistent disk to save the data. To do so, one can create a GCE disk like
235130
```bash
236131
gcloud compute disks create --size=10G --zone=europe-west1-b my-pd-name

0 commit comments

Comments
 (0)