You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -17,166 +17,55 @@ This tutorial guides you through setting up MLBench in a Google Cloud [Kubernete
17
17
This tutorial assumes you have a Google Cloud account with permissions to create a new cluster.
18
18
You also need to have [Python](https://www.python.org/), [Git](https://git-scm.com/), and [Docker](https://www.docker.com) installed locally and the Docker Daemon should be running.
19
19
20
-
Now you have to checkout the mlbench github repository and have a terminal open in the checked-out mlbench directory.
20
+
Checkout the [mlbench-helm](https://github.com/mlbench/mlbench-helm)) github repository and have a terminal open in the checked-out mlbench directory.
Follow the steps detailed [here](https://cloud.google.com/sdk/docs/quickstarts) to install the Google Cloud SDK.
35
-
36
-
Now install the [kubectl](https://kubernetes.io/docs/reference/kubectl/overview/) tool with the Google configuration. kubectl is a command line interface for communicating with a Kubernetes API server.
37
-
38
-
```shell
39
-
$ gcloud components install kubectl
40
-
```
41
-
42
-
This will configure the kubernetes kubectl with the correct credentials for your account.
43
-
44
-
We can now create a Kubernetes cluster called ``mlbench`` by running
This sets the default context of kubectl to our newly created cluster.
57
-
58
-
### Installing Helm
59
-
60
-
[Helm](https://github.com/helm/helm/) is a package manager for Kubernetes applications. It helps install pre-defined distributed applications to clusters.
For helm to work properly, it needs a service account in the cluster with ``cluster-admin`` rights. We can set up an account with the correct privileges by running
69
-
70
-
```shell
71
-
$ kubectl --namespace kube-system create sa tiller
This creates a new service account with the correct privileges for the helm server component ``tiller``, which takes care of managing the deployment of pods to our cluster.
76
-
77
-
We can now initialize helm with our newly created service account
78
-
79
-
```shell
80
-
$ helm init --service-account tiller
81
-
```
82
-
83
-
After this, helm is set up and ready to deploy applications to our newly created cluster.
84
-
85
-
### Building the Master and Worker images
86
-
87
-
*Note: You can skip this part if you want to use the precompiled docker images*
88
-
89
-
To use custom images, we will have to host them in a docker registry. The [Google Cloud Container Registry](https://cloud.google.com/container-registry/) is an obvious choice for Google Cloud.
90
-
91
-
First we need to enable access to it on our commandline
92
-
93
-
```shell
94
-
$ gcloud auth configure-docker
95
-
```
96
-
97
-
We provide easy to use commands to build and deploy the images using ```make```. Make sure to correctly set the name of your Google Cloud project (``<gcloud project name>``) in the following commands
98
-
99
-
```shell
100
-
$ make publish-docker component=master docker_registry=gcr.io/<gcloud project name>
101
-
$ make publish-docker component=worker docker_registry=gcr.io/<gcloud project name>
102
-
```
103
-
104
-
This will build the ``master`` and ``worker`` docker images and push them to the Container Registry.
105
-
106
32
### Installing MLBench
107
33
108
-
Now copy the file `charts/mlbench/values.yaml` to the current directory, calling it `myvalues.yaml`.
34
+
Copy the file `values.yaml` to the current directory, calling it `myvalues.yaml`.
109
35
110
36
```shell
111
-
$ cp charts/mlbench/values.yaml myvalues.yaml
37
+
$ cp values.yaml myvalues.yaml
112
38
```
113
39
114
40
This file contains default values for most settings in mlbench. There are however some you need to set yourself to reasonable values for your cluster, namely:
115
41
116
42
```yaml
117
43
limits:
118
44
cpu: 1000m
119
-
gpu: 0
120
-
maximumWorkers: 3
45
+
workers: 3
121
46
bandwidth: 1000
122
-
```
123
-
124
-
This limits the maximum usable resources (And the maximum you are able to chose in the UI) to 1 CPU core , 0 GPUs, 1000 mbit/s network speed per node and 3 nodes total.
125
-
126
-
*Note: Our ``n1-standard-2`` instances have 2 CPU cores. But due to Google Cloud Kubernetes running its own monitoring and management pods, which also use some CPU, it is advisable to set MLBench to use one core less than available*
127
-
128
-
If you followed the previous section and built the docker images yourself, your ``myvalues.yaml`` file should look as follows (again, replace ``<gcloud project name>`` with your Google Cloud project name)
This limits the maximum usable resources (And the maximum you are able to chose in the UI) to 1 CPU (1000m = 1000 milli-CPUs) core , 0 GPUs, 1000 mbit/s network speed per node and 3 nodes total.
156
51
157
-
This creates Kubernetes templates based on the values set in ``myvalues.yaml``and installs them to our Kubernetes cluster, calling the release ``release1``.
52
+
*Note: ``n1-standard-2`` instances have 2 CPU cores. But due to Google Cloud Kubernetes running its own monitoring and management pods, which also use some CPU, it is advisable to set MLBench to use one core less than available*
158
53
159
-
*Note: Release names allow you to install multiple instances of the same helm chart side by side, but are not relevant for this tutorial*
54
+
With those values set, MLBench can be installed with the `google_cloud_setup.sh` script (Run `google_cloud_setup.sh help` to see all available options).
160
55
161
-
Since the deployment is not open to the internet by default, the default instructions printed by the previous command **do not apply**.
162
-
To gain access to MLBench, we need to add a firewall rule to Google Cloud
56
+
First, create a GKE cluster:
163
57
164
58
```shell
165
-
$ export NODE_PORT=$(kubectl get --namespace default -o jsonpath="{.spec.ports[0].nodePort}" services ${RELEASE_NAME}-mlbench-master)
This gets the public ip of the node the ``master`` image is deployed on, plus the randomly selected port it is running on, and adds a firewall rule allowing access to that port.
171
-
172
-
To get the URL the dashboard is accessible under, we can now just run
62
+
and then install the helm chart:
173
63
174
64
```shell
175
-
$ echo http://$NODE_IP:$NODE_PORT
176
-
http://172.16.0.1:32145
65
+
$ ./google_cloud_setup.sh install-chart
177
66
```
178
67
179
-
and itshould print the URL (In this example it printed ``http://172.16.0.1:32145``)
68
+
That's it, this should setup MLBench in your Google Kubernetes cluster. The Dashboard URL can be found in at the end of the output of the last command (e.g. `http://172.16.0.1:32145`).
180
69
181
70
Simply open the URL in your browser and you should be ready to go.
182
71
@@ -222,15 +111,21 @@ You can then see the details of the experiment by clicking on its entry in the l
222
111
223
112
That's it! You successfully ran an distributed machine learning algorithm in the cloud. You can also easily develop custom worker images for your own models and compare them to existing benchmarking code without a lot of overhead.
224
113
225
-
### Appendix 1: Run mlbench on GKE
226
114
227
-
The previous commands can be summarized as follows
115
+
### Cleanup
116
+
To delete MLBench, run :
117
+
118
+
```shell
119
+
$ ./google_cloud_setup.sh uninstall-chart
120
+
```
228
121
229
-
{% gist 23361aea5fe252570496acc7da4fb599 %}
122
+
To delete the whole Cluster (and cleanup firewall rules), run:
230
123
231
-
Customize environment variables like `NUM_NODES` and run the scripts with `create`, `install` and `dashboard` sequantially. The last command will give you the external ip of the dashboard. When the job is done, run this script with `cleanup` to delete everything.
124
+
```shell
125
+
$ ./google_cloud_setup.sh delete-cluster
126
+
```
232
127
233
-
### Appendix 2: Use NFS for Data storage
128
+
### Appendix 1: Use NFS for Data storage
234
129
To avoid downloading datasets everytime we reinstall mlbench, we can use a persistent disk to save the data. To do so, one can create a GCE disk like
0 commit comments