Kubeflow が MicroK8s の Addon になってた

以前 Kubeflow を MiniKF (Minikube ベースの VirtualBox イメージ) で導入しました。

blog.kondoumh.com

その後 Ubuntu 19.10 のリリース時、Kubeflow が MicroK8s の Addon として導入できるようになるというアナウンスがありました。

ubuntu.com

MicroK8s は Snap で簡単にインストールできる Kubernetes 環境です。Addon として Dashboard、CoreDNS、Istio などを microk8s.enable コマンドで簡単に導入できます。

blog.kondoumh.com

そして、つい先日の MicroK8s 1.17 - 11 December 2019 で Addon として追加された模様です。

New addon: kubeflow. Give it a try with microk8s.enable kubeflow.

Release notes | MicroK8s

MicroK8s なので Ubuntu 19.10 が必須というわけでもなさそうですが、環境を用意して導入してみました。

まず Snap で MicroK8s をインストール

$ sudo snap install microk8s --classic
microk8s v1.17.0 from Canonical✓ installed

ユーザーを microk8s グループに追加してログインしなおし。

$ sudo usermod -a -G microk8s hoge

ステータスを確認。kubeflow は disable になっています。

$ microk8s.status --wait-ready
microk8s is running
addons:
cilium: disabled
dashboard: disabled
dns: disabled
fluentd: disabled
gpu: disabled
helm: disabled
ingress: disabled
istio: disabled
jaeger: disabled
juju: disabled
knative: disabled
kubeflow: disabled
linkerd: disabled
metallb: disabled
metrics-server: disabled
prometheus: disabled
rbac: disabled
registry: disabled
storage: disabled

microk8s.kubectl を kubectl として実行できるよう alias を設定

$ sudo snap alias microk8s.kubectl kubectl
Added:
  - microk8s.kubectl as kubectl

kubeflow を enable します。dns , storage など依存するアドオンも enable され最後に kubeflow のデプロイが開始されます。

Pod が大量に起動され、Service が Ready になるまでに 30分ほどかかりました。

$ microk8s.enable kubeflow
Enabling dns...
Enabling storage...
Enabling dashboard...
Enabling ingress...
Enabling rbac...
Enabling juju...
Deploying Kubeflow...
Kubeflow deployed.
Waiting for operator pods to become ready.
Waited 0s for operator pods to come up, 28 remaining.
Waited 15s for operator pods to come up, 27 remaining.
Waited 30s for operator pods to come up, 27 remaining.
Waited 45s for operator pods to come up, 27 remaining.
Waited 60s for operator pods to come up, 25 remaining.
Waited 75s for operator pods to come up, 21 remaining.
Waited 90s for operator pods to come up, 19 remaining.
Waited 105s for operator pods to come up, 18 remaining.
Waited 120s for operator pods to come up, 17 remaining.
Waited 135s for operator pods to come up, 15 remaining.
Waited 150s for operator pods to come up, 13 remaining.
Waited 165s for operator pods to come up, 13 remaining.
Waited 180s for operator pods to come up, 11 remaining.
Waited 195s for operator pods to come up, 10 remaining.
Waited 210s for operator pods to come up, 8 remaining.
Waited 225s for operator pods to come up, 4 remaining.
Waited 240s for operator pods to come up, 3 remaining.
Operator pods ready.
Waiting for service pods to become ready.

Congratulations, Kubeflow is now available.
The dashboard is available at https://localhost/

    Username: admin
    Password: XXXXXXXXXXXXXXXXXXXXXXX

To see these values again, run:
    microk8s.juju config kubeflow-gatekeeper username
    microk8s.juju config kubeflow-gatekeeper password


To tear down Kubeflow and associated infrastructure, run:
   microk8s.disable kubeflow

kubeflow というnamespace が作られ、多くの deployment や service が作られています。0.5 の時よりかなり増えてる感じです。

$ kubectl get deploy,svc -n kubeflow
NAME                                          READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/ambassador                    1/1     1            1           19m
deployment.apps/argo-controller               1/1     1            1           16m
deployment.apps/argo-ui                       1/1     1            1           19m
deployment.apps/jupyter-controller            1/1     1            1           19m
deployment.apps/jupyter-web                   1/1     1            1           18m
deployment.apps/katib-controller              1/1     1            1           18m
deployment.apps/katib-manager                 1/1     1            1           17m
deployment.apps/katib-ui                      1/1     1            1           17m
deployment.apps/kubeflow-dashboard            1/1     1            1           17m
deployment.apps/kubeflow-gatekeeper           1/1     1            1           17m
deployment.apps/kubeflow-login                1/1     1            1           17m
deployment.apps/kubeflow-profiles             1/1     1            1           17m
deployment.apps/metacontroller                1/1     1            1           17m
deployment.apps/metadata                      1/1     1            1           15m
deployment.apps/metadata-ui                   1/1     1            1           14m
deployment.apps/modeldb-backend               1/1     1            1           15m
deployment.apps/modeldb-store                 1/1     1            1           16m
deployment.apps/modeldb-ui                    1/1     1            1           15m
deployment.apps/pipelines-api                 1/1     1            1           15m
deployment.apps/pipelines-persistence         1/1     1            1           15m
deployment.apps/pipelines-scheduledworkflow   1/1     1            1           15m
deployment.apps/pipelines-ui                  1/1     1            1           14m
deployment.apps/pipelines-viewer              1/1     1            1           14m
deployment.apps/pytorch-operator              1/1     1            1           14m
deployment.apps/tf-job-dashboard              1/1     1            1           15m
deployment.apps/tf-job-operator               1/1     1            1           15m

NAME                                           TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE
service/ambassador                             ClusterIP   10.152.183.73    <none>        80/TCP              19m
service/ambassador-operator                    ClusterIP   10.152.183.100   <none>        30666/TCP           19m
service/argo-controller-operator               ClusterIP   10.152.183.215   <none>        30666/TCP           19m
service/argo-ui                                ClusterIP   10.152.183.55    <none>        8001/TCP            19m
service/argo-ui-operator                       ClusterIP   10.152.183.134   <none>        30666/TCP           19m
service/jupyter-controller-operator            ClusterIP   10.152.183.104   <none>        30666/TCP           19m
service/jupyter-web                            ClusterIP   10.152.183.191   <none>        5000/TCP            18m
service/jupyter-web-operator                   ClusterIP   10.152.183.205   <none>        30666/TCP           19m
service/katib-controller                       ClusterIP   10.152.183.200   <none>        443/TCP             18m
service/katib-controller-operator              ClusterIP   10.152.183.189   <none>        30666/TCP           18m
service/katib-db                               ClusterIP   10.152.183.45    <none>        3306/TCP            18m
service/katib-db-endpoints                     ClusterIP   None             <none>        <none>              18m
service/katib-db-operator                      ClusterIP   10.152.183.48    <none>        30666/TCP           18m
service/katib-manager                          ClusterIP   10.152.183.222   <none>        6789/TCP            17m
service/katib-manager-operator                 ClusterIP   10.152.183.167   <none>        30666/TCP           18m
service/katib-ui                               ClusterIP   10.152.183.22    <none>        80/TCP              17m
service/katib-ui-operator                      ClusterIP   10.152.183.199   <none>        30666/TCP           18m
service/kubeflow-dashboard                     ClusterIP   10.152.183.237   <none>        8082/TCP            17m
service/kubeflow-dashboard-operator            ClusterIP   10.152.183.66    <none>        30666/TCP           18m
service/kubeflow-gatekeeper                    ClusterIP   10.152.183.240   <none>        8085/TCP            17m
service/kubeflow-gatekeeper-operator           ClusterIP   10.152.183.2     <none>        30666/TCP           17m
service/kubeflow-login                         ClusterIP   10.152.183.31    <none>        5000/TCP            17m
service/kubeflow-login-operator                ClusterIP   10.152.183.252   <none>        30666/TCP           17m
service/kubeflow-profiles                      ClusterIP   10.152.183.239   <none>        8081/TCP            17m
service/kubeflow-profiles-operator             ClusterIP   10.152.183.23    <none>        30666/TCP           18m
service/metacontroller                         ClusterIP   10.152.183.112   <none>        9999/TCP            17m
service/metacontroller-operator                ClusterIP   10.152.183.13    <none>        30666/TCP           18m
service/metadata                               ClusterIP   10.152.183.164   <none>        8080/TCP            15m
service/metadata-db                            ClusterIP   10.152.183.209   <none>        3306/TCP            16m
service/metadata-db-endpoints                  ClusterIP   None             <none>        <none>              16m
service/metadata-db-operator                   ClusterIP   10.152.183.67    <none>        30666/TCP           16m
service/metadata-operator                      ClusterIP   10.152.183.121   <none>        30666/TCP           15m
service/metadata-ui                            ClusterIP   10.152.183.182   <none>        3000/TCP            14m
service/metadata-ui-operator                   ClusterIP   10.152.183.6     <none>        30666/TCP           16m
service/minio                                  ClusterIP   10.152.183.213   <none>        9000/TCP            16m
service/minio-endpoints                        ClusterIP   None             <none>        <none>              16m
service/minio-operator                         ClusterIP   10.152.183.136   <none>        30666/TCP           17m
service/modeldb-backend                        ClusterIP   10.152.183.69    <none>        8085/TCP,8080/TCP   15m
service/modeldb-backend-operator               ClusterIP   10.152.183.94    <none>        30666/TCP           16m
service/modeldb-db                             ClusterIP   10.152.183.155   <none>        3306/TCP            15m
service/modeldb-db-endpoints                   ClusterIP   None             <none>        <none>              15m
service/modeldb-db-operator                    ClusterIP   10.152.183.56    <none>        30666/TCP           16m
service/modeldb-store                          ClusterIP   10.152.183.20    <none>        8086/TCP            16m
service/modeldb-store-operator                 ClusterIP   10.152.183.224   <none>        30666/TCP           17m
service/modeldb-ui                             ClusterIP   10.152.183.7     <none>        3000/TCP            15m
service/modeldb-ui-operator                    ClusterIP   10.152.183.130   <none>        30666/TCP           16m
service/pipelines-api                          ClusterIP   10.152.183.43    <none>        8887/TCP,8888/TCP   15m
service/pipelines-api-operator                 ClusterIP   10.152.183.34    <none>        30666/TCP           16m
service/pipelines-db                           ClusterIP   10.152.183.53    <none>        3306/TCP            16m
service/pipelines-db-endpoints                 ClusterIP   None             <none>        <none>              16m
service/pipelines-db-operator                  ClusterIP   10.152.183.32    <none>        30666/TCP           16m
service/pipelines-persistence-operator         ClusterIP   10.152.183.161   <none>        30666/TCP           15m
service/pipelines-scheduledworkflow-operator   ClusterIP   10.152.183.74    <none>        30666/TCP           15m
service/pipelines-ui                           ClusterIP   10.152.183.68    <none>        3000/TCP,3001/TCP   14m
service/pipelines-ui-operator                  ClusterIP   10.152.183.135   <none>        30666/TCP           15m
service/pipelines-viewer                       ClusterIP   10.152.183.61    <none>        8001/TCP            14m
service/pipelines-viewer-operator              ClusterIP   10.152.183.25    <none>        30666/TCP           15m
service/pytorch-operator-operator              ClusterIP   10.152.183.208   <none>        30666/TCP           14m
service/tf-job-dashboard                       ClusterIP   10.152.183.179   <none>        8080/TCP            15m
service/tf-job-dashboard-operator              ClusterIP   10.152.183.49    <none>        30666/TCP           15m
service/tf-job-operator-operator               ClusterIP   10.152.183.241   <none>        30666/TCP           15m

ブラウザで localhost に接続するとログイン画面が出るのでインストール完了時に表示された admin ユーザとパスワードでログインします。パスワードは microk8s.juju コマンドで再表示可能です。

$ microk8s.juju config kubeflow-gatekeeper password

おなじみの kubeflow の dashboard が表示されました。

f:id:kondoumh:20191215111758p:plain

v0.6.0-rc.0 が入ったようです。現在最新版は 0.7 で次は 1.0 になるみたいです。

kubeflow/ROADMAP.md at master · kubeflow/kubeflow · GitHub

ということで、MicroK8s の近い将来のリリースで Kubeflow 1.0 が入るようになるでしょう。

Kubeflow は導入障壁がまだ高いのですが、MicroK8s の Addon になるとかなり手軽になりますね。Kubeflow を利用する開発環境の構築にも良さそうです。今回は、GCP に 8 vCPU 30GB メモリという強めな VM を用意し GNOME と xrdp でリモートデスクトップ接続しました。Core i7 で 32GB メモリ程度の LInux マシンが用意できればけっこう使える環境が作れそうです。