vAccel on k8s using Kata-containers & Firecracker
Prerequisites
In order to run vAccel on Kata containers with Firecracker you need to meet the following prerequisites on each k8s node that will be used for acceleration:
- containerd as container manager
- devicemapper as CRI plugin default snapshotter (info)
- nvidia GPU which supports CUDA (for now) (info)
- jetson-inference libraries (libjetson-inference.so must be installed and properly linked with CUDA libraries) (info)
Quick start
Deploy vAccel with Kata
We rely on kata-containers/kata-deploy to create the vaccel-kata-deploy daemon. Our fork repo can be found on cloudkernels/packaging.We are working on building a Kata Containers release with vAccel support.
Label each node where vAccel-kata should be deployed:
$ kubectl label nodes <your-node-name> vaccel=true
Create service account and cluster role for the kata-deploy daemon
$ kubectl apply -f https://raw.githubusercontent.com/cloudkernels/packaging/vaccel-dev/kata-deploy/kata-rbac/base/kata-rbac.yaml
Install vAccel-kata on each "vaccel=true" node:
$ kubectl apply -f https://raw.githubusercontent.com/cloudkernels/packaging/vaccel-dev/kata-deploy/kata-deploy/base/kata-deploy.yaml
# or for k3s
$ k3s kubectl apply -k github.com/cloudkernels/packaging/kata-deploy/kata-deploy/overlays/k3s?ref=vaccel-dev
The kata-deploy daemon calls the vAccel download script. It may take a few minutes to download the ML Inference models.
$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system kata-deploy-575tm 1/1 Running 0 101m
...
...
Check the pod logs to be sure that the installation is complete. You should see something like the following:
$ kubectl -n kube-system logs kata-deploy-575tm
...
...
Done! containerd-shim-kata-v2 is now configured to run Firecracker with vAccel
node/node3.nubificus.com labeled
That's it! You are now ready to accelerate your functions on Kubernetes with vAccel.
Alternatively use the following daemon which already contains all the vAccel artifacts and required components in the container image. The image is slightly bigger than before as it already contains jetson inference models.
$ kubectl apply -k github.com/cloudkernels/packaging/kata-deploy/kata-deploy/overlays/full?ref=vaccel-dev
# or for k3s
$ k3s kubectl apply -k github.com/cloudkernels/packaging/kata-deploy/kata-deploy/overlays/full-k3s?ref=vaccel-dev
Don't forget to create a RuntimeClass in order to run your workloads with vAccel enabled kata runtime
$ kubectl apply -f https://raw.githubusercontent.com/cloudkernels/packaging/vaccel-dev/kata-deploy/k8s-1.14/kata-fc-runtimeClass.yaml
Deploy an image classification function as a Service
The following will deploy a custom HTTP server that routes POST requests to a handler. The handler gets an image from the POST body and calls vAccel to perform image-classification operation using the GPU.
$ kubectl create -f https://raw.githubusercontent.com/cloudkernels/packaging/vaccel-dev/kata-deploy/examples/web-classify.yaml
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
web-classify-kata-fc-5f44fd448f-mtvlv 1/1 Running 0 92m
web-classify-kata-fc-5f44fd448f-h7j84 1/1 Running 0 92m
$ kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
web-classify-kata-fc ClusterIP 10.43.214.52 <none> 80/TCP 91m
Run a curl
command (from a cluster node) like the following to send your POST request to the Service web-classify-kata-fc (to access the Service from outside the cluster use nodePort or deploy an ingress route)
$ wget https://pbs.twimg.com/profile_images/1186928115571941378/1B6zKjc3_400x400.jpg -O - | curl -L -X POST 10.43.214.52:80/classify --data-binary @-
And see the result of the image classification!
--2021-02-05 20:17:15-- https://pbs.twimg.com/profile_images/1186928115571941378/1B6zKjc3_400x400.jpg
Resolving pbs.twimg.com (pbs.twimg.com)... 2606:2800:134:fa2:1627:1fe:edb:1665, 192.229.233.50
Connecting to pbs.twimg.com (pbs.twimg.com)|2606:2800:134:fa2:1627:1fe:edb:1665|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 12605 (12K) [image/jpeg]
Saving to: 'STDOUT'
- 100%[================================================>] 12.31K --.-KB/s in 0.04s
2021-02-05 20:17:15 (296 KB/s) - written to stdout [12605/12605]
["web-classify-kata-fc-567bddccc4-s79b5"]: "29.761% wall clock"
Cleanup everything
Delete the web-classify-fc deployment and service:
$ kubectl delete -f https://raw.githubusercontent.com/cloudkernels/packaging/vaccel-dev/kata-deploy/examples/web-classify.yaml
Delete the daemon: (removes artifacts from host paths /opt/vaccel & /opt/kata and restores containerd configuration)
$ kubectl delete -f https://raw.githubusercontent.com/cloudkernels/packaging/vaccel-dev/kata-deploy/kata-deploy/base/kata-deploy.yaml
# or for k3s
$ k3s kubectl delete -k github.com/cloudkernels/packaging/kata-deploy/kata-deploy/overlays/k3s?ref=vaccel-dev
or in case you deployed the full vAccel overlay:
$ kubectl delete -k github.com/cloudkernels/packaging/kata-deploy/kata-deploy/overlays/full?ref=vaccel-dev
# or for k3s
$ k3s kubectl delete -k github.com/cloudkernels/packaging/kata-deploy/kata-deploy/overlays/full-k3s?ref=vaccel-dev
Reset the runtime and remove kata related labels from nodes:
$ kubectl apply -f https://raw.githubusercontent.com/cloudkernels/packaging/vaccel-dev/kata-deploy/kata-cleanup/base/kata-cleanup.yaml
# or for k3s
$ k3s kubectl apply -k github.com/cloudkernels/packaging/kata-deploy/kata-cleanup/overlays/k3s?ref=vaccel-dev
Delete the kata-fc RuntimeClass and the rbac
$ kubectl delete -f https://raw.githubusercontent.com/cloudkernels/packaging/vaccel-dev/kata-deploy/k8s-1.14/kata-fc-runtimeClass.yaml
$ kubectl delete -f https://raw.githubusercontent.com/cloudkernels/packaging/vaccel-dev/kata-deploy/kata-rbac/base/kata-rbac.yaml
Delete the cleanup daemon
$ kubectl delete -f https://raw.githubusercontent.com/cloudkernels/packaging/vaccel-dev/kata-deploy/kata-cleanup/base/kata-cleanup.yaml
# or for k3s
$ k3s kubectl delete -k github.com/cloudkernels/packaging/kata-deploy/kata-cleanup/overlays/k3s?ref=vaccel-dev
Remove vaccel=true
from each node
$ kubectl label nodes <your-node-name> vaccel=true-