Deploying vAccel applications using Kubernetes🔗
This guide describes how to deploy a vAccel-enabled application, alongside the vAccel Agent with the Torch plugin using Kubernetes. It includes:
- Full explanation of the deployment architecture
- Sidecar-based co-location of client and agent
- Split deployment with remote agent access
- Dockerfiles for both client and agent
- Simple block diagram for visual reference
Architecture Overview🔗
The deployment includes:
- The vAccel Agent (running
vaccel-rpc-agent
) exposing a Unix socket or TCP endpoint. - The vAccel-enabled Application, packaged with the RPC plugin, which connects to the agent.
- A sidecar option to colocate both agent and client in one pod for latency-sensitive workloads.
- A split deployment option where the agent runs as a standalone DaemonSet.
Deployment Graph🔗
flowchart TD
subgraph Node1["K8s Node"]
subgraph PodA["Sidecar Pod"]
A1[vAccel Client]
A2[vAccel RPC Agent]
end
end
subgraph Node2["K8s Node"]
subgraph PodB["Client Only Pod"]
B1[vAccel Client]
end
subgraph PodC["Agent Pod"]
C1[vAccel RPC Agent]
end
end
B1 -->|RPC over vsock/tcp| C1
A1 -->|Unix socket| A2
We have gathered all necessary files in a Github repository. Its file structure is as follows:
├── docker
│  ├── Dockerfile
│  └── Dockerfile.agent
├── manifests
│  ├── client.yaml
│  ├── daemonset-agent.yaml
│  └── sidecar.yaml
└── README.md
In the docker
folder we include two Dockerfiles
, one for the client application and one for the vaccel agent. Feel free to build on the client Dockerfile
to create your own. To showcase our example we have already included our application binary in an example container image: harbor.nbfc.io/nubificus/vaccel-torch-bert-example:x86_64
.
The manifests
folder contains YAML files to deploy this example in both modes of operation.
vAccel Agent as a sidecar container🔗
The agent runs alongside the client application, in a single pod (sidecar.yaml
). Since this mode implicitly assumes that both containers share the same mount namespaces, we can use the UNIX
socket RPC transport. See the following snippet from the YAML file:
containers:
- name: vaccel-agent
volumeMounts:
- name: vaccel-sock
mountPath: /var/run/vaccel
...
- name: vaccel-client
...
volumeMounts:
- name: vaccel-sock
mountPath: /var/run/vaccel
...
volumes:
- name: vaccel-sock
emptyDir: {}
Note: The
ENTRYPOINT
of the Agent's Dockerfile is the following:
and the env var that points to the vAccel endpoint in the Client's Dockerfile is:
Logs from an example deployment are shown below.
Agent logs, before the client start:
$ kubectl logs vaccel-sidecar-pod -c vaccel-agent
[2025-06-22T22:30:03Z INFO ttrpc::sync::server] server listen started
[2025-06-22T22:30:03Z INFO ttrpc::sync::server] server started
[2025-06-22T22:30:03Z INFO vaccel_rpc_agent] vAccel RPC agent started
[2025-06-22T22:30:03Z INFO vaccel_rpc_agent] Listening on 'unix:///var/run/vaccel/vaccel.sock', press Ctrl+C to exit
Client logs:
$ kubectl logs vaccel-sidecar-pod -c vaccel-client
2025.06.22-22:30:03.73 - <info> vAccel 0.7.0-7-e67e52b6
2025.06.22-22:30:03.73 - <info> Registered plugin rpc 0.2.0-1-eca9e440
Processing 26954 lines from: /data/tweets.txt
== [Vocab Loaded] ==
2025.06.22-22:30:03.78 - <warn> Path does not seem to have a `<prefix>://`
2025.06.22-22:30:03.78 - <warn> Assuming cnn_trace.pt is a local path
vaccel_resource_new(): Time Taken: 68500 nanoseconds
Created new model resource 1
Initialized vAccel session 1
Line 1: Duration: 497.627 ms Prediction: neither
Line 2: Duration: 198.996 ms Prediction: offensive-language
Line 3: Duration: 359.865 ms Prediction: offensive-language
Line 4: Duration: 69.7109 ms Prediction: offensive-language
...
Agent logs, after client initial connection:
$ kubectl logs vaccel-sidecar-pod -c vaccel-agent
[2025-06-22T22:30:03Z INFO ttrpc::sync::server] server listen started
[2025-06-22T22:30:03Z INFO ttrpc::sync::server] server started
[2025-06-22T22:30:03Z INFO vaccel_rpc_agent] vAccel RPC agent started
[2025-06-22T22:30:03Z INFO vaccel_rpc_agent] Listening on 'unix:///var/run/vaccel/vaccel.sock', press Ctrl+C to exit
[2025-06-22T22:30:03Z INFO vaccel_rpc_agent::session] Created session 1
[2025-06-22T22:30:05Z INFO vaccel_rpc_agent::resource] Creating new resource
[2025-06-22T22:30:05Z INFO vaccel_rpc_agent::resource] Registering resource 1 with session 1
[2025-06-22T22:30:06Z INFO vaccel_rpc_agent::ops::torch] session:1 PyTorch jitload forward
[2025-06-22T22:30:06Z INFO vaccel_rpc_agent::ops::torch] session:1 PyTorch jitload forward
[2025-06-22T22:30:06Z INFO vaccel_rpc_agent::ops::torch] session:1 PyTorch jitload forward
...
vAccel Agent as a daemonset🔗
The Agent runs as a daemonset providing a service to the k8s cluster, available via a hostname and port. This mode offers a lot more flexibility, as it allows multiple agent deployments, on heterogeneous nodes, with diverse hardware characteristics. In our example we use a single x86_64
node, but more complex scenarios can be supported, with multiple GPUs/FPGAs or custom acceleration nodes. In this mode of operation we have two YAML files:
(a) one for the agent daemonset (daemoneset-agent.yaml
) that includes the service:
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: vaccel-agent
spec:
selector:
matchLabels:
app: vaccel-agent
template:
metadata:
labels:
app: vaccel-agent
spec:
containers:
- name: vaccel-agent
image: harbor.nbfc.io/nubificus/vaccel-torch-bert-example-agent:x86_64
command: ["vaccel-rpc-agent"]
args: ["-a", "tcp://0.0.0.0:8888"]
ports:
- containerPort: 8888
name: rpc
and the service:
apiVersion: v1
kind: Service
metadata:
name: vaccel-agent
labels:
app: vaccel-agent
spec:
ports:
- name: vaccel-agent
port: 8888
protocol: TCP
targetPort: 8888
selector:
app: vaccel-agent
sessionAffinity: None
(b) one for the client application (client.yaml
) that points to the respective vAccel agent service:
apiVersion: v1
kind: Pod
metadata:
name: vaccel-client
spec:
containers:
- name: vaccel-client
image: harbor.nbfc.io/nubificus/vaccel-torch-bert-example:x86_64
command: ["./build/classifier"]
args: ["-m", "cnn_trace.pt", "-v", "bert_cased_vocab.txt", "-f", "/data/tweets.txt"]
env:
- name: VACCEL_RPC_ADDRESS
value: "tcp://vaccel-agent:8888"
volumeMounts:
- name: tweets
mountPath: /data
volumes:
- name: tweets
hostPath:
path: /tmp/data
Deployment logs are shown below.
Agent & service instantiation:
$ kubectl apply -f daemonset-agent.yaml
daemonset.apps/vaccel-agent created
service/vaccel-agent created
$ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
vaccel-agent-p4zxg 1/1 Running 0 15s 10.244.43.21 node1 <none> <none>
$ kubectl logs vaccel-agent-p4zxg
[2025-06-22T22:34:34Z INFO ttrpc::sync::server] server listen started
[2025-06-22T22:34:34Z INFO ttrpc::sync::server] server started
[2025-06-22T22:34:34Z INFO vaccel_rpc_agent] vAccel RPC agent started
[2025-06-22T22:34:34Z INFO vaccel_rpc_agent] Listening on 'tcp://0.0.0.0:8888', press Ctrl+C to exit
$ kubectl get svc -o wide
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
vaccel-agent ClusterIP 10.244.21.70 <none> 8888/TCP 64s app=vaccel-agent
Client spawn:
$ kubectl apply -f client.yaml
pod/vaccel-client created
$ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
vaccel-agent-p4zxg 1/1 Running 0 2m9s 10.244.43.21 node1 <none> <none>
vaccel-client 1/1 Running 0 11s 10.244.43.1 node1 <none> <none>
$ kubectl logs vaccel-client
2025.06.22-22:36:32.01 - <info> vAccel 0.7.0-7-e67e52b6
2025.06.22-22:36:32.02 - <info> Registered plugin rpc 0.2.0-1-eca9e440
Processing 26954 lines from: /data/tweets.txt
== [Vocab Loaded] ==
2025.06.22-22:36:32.07 - <warn> Path does not seem to have a `<prefix>://`
2025.06.22-22:36:32.07 - <warn> Assuming cnn_trace.pt is a local path
vaccel_resource_new(): Time Taken: 67057 nanoseconds
Created new model resource 1
Initialized vAccel session 1
Line 1: Duration: 551.823 ms Prediction: neither
Line 2: Duration: 279.833 ms Prediction: offensive-language
Line 3: Duration: 413.417 ms Prediction: offensive-language
Line 4: Duration: 120.675 ms Prediction: offensive-language
Line 5: Duration: 185.506 ms Prediction: neither
...
Next Steps🔗
- Customize your Torch model or plugin inside the client image
- Test workloads in the sidecar first for simplified debugging
- Scale via Deployment/Job or integrate with KNative