Mobilint Device Plugin for Kubernetes#
The Mobilint Device Plugin integrates ARIES NPUs with Kubernetes so that you can request an NPU the same way you request a CPU or GPU.
Overview#
The Mobilint Device Plugin implements the Kubernetes Device Plugin API to register ARIES NPUs as Kubernetes resources.
After installation, the kubelet publishes the node’s ARIES devices as the mobilint.com/npu resource, and you request that resource from a Pod.
When a Pod is allocated an NPU, the device plugin uses the Container Device Interface (CDI) to inject the selected device into the container.
Prerequisites#
The ARIES driver must be installed on every NPU node. See Driver Installation. Verify on each node with:
lsmod | grep aries ls /dev/aries*
The container runtime must support CDI (Container Device Interface).
Runtime
Version
containerd
1.7+
CRI-O
1.23+
Installation#
1. Label the NPU nodes#
The device plugin is deployed only to nodes that carry the mobilint.com/npu.present=true label. Label each NPU node:
kubectl label node <NODE_NAME> mobilint.com/npu.present=true --overwrite
List node names with kubectl get nodes. You can automate this step with Node Feature Discovery (see NFD Integration below).
2. Install the device plugin#
Install with Helm:
helm install mobilint-device-plugin \
oci://ghcr.io/mobilint/charts/mobilint-device-plugin \
-n kube-system
If you are not using Helm, apply the DaemonSet manifest directly:
kubectl apply -f https://raw.githubusercontent.com/mobilint/mobilint-device-plugin/main/deploy/daemonset.yaml
Verifying the Installation#
Check that the device plugin Pod is running:
kubectl -n kube-system get pods \
-l app.kubernetes.io/name=mobilint-device-plugin
The device plugin Pod should be READY 1/1 on every NPU node.
Check that the node advertises the NPU resource:
kubectl get node <NODE_NAME> \
-o jsonpath='{.status.allocatable.mobilint\.com/npu}'
It prints the number of NPUs on the node (for example, 4).
Using It in a Workload#
Request an NPU by setting mobilint.com/npu under the Pod’s resources.limits:
apiVersion: v1
kind: Pod
metadata:
name: npu-example
spec:
containers:
- name: app
image: ubuntu:latest
command: ["sleep", "infinity"]
resources:
limits:
mobilint.com/npu: 1
Save the manifest as npu-example.yaml, apply it, and confirm the NPU device is visible inside the container:
kubectl apply -f npu-example.yaml
kubectl exec -it npu-example -- ls -l /dev/aries*
Monitoring#
The device plugin exposes metrics and status endpoints on port :9400 on each node.
Endpoint |
Description |
|---|---|
|
Per-device NPU telemetry in Prometheus text format |
|
Details of the processes currently using the NPU (JSON) |
|
Readiness probe (returns 200 once kubelet registration completes) |
The endpoints are always served from the Pod. To have Prometheus scrape them, enable the scrape configuration below.
If you use Prometheus Operator, enable the Service and ServiceMonitor when installing with Helm:
helm install mobilint-device-plugin \
oci://ghcr.io/mobilint/charts/mobilint-device-plugin \
-n kube-system \
--set metrics.service.enabled=true \
--set metrics.serviceMonitor.enabled=true
Metrics#
Metric |
Type |
Unit |
Description |
|---|---|---|---|
|
gauge |
0/1 |
Whether the monitor sample was read successfully |
|
gauge |
— |
Static info (model, driver/firmware version, PCIe, etc.) exposed as labels; value is always 1 |
|
gauge |
°C |
Die temperature |
|
gauge |
Hz |
NPU core clock |
|
gauge |
Hz |
NoC (interconnect) clock |
|
gauge |
W |
Total power |
|
gauge |
A |
Total current |
|
gauge |
V |
Total voltage |
|
gauge |
% |
Cooling fan duty |
|
gauge |
— |
Number of open file descriptors on the device |
|
gauge |
bytes |
Total NPU memory |
|
gauge |
bytes |
Used NPU memory |
|
gauge |
0–1 |
Overall NPU utilization |
|
gauge |
— |
Number of processes using the NPU |
|
gauge |
0–1 |
Per-core utilization ( |
Per-process Details (/process)#
mobilint_npu_process_count provides only the process count. Details for individual processes, such as memory and utilization, are available from the /process JSON endpoint.
kubectl -n kube-system port-forward <device-plugin-pod> 9400:9400
curl localhost:9400/process
[
{
"device": "aries0",
"processes": [
{ "pid": 420300, "memory_used_bytes": 3890802880, "utilization": 0.712 }
]
}
]
NFD Integration#
Node Feature Discovery (NFD) can apply the mobilint.com/npu.present=true label automatically through the NodeFeatureRule that Mobilint provides.
First, install NFD:
helm repo add nfd https://kubernetes-sigs.github.io/node-feature-discovery/charts
helm install nfd nfd/node-feature-discovery \
-n node-feature-discovery --create-namespace \
--set master.extraLabelNs={mobilint.com}
Then enable NFD integration when installing the device plugin:
helm install mobilint-device-plugin \
oci://ghcr.io/mobilint/charts/mobilint-device-plugin \
-n kube-system \
--set nodeFeatureDiscovery.enabled=true
Uninstalling#
If you installed with Helm:
helm uninstall mobilint-device-plugin -n kube-system
If you installed from the manifest:
kubectl delete -f https://raw.githubusercontent.com/mobilint/mobilint-device-plugin/main/deploy/daemonset.yaml
Remove the node label as well:
kubectl label node <NODE_NAME> mobilint.com/npu.present-