SDK qb v1.0.0 Update Note

SDK qb v1.0.0 Update Note#

Update_illust

This update does not introduce major changes in usage from the user’s perspective, but it is a major update that includes significant improvements across the internal architecture and the SDK qb as a whole. We focused on scalability, consistency, and a structural refactor for future expansion.

1. Mobilint SDK qb Naming Unification#

Previously, different components used different names, which could be confusing for users new to the SDK qb. To address this, we unified the names of key SDK qb components as follows:

Runtime library maccel → qb Runtime
Compiler qubee → qb Compiler

This naming unification makes the roles and relationships between SDK qb components more intuitive and enables a more consistent user experience in documentation and future feature expansions.

Installation#

Due to a package naming policy change, legacy packages are no longer maintained. Please use the updated packages listed below.

I. Update APT Package Index#

Before installing any packages, update the APT package index:

sudo apt update

II. Install Runtime Library#

Runtime library package name has been changed from mobilint-npu-runtime to mobilint-qb-runtime.

# C++ library
sudo apt install mobilint-qb-runtime

# Python package
pip install mobilint-qb-runtime

III. Install Driver#

Driver package names have also changed according to the new naming policy from aries-driver to mobilint-aries-driver.

sudo apt install mobilint-aries-driver

C++ Library Changes#

Compilation/linking flags updated

# Previous build
g++ -o example example.cpp -lmaccel

# Updated build
g++ -o example example.cpp -lqbruntime

Header path updated

// Previous header
# include "maccel/maccel.h"

// Updated header
# include "qbruntime/qbruntime.h"

Python Package Changes#

Module name updated

# Previous module
import maccel

# Updated module
import qbruntime

2. Model Count Limit Removed#

Previously, the number of models that could run concurrently was limited by the number of NPU cores. This update removes that restriction by improving the underlying design.

Models compiled with the latest qb Compiler can be loaded and executed concurrently within available DRAM, regardless of the core mode specified at compile time.

Benefits include:

More flexibility in services that run multiple models simultaneously
Ability to run models built for different core modes at the same time
Removal of core constraints that affected large models such as LLMs

This change is based on internal runtime optimizations. For users, any model compiled as MXQv7 can take advantage of it without code changes.

3. Multithreading Performance Improvements#

With this update, the C++ library provides .setActivationSlots(int num) and the Python API provides .set_activation_slots(num) to more freely optimize pipelining between NPU inference and data transfer.

These functions allow you to control the number of input slots for a model. Using more slots increases NPU memory usage, but enables more effective pipelining and improves performance in multithreaded workloads.

NOTE: For models that use cache (e.g., LLMs), the activation slot count is currently limited to 1.

4. uint8 Inference Support#

This update officially supports uint8 integer inference.

uint8 quantized models can be compiled with qb Compiler
qb Runtime supports inference execution for these models

This enables reduced CPU overhead during preprocessing for models that use uint8 inputs.

Summary#

This major update focuses on internal scalability, consistency, and performance optimizations rather than visible UI or usage changes.

SDK qb will continue to evolve with ongoing performance and feature improvements.