Release Notes#

v1.1.0#

Release date: March 23, 2026

qb Runtime v1.1.0 brings automatic core mode selection, data type query APIs, and performance optimizations.

Highlights#

CoreMode::Auto#

The runtime can now automatically select the available core mode for your model. Setting CoreMode::Auto in your ModelConfig enables the runtime to detect and apply the appropriate core mode from the MXQ. Previously, non-default core modes such as Multi, Global4, and Global8 required manual ModelConfig construction; with Auto mode, the available core mode is selected automatically. Since the default constructor also uses Auto mode, no additional configuration is needed in most cases.

Note

If the MXQ was compiled with a flag like scheme="all" that produces multiple core modes, you must still select the core mode manually as before.

See also

For more details, see setAutoCoreMode().

New APIs#

REGULUS Dynamic Allocation#

The dynamic allocation approach introduced in v1.0.0 has been applied to REGULUS as well, ensuring a consistent usage pattern.

Performance Improvements#

  • Improved data transfer performance to NPU devices on Windows.

  • Optimized internal type conversion.

Bug Fixes#

  • Resolved a compile error caused by std::filesystem on GCC versions below 9.

  • Fixed an intermittent deadlock in certain models.

Breaking Changes#

  • The supported REGULUS driver revision number changes from REV0 to REV1.

For the complete changelog, see the Changelog page.


v1.0.0 — Major Release#

Release date: January 31, 2026

Update_illust

This update includes significant improvements across the internal architecture and the SDK qb as a whole. We focused on scalability, consistency, and a structural refactor for future expansion.

Highlights#

SDK qb Naming Unification#

Previously, different components used different names, which could be confusing for users new to the SDK qb. To address this, we unified the names of key SDK qb components as follows:

  • Runtime library maccel → qb Runtime

  • Compiler qubee → qb Compiler

This naming unification makes the roles and relationships between SDK qb components more intuitive and enables a more consistent user experience in documentation and future feature expansions.

Model Count Limit Removed#

Previously, the number of models that could run concurrently was limited by the number of NPU cores. This update removes that restriction by improving the underlying design.

  • Models compiled with the latest qb Compiler can be loaded and executed concurrently within available DRAM, regardless of the core mode specified at compile time.

Benefits include:

  • More flexibility in services that run multiple models simultaneously

  • Ability to run models built for different core modes at the same time

  • Removal of core constraints that affected large models such as LLMs

This change is based on internal runtime optimizations. For users, any model compiled as MXQv7 can take advantage of it without code changes.

Multithreading Performance Improvements#

With this update, the C++ library provides .setActivationSlots(int num) and the Python API provides .set_activation_slots(num) to more freely optimize pipelining between NPU inference and data transfer.

These functions allow you to control the number of input slots for a model. Using more slots increases NPU memory usage, but enables more effective pipelining and improves performance in multithreaded workloads.

NOTE: For models that use cache (e.g., LLMs), the activation slot count is currently limited to 1.

uint8 Inference Support#

This update officially supports uint8 integer inference.

  • uint8 quantized models can be compiled with qb Compiler

  • qb Runtime supports inference execution for these models

This enables reduced CPU overhead during preprocessing for models that use uint8 inputs.

Migration Guide#

Due to the naming unification, packages, headers, and module names have changed. Legacy packages are no longer maintained.

Installation#

I. Update APT Package Index#

Before installing any packages, update the APT package index:

sudo apt update
II. Install Runtime Library#

Runtime library package name has been changed from mobilint-npu-runtime to mobilint-qb-runtime.

# C++ library
sudo apt install mobilint-qb-runtime

# Python package
pip install mobilint-qb-runtime
III. Install Driver#

Driver package names have also changed according to the new naming policy from aries-driver to mobilint-aries-driver.

sudo apt install mobilint-aries-driver

C++ Library Changes#

  • Compilation/linking flags updated

    # Previous build
    g++ -o example example.cpp -lmaccel
    
    # Updated build
    g++ -o example example.cpp -lqbruntime
    
  • Header path updated

    // Previous header
    # include "maccel/maccel.h"
    
    // Updated header
    # include "qbruntime/qbruntime.h"
    

Python Package Changes#

  • Module name updated

    # Previous module
    import maccel
    
    # Updated module
    import qbruntime