C++ API Reference

C++ API Reference#

SDK qb Runtime Library: C++ API Reference
SDK qb Runtime Library v1.0
MCS001-
C++ API Reference

C++ API provides core functionalities for the NPU. More...

Classes

class  mobilint::Accelerator
 Represents an accelerator, i.e., an NPU, used for executing models. More...
class  mobilint::FutureImpl< T >
class  mobilint::Future< T >
 Represents a future for retrieving the result of asynchronous inference. More...
class  mobilint::Model
 Represents an AI model loaded from an MXQ file. More...
class  mobilint::ModelVariantHandle
 Handle to a specific variant of a loaded model. More...
class  mobilint::NDArray< T >
 A class representing an N-dimensional array (NDArray). More...
struct  mobilint::Scale
 Struct for scale values. More...
struct  mobilint::CoreId
 Represents a unique identifier for an NPU core. More...
struct  mobilint::Buffer
 A simple byte-sized buffer. More...
struct  mobilint::BufferInfo
 Struct representing input/output buffer information. More...
class  mobilint::ModelConfig
 Configures a core mode and core allocation of a model for NPU inference. More...
struct  mobilint::CacheInfo
 Struct representing KV-cache information. More...

Enumerations

enum class  mobilint::StatusCode {
  StatusCode::OK = 0 ,
  StatusCode::InternalError = 23 ,
  StatusCode::NotImplemented = 18 ,
  StatusCode::BadAlloc = 39 ,
  StatusCode::Acc_CoreAlreadyInUse = 11 ,
  StatusCode::Acc_NPUTimeout = 42 ,
  StatusCode::Acc_NoIMemInitFound = 43 ,
  StatusCode::Acc_NoSuchModel = 12 ,
  StatusCode::Acc_TaskQueueNotFound = 28 ,
  StatusCode::Driver_FailedToAllocateHostMemory = 13 ,
  StatusCode::Driver_FailedToAllocateModelMemory = 44 ,
  StatusCode::Driver_FailedToBuildCmaIoReq = 45 ,
  StatusCode::Driver_FailedToClaimCores = 24 ,
  StatusCode::Driver_FailedToFreeModelMemory = 46 ,
  StatusCode::Driver_FailedToLockCore = 49 ,
  StatusCode::Driver_FailedToPostInfer = 33 ,
  StatusCode::Driver_FailedToReadMemoryBuffer = 15 ,
  StatusCode::Driver_FailedToUnclaimCores = 25 ,
  StatusCode::Driver_FailedToUnlockCore = 50 ,
  StatusCode::Driver_FailedToWaitDone = 34 ,
  StatusCode::Driver_FailedToWriteMemoryBuffer = 14 ,
  StatusCode::Driver_NotInitialized = 1 ,
  StatusCode::Driver_WaitDoneTimeout = 35 ,
  StatusCode::Driver_WrongBaseAddress = 47 ,
  StatusCode::MemoryPool_AllocatorNotSet = 30 ,
  StatusCode::MemoryPool_BufNotFound = 29 ,
  StatusCode::Model_AlreadyLaunched = 65 ,
  StatusCode::Model_AsyncPipelineCheckFailed = 55 ,
  StatusCode::Model_AsyncPipelineNotAlive = 56 ,
  StatusCode::Model_AsyncPipelineTimeout = 57 ,
  StatusCode::Model_BrokenMXQ = 20 ,
  StatusCode::Model_BufferSizeMismatched = 63 ,
  StatusCode::Model_CacheOverflow = 53 ,
  StatusCode::Model_DtypeMismatched = 22 ,
  StatusCode::Model_FailedToAllocMemory = 32 ,
  StatusCode::Model_FailedToFindDirectory = 61 ,
  StatusCode::Model_FailedToLoadMXQ = 3 ,
  StatusCode::Model_FailedToOpenCacheFile = 62 ,
  StatusCode::Model_FailedToOpenModelDescFile = 19 ,
  StatusCode::Model_FailedToOpenOutputScale = 6 ,
  StatusCode::Model_FailedToOpenScaleFile = 4 ,
  StatusCode::Model_FailedToOpenSectionBinary = 8 ,
  StatusCode::Model_FailedToSaveTensor = 37 ,
  StatusCode::Model_NumCacheMismatched = 60 ,
  StatusCode::Model_InvalidNPUDtype = 51 ,
  StatusCode::Model_InvalidOutputScaleValue = 7 ,
  StatusCode::Model_InvalidRmemType = 52 ,
  StatusCode::Model_InvalidScaleValue = 5 ,
  StatusCode::Model_InvalidSupplementary = 59 ,
  StatusCode::Model_InvalidVariantIdx = 64 ,
  StatusCode::Model_IsNotPacked = 27 ,
  StatusCode::Model_IsNotSupportedHardware = 48 ,
  StatusCode::Model_IsPacked = 26 ,
  StatusCode::Model_MXQAndModelConfigNotMatch = 16 ,
  StatusCode::Model_NoCache = 54 ,
  StatusCode::Model_NoGlobalCoreWithGlobalMultiMode = 17 ,
  StatusCode::Model_NoTargetCores = 2 ,
  StatusCode::Model_NotAlive = 10 ,
  StatusCode::Model_NotLaunched = 9 ,
  StatusCode::Model_PredictError = 36 ,
  StatusCode::Model_ShapeMismatched = 21 ,
  StatusCode::Model_TaskQueueClosed = 40 ,
  StatusCode::Model_TaskQueueTimeout = 41 ,
  StatusCode::Model_UnexpectedMemoryFormat = 38 ,
  StatusCode::Future_NotValid = 58
}
 Enumerates status codes for the qbruntime. More...
enum class  mobilint::Cluster : int32_t {
  Cluster::Cluster0 = 1 << 16 ,
  Cluster::Cluster1 = 2 << 16 ,
  Cluster::Error = 0x7FFF'0000
}
 Enumerates clusters in the ARIES NPU. More...
enum class  mobilint::Core : int32_t {
  Core::Core0 = 1 ,
  Core::Core1 = 2 ,
  Core::Core2 = 3 ,
  Core::Core3 = 4 ,
  Core::All = 0x0000'FFFC ,
  Core::GlobalCore = 0x0000'FFFE ,
  Core::Error = 0x0000'FFFF
}
 Enumerates cores within a cluster in the ARIES NPU. More...
enum class  mobilint::CoreAllocationPolicy {
  CoreAllocationPolicy::Auto ,
  CoreAllocationPolicy::Manual
}
 Core allocation policy. More...
enum class  mobilint::CoreMode : uint8_t {
  CoreMode::Single = 0 ,
  CoreMode::Multi = 1 ,
  CoreMode::Global = 2 ,
  CoreMode::Global4 = 3 ,
  CoreMode::Global8 = 4 ,
  CoreMode::Error = 0xF
}
 Defines the core mode for NPU execution. More...
enum class  mobilint::LogLevel : char {
  DEBUG = 1 ,
  INFO = 2 ,
  WARN = 3 ,
  ERR = 4 ,
  FATAL = 5 ,
  OFF = 6
}
 LogLevel. More...
enum class  mobilint::CacheType : uint8_t {
  Default = 0 ,
  Batch ,
  Error = 0x0F
}
 CacheType. More...

Functions

 mobilint::ModelVariantHandle::ModelVariantHandle (const ModelVariantHandle &other)=delete
 mobilint::ModelVariantHandle::ModelVariantHandle (ModelVariantHandle &&other)=delete
ModelVariantHandlemobilint::ModelVariantHandle::operator= (const ModelVariantHandle &rhs)=delete
ModelVariantHandlemobilint::ModelVariantHandle::operator= (ModelVariantHandle &&rhs) noexcept=delete
int mobilint::ModelVariantHandle::getVariantIdx () const
 Returns the index of this model variant.
const std::vector< std::vector< int64_t > > & mobilint::ModelVariantHandle::getModelInputShape () const
 Returns the input shape for this model variant.
const std::vector< std::vector< int64_t > > & mobilint::ModelVariantHandle::getModelOutputShape () const
 Returns the output shape for this model variant.
const std::vector< BufferInfo > & mobilint::ModelVariantHandle::getInputBufferInfo () const
 Returns the input buffer information for this variant.
const std::vector< BufferInfo > & mobilint::ModelVariantHandle::getOutputBufferInfo () const
 Returns the output buffer information for this variant.
std::vector< Scalemobilint::ModelVariantHandle::getInputScale () const
 Returns the input quantization scale(s) for this variant.
std::vector< Scalemobilint::ModelVariantHandle::getOutputScale () const
 Returns the output quantization scale(s) for this variant.
const std::string mobilint::statusCodeToString (const StatusCode status_code)
 Convert StatusCode into string.
bool mobilint::operator! (StatusCode sc)
 Checks whether the given StatusCode represents an error.
QBRUNTIME_EXPORT std::string mobilint::getQbRuntimeVersion ()
 Retrieves the version of the qbruntime.
QBRUNTIME_EXPORT std::string mobilint::getQbRuntimeGitVersion ()
 Retrieves the Git commit hash of the qbruntime.
QBRUNTIME_EXPORT std::string mobilint::getQbRuntimeVendor ()
 Retrieves the vendor name of the qbruntime.
QBRUNTIME_EXPORT std::string mobilint::getQbRuntimeProduct ()
 Retrieves product information of the qbruntime.
QBRUNTIME_EXPORT void mobilint::setLogLevel (LogLevel level)
QBRUNTIME_EXPORT bool mobilint::startTracingEvents (const char *path)
 Starts event tracing and prepares to save the trace log to a specified file.
QBRUNTIME_EXPORT void mobilint::stopTracingEvents ()
 Stops event tracing and writes the recorded trace log.
QBRUNTIME_EXPORT std::string mobilint::getModelSummary (const std::string &mxq_path)
 Generates a structured summary of the specified MXQ model.

Friends

class mobilint::ModelVariantHandle::ModelImpl

Buffer Management APIs

These APIs are used when performing inference with Model::inferBuffer or Model::inferBufferToFloat, using this variant’s input and output shapes.

Buffers are acquired using:

  • acquireInputBuffer
  • acquireOutputBuffer

Any acquired buffer must be released using:

  • releaseBuffer
  • releaseBuffers

Repositioning is handled by:

  • repositionInputs
  • repositionOutputs
Note
These APIs are intended for advanced use and follow the same buffer management interface as the Model class.
std::vector< Buffermobilint::ModelVariantHandle::acquireInputBuffer (const std::vector< std::vector< int > > &seqlens={}) const
std::vector< Buffermobilint::ModelVariantHandle::acquireOutputBuffer (const std::vector< std::vector< int > > &seqlens={}) const
std::vector< std::vector< Buffer > > mobilint::ModelVariantHandle::acquireInputBuffers (int batch_size, const std::vector< std::vector< int > > &seqlens={}) const
std::vector< std::vector< Buffer > > mobilint::ModelVariantHandle::acquireOutputBuffers (int batch_size, const std::vector< std::vector< int > > &seqlens={}) const
StatusCode mobilint::ModelVariantHandle::releaseBuffer (std::vector< Buffer > &buffer) const
StatusCode mobilint::ModelVariantHandle::releaseBuffers (std::vector< std::vector< Buffer > > &buffers) const
StatusCode mobilint::ModelVariantHandle::repositionInputs (const std::vector< float * > &input, std::vector< Buffer > &input_buf, const std::vector< std::vector< int > > &seqlens={}) const
StatusCode mobilint::ModelVariantHandle::repositionOutputs (const std::vector< Buffer > &output_buf, std::vector< float * > &output, const std::vector< std::vector< int > > &seqlens={}) const
StatusCode mobilint::ModelVariantHandle::repositionOutputs (const std::vector< Buffer > &output_buf, std::vector< std::vector< float > > &output, const std::vector< std::vector< int > > &seqlens={}) const
StatusCode mobilint::ModelVariantHandle::repositionInputs (const std::vector< uint8_t * > &input, std::vector< Buffer > &input_buf, const std::vector< std::vector< int > > &seqlens={}) const
StatusCode mobilint::ModelVariantHandle::repositionInputs (const std::vector< float * > &input, std::vector< std::vector< Buffer > > &input_buf, const std::vector< std::vector< int > > &seqlens={}) const
StatusCode mobilint::ModelVariantHandle::repositionOutputs (const std::vector< std::vector< Buffer > > &output_buf, std::vector< float * > &output, const std::vector< std::vector< int > > &seqlens={}) const
StatusCode mobilint::ModelVariantHandle::repositionOutputs (const std::vector< std::vector< Buffer > > &output_buf, std::vector< std::vector< float > > &output, const std::vector< std::vector< int > > &seqlens={}) const
StatusCode mobilint::ModelVariantHandle::repositionInputs (const std::vector< uint8_t * > &input, std::vector< std::vector< Buffer > > &input_buf, const std::vector< std::vector< int > > &seqlens={}) const

Detailed Description

C++ API provides core functionalities for the NPU.

Enumeration Type Documentation

◆ StatusCode

enum class mobilint::StatusCode
strong

Enumerates status codes for the qbruntime.

This enumeration defines return codes used in the qbruntime to indicate success or specific error conditions. A value of StatusCode::OK (0) represents success, while any other value indicates a particular error type.

Enumerator
OK 

OK

InternalError 23 

Should never be reached, but reached anyway

NotImplemented 18 

Not implemented

BadAlloc 39 

Bad allocation

Acc_CoreAlreadyInUse 11 

Core already in use.

Acc_NPUTimeout 42 

NPU timeout

Acc_NoIMemInitFound 43 

No imem initialization found

Acc_NoSuchModel 12 

No such model

Acc_TaskQueueNotFound 28 

Task queue not found

Driver_FailedToAllocateHostMemory 13 

Failed to allocate host memory

Driver_FailedToAllocateModelMemory 44 

Failed to allocate model memory

Driver_FailedToBuildCmaIoReq 45 

Failed to build CMA IO request

Driver_FailedToClaimCores 24 

Failed to claim cores

Driver_FailedToFreeModelMemory 46 

Failed to free model memory

Driver_FailedToLockCore 49 

Failed to lock core

Driver_FailedToPostInfer 33 

Failed to post infer

Driver_FailedToReadMemoryBuffer 15 

Failed to read memory buffer

Driver_FailedToUnclaimCores 25 

Failed to unclaim cores

Driver_FailedToUnlockCore 50 

Failed to unlock core

Driver_FailedToWaitDone 34 

Failed to wait done

Driver_FailedToWriteMemoryBuffer 14 

Failed to write memory buffer

Driver_NotInitialized 

Not implemented

Driver_WaitDoneTimeout 35 

Wait done timeout

Driver_WrongBaseAddress 47 

Wrong base address

MemoryPool_AllocatorNotSet 30 

Allocator not set

MemoryPool_BufNotFound 29 

Buffer not found

Model_AlreadyLaunched 65 

Model has been launched already

Model_AsyncPipelineCheckFailed 55 

Asynchronous pipeline check failed

Model_AsyncPipelineNotAlive 56 

Asynchronous pipeline not alive

Model_AsyncPipelineTimeout 57 

Asynchronous pipeline Timeout

Model_BrokenMXQ 20 

Broken mxq

Model_BufferSizeMismatched 63 

Size of buffer is mismatched

Model_CacheOverflow 53 

KV-cache overflow

Model_DtypeMismatched 22 

dtype mismatched

Model_FailedToAllocMemory 32 

Failed to allocate memory

Model_FailedToFindDirectory 61 

Failed to find specified directory

Model_FailedToLoadMXQ 

Failed to load mxq

Model_FailedToOpenCacheFile 62 

Failed to open cache file

Model_FailedToOpenModelDescFile 19 

Failed to open model description file

Model_FailedToOpenOutputScale 
Deprecated
Scale file doesn't exist any more.
Model_FailedToOpenScaleFile 
Deprecated
Scale file doesn't exist any more.
Model_FailedToOpenSectionBinary 

Failed to open section binary

Model_FailedToSaveTensor 37 

Failed to save tensor

Model_NumCacheMismatched 60 

Size of KV-cache mismatched

Model_InvalidNPUDtype 51 

Invalid NPU dtype

Model_InvalidOutputScaleValue 
Deprecated
Scale file doesn't exist any more.
Model_InvalidRmemType 52 

Invalid RMEM type

Model_InvalidScaleValue 

Invalid scale value

Model_InvalidSupplementary 59 

Invalid supplementary inference

Model_InvalidVariantIdx 64 

Invalid model variant idx

Model_IsNotPacked 27 
Deprecated
Packed logic doesn't exist any more.
Model_IsNotSupportedHardware 48 

Is not supported hardware

Model_IsPacked 26 
Deprecated
Packed logic doesn't exist any more.
Model_MXQAndModelConfigNotMatch 16 

mxq and model config not match

Model_NoCache 54 

Model does not support KV-cache

Model_NoGlobalCoreWithGlobalMultiMode 17 

No global core with global multi mode

Model_NoTargetCores 

No target cores

Model_NotAlive 10 

Not alive

Model_NotLaunched 

Not launched

Model_PredictError 36 

Predict error

Model_ShapeMismatched 21 

Shape mismatched

Model_TaskQueueClosed 40 

Task queue closed

Model_TaskQueueTimeout 41 

Task queue timeout

Model_UnexpectedMemoryFormat 38 
Deprecated
Future_NotValid 58 

Future is not valid

Definition at line 26 of file status_code.h.

◆ Cluster

enum class mobilint::Cluster : int32_t
strong

Enumerates clusters in the ARIES NPU.

Note
The ARIES NPU consists of two clusters, each containing one global core and four local cores, totaling eight local cores. REGULUS has only a single cluster (Cluster0) with one local core (Core0).
Enumerator
Cluster0 1 << 16 

Cluster 0

Cluster1 2 << 16 

Cluster 1

Error 0x7FFF'0000 

Represents an invalid or uninitialized state.

Definition at line 64 of file type.h.

◆ Core

enum class mobilint::Core : int32_t
strong

Enumerates cores within a cluster in the ARIES NPU.

Note
The ARIES NPU consists of two clusters, each containing one global core and four local cores, totaling eight local cores. REGULUS has only a single cluster (Cluster0) with one local core (Core0).
Enumerator
Core0 

Local core 0

Core1 

Local core 1

Core2 

Local core 2

Core3 

Local core 3

All 0x0000'FFFC 

Deprecated

GlobalCore 0x0000'FFFE 

Global core

Error 0x0000'FFFF 

Represents an invalid or uninitialized state.

Definition at line 77 of file type.h.

◆ CoreAllocationPolicy

enum class mobilint::CoreAllocationPolicy
strong

Core allocation policy.

Enumerator
Auto 

Auto

Manual 

Manual

Definition at line 90 of file type.h.

◆ CoreMode

enum class mobilint::CoreMode : uint8_t
strong

Defines the core mode for NPU execution.

Supported core modes include single-core, multi-core, global4-core, and global8-core. For detailed explanations of each mode, refer to the following functions:

Enumerator
Single 

Single-core mode

Multi 

Multi-core mode

Global 

Deprecated

Global4 

Global4-core mode

Global8 

Global8-core mode

Error 0xF 

Represents an invalid or uninitialized state.

Definition at line 169 of file type.h.

◆ LogLevel

enum class mobilint::LogLevel : char
strong

LogLevel.

Definition at line 464 of file type.h.

◆ CacheType

enum class mobilint::CacheType : uint8_t
strong

CacheType.

Definition at line 476 of file type.h.

Function Documentation

◆ getVariantIdx()

int mobilint::ModelVariantHandle::getVariantIdx ( ) const

Returns the index of this model variant.

Returns
Index of the model variant.

◆ getModelInputShape()

const std::vector< std::vector< int64_t > > & mobilint::ModelVariantHandle::getModelInputShape ( ) const

Returns the input shape for this model variant.

Returns
Reference to the input shape.

◆ getModelOutputShape()

const std::vector< std::vector< int64_t > > & mobilint::ModelVariantHandle::getModelOutputShape ( ) const

Returns the output shape for this model variant.

Returns
Reference to the output shape.

◆ getInputBufferInfo()

const std::vector< BufferInfo > & mobilint::ModelVariantHandle::getInputBufferInfo ( ) const

Returns the input buffer information for this variant.

Returns
Reference to a vector of input buffer information.

◆ getOutputBufferInfo()

const std::vector< BufferInfo > & mobilint::ModelVariantHandle::getOutputBufferInfo ( ) const

Returns the output buffer information for this variant.

Returns
Reference to a vector of output buffer information.

◆ getInputScale()

std::vector< Scale > mobilint::ModelVariantHandle::getInputScale ( ) const

Returns the input quantization scale(s) for this variant.

Returns
Vector of input scales.

◆ getOutputScale()

std::vector< Scale > mobilint::ModelVariantHandle::getOutputScale ( ) const

Returns the output quantization scale(s) for this variant.

Returns
Vector of output scales.

◆ statusCodeToString()

const std::string mobilint::statusCodeToString ( const StatusCode status_code)

Convert StatusCode into string.

Parameters
[in]status_codeThe StatusCode to convert.
Returns
String of each StatusCode if valid, else nullptr.

◆ operator!()

bool mobilint::operator! ( StatusCode sc)
inline

Checks whether the given StatusCode represents an error.

This operator enables StatusCode to be used in conditional statements for error checking. It returns false if sc is StatusCode::OK (0) and true otherwise.

StatusCode sc = someFunction();
if (!sc) { // `!sc` is true when `sc` is not `StatusCode::OK`.
// Handle error
}
StatusCode
Enumerates status codes for the qbruntime.
Definition status_code.h:26
Parameters
[in]scThe StatusCode to evaluate.
Returns
True if sc is an error (nonzero), false if it is StatusCode::OK.

Definition at line 125 of file status_code.h.

◆ getQbRuntimeVersion()

QBRUNTIME_EXPORT std::string mobilint::getQbRuntimeVersion ( )

Retrieves the version of the qbruntime.

Returns
A string representing the runtime version.

◆ getQbRuntimeGitVersion()

QBRUNTIME_EXPORT std::string mobilint::getQbRuntimeGitVersion ( )

Retrieves the Git commit hash of the qbruntime.

Returns
A string containing the Git hash.

◆ getQbRuntimeVendor()

QBRUNTIME_EXPORT std::string mobilint::getQbRuntimeVendor ( )

Retrieves the vendor name of the qbruntime.

Typically, this function returns "mobilint."

Returns
A string containing the vendor name.

◆ getQbRuntimeProduct()

QBRUNTIME_EXPORT std::string mobilint::getQbRuntimeProduct ( )

Retrieves product information of the qbruntime.

This function indicates the product for which the qbruntime is built. For example, it may return values such as "aries2-v4" or "regulus-v4."

Returns
A string containing product details.

◆ startTracingEvents()

QBRUNTIME_EXPORT bool mobilint::startTracingEvents ( const char * path)

Starts event tracing and prepares to save the trace log to a specified file.

The trace log is recorded in "Chrome Tracing JSON format," which can be viewed at https://ui.perfetto.dev/.

The trace log is not written immediately; it is saved only when stopTracingEvents() is called.

Parameters
[in]pathThe file path where the trace log should be stored.
Returns
True if tracing starts successfully, false otherwise.

◆ stopTracingEvents()

QBRUNTIME_EXPORT void mobilint::stopTracingEvents ( )

Stops event tracing and writes the recorded trace log.

This function finalizes tracing and saves the collected trace data to the file specified when startTracingEvents() was called.

◆ getModelSummary()

QBRUNTIME_EXPORT std::string mobilint::getModelSummary ( const std::string & mxq_path)

Generates a structured summary of the specified MXQ model.

Returns an overview of the model contained in the MXQ file, including:

  • Target NPU hardware
  • Supported core modes and their associated cores
  • The total number of model variants
  • For each variant:
    • Input and output tensor shapes
    • A list of layers with their types, output shapes, and input layer indices

The summary is returned as a human-readable string in a table and is useful for inspecting model compatibility, structure, and input/output shapes.

Parameters
[in]mxq_pathPath to the MXQ model file.
Returns
A formatted string containing the model summary.

Friends

◆ ModelImpl

friend class ModelImpl
friend

Definition at line 163 of file model_variant_handle.h.