C++ API Reference#
|
SDK qb Runtime Library v1.0
MCS001-
|
C++ API provides core functionalities for the NPU. More...
Classes | |
| class | mobilint::Accelerator |
| Represents an accelerator, i.e., an NPU, used for executing models. More... | |
| class | mobilint::FutureImpl< T > |
| class | mobilint::Future< T > |
| Represents a future for retrieving the result of asynchronous inference. More... | |
| class | mobilint::Model |
| Represents an AI model loaded from an MXQ file. More... | |
| class | mobilint::ModelVariantHandle |
| Handle to a specific variant of a loaded model. More... | |
| class | mobilint::NDArray< T > |
| A class representing an N-dimensional array (NDArray). More... | |
| struct | mobilint::Scale |
| Struct for scale values. More... | |
| struct | mobilint::CoreId |
| Represents a unique identifier for an NPU core. More... | |
| struct | mobilint::Buffer |
| A simple byte-sized buffer. More... | |
| struct | mobilint::BufferInfo |
| Struct representing input/output buffer information. More... | |
| class | mobilint::ModelConfig |
| Configures a core mode and core allocation of a model for NPU inference. More... | |
| struct | mobilint::CacheInfo |
| Struct representing KV-cache information. More... | |
Functions | |
| mobilint::ModelVariantHandle::ModelVariantHandle (const ModelVariantHandle &other)=delete | |
| mobilint::ModelVariantHandle::ModelVariantHandle (ModelVariantHandle &&other)=delete | |
| ModelVariantHandle & | mobilint::ModelVariantHandle::operator= (const ModelVariantHandle &rhs)=delete |
| ModelVariantHandle & | mobilint::ModelVariantHandle::operator= (ModelVariantHandle &&rhs) noexcept=delete |
| int | mobilint::ModelVariantHandle::getVariantIdx () const |
| Returns the index of this model variant. | |
| const std::vector< std::vector< int64_t > > & | mobilint::ModelVariantHandle::getModelInputShape () const |
| Returns the input shape for this model variant. | |
| const std::vector< std::vector< int64_t > > & | mobilint::ModelVariantHandle::getModelOutputShape () const |
| Returns the output shape for this model variant. | |
| const std::vector< BufferInfo > & | mobilint::ModelVariantHandle::getInputBufferInfo () const |
| Returns the input buffer information for this variant. | |
| const std::vector< BufferInfo > & | mobilint::ModelVariantHandle::getOutputBufferInfo () const |
| Returns the output buffer information for this variant. | |
| std::vector< Scale > | mobilint::ModelVariantHandle::getInputScale () const |
| Returns the input quantization scale(s) for this variant. | |
| std::vector< Scale > | mobilint::ModelVariantHandle::getOutputScale () const |
| Returns the output quantization scale(s) for this variant. | |
| const std::string | mobilint::statusCodeToString (const StatusCode status_code) |
| Convert StatusCode into string. | |
| bool | mobilint::operator! (StatusCode sc) |
| Checks whether the given StatusCode represents an error. | |
| QBRUNTIME_EXPORT std::string | mobilint::getQbRuntimeVersion () |
| Retrieves the version of the qbruntime. | |
| QBRUNTIME_EXPORT std::string | mobilint::getQbRuntimeGitVersion () |
| Retrieves the Git commit hash of the qbruntime. | |
| QBRUNTIME_EXPORT std::string | mobilint::getQbRuntimeVendor () |
| Retrieves the vendor name of the qbruntime. | |
| QBRUNTIME_EXPORT std::string | mobilint::getQbRuntimeProduct () |
| Retrieves product information of the qbruntime. | |
| QBRUNTIME_EXPORT void | mobilint::setLogLevel (LogLevel level) |
| QBRUNTIME_EXPORT bool | mobilint::startTracingEvents (const char *path) |
| Starts event tracing and prepares to save the trace log to a specified file. | |
| QBRUNTIME_EXPORT void | mobilint::stopTracingEvents () |
| Stops event tracing and writes the recorded trace log. | |
| QBRUNTIME_EXPORT std::string | mobilint::getModelSummary (const std::string &mxq_path) |
| Generates a structured summary of the specified MXQ model. | |
Friends | |
| class | mobilint::ModelVariantHandle::ModelImpl |
Buffer Management APIs | |
These APIs are used when performing inference with Model::inferBuffer or Model::inferBufferToFloat, using this variant’s input and output shapes. Buffers are acquired using:
Any acquired buffer must be released using:
Repositioning is handled by:
| |
| std::vector< Buffer > | mobilint::ModelVariantHandle::acquireInputBuffer (const std::vector< std::vector< int > > &seqlens={}) const |
| std::vector< Buffer > | mobilint::ModelVariantHandle::acquireOutputBuffer (const std::vector< std::vector< int > > &seqlens={}) const |
| std::vector< std::vector< Buffer > > | mobilint::ModelVariantHandle::acquireInputBuffers (int batch_size, const std::vector< std::vector< int > > &seqlens={}) const |
| std::vector< std::vector< Buffer > > | mobilint::ModelVariantHandle::acquireOutputBuffers (int batch_size, const std::vector< std::vector< int > > &seqlens={}) const |
| StatusCode | mobilint::ModelVariantHandle::releaseBuffer (std::vector< Buffer > &buffer) const |
| StatusCode | mobilint::ModelVariantHandle::releaseBuffers (std::vector< std::vector< Buffer > > &buffers) const |
| StatusCode | mobilint::ModelVariantHandle::repositionInputs (const std::vector< float * > &input, std::vector< Buffer > &input_buf, const std::vector< std::vector< int > > &seqlens={}) const |
| StatusCode | mobilint::ModelVariantHandle::repositionOutputs (const std::vector< Buffer > &output_buf, std::vector< float * > &output, const std::vector< std::vector< int > > &seqlens={}) const |
| StatusCode | mobilint::ModelVariantHandle::repositionOutputs (const std::vector< Buffer > &output_buf, std::vector< std::vector< float > > &output, const std::vector< std::vector< int > > &seqlens={}) const |
| StatusCode | mobilint::ModelVariantHandle::repositionInputs (const std::vector< uint8_t * > &input, std::vector< Buffer > &input_buf, const std::vector< std::vector< int > > &seqlens={}) const |
| StatusCode | mobilint::ModelVariantHandle::repositionInputs (const std::vector< float * > &input, std::vector< std::vector< Buffer > > &input_buf, const std::vector< std::vector< int > > &seqlens={}) const |
| StatusCode | mobilint::ModelVariantHandle::repositionOutputs (const std::vector< std::vector< Buffer > > &output_buf, std::vector< float * > &output, const std::vector< std::vector< int > > &seqlens={}) const |
| StatusCode | mobilint::ModelVariantHandle::repositionOutputs (const std::vector< std::vector< Buffer > > &output_buf, std::vector< std::vector< float > > &output, const std::vector< std::vector< int > > &seqlens={}) const |
| StatusCode | mobilint::ModelVariantHandle::repositionInputs (const std::vector< uint8_t * > &input, std::vector< std::vector< Buffer > > &input_buf, const std::vector< std::vector< int > > &seqlens={}) const |
Detailed Description
C++ API provides core functionalities for the NPU.
Enumeration Type Documentation
◆ StatusCode
|
strong |
Enumerates status codes for the qbruntime.
This enumeration defines return codes used in the qbruntime to indicate success or specific error conditions. A value of StatusCode::OK (0) represents success, while any other value indicates a particular error type.
| Enumerator | ||
|---|---|---|
| OK | 0 | OK |
| InternalError | 23 | Should never be reached, but reached anyway |
| NotImplemented | 18 | Not implemented |
| BadAlloc | 39 | Bad allocation |
| Acc_CoreAlreadyInUse | 11 | Core already in use. |
| Acc_NPUTimeout | 42 | NPU timeout |
| Acc_NoIMemInitFound | 43 | No imem initialization found |
| Acc_NoSuchModel | 12 | No such model |
| Acc_TaskQueueNotFound | 28 | Task queue not found |
| Driver_FailedToAllocateHostMemory | 13 | Failed to allocate host memory |
| Driver_FailedToAllocateModelMemory | 44 | Failed to allocate model memory |
| Driver_FailedToBuildCmaIoReq | 45 | Failed to build CMA IO request |
| Driver_FailedToClaimCores | 24 | Failed to claim cores |
| Driver_FailedToFreeModelMemory | 46 | Failed to free model memory |
| Driver_FailedToLockCore | 49 | Failed to lock core |
| Driver_FailedToPostInfer | 33 | Failed to post infer |
| Driver_FailedToReadMemoryBuffer | 15 | Failed to read memory buffer |
| Driver_FailedToUnclaimCores | 25 | Failed to unclaim cores |
| Driver_FailedToUnlockCore | 50 | Failed to unlock core |
| Driver_FailedToWaitDone | 34 | Failed to wait done |
| Driver_FailedToWriteMemoryBuffer | 14 | Failed to write memory buffer |
| Driver_NotInitialized | 1 | Not implemented |
| Driver_WaitDoneTimeout | 35 | Wait done timeout |
| Driver_WrongBaseAddress | 47 | Wrong base address |
| MemoryPool_AllocatorNotSet | 30 | Allocator not set |
| MemoryPool_BufNotFound | 29 | Buffer not found |
| Model_AlreadyLaunched | 65 | Model has been launched already |
| Model_AsyncPipelineCheckFailed | 55 | Asynchronous pipeline check failed |
| Model_AsyncPipelineNotAlive | 56 | Asynchronous pipeline not alive |
| Model_AsyncPipelineTimeout | 57 | Asynchronous pipeline Timeout |
| Model_BrokenMXQ | 20 | Broken mxq |
| Model_BufferSizeMismatched | 63 | Size of buffer is mismatched |
| Model_CacheOverflow | 53 | KV-cache overflow |
| Model_DtypeMismatched | 22 | dtype mismatched |
| Model_FailedToAllocMemory | 32 | Failed to allocate memory |
| Model_FailedToFindDirectory | 61 | Failed to find specified directory |
| Model_FailedToLoadMXQ | 3 | Failed to load mxq |
| Model_FailedToOpenCacheFile | 62 | Failed to open cache file |
| Model_FailedToOpenModelDescFile | 19 | Failed to open model description file |
| Model_FailedToOpenOutputScale | 6 |
|
| Model_FailedToOpenScaleFile | 4 |
|
| Model_FailedToOpenSectionBinary | 8 | Failed to open section binary |
| Model_FailedToSaveTensor | 37 | Failed to save tensor |
| Model_NumCacheMismatched | 60 | Size of KV-cache mismatched |
| Model_InvalidNPUDtype | 51 | Invalid NPU dtype |
| Model_InvalidOutputScaleValue | 7 |
|
| Model_InvalidRmemType | 52 | Invalid RMEM type |
| Model_InvalidScaleValue | 5 | Invalid scale value |
| Model_InvalidSupplementary | 59 | Invalid supplementary inference |
| Model_InvalidVariantIdx | 64 | Invalid model variant idx |
| Model_IsNotPacked | 27 |
|
| Model_IsNotSupportedHardware | 48 | Is not supported hardware |
| Model_IsPacked | 26 |
|
| Model_MXQAndModelConfigNotMatch | 16 | mxq and model config not match |
| Model_NoCache | 54 | Model does not support KV-cache |
| Model_NoGlobalCoreWithGlobalMultiMode | 17 | No global core with global multi mode |
| Model_NoTargetCores | 2 | No target cores |
| Model_NotAlive | 10 | Not alive |
| Model_NotLaunched | 9 | Not launched |
| Model_PredictError | 36 | Predict error |
| Model_ShapeMismatched | 21 | Shape mismatched |
| Model_TaskQueueClosed | 40 | Task queue closed |
| Model_TaskQueueTimeout | 41 | Task queue timeout |
| Model_UnexpectedMemoryFormat | 38 | |
| Future_NotValid | 58 | Future is not valid |
Definition at line 26 of file status_code.h.
◆ Cluster
|
strong |
Enumerates clusters in the ARIES NPU.
- Note
- The ARIES NPU consists of two clusters, each containing one global core and four local cores, totaling eight local cores. REGULUS has only a single cluster (Cluster0) with one local core (Core0).
| Enumerator | ||
|---|---|---|
| Cluster0 | 1 << 16 | Cluster 0 |
| Cluster1 | 2 << 16 | Cluster 1 |
| Error | 0x7FFF'0000 | Represents an invalid or uninitialized state. |
◆ Core
|
strong |
Enumerates cores within a cluster in the ARIES NPU.
- Note
- The ARIES NPU consists of two clusters, each containing one global core and four local cores, totaling eight local cores. REGULUS has only a single cluster (Cluster0) with one local core (Core0).
◆ CoreAllocationPolicy
|
strong |
◆ CoreMode
|
strong |
Defines the core mode for NPU execution.
Supported core modes include single-core, multi-core, global4-core, and global8-core. For detailed explanations of each mode, refer to the following functions:
- ModelConfig::setSingleCoreMode
- ModelConfig::setMultiCoreMode
- ModelConfig::setGlobal4CoreMode
- ModelConfig::setGlobal8CoreMode
| Enumerator | ||
|---|---|---|
| Single | 0 | Single-core mode |
| Multi | 1 | Multi-core mode |
| Global | 2 | Deprecated |
| Global4 | 3 | Global4-core mode |
| Global8 | 4 | Global8-core mode |
| Error | 0xF | Represents an invalid or uninitialized state. |
◆ LogLevel
|
strong |
◆ CacheType
|
strong |
Function Documentation
◆ getVariantIdx()
| int mobilint::ModelVariantHandle::getVariantIdx | ( | ) | const |
Returns the index of this model variant.
- Returns
- Index of the model variant.
◆ getModelInputShape()
| const std::vector< std::vector< int64_t > > & mobilint::ModelVariantHandle::getModelInputShape | ( | ) | const |
Returns the input shape for this model variant.
- Returns
- Reference to the input shape.
◆ getModelOutputShape()
| const std::vector< std::vector< int64_t > > & mobilint::ModelVariantHandle::getModelOutputShape | ( | ) | const |
Returns the output shape for this model variant.
- Returns
- Reference to the output shape.
◆ getInputBufferInfo()
| const std::vector< BufferInfo > & mobilint::ModelVariantHandle::getInputBufferInfo | ( | ) | const |
Returns the input buffer information for this variant.
- Returns
- Reference to a vector of input buffer information.
◆ getOutputBufferInfo()
| const std::vector< BufferInfo > & mobilint::ModelVariantHandle::getOutputBufferInfo | ( | ) | const |
Returns the output buffer information for this variant.
- Returns
- Reference to a vector of output buffer information.
◆ getInputScale()
| std::vector< Scale > mobilint::ModelVariantHandle::getInputScale | ( | ) | const |
Returns the input quantization scale(s) for this variant.
- Returns
- Vector of input scales.
◆ getOutputScale()
| std::vector< Scale > mobilint::ModelVariantHandle::getOutputScale | ( | ) | const |
Returns the output quantization scale(s) for this variant.
- Returns
- Vector of output scales.
◆ statusCodeToString()
| const std::string mobilint::statusCodeToString | ( | const StatusCode | status_code | ) |
Convert StatusCode into string.
- Parameters
-
[in] status_code The StatusCode to convert.
- Returns
- String of each StatusCode if valid, else nullptr.
◆ operator!()
|
inline |
Checks whether the given StatusCode represents an error.
This operator enables StatusCode to be used in conditional statements for error checking. It returns false if sc is StatusCode::OK (0) and true otherwise.
- Parameters
-
[in] sc The StatusCode to evaluate.
- Returns
- True if sc is an error (nonzero), false if it is StatusCode::OK.
Definition at line 125 of file status_code.h.
◆ getQbRuntimeVersion()
| QBRUNTIME_EXPORT std::string mobilint::getQbRuntimeVersion | ( | ) |
Retrieves the version of the qbruntime.
- Returns
- A string representing the runtime version.
◆ getQbRuntimeGitVersion()
| QBRUNTIME_EXPORT std::string mobilint::getQbRuntimeGitVersion | ( | ) |
Retrieves the Git commit hash of the qbruntime.
- Returns
- A string containing the Git hash.
◆ getQbRuntimeVendor()
| QBRUNTIME_EXPORT std::string mobilint::getQbRuntimeVendor | ( | ) |
Retrieves the vendor name of the qbruntime.
Typically, this function returns "mobilint."
- Returns
- A string containing the vendor name.
◆ getQbRuntimeProduct()
| QBRUNTIME_EXPORT std::string mobilint::getQbRuntimeProduct | ( | ) |
Retrieves product information of the qbruntime.
This function indicates the product for which the qbruntime is built. For example, it may return values such as "aries2-v4" or "regulus-v4."
- Returns
- A string containing product details.
◆ startTracingEvents()
| QBRUNTIME_EXPORT bool mobilint::startTracingEvents | ( | const char * | path | ) |
Starts event tracing and prepares to save the trace log to a specified file.
The trace log is recorded in "Chrome Tracing JSON format," which can be viewed at https://ui.perfetto.dev/.
The trace log is not written immediately; it is saved only when stopTracingEvents() is called.
- Parameters
-
[in] path The file path where the trace log should be stored.
- Returns
- True if tracing starts successfully, false otherwise.
◆ stopTracingEvents()
| QBRUNTIME_EXPORT void mobilint::stopTracingEvents | ( | ) |
Stops event tracing and writes the recorded trace log.
This function finalizes tracing and saves the collected trace data to the file specified when startTracingEvents() was called.
◆ getModelSummary()
| QBRUNTIME_EXPORT std::string mobilint::getModelSummary | ( | const std::string & | mxq_path | ) |
Generates a structured summary of the specified MXQ model.
Returns an overview of the model contained in the MXQ file, including:
- Target NPU hardware
- Supported core modes and their associated cores
- The total number of model variants
- For each variant:
- Input and output tensor shapes
- A list of layers with their types, output shapes, and input layer indices
The summary is returned as a human-readable string in a table and is useful for inspecting model compatibility, structure, and input/output shapes.
- Parameters
-
[in] mxq_path Path to the MXQ model file.
- Returns
- A formatted string containing the model summary.
Friends
◆ ModelImpl
|
friend |
Definition at line 163 of file model_variant_handle.h.
Generated by