ModelConfig Class Reference

ModelConfig Class Reference#

SDK qb Runtime Library: mobilint::ModelConfig Class Reference
SDK qb Runtime Library v1.0
MCS001-

Configures a core mode and core allocation of a model for NPU inference. More...

#include <type.h>

Public Member Functions

 ModelConfig ()
 Default constructor. This default-constructed object is initially set to single-core mode with all NPU local cores included.
bool setSingleCoreMode (int num_cores)
 Sets the model to use single-core mode for inference with a specified number of local cores.
bool setSingleCoreMode (std::vector< CoreId > core_ids)
 Sets the model to use single-core mode for inference with a specific set of NPU local cores.
bool setMultiCoreMode (std::vector< Cluster > clusters={Cluster::Cluster0, Cluster::Cluster1})
 Sets the model to use multi-core mode for batch inference.
bool setGlobal4CoreMode (std::vector< Cluster > clusters={Cluster::Cluster0, Cluster::Cluster1})
 Sets the model to use global4-core mode for inference with a specified set of NPU clusters.
bool setGlobal8CoreMode ()
 Sets the model to use global8-core mode for inference.
CoreMode getCoreMode () const
 Gets the core mode to be applied to the model.
CoreAllocationPolicy getCoreAllocationPolicy () const
 Gets the core allocation policy to be applied to the model.
int getNumCores () const
 Gets the number of cores to be allocated for the model.
bool forceSingleNPUBundle (int npu_bundle_index)
 Forces the use of a specific NPU bundle.
int getForcedNPUBundleIndex () const
 Retrieves the index of the forced NPU bundle.
const std::vector< CoreId > & getCoreIds () const
 Returns the list of NPU CoreIds to be used for model inference.
const std::vector< Cluster > & getClusters () const
void setAsyncPipelineEnabled (bool enable)
 Enables or disables the asynchronous pipeline required for asynchronous inference.
bool getAsyncPipelineEnabled () const
 Returns whether the asynchronous pipeline is enabled in this configuration.
void setActivationSlots (int count)
 Sets activation buffer slots for multi-activation supported model.
int getActivationSlots () const
 Returns activation buffer slot count.
 ModelConfig (int num_cores)
bool setGlobalCoreMode (std::vector< Cluster > clusters)

Public Attributes

std::vector< uint64_t > early_latencies
std::vector< uint64_t > finish_latencies

Detailed Description

Configures a core mode and core allocation of a model for NPU inference.

The ModelConfig class provides methods for setting a core mode and allocating cores for NPU inference. Supported core modes are single-core, multi-core, global4-core, and global8-core. Users can also specify which cores to allocate for the model. Additionally, the configuration offers an option to enforce the use of a specific NPU bundle.

Note
Deprecated functions are included for backward compatibility, but it is recommended to use the newer core mode configuration methods.

Definition at line 233 of file type.h.

Constructor & Destructor Documentation

◆ ModelConfig()

mobilint::ModelConfig::ModelConfig ( int num_cores)
explicit

deprecated

Member Function Documentation

◆ setSingleCoreMode() [1/2]

bool mobilint::ModelConfig::setSingleCoreMode ( int num_cores)

Sets the model to use single-core mode for inference with a specified number of local cores.

In single-core mode, each local core executes model inference independently. The number of cores used is specified by the num_cores parameter, and the core allocation policy is set to CoreAllocationPolicy::Auto, meaning the model will be automatically allocated to available local cores when the model is launched to the NPU, specifically when the Model::launch function is called.

Parameters
[in]num_coresThe number of local cores to use for inference.
Returns
true if the mode was successfully set, false otherwise.

◆ setSingleCoreMode() [2/2]

bool mobilint::ModelConfig::setSingleCoreMode ( std::vector< CoreId > core_ids)

Sets the model to use single-core mode for inference with a specific set of NPU local cores.

In single-core mode, each local core executes model inference independently. The user can specify a vector of CoreIds to determine which cores to use for inference.

Parameters
[in]core_idsA vector of CoreIds to be used for model inference.
Returns
true if the mode was successfully set, false otherwise.

◆ setMultiCoreMode()

bool mobilint::ModelConfig::setMultiCoreMode ( std::vector< Cluster > clusters = {Cluster::Cluster0Cluster::Cluster1})

Sets the model to use multi-core mode for batch inference.

In multi-core mode, on Aries NPU, the four local cores within a cluster work together to process batch inference tasks efficiently. This mode is optimized for batch processing.

Note
By default, the configuration is set to use all clusters.
Parameters
[in]clustersA vector of clusters to be used for multi-core batch inference.
Returns
true if the mode was successfully set, false otherwise.

◆ setGlobal4CoreMode()

bool mobilint::ModelConfig::setGlobal4CoreMode ( std::vector< Cluster > clusters = {Cluster::Cluster0Cluster::Cluster1})

Sets the model to use global4-core mode for inference with a specified set of NPU clusters.

For Aries NPU, there are two clusters, each consisting of four local cores. In global4-core mode, four local cores within the same cluster work together to execute the model inference.

Note
By default, the configuration is set to use all clusters.
Parameters
[in]clustersA vector of clusters to be used for model inference.
Returns
true if the mode was successfully set, false otherwise.

◆ setGlobal8CoreMode()

bool mobilint::ModelConfig::setGlobal8CoreMode ( )

Sets the model to use global8-core mode for inference.

For Aries NPU, there are two clusters, each consisting of four local cores. In global8-core mode, all eight local cores across the two clusters work together to execute the model inference.

Returns
true if the mode was successfully set, false otherwise.

◆ getCoreMode()

CoreMode mobilint::ModelConfig::getCoreMode ( ) const
inline

Gets the core mode to be applied to the model.

This reflects the core mode that will be used when the model is created.

Returns
The CoreMode to be applied to the model.

Definition at line 318 of file type.h.

◆ getCoreAllocationPolicy()

CoreAllocationPolicy mobilint::ModelConfig::getCoreAllocationPolicy ( ) const
inline

Gets the core allocation policy to be applied to the model.

This reflects the core allocation policy that will be used when the model is created.

Returns
The CoreAllocationPolicy to be applied to the model.

Definition at line 328 of file type.h.

◆ getNumCores()

int mobilint::ModelConfig::getNumCores ( ) const
inline

Gets the number of cores to be allocated for the model.

This represents the number of cores that will be allocated for inference when the model is launched to the NPU.

Returns
The number of cores to be allocated for the model.

Definition at line 338 of file type.h.

◆ forceSingleNPUBundle()

bool mobilint::ModelConfig::forceSingleNPUBundle ( int npu_bundle_index)

Forces the use of a specific NPU bundle.

This function forces the selection of a specific NPU bundle. If a non-negative index is provided, the corresponding NPU bundle is selected and runs without CPU offloading. If -1 is provided, all NPU bundles are used with CPU offloading enabled.

Parameters
[in]npu_bundle_indexThe index of the NPU bundle to force. A non-negative integer selects a specific NPU bundle (runs without CPU offloading), or -1 to enable all NPU bundles with CPU offloading.
Returns
true if the index is valid and the NPU bundle is successfully set, false if the index is invalid (less than -1).

◆ getForcedNPUBundleIndex()

int mobilint::ModelConfig::getForcedNPUBundleIndex ( ) const
inline

Retrieves the index of the forced NPU bundle.

This function returns the index of the NPU bundle that has been forced using the forceSingleNPUBundle function. If no NPU bundle is forced, the returned value will be -1.

Returns
The index of the forced NPU bundle, or -1 if no bundle is forced.

Definition at line 366 of file type.h.

◆ getCoreIds()

const std::vector< CoreId > & mobilint::ModelConfig::getCoreIds ( ) const
inline

Returns the list of NPU CoreIds to be used for model inference.

This function returns a reference to the vector of NPU CoreIds that the model will use for inference. When setSingleCoreMode(int num_cores) is called and the core allocation policy is set to CoreAllocationPolicy::Auto, it will return an empty vector.

Returns
A constant reference to the vector of NPU CoreIds.

Definition at line 378 of file type.h.

◆ getClusters()

const std::vector< Cluster > & mobilint::ModelConfig::getClusters ( ) const
inline

Definition at line 380 of file type.h.

◆ setAsyncPipelineEnabled()

void mobilint::ModelConfig::setAsyncPipelineEnabled ( bool enable)

Enables or disables the asynchronous pipeline required for asynchronous inference.

Call this function with enable set to true if you intend to use Model::inferAsync or Model::inferAsyncCHW, as the asynchronous pipeline is necessary for their operation.

If you are only using synchronous inference, such as Model::infer or Model::inferCHW, it is recommended to keep the asynchronous pipeline disabled to avoid unnecessary overhead.

Parameters
[in]enableSet to true to enable the asynchronous pipeline; set to false to disable it.

◆ getAsyncPipelineEnabled()

bool mobilint::ModelConfig::getAsyncPipelineEnabled ( ) const
inline

Returns whether the asynchronous pipeline is enabled in this configuration.

Returns
true if the asynchronous pipeline is enabled; false otherwise.

Definition at line 404 of file type.h.

◆ setActivationSlots()

void mobilint::ModelConfig::setActivationSlots ( int count)

Sets activation buffer slots for multi-activation supported model.

Call this function if you want to set the number of activation buffer slots manually.

If you do not call this function, the default number of activation buffer slots is set differently depending on the CoreMode.

Note
This function has no effect on MXQ file in version earlier than MXQv7.
Currently, LLM model's activation slot is fixed to 1 and ignoring count.
Parameters
[in]countMulti activation counts. Must be >= 1.

◆ getActivationSlots()

int mobilint::ModelConfig::getActivationSlots ( ) const
inline

Returns activation buffer slot count.

Note
This function has no meaning on MXQ file in version earlier than MXQv7.
Returns
Activation buffer slot count.

Definition at line 435 of file type.h.

◆ setGlobalCoreMode()

bool mobilint::ModelConfig::setGlobalCoreMode ( std::vector< Cluster > clusters)

deprecated

Member Data Documentation

◆ early_latencies

std::vector<uint64_t> mobilint::ModelConfig::early_latencies
Deprecated
This setting has no effect.

Definition at line 444 of file type.h.

◆ finish_latencies

std::vector<uint64_t> mobilint::ModelConfig::finish_latencies
Deprecated
This setting has no effect.

Definition at line 448 of file type.h.


The documentation for this class was generated from the following file: