Configures a core mode and core allocation of a model for NPU inference. More...

#include <type.h>

Public Member Functions
	ModelConfig ()
	Default constructor. This default-constructed object is initially set to single-core mode with all NPU local cores included.
bool	setSingleCoreMode (int num_cores)
	Sets the model to use single-core mode for inference with a specified number of local cores.
bool	setSingleCoreMode (std::vector< CoreId > core_ids)
	Sets the model to use single-core mode for inference with a specific set of NPU local cores.
bool	setMultiCoreMode (std::vector< Cluster > clusters={Cluster::Cluster0, Cluster::Cluster1})
	Sets the model to use multi-core mode for batch inference.
bool	setGlobal4CoreMode (std::vector< Cluster > clusters={Cluster::Cluster0, Cluster::Cluster1})
	Sets the model to use global4-core mode for inference with a specified set of NPU clusters.
bool	setGlobal8CoreMode ()
	Sets the model to use global8-core mode for inference.
CoreMode	getCoreMode () const
	Gets the core mode to be applied to the model.
CoreAllocationPolicy	getCoreAllocationPolicy () const
	Gets the core allocation policy to be applied to the model.
int	getNumCores () const
	Gets the number of cores to be allocated for the model.
bool	forceSingleNPUBundle (int npu_bundle_index)
	Forces the use of a specific NPU bundle.
int	getForcedNPUBundleIndex () const
	Retrieves the index of the forced NPU bundle.
const std::vector< CoreId > &	getCoreIds () const
	Returns the list of NPU CoreIds to be used for model inference.
const std::vector< Cluster > &	getClusters () const
void	setAsyncPipelineEnabled (bool enable)
	Enables or disables the asynchronous pipeline required for asynchronous inference.
bool	getAsyncPipelineEnabled () const
	Returns whether the asynchronous pipeline is enabled in this configuration.
void	setActivationSlots (int count)
	Sets activation buffer slots for multi-activation supported model.
int	getActivationSlots () const
	Returns activation buffer slot count.
	ModelConfig (int num_cores)
bool	setGlobalCoreMode (std::vector< Cluster > clusters)

Public Attributes
std::vector< uint64_t >	early_latencies
std::vector< uint64_t >	finish_latencies

Detailed Description

Configures a core mode and core allocation of a model for NPU inference.

The ModelConfig class provides methods for setting a core mode and allocating cores for NPU inference. Supported core modes are single-core, multi-core, global4-core, and global8-core. Users can also specify which cores to allocate for the model. Additionally, the configuration offers an option to enforce the use of a specific NPU bundle.

Note: Deprecated functions are included for backward compatibility, but it is recommended to use the newer core mode configuration methods.

Definition at line 233 of file type.h.

Constructor & Destructor Documentation

◆ ModelConfig()

mobilint::ModelConfig::ModelConfig ( int num_cores )

explicit

deprecated

Member Function Documentation

◆ setSingleCoreMode() [1/2]

bool mobilint::ModelConfig::setSingleCoreMode ( int num_cores )

Sets the model to use single-core mode for inference with a specified number of local cores.

In single-core mode, each local core executes model inference independently. The number of cores used is specified by the num_cores parameter, and the core allocation policy is set to CoreAllocationPolicy::Auto, meaning the model will be automatically allocated to available local cores when the model is launched to the NPU, specifically when the Model::launch function is called.

Parameters

[in] num_cores The number of local cores to use for inference.

Returns: true if the mode was successfully set, false otherwise.

◆ setSingleCoreMode() [2/2]

bool mobilint::ModelConfig::setSingleCoreMode ( std::vector< CoreId > core_ids )

Sets the model to use single-core mode for inference with a specific set of NPU local cores.

In single-core mode, each local core executes model inference independently. The user can specify a vector of CoreIds to determine which cores to use for inference.

Parameters

[in] core_ids A vector of CoreIds to be used for model inference.

Returns: true if the mode was successfully set, false otherwise.

◆ setMultiCoreMode()

bool mobilint::ModelConfig::setMultiCoreMode ( std::vector< Cluster > clusters = {Cluster::Cluster0, Cluster::Cluster1} )

Sets the model to use multi-core mode for batch inference.

In multi-core mode, on Aries NPU, the four local cores within a cluster work together to process batch inference tasks efficiently. This mode is optimized for batch processing.

Note: By default, the configuration is set to use all clusters.

Parameters

[in] clusters A vector of clusters to be used for multi-core batch inference.

Returns: true if the mode was successfully set, false otherwise.

◆ setGlobal4CoreMode()

bool mobilint::ModelConfig::setGlobal4CoreMode ( std::vector< Cluster > clusters = {Cluster::Cluster0, Cluster::Cluster1} )

Sets the model to use global4-core mode for inference with a specified set of NPU clusters.

For Aries NPU, there are two clusters, each consisting of four local cores. In global4-core mode, four local cores within the same cluster work together to execute the model inference.

Note: By default, the configuration is set to use all clusters.

Parameters

[in] clusters A vector of clusters to be used for model inference.

Returns: true if the mode was successfully set, false otherwise.

◆ setGlobal8CoreMode()

bool mobilint::ModelConfig::setGlobal8CoreMode ( )

Sets the model to use global8-core mode for inference.

For Aries NPU, there are two clusters, each consisting of four local cores. In global8-core mode, all eight local cores across the two clusters work together to execute the model inference.

Returns: true if the mode was successfully set, false otherwise.

◆ getCoreMode()

CoreMode mobilint::ModelConfig::getCoreMode ( ) const

inline

Gets the core mode to be applied to the model.

This reflects the core mode that will be used when the model is created.

Returns: The CoreMode to be applied to the model.

Definition at line 318 of file type.h.

◆ getCoreAllocationPolicy()

CoreAllocationPolicy mobilint::ModelConfig::getCoreAllocationPolicy ( ) const

inline

Gets the core allocation policy to be applied to the model.

This reflects the core allocation policy that will be used when the model is created.

Returns: The CoreAllocationPolicy to be applied to the model.

Definition at line 328 of file type.h.

◆ getNumCores()

int mobilint::ModelConfig::getNumCores ( ) const

inline

Gets the number of cores to be allocated for the model.

This represents the number of cores that will be allocated for inference when the model is launched to the NPU.

Returns: The number of cores to be allocated for the model.

Definition at line 338 of file type.h.

◆ forceSingleNPUBundle()

bool mobilint::ModelConfig::forceSingleNPUBundle ( int npu_bundle_index )

Forces the use of a specific NPU bundle.

This function forces the selection of a specific NPU bundle. If a non-negative index is provided, the corresponding NPU bundle is selected and runs without CPU offloading. If -1 is provided, all NPU bundles are used with CPU offloading enabled.

Parameters

[in] npu_bundle_index The index of the NPU bundle to force. A non-negative integer selects a specific NPU bundle (runs without CPU offloading), or -1 to enable all NPU bundles with CPU offloading.

Returns: true if the index is valid and the NPU bundle is successfully set, false if the index is invalid (less than -1).

◆ getForcedNPUBundleIndex()

int mobilint::ModelConfig::getForcedNPUBundleIndex ( ) const

inline

Retrieves the index of the forced NPU bundle.

This function returns the index of the NPU bundle that has been forced using the forceSingleNPUBundle function. If no NPU bundle is forced, the returned value will be -1.

Returns: The index of the forced NPU bundle, or -1 if no bundle is forced.

Definition at line 366 of file type.h.

◆ getCoreIds()

const std::vector< CoreId > & mobilint::ModelConfig::getCoreIds ( ) const

inline

Returns the list of NPU CoreIds to be used for model inference.

This function returns a reference to the vector of NPU CoreIds that the model will use for inference. When setSingleCoreMode(int num_cores) is called and the core allocation policy is set to CoreAllocationPolicy::Auto, it will return an empty vector.

Returns: A constant reference to the vector of NPU CoreIds.

Definition at line 378 of file type.h.

◆ getClusters()

const std::vector< Cluster > & mobilint::ModelConfig::getClusters ( ) const

inline

Definition at line 380 of file type.h.

◆ setAsyncPipelineEnabled()

void mobilint::ModelConfig::setAsyncPipelineEnabled ( bool enable )

Enables or disables the asynchronous pipeline required for asynchronous inference.

Call this function with enable set to true if you intend to use Model::inferAsync or Model::inferAsyncCHW, as the asynchronous pipeline is necessary for their operation.

If you are only using synchronous inference, such as Model::infer or Model::inferCHW, it is recommended to keep the asynchronous pipeline disabled to avoid unnecessary overhead.

Parameters

[in] enable Set to true to enable the asynchronous pipeline; set to false to disable it.

◆ getAsyncPipelineEnabled()

bool mobilint::ModelConfig::getAsyncPipelineEnabled ( ) const

inline

Returns whether the asynchronous pipeline is enabled in this configuration.

Returns: true if the asynchronous pipeline is enabled; false otherwise.

Definition at line 404 of file type.h.

◆ setActivationSlots()

void mobilint::ModelConfig::setActivationSlots ( int count )

Sets activation buffer slots for multi-activation supported model.

Call this function if you want to set the number of activation buffer slots manually.

If you do not call this function, the default number of activation buffer slots is set differently depending on the CoreMode.

CoreMode::Single : 2 * (the number of target core ids)
CoreMode::Multi : 2 * (the number of target clusters)
CoreMode::Global4 : 2 * (the number of target clusters)
CoreMode::Global8 : 2

Note: This function has no effect on MXQ file in version earlier than MXQv7.; Currently, LLM model's activation slot is fixed to 1 and ignoring count.

Parameters

[in] count Multi activation counts. Must be >= 1.

◆ getActivationSlots()

int mobilint::ModelConfig::getActivationSlots ( ) const

inline

Returns activation buffer slot count.

Note: This function has no meaning on MXQ file in version earlier than MXQv7.

Returns: Activation buffer slot count.

Definition at line 435 of file type.h.

◆ setGlobalCoreMode()

bool mobilint::ModelConfig::setGlobalCoreMode ( std::vector< Cluster > clusters )

deprecated

Member Data Documentation

◆ early_latencies

std::vector<uint64_t> mobilint::ModelConfig::early_latencies

Deprecated: This setting has no effect.

Definition at line 444 of file type.h.

◆ finish_latencies

std::vector<uint64_t> mobilint::ModelConfig::finish_latencies

Deprecated: This setting has no effect.

Definition at line 448 of file type.h.

The documentation for this class was generated from the following file:

type.h

ModelConfig Class Reference

ModelConfig Class Reference#

Public Member Functions

Public Attributes

Detailed Description

Constructor & Destructor Documentation

◆ ModelConfig()

Member Function Documentation

◆ setSingleCoreMode() [1/2]

◆ setSingleCoreMode() [2/2]

◆ setMultiCoreMode()

◆ setGlobal4CoreMode()

◆ setGlobal8CoreMode()

◆ getCoreMode()

◆ getCoreAllocationPolicy()

◆ getNumCores()

◆ forceSingleNPUBundle()

◆ getForcedNPUBundleIndex()

◆ getCoreIds()

◆ getClusters()

◆ setAsyncPipelineEnabled()

◆ getAsyncPipelineEnabled()

◆ setActivationSlots()

◆ getActivationSlots()

◆ setGlobalCoreMode()

Member Data Documentation

◆ early_latencies

◆ finish_latencies