ModelConfig Class Reference#
|
Runtime Library v0.30
Mobilint SDK qb
|
Configures a core mode and core allocation of a model for NPU inference. More...
#include <type.h>
Public Member Functions | |
| ModelConfig () | |
| Default constructor. This default-constructed object is initially set to single-core mode with all NPU local cores included. | |
| bool | setSingleCoreMode (int num_cores) |
| Sets the model to use single-core mode for inference with a specified number of local cores. | |
| bool | setSingleCoreMode (std::vector< CoreId > core_ids) |
| Sets the model to use single-core mode for inference with a specific set of NPU local cores. | |
| bool | setMultiCoreMode (std::vector< Cluster > clusters) |
| Sets the model to use multi-core mode for batch inference. | |
| bool | setGlobal4CoreMode (std::vector< Cluster > clusters) |
| Sets the model to use global4-core mode for inference with a specified set of NPU clusters. | |
| bool | setGlobal8CoreMode () |
| Sets the model to use global8-core mode for inference. | |
| CoreMode | getCoreMode () const |
| Gets the core mode to be applied to the model. | |
| CoreAllocationPolicy | getCoreAllocationPolicy () const |
| Gets the core allocation policy to be applied to the model. | |
| int | getNumCores () const |
| Gets the number of cores to be allocated for the model. | |
| bool | forceSingleNPUBundle (int npu_bundle_index) |
| Forces the use of a specific NPU bundle. | |
| int | getForcedNPUBundleIndex () const |
| Retrieves the index of the forced NPU bundle. | |
| const std::vector< CoreId > & | getCoreIds () const |
| Returns the list of NPU CoreIds to be used for model inference. | |
| void | setAsyncPipelineEnabled (bool enable) |
| Enables or disables the asynchronous pipeline required for asynchronous inference. | |
| bool | getAsyncPipelineEnabled () const |
| Returns whether the asynchronous pipeline is enabled in this configuration. | |
| ModelConfig (int num_cores) | |
| bool | includeAllCores () |
| bool | excludeAllCores () |
| bool | include (Cluster cluster, Core core) |
| bool | include (Cluster cluster) |
| bool | include (Core core) |
| bool | exclude (Cluster cluster, Core core) |
| bool | exclude (Cluster cluster) |
| bool | exclude (Core core) |
| bool | setGlobalCoreMode (std::vector< Cluster > clusters) |
| bool | setAutoMode (int num_cores=1) |
| bool | setManualMode () |
Public Attributes | |
| SchedulePolicy | schedule_policy = SchedulePolicy::FIFO |
| LatencySetPolicy | latency_set_policy = LatencySetPolicy::Auto |
| MaintenancePolicy | maintenance_policy = MaintenancePolicy::Maintain |
| std::vector< uint64_t > | early_latencies |
| std::vector< uint64_t > | finish_latencies |
Detailed Description
Configures a core mode and core allocation of a model for NPU inference.
The ModelConfig class provides methods for setting a core mode and allocating cores for NPU inference. Supported core modes are single-core, multi-core, global4-core, and global8-core. Users can also specify which cores to allocate for the model. Additionally, the configuration offers an option to enforce the use of a specific NPU bundle.
- Note
- Deprecated functions are included for backward compatibility, but it is recommended to use the newer core mode configuration methods.
Constructor & Destructor Documentation
◆ ModelConfig()
|
explicit |
deprecated
Member Function Documentation
◆ setSingleCoreMode() [1/2]
| bool mobilint::ModelConfig::setSingleCoreMode | ( | int | num_cores | ) |
Sets the model to use single-core mode for inference with a specified number of local cores.
In single-core mode, each local core executes model inference independently. The number of cores used is specified by the num_cores parameter, and the core allocation policy is set to CoreAllocationPolicy::Auto, meaning the model will be automatically allocated to available local cores when the model is launched to the NPU, specifically when the Model::launch function is called.
- Parameters
-
[in] num_cores The number of local cores to use for inference.
- Returns
- true if the mode was successfully set, false otherwise.
◆ setSingleCoreMode() [2/2]
| bool mobilint::ModelConfig::setSingleCoreMode | ( | std::vector< CoreId > | core_ids | ) |
Sets the model to use single-core mode for inference with a specific set of NPU local cores.
In single-core mode, each local core executes model inference independently. The user can specify a vector of CoreIds to determine which cores to use for inference.
- Parameters
-
[in] core_ids A vector of CoreIds to be used for model inference.
- Returns
- true if the mode was successfully set, false otherwise.
◆ setMultiCoreMode()
| bool mobilint::ModelConfig::setMultiCoreMode | ( | std::vector< Cluster > | clusters | ) |
Sets the model to use multi-core mode for batch inference.
In multi-core mode, on Aries NPU, the four local cores within a cluster work together to process batch inference tasks efficiently. This mode is optimized for batch processing.
- Parameters
-
[in] clusters A vector of clusters to be used for multi-core batch inference.
- Returns
- true if the mode was successfully set, false otherwise.
◆ setGlobal4CoreMode()
| bool mobilint::ModelConfig::setGlobal4CoreMode | ( | std::vector< Cluster > | clusters | ) |
Sets the model to use global4-core mode for inference with a specified set of NPU clusters.
For Aries NPU, there are two clusters, each consisting of four local cores. In global4-core mode, four local cores within the same cluster work together to execute the model inference.
- Parameters
-
[in] clusters A vector of clusters to be used for model inference.
- Returns
- true if the mode was successfully set, false otherwise.
◆ setGlobal8CoreMode()
| bool mobilint::ModelConfig::setGlobal8CoreMode | ( | ) |
Sets the model to use global8-core mode for inference.
For Aries NPU, there are two clusters, each consisting of four local cores. In global8-core mode, all eight local cores across the two clusters work together to execute the model inference.
- Returns
- true if the mode was successfully set, false otherwise.
◆ getCoreMode()
|
inline |
◆ getCoreAllocationPolicy()
|
inline |
Gets the core allocation policy to be applied to the model.
This reflects the core allocation policy that will be used when the model is created.
- Returns
- The CoreAllocationPolicy to be applied to the model.
◆ getNumCores()
|
inline |
◆ forceSingleNPUBundle()
| bool mobilint::ModelConfig::forceSingleNPUBundle | ( | int | npu_bundle_index | ) |
Forces the use of a specific NPU bundle.
This function forces the selection of a specific NPU bundle. If a non-negative index is provided, the corresponding NPU bundle is selected and runs without CPU offloading. If -1 is provided, all NPU bundles are used with CPU offloading enabled.
- Parameters
-
[in] npu_bundle_index The index of the NPU bundle to force. A non-negative integer selects a specific NPU bundle (runs without CPU offloading), or -1 to enable all NPU bundles with CPU offloading.
- Returns
- true if the index is valid and the NPU bundle is successfully set, false if the index is invalid (less than -1).
◆ getForcedNPUBundleIndex()
|
inline |
Retrieves the index of the forced NPU bundle.
This function returns the index of the NPU bundle that has been forced using the forceSingleNPUBundle function. If no NPU bundle is forced, the returned value will be -1.
- Returns
- The index of the forced NPU bundle, or -1 if no bundle is forced.
◆ getCoreIds()
|
inline |
Returns the list of NPU CoreIds to be used for model inference.
This function returns a reference to the vector of NPU CoreIds that the model will use for inference. When setSingleCoreMode(int num_cores) is called and the core allocation policy is set to CoreAllocationPolicy::Auto, it will return an empty vector.
- Returns
- A constant reference to the vector of NPU CoreIds.
◆ setAsyncPipelineEnabled()
| void mobilint::ModelConfig::setAsyncPipelineEnabled | ( | bool | enable | ) |
Enables or disables the asynchronous pipeline required for asynchronous inference.
Call this function with enable set to true if you intend to use Model::inferAsync or Model::inferAsyncCHW, as the asynchronous pipeline is necessary for their operation.
If you are only using synchronous inference, such as Model::infer or Model::inferCHW, it is recommended to keep the asynchronous pipeline disabled to avoid unnecessary overhead.
- Parameters
-
[in] enable Set to true to enable the asynchronous pipeline; set to false to disable it.
◆ getAsyncPipelineEnabled()
|
inline |
◆ includeAllCores()
| bool mobilint::ModelConfig::includeAllCores | ( | ) |
deprecated
◆ excludeAllCores()
| bool mobilint::ModelConfig::excludeAllCores | ( | ) |
deprecated
◆ include() [1/3]
◆ include() [2/3]
| bool mobilint::ModelConfig::include | ( | Cluster | cluster | ) |
deprecated
◆ include() [3/3]
| bool mobilint::ModelConfig::include | ( | Core | core | ) |
deprecated
◆ exclude() [1/3]
◆ exclude() [2/3]
| bool mobilint::ModelConfig::exclude | ( | Cluster | cluster | ) |
deprecated
◆ exclude() [3/3]
| bool mobilint::ModelConfig::exclude | ( | Core | core | ) |
deprecated
◆ setGlobalCoreMode()
| bool mobilint::ModelConfig::setGlobalCoreMode | ( | std::vector< Cluster > | clusters | ) |
deprecated
◆ setAutoMode()
| bool mobilint::ModelConfig::setAutoMode | ( | int | num_cores = 1 | ) |
deprecated
◆ setManualMode()
| bool mobilint::ModelConfig::setManualMode | ( | ) |
deprecated
Member Data Documentation
◆ schedule_policy
| SchedulePolicy mobilint::ModelConfig::schedule_policy = SchedulePolicy::FIFO |
- Deprecated
- This setting has no effect.
◆ latency_set_policy
| LatencySetPolicy mobilint::ModelConfig::latency_set_policy = LatencySetPolicy::Auto |
- Deprecated
- This setting has no effect.
◆ maintenance_policy
| MaintenancePolicy mobilint::ModelConfig::maintenance_policy = MaintenancePolicy::Maintain |
- Deprecated
- This setting has no effect.
◆ early_latencies
| std::vector<uint64_t> mobilint::ModelConfig::early_latencies |
- Deprecated
- This setting has no effect.
◆ finish_latencies
| std::vector<uint64_t> mobilint::ModelConfig::finish_latencies |
- Deprecated
- This setting has no effect.
The documentation for this class was generated from the following file:
Generated by