Model Class Reference

Model Class Reference#

Runtime Library: mobilint::Model Class Reference

Runtime Library v0.30

Mobilint SDK qb

mobilint
Model

Represents an AI model loaded from an MXQ file. More...

#include <model.h>

Public Member Functions
	Model (const Model &other)=delete
	Model (Model &&other) noexcept
Model &	operator= (const Model &rhs)=delete
Model &	operator= (Model &&rhs) noexcept
StatusCode	launch (Accelerator &acc)
	Launches the model on the specified Accelerator, which represents the actual NPU.
StatusCode	dispose ()
	Disposes of the model loaded onto the NPU.
CoreMode	getCoreMode () const
	Retrieves the core mode of the model.
bool	isTarget (CoreId core_id) const
	Checks if the NPU core specified by CoreId is the target of the model. In other words, whether the model is configured to use the given NPU core.
std::vector< CoreId >	getTargetCores () const
	Returns the NPU cores the model is configured to use.
StatusCode	inferSpeedrun (int variant_idx=0)
	Development-only API for measuring pure NPU inference speed.
int	getNumModelVariants () const
	Returns the total number of model variants available in this model.
std::unique_ptr< ModelVariantHandle >	getModelVariantHandle (int variant_idx, StatusCode &sc) const
	Retrieves a handle to the specified model variant.
const std::vector< std::vector< int64_t > > &	getModelInputShape () const
	Returns the input shape of the model.
const std::vector< std::vector< int64_t > > &	getModelOutputShape () const
	Returns the output shape of the model.
const std::vector< BufferInfo > &	getInputBufferInfo () const
	Returns the input buffer information for the model.
const std::vector< BufferInfo > &	getOutputBufferInfo () const
	Returns the output buffer information of the model.
std::vector< Scale >	getInputScale () const
	Returns the input quantization scale(s) of the model.
std::vector< Scale >	getOutputScale () const
	Returns the output quantization scale(s) of the model.
uint32_t	getIdentifier () const
	Returns the model's unique identifier.
std::string	getModelPath () const
	Returns the path to the MXQ model file associated with the Model.
std::vector< CacheInfo >	getCacheInfos () const
	Returns informations of KV-cache of the model.
NHWC float-to-float inference
Performs inference with input and output elements of type float in NHWC (batch N, height H, width W, channels C) or HWC format. Two input-output type pairs are supported: std::vector<NDArray<float>> for both input and output Recommended approach, as NDArray allows the maccel runtime to avoid unnecessary data copies internally. std::vector<float*> for input and std::vector<std::vector<float>> for output Provided for user convenience, but results in unavoidable extra copies within the maccel runtime.
StatusCode	infer (const std::vector< NDArray< float > > &input, std::vector< NDArray< float > > &output)
	Performs inference.
std::vector< NDArray< float > >	infer (const std::vector< NDArray< float > > &input, StatusCode &sc)
	This overload differs from the above function in that it directly returns the inference results instead of modifying an output parameter.
StatusCode	infer (const std::vector< float * > &input, std::vector< std::vector< float > > &output)
	This overload is provided for convenience but may result in additional data copies within the maccel runtime.
std::vector< std::vector< float > >	infer (const std::vector< float * > &input, StatusCode &sc)
	This overload is provided for convenience but may result in additional data copies within the maccel runtime.
StatusCode	infer (const std::vector< float * > &input, std::vector< std::vector< float > > &output, const std::vector< std::vector< int64_t > > &shape)
	This overload is provided for convenience but may result in additional data copies within the maccel runtime.
std::vector< std::vector< float > >	infer (const std::vector< float * > &input, const std::vector< std::vector< int64_t > > &shape, StatusCode &sc)
	This overload is provided for convenience but may result in additional data copies within the maccel runtime.
StatusCode	infer (const std::vector< NDArray< float > > &input, std::vector< NDArray< float > > &output, uint32_t cache_size)
	This overload supports inference with KV cache.
std::vector< NDArray< float > >	infer (const std::vector< NDArray< float > > &input, uint32_t cache_size, StatusCode &sc)
	This overload supports inference with KV cache.
StatusCode	infer (const std::vector< float * > &input, std::vector< std::vector< float > > &output, const std::vector< std::vector< int64_t > > &shape, uint32_t cache_size)
	This overload supports inference with KV cache.
std::vector< std::vector< float > >	infer (const std::vector< float * > &input, const std::vector< std::vector< int64_t > > &shape, uint32_t cache_size, StatusCode &sc)
	This overload supports inference with KV cache.
NCHW float-to-float inference
Performs inference with input and output elements of type float in NCHW (batch N, channels C, height H, width W) or CHW format. Two input-output type pairs are supported: std::vector<NDArray<float>> for both input and output Recommended approach, as NDArray allows the maccel runtime to avoid unnecessary data copies internally. std::vector<float*> for input and std::vector<std::vector<float>> for output Provided for user convenience, but results in unavoidable extra copies within the maccel runtime. Note CHW is not the recommended format, as the NPU natively operates on HWC-ordered data. When input is provided in CHW format, it will be transposed internally, introducing additional overhead. If your data is in HWC format, use Model::infer instead of Model::inferCHW, as it avoids unnecessary format conversion.
StatusCode	inferCHW (const std::vector< NDArray< float > > &input, std::vector< NDArray< float > > &output)
	Performs inference.
std::vector< NDArray< float > >	inferCHW (const std::vector< NDArray< float > > &input, StatusCode &sc)
	This overload differs from the above function in that it directly returns the inference results instead of modifying an output parameter.
StatusCode	inferCHW (const std::vector< float * > &input, std::vector< std::vector< float > > &output)
	This overload is provided for convenience but may result in additional data copies within the maccel runtime.
std::vector< std::vector< float > >	inferCHW (const std::vector< float * > &input, StatusCode &sc)
	This overload is provided for convenience but may result in additional data copies within the maccel runtime.
StatusCode	inferCHW (const std::vector< float * > &input, std::vector< std::vector< float > > &output, const std::vector< std::vector< int64_t > > &shape)
	This overload is provided for convenience but may result in additional data copies within the maccel runtime.
std::vector< std::vector< float > >	inferCHW (const std::vector< float * > &input, const std::vector< std::vector< int64_t > > &shape, StatusCode &sc)
	This overload is provided for convenience but may result in additional data copies within the maccel runtime.
StatusCode	inferCHW (const std::vector< NDArray< float > > &input, std::vector< NDArray< float > > &output, uint32_t cache_size)
	This overload supports inference with KV cache.
std::vector< NDArray< float > >	inferCHW (const std::vector< NDArray< float > > &input, uint32_t cache_size, StatusCode &sc)
	This overload supports inference with KV cache.
StatusCode	inferCHW (const std::vector< float * > &input, std::vector< std::vector< float > > &output, const std::vector< std::vector< int64_t > > &shape, uint32_t cache_size)
	This overload supports inference with KV cache.
std::vector< std::vector< float > >	inferCHW (const std::vector< float * > &input, const std::vector< std::vector< int64_t > > &shape, uint32_t cache_size, StatusCode &sc)
	This overload supports inference with KV cache.
NHWC int8_t-to-int8_t inference
Performs inference with input and output elements of type int8_t in NHWC (batch N, height H, width W, channels C) or HWC format. Using these inference APIs requires manual scaling (quantization) of float values to int8_t for input and int8_t to float for output. Note These APIs are intended for advanced use rather than typical usage.
StatusCode	infer (const std::vector< NDArray< int8_t > > &input, std::vector< NDArray< int8_t > > &output)
std::vector< NDArray< int8_t > >	infer (const std::vector< NDArray< int8_t > > &input, StatusCode &sc)
StatusCode	infer (const std::vector< int8_t * > &input, std::vector< std::vector< int8_t > > &output)
std::vector< std::vector< int8_t > >	infer (const std::vector< int8_t * > &input, StatusCode &sc)
StatusCode	infer (const std::vector< int8_t * > &input, std::vector< std::vector< int8_t > > &output, const std::vector< std::vector< int64_t > > &shape)
std::vector< std::vector< int8_t > >	infer (const std::vector< int8_t * > &input, const std::vector< std::vector< int64_t > > &shape, StatusCode &sc)
StatusCode	infer (const std::vector< NDArray< int8_t > > &input, std::vector< NDArray< int8_t > > &output, uint32_t cache_size)
std::vector< NDArray< int8_t > >	infer (const std::vector< NDArray< int8_t > > &input, uint32_t cache_size, StatusCode &sc)
StatusCode	infer (const std::vector< int8_t * > &input, std::vector< std::vector< int8_t > > &output, const std::vector< std::vector< int64_t > > &shape, uint32_t cache_size)
std::vector< std::vector< int8_t > >	infer (const std::vector< int8_t * > &input, const std::vector< std::vector< int64_t > > &shape, uint32_t cache_size, StatusCode &sc)
NCHW int8_t-to-int8_t inference
Performs inference with input and output elements of type int8_t in NCHW (batch N, channels C, height H, width W) or CHW format. Using these inference APIs requires manual scaling (quantization) of float values to int8_t for input and int8_t to float for output. Note These APIs are intended for advanced use rather than typical usage.
StatusCode	inferCHW (const std::vector< NDArray< int8_t > > &input, std::vector< NDArray< int8_t > > &output)
std::vector< NDArray< int8_t > >	inferCHW (const std::vector< NDArray< int8_t > > &input, StatusCode &sc)
StatusCode	inferCHW (const std::vector< int8_t * > &input, std::vector< std::vector< int8_t > > &output)
std::vector< std::vector< int8_t > >	inferCHW (const std::vector< int8_t * > &input, StatusCode &sc)
StatusCode	inferCHW (const std::vector< int8_t * > &input, std::vector< std::vector< int8_t > > &output, const std::vector< std::vector< int64_t > > &shape)
std::vector< std::vector< int8_t > >	inferCHW (const std::vector< int8_t * > &input, const std::vector< std::vector< int64_t > > &shape, StatusCode &sc)
StatusCode	inferCHW (const std::vector< NDArray< int8_t > > &input, std::vector< NDArray< int8_t > > &output, uint32_t cache_size)
std::vector< NDArray< int8_t > >	inferCHW (const std::vector< NDArray< int8_t > > &input, uint32_t cache_size, StatusCode &sc)
StatusCode	inferCHW (const std::vector< int8_t * > &input, std::vector< std::vector< int8_t > > &output, const std::vector< std::vector< int64_t > > &shape, uint32_t cache_size)
std::vector< std::vector< int8_t > >	inferCHW (const std::vector< int8_t * > &input, const std::vector< std::vector< int64_t > > &shape, uint32_t cache_size, StatusCode &sc)
NHWC int8_t-to-float inference
Performs inference with input and output elements of type int8_t in NHWC (batch N, height H, width W, channels C) or HWC format. Using these inference APIs requires manual scaling (quantization) of float values to int8_t for input. Note These APIs are intended for advanced use rather than typical usage.
std::vector< NDArray< float > >	inferToFloat (const std::vector< NDArray< int8_t > > &input, StatusCode &sc)
std::vector< std::vector< float > >	inferToFloat (const std::vector< int8_t * > &input, StatusCode &sc)
std::vector< std::vector< float > >	inferToFloat (const std::vector< int8_t * > &input, const std::vector< std::vector< int64_t > > &shape, StatusCode &sc)
std::vector< NDArray< float > >	inferToFloat (const std::vector< NDArray< int8_t > > &input, uint32_t cache_size, StatusCode &sc)
std::vector< std::vector< float > >	inferToFloat (const std::vector< int8_t * > &input, const std::vector< std::vector< int64_t > > &shape, uint32_t cache_size, StatusCode &sc)
NCHW int8_t-to-float inference
Performs inference with input and output elements of type int8_t in NCHW (batch N, channels C, height H, width W) or CHW format. Using these inference APIs requires manual scaling (quantization) of float values to int8_t for input. Note These APIs are intended for advanced use rather than typical usage.
std::vector< NDArray< float > >	inferCHWToFloat (const std::vector< NDArray< int8_t > > &input, StatusCode &sc)
std::vector< std::vector< float > >	inferCHWToFloat (const std::vector< int8_t * > &input, StatusCode &sc)
std::vector< std::vector< float > >	inferCHWToFloat (const std::vector< int8_t * > &input, const std::vector< std::vector< int64_t > > &shape, StatusCode &sc)
std::vector< NDArray< float > >	inferCHWToFloat (const std::vector< NDArray< int8_t > > &input, uint32_t cache_size, StatusCode &sc)
std::vector< std::vector< float > >	inferCHWToFloat (const std::vector< int8_t * > &input, const std::vector< std::vector< int64_t > > &shape, uint32_t cache_size, StatusCode &sc)
NHWC Buffer-to-Buffer inference
Performs inference using input and output elements in the NPU’s internal data type. The inference operates on buffers allocated via the following APIs: Model::acquireInputBuffer Model::acquireOutputBuffer Model::acquireInputBuffers Model::acquireOutputBuffers ModelVariantHandle::acquireInputBuffer ModelVariantHandle::acquireOutputBuffer ModelVariantHandle::acquireInputBuffers ModelVariantHandle::acquireOutputBuffers Additionally, Model::repositionInputs, Model::repositionOutputs, ModelVariantHandle::repositionInputs and ModelVariantHandle::repositionOutputs must be used properly. Note These APIs are intended for advanced use rather than typical usage.
StatusCode	inferBuffer (const std::vector< Buffer > &input, std::vector< Buffer > &output, const std::vector< std::vector< int64_t > > &shape={}, uint32_t cache_size=0)
StatusCode	inferBuffer (const std::vector< std::vector< Buffer > > &input, std::vector< std::vector< Buffer > > &output, const std::vector< std::vector< int64_t > > &shape={}, uint32_t cache_size=0)
NHWC Buffer-to-float inference
Performs inference using input and output elements in the NPU’s internal data type. The inference operates on buffers allocated via the following APIs: Model::acquireInputBuffer Model::acquireInputBuffers ModelVariantHandle::acquireInputBuffer ModelVariantHandle::acquireInputBuffers Additionally, Model::repositionInputs and ModelVariantHandle::repositionInputs must be used properly. Note These APIs are intended for advanced use rather than typical usage.
StatusCode	inferBufferToFloat (const std::vector< Buffer > &input, std::vector< NDArray< float > > &output, const std::vector< std::vector< int64_t > > &shape={}, uint32_t cache_size=0)
StatusCode	inferBufferToFloat (const std::vector< std::vector< Buffer > > &input, std::vector< NDArray< float > > &output, const std::vector< std::vector< int64_t > > &shape={}, uint32_t cache_size=0)
StatusCode	inferBufferToFloat (const std::vector< Buffer > &input, std::vector< std::vector< float > > &output, const std::vector< std::vector< int64_t > > &shape={}, uint32_t cache_size=0)
StatusCode	inferBufferToFloat (const std::vector< std::vector< Buffer > > &input, std::vector< std::vector< float > > &output, const std::vector< std::vector< int64_t > > &shape={}, uint32_t cache_size=0)
Asynchronous Inference
Performs inference asynchronously. To use asynchronous inference, the model must be created using a ModelConfig object with the async pipeline configured to be enabled. This is done by calling ModelConfig::setAsyncPipelineEnabled(true) before passing the configuration to Model::create. Example: using namespace mobilint; ModelConfig mc; // Enables support for `inferAsync` and `inferAsyncCHW` mc.setAsyncPipelineEnabled(true); StatusCode sc; std::unique_ptr<Model> model = Model::create("resnet50.mxq", mc, sc); if (!sc) { // Handle error appropriately } // Now `inferAsync` can be called. Future future = model->inferAsync(input, sc); mobilint::Future Represents a future for retrieving the result of asynchronous inference. Definition future.h:43 mobilint::ModelConfig Configures a core mode and core allocation of a model for NPU inference. Definition type.h:257 mobilint::ModelConfig::setAsyncPipelineEnabled void setAsyncPipelineEnabled(bool enable) Enables or disables the asynchronous pipeline required for asynchronous inference. mobilint::Model::create static std::unique_ptr< Model > create(const std::string &mxq_path, StatusCode &sc) Creates a Model object from the specified MXQ model file. mobilint::StatusCode StatusCode Enumerates status codes for the maccel runtime. Definition status_code.h:26 Note Functions in the inferAsync family (inferAsync, inferAsyncCHW, inferAsyncToFloat, inferAsyncCHWToFloat) typically return immediately. However, they may block if the input queue in the maccel runtime is full. For all functions in the inferAsync family (inferAsync, inferAsyncCHW, inferAsyncToFloat, inferAsyncCHWToFloat), the data provided through the input parameter must remain unmodified until the asynchronous inference has completed. Modifying this data during execution may result in invalid results. Currently, only CNN-based models are supported, as asynchronous execution is particularly effective for this type of workload. Limitations: RNN/LSTM and LLM models are not supported yet. Models requiring CPU offloading are not supported yet. Currently, only single-batch inference is supported (i.e., N = 1). Currently, Buffer inference is not supported. The following types are supported in the synchronous API for advanced use cases, but are not yet available for asynchronous inference: Buffer to Buffer Buffer to float
Future< float >	inferAsync (const std::vector< NDArray< float > > &input, StatusCode &sc)
	Initiates asynchronous inference with input in NHWC (batch N, height H, width W, channels C) or HWC format.
Future< float >	inferAsyncCHW (const std::vector< NDArray< float > > &input, StatusCode &sc)
	Initiates asynchronous inference with input in NCHW (batch N, channels C, height H, width W) or CHW format.
Future< int8_t >	inferAsync (const std::vector< NDArray< int8_t > > &input, StatusCode &sc)
	This overload supports int8_t-to-int8_t asynchronous inference.
Future< int8_t >	inferAsyncCHW (const std::vector< NDArray< int8_t > > &input, StatusCode &sc)
	This overload supports int8_t-to-int8_t asynchronous inference.
Future< float >	inferAsyncToFloat (const std::vector< NDArray< int8_t > > &input, StatusCode &sc)
	This overload supports int8_t-to-float asynchronous inference.
Future< float >	inferAsyncCHWToFloat (const std::vector< NDArray< int8_t > > &input, StatusCode &sc)
	This overload supports int8_t-to-float asynchronous inference.
Buffer Management APIs
These APIs are required when calling Model::inferBuffer or Model::inferBufferToFloat. Buffers are acquired using: acquireInputBuffer acquireOutputBuffer Any acquired buffer must be released using: releaseBuffer releaseBuffers Repositioning is handled by: repositionInputs repositionOutputs Note These APIs are intended for advanced use rather than typical usage.
std::vector< Buffer >	acquireInputBuffer (const std::vector< std::vector< int > > &seqlens={}) const
std::vector< Buffer >	acquireOutputBuffer (const std::vector< std::vector< int > > &seqlens={}) const
std::vector< std::vector< Buffer > >	acquireInputBuffers (const int batch_size, const std::vector< std::vector< int > > &seqlens={}) const
std::vector< std::vector< Buffer > >	acquireOutputBuffers (const int batch_size, const std::vector< std::vector< int > > &seqlens={}) const
StatusCode	releaseBuffer (std::vector< Buffer > &buffer) const
StatusCode	releaseBuffers (std::vector< std::vector< Buffer > > &buffers) const
StatusCode	repositionInputs (const std::vector< float * > &input, std::vector< Buffer > &input_buf, const std::vector< std::vector< int > > &seqlens={}) const
StatusCode	repositionOutputs (const std::vector< Buffer > &output_buf, std::vector< float * > &output, const std::vector< std::vector< int > > &seqlens={}) const
StatusCode	repositionOutputs (const std::vector< Buffer > &output_buf, std::vector< std::vector< float > > &output, const std::vector< std::vector< int > > &seqlens={}) const
StatusCode	repositionInputs (const std::vector< float * > &input, std::vector< std::vector< Buffer > > &input_buf, const std::vector< std::vector< int > > &seqlens={}) const
StatusCode	repositionOutputs (const std::vector< std::vector< Buffer > > &output_buf, std::vector< float * > &output, const std::vector< std::vector< int > > &seqlens={}) const
StatusCode	repositionOutputs (const std::vector< std::vector< Buffer > > &output_buf, std::vector< std::vector< float > > &output, const std::vector< std::vector< int > > &seqlens={}) const
KV Cache Management
Note These APIs are used for LLM models that utilize KV cache.
void	resetCacheMemory ()
	Resets the KV cache memory.
StatusCode	dumpCacheMemory (std::vector< std::vector< int8_t > > &bufs)
	Dumps the KV cache memory into buffers.
std::vector< std::vector< int8_t > >	dumpCacheMemory (StatusCode &sc)
	Dumps the KV cache memory into buffers.
StatusCode	dumpCacheMemory (const std::string &cache_dir)
	Dumps KV cache memory to files in the specified directory.
StatusCode	loadCacheMemory (const std::vector< std::vector< int8_t > > &bufs)
	Loads the KV cache memory from buffers.
StatusCode	loadCacheMemory (const std::string &cache_dir)
	Loads the KV cache memory from files in the specified directory.
int	filterCacheTail (int cache_size, int tail_size, const std::vector< bool > &mask, StatusCode &sc)
	Filter the tail of the KV cache memory.
int	moveCacheTail (int num_head, int num_tail, int cache_size, StatusCode &sc)
	Moves the tail of the KV cache memory to the end of the head.
Deprecated APIs
Note These APIs are deprecated and should not be used.
StatusCode	infer (const std::vector< float * > &input, std::vector< std::vector< float > > &output, int batch_size)
std::vector< std::vector< float > >	infer (const std::vector< float * > &input, int batch_size, StatusCode &sc)
StatusCode	inferHeightBatch (const std::vector< float * > &input, std::vector< std::vector< float > > &output, int height_batch_size)
SchedulePolicy	getSchedulePolicy () const
LatencySetPolicy	getLatencySetPolicy () const
MaintenancePolicy	getMaintenancePolicy () const
uint64_t	getLatencyConsumed (const int npu_op_idx) const
uint64_t	getLatencyFinished (const int npu_op_idx) const
std::shared_ptr< Statistics >	getStatistics () const

Static Public Member Functions
static std::unique_ptr< Model >	create (const std::string &mxq_path, StatusCode &sc)
	Creates a Model object from the specified MXQ model file.
static std::unique_ptr< Model >	create (const std::string &mxq_path, const ModelConfig &config, StatusCode &sc)
	Creates a Model object from the specified MXQ model file and configuration.

Friends
class	Accelerator

Detailed Description

Represents an AI model loaded from an MXQ file.

This class loads an AI model from an MXQ file and provides functions to launch it on the NPU and perform inference.

Definition at line 40 of file model.h.

Member Function Documentation

◆ create() [1/2]

std::unique_ptr< Model > mobilint::Model::create	(	const std::string &	mxq_path,
		StatusCode &	sc )

static

Creates a Model object from the specified MXQ model file.

Parses the MXQ file and constructs a Model object. The model is initialized in single-core mode with all NPU local cores included.

Note: The created Model object must be launched before performing inference. See Model::launch for more details.

Parameters

[in]	mxq_path	The path to the MXQ model file.
[out]	sc	A reference to a status code that will be updated to indicate whether the model was successfully created or if an error occurred.

Returns: A unique pointer to the created Model object.

◆ create() [2/2]

std::unique_ptr< Model > mobilint::Model::create	(	const std::string &	mxq_path,
		const ModelConfig &	config,
		StatusCode &	sc )

static

Creates a Model object from the specified MXQ model file and configuration.

Parses the MXQ file and constructs a Model object using the provided configuration, initializing the model with the given settings.

Note: The created Model object must be launched before performing inference. See Model::launch for more details.

Parameters

[in]	mxq_path	The path to the MXQ model file.
[in]	config	The configuration settings to initialize the Model.
[out]	sc	A reference to a status code that will be updated to indicate whether the model was successfully created or if an error occurred.

Returns: A unique pointer to the created Model object.

◆ launch()

StatusCode mobilint::Model::launch ( Accelerator & acc )

Launches the model on the specified Accelerator, which represents the actual NPU.

Parameters

[in] acc The accelerator on which to launch the model.

Returns: A status code indicating whether the model was successfully launched or if an error occurred.

◆ dispose()

StatusCode mobilint::Model::dispose ( )

Disposes of the model loaded onto the NPU.

Releases any resources associated with the model on the NPU.

Returns: A status code indicating whether the disposal was successful or if an error occurred.

◆ getCoreMode()

CoreMode mobilint::Model::getCoreMode ( ) const

Retrieves the core mode of the model.

Returns: The CoreMode of the model.

◆ isTarget()

bool mobilint::Model::isTarget ( CoreId core_id ) const

Checks if the NPU core specified by CoreId is the target of the model. In other words, whether the model is configured to use the given NPU core.

Parameters

[in] core_id The CoreId to check.

Returns: True if the model is configured to use the specified CoreId, false otherwise.

◆ getTargetCores()

std::vector< CoreId > mobilint::Model::getTargetCores ( ) const

Returns the NPU cores the model is configured to use.

Returns: A vector of CoreIds representing the target NPU cores.

◆ infer() [1/12]

StatusCode mobilint::Model::infer	(	const std::vector< NDArray< float > > &	input,
		std::vector< NDArray< float > > &	output )

Performs inference.

Parameters

[in]	input	A vector of NDArray<float>. Each NDArray must be in NHWC or HWC format.
[out]	output	A reference to a vector of NDArray<float> that will store the inference results.

Returns: A status code indicating whether the inference operation completed successfully or encountered an error.

◆ infer() [2/12]

std::vector< NDArray< float > > mobilint::Model::infer	(	const std::vector< NDArray< float > > &	input,
		StatusCode &	sc )

This overload differs from the above function in that it directly returns the inference results instead of modifying an output parameter.

Parameters

[in]	input	A vector of NDArray<float>. Each NDArray must be in NHWC or HWC format.
[out]	sc	A reference to a status code that will be updated to indicate whether the inference operation was successful or encountered an error.

Returns: A vector of NDArray<float> containing the inference results.

◆ infer() [3/12]

StatusCode mobilint::Model::infer	(	const std::vector< float * > &	input,
		std::vector< std::vector< float > > &	output )

This overload is provided for convenience but may result in additional data copies within the maccel runtime.

Parameters

[in]	input	A vector of float pointers, where each pointer represents input data in HWC format.
[out]	output	A reference to a vector of float vectors that will store the inference results.

Returns: A status code indicating whether the inference operation completed successfully or encountered an error.

◆ infer() [4/12]

std::vector< std::vector< float > > mobilint::Model::infer	(	const std::vector< float * > &	input,
		StatusCode &	sc )

This overload is provided for convenience but may result in additional data copies within the maccel runtime.

Unlike the above overload, this function returns the inference results directly instead of modifying an output parameter.

Parameters

[in]	input	A vector of float pointers, where each pointer represents input data in HWC format.
[out]	sc	A reference to a status code that will be updated to indicate whether the inference operation was successful or encountered an error.

Returns: A vector of float vectors containing the inference results.

◆ infer() [5/12]

StatusCode mobilint::Model::infer	(	const std::vector< float * > &	input,
		std::vector< std::vector< float > > &	output,
		const std::vector< std::vector< int64_t > > &	shape )

This overload is provided for convenience but may result in additional data copies within the maccel runtime.

Unlike other overloads, this version allows explicitly specifying the shape of each input data, which can be in NHWC or HWC format.

Parameters

[in]	input	A vector of float pointers, where each pointer represents input data in NHWC or HWC format.
[out]	output	A reference to a vector of float vectors that will store the inference results.
[in]	shape	A vector of vectors, where each inner vector specifies the shape of the corresponding input data.

Returns: A status code indicating whether the inference operation completed successfully or encountered an error.

◆ infer() [6/12]

std::vector< std::vector< float > > mobilint::Model::infer	(	const std::vector< float * > &	input,
		const std::vector< std::vector< int64_t > > &	shape,
		StatusCode &	sc )

This overload is provided for convenience but may result in additional data copies within the maccel runtime.

Unlike the above overload, this function returns the inference results directly instead of modifying an output parameter.

Parameters

[in]	input	A vector of float pointers, where each pointer represents input data in NHWC or HWC format.
[in]	shape	A vector of vectors, where each inner vector specifies the shape of the corresponding input data.
[out]	sc	A reference to a status code that will be updated to indicate whether the inference operation was successful or encountered an error.

Returns: A vector of float vectors containing the inference results.

◆ infer() [7/12]

StatusCode mobilint::Model::infer	(	const std::vector< NDArray< float > > &	input,
		std::vector< NDArray< float > > &	output,
		uint32_t	cache_size )

This overload supports inference with KV cache.

Note: This function is relevant for LLM models that use KV cache.

Parameters

[in]	input	A vector of NDArrays, where each NDArray represents input data in NHWC or HWC format.
[out]	output	A reference to a vector of NDArrays that will store the inference results.
[in]	cache_size	The number of tokens accumulated in the KV cache so far.

Returns: A status code indicating whether the inference operation completed successfully or encountered an error.

◆ infer() [8/12]

std::vector< NDArray< float > > mobilint::Model::infer	(	const std::vector< NDArray< float > > &	input,
		uint32_t	cache_size,
		StatusCode &	sc )

This overload supports inference with KV cache.

Unlike the above overload, this function returns the inference results directly instead of modifying an output parameter.

Note: This function is relevant for LLM models that use KV cache.

Parameters

[in]	input	A vector of NDArrays, where each NDArray represents input data in NHWC or HWC format.
[in]	cache_size	The number of tokens accumulated in the KV cache so far.
[out]	sc	A reference to a status code that will be updated to indicate whether the inference operation was successful or encountered an error.

Returns: A vector of NDArrays containing the inference results.

◆ infer() [9/12]

StatusCode mobilint::Model::infer	(	const std::vector< float * > &	input,
		std::vector< std::vector< float > > &	output,
		const std::vector< std::vector< int64_t > > &	shape,
		uint32_t	cache_size )

This overload supports inference with KV cache.

Note: This function is relevant for LLM models that use KV cache.

Parameters

[in]	input	A vector of float pointers, where each pointer represents input data in NHWC or HWC format.
[out]	output	A reference to a vector of float vectors that will store the inference results.
[in]	shape	A vector of vectors, where each inner vector specifies the shape of the corresponding input data.
[in]	cache_size	The number of tokens accumulated in the KV cache so far.

Returns: A status code indicating whether the inference operation completed successfully or encountered an error.

◆ infer() [10/12]

std::vector< std::vector< float > > mobilint::Model::infer	(	const std::vector< float * > &	input,
		const std::vector< std::vector< int64_t > > &	shape,
		uint32_t	cache_size,
		StatusCode &	sc )

This overload supports inference with KV cache.

Unlike the above overload, this function returns the inference results directly instead of modifying an output parameter.

Note: This function is relevant for LLM models that use KV cache.

Parameters

[in]	input	A vector of float pointers, where each pointer represents input data in NHWC or HWC format.
[in]	shape	A vector of vectors, where each inner vector specifies the shape of the corresponding input data.
[in]	cache_size	The number of tokens accumulated in the KV cache so far.
[out]	sc	A reference to a status code that will be updated to indicate whether the inference operation was successful or encountered an error.

Returns: A vector of float vectors containing the inference results.

◆ inferCHW() [1/10]

StatusCode mobilint::Model::inferCHW	(	const std::vector< NDArray< float > > &	input,
		std::vector< NDArray< float > > &	output )

Performs inference.

Parameters

[in]	input	A vector of NDArray<float>. Each NDArray must be in NCHW or CHW format.
[out]	output	A reference to a vector of NDArray<float> that will store the inference results.

Returns: A status code indicating whether the inference operation completed successfully or encountered an error.

◆ inferCHW() [2/10]

std::vector< NDArray< float > > mobilint::Model::inferCHW	(	const std::vector< NDArray< float > > &	input,
		StatusCode &	sc )

This overload differs from the above function in that it directly returns the inference results instead of modifying an output parameter.

Parameters

[in]	input	A vector of NDArray<float>. Each NDArray must be in NCHW or CHW format.
[out]	sc	A reference to a status code that will be updated to indicate whether the inference operation was successful or encountered an error.

Returns: A vector of NDArray<float> containing the inference results.

◆ inferCHW() [3/10]

StatusCode mobilint::Model::inferCHW	(	const std::vector< float * > &	input,
		std::vector< std::vector< float > > &	output )

This overload is provided for convenience but may result in additional data copies within the maccel runtime.

Parameters

[in]	input	A vector of float pointers, where each pointer represents input data in CHW format.
[out]	output	A reference to a vector of float vectors that will store the inference results.

Returns: A status code indicating whether the inference operation completed successfully or encountered an error.

◆ inferCHW() [4/10]

std::vector< std::vector< float > > mobilint::Model::inferCHW	(	const std::vector< float * > &	input,
		StatusCode &	sc )

This overload is provided for convenience but may result in additional data copies within the maccel runtime.

Unlike the above overload, this function returns the inference results directly instead of modifying an output parameter.

Parameters

[in]	input	A vector of float pointers, where each pointer represents input data in CHW format.
[out]	sc	A reference to a status code that will be updated to indicate whether the inference operation was successful or encountered an error.

Returns: A vector of float vectors containing the inference results.

◆ inferCHW() [5/10]

StatusCode mobilint::Model::inferCHW	(	const std::vector< float * > &	input,
		std::vector< std::vector< float > > &	output,
		const std::vector< std::vector< int64_t > > &	shape )

This overload is provided for convenience but may result in additional data copies within the maccel runtime.

Unlike other overloads, this version allows explicitly specifying the shape of each input data, which can be in NCHW or CHW format.

Parameters

[in]	input	A vector of float pointers, where each pointer represents input data in NCHW or CHW format.
[out]	output	A reference to a vector of float vectors that will store the inference results.
[in]	shape	A vector of vectors, where each inner vector specifies the shape of the corresponding input data.

Returns: A status code indicating whether the inference operation completed successfully or encountered an error.

◆ inferCHW() [6/10]

std::vector< std::vector< float > > mobilint::Model::inferCHW	(	const std::vector< float * > &	input,
		const std::vector< std::vector< int64_t > > &	shape,
		StatusCode &	sc )

This overload is provided for convenience but may result in additional data copies within the maccel runtime.

Unlike the above overload, this function returns the inference results directly instead of modifying an output parameter.

Parameters

[in]	input	A vector of float pointers, where each pointer represents input data in NCHW or CHW format.
[in]	shape	A vector of vectors, where each inner vector specifies the shape of the corresponding input data.
[out]	sc	A reference to a status code that will be updated to indicate whether the inference operation was successful or encountered an error.

Returns: A vector of float vectors containing the inference results.

◆ inferCHW() [7/10]

StatusCode mobilint::Model::inferCHW	(	const std::vector< NDArray< float > > &	input,
		std::vector< NDArray< float > > &	output,
		uint32_t	cache_size )

This overload supports inference with KV cache.

Note: This function is relevant for LLM models that use KV cache.

Parameters

[in]	input	A vector of NDArrays, where each NDArray represents input data in NCHW or CHW format.
[out]	output	A reference to a vector of NDArrays that will store the inference results.
[in]	cache_size	The number of tokens accumulated in the KV cache so far.

Returns: A status code indicating whether the inference operation completed successfully or encountered an error.

◆ inferCHW() [8/10]

std::vector< NDArray< float > > mobilint::Model::inferCHW	(	const std::vector< NDArray< float > > &	input,
		uint32_t	cache_size,
		StatusCode &	sc )

This overload supports inference with KV cache.

Unlike the above overload, this function returns the inference results directly instead of modifying an output parameter.

Note: This function is relevant for LLM models that use KV cache.

Parameters

[in]	input	A vector of NDArrays, where each NDArray represents input data in NCHW or CHW format.
[in]	cache_size	The number of tokens accumulated in the KV cache so far.
[out]	sc	A reference to a status code that will be updated to indicate whether the inference operation was successful or encountered an error.

Returns: A vector of NDArrays containing the inference results.

◆ inferCHW() [9/10]

StatusCode mobilint::Model::inferCHW	(	const std::vector< float * > &	input,
		std::vector< std::vector< float > > &	output,
		const std::vector< std::vector< int64_t > > &	shape,
		uint32_t	cache_size )

This overload supports inference with KV cache.

Note: This function is relevant for LLM models that use KV cache.

Parameters

[in]	input	A vector of float pointers, where each pointer represents input data in NCHW or CHW format.
[out]	output	A reference to a vector of float vectors that will store the inference results.
[in]	shape	A vector of vectors, where each inner vector specifies the shape of the corresponding input data.
[in]	cache_size	The number of tokens accumulated in the KV cache so far.

Returns: A status code indicating whether the inference operation completed successfully or encountered an error.

◆ inferCHW() [10/10]

std::vector< std::vector< float > > mobilint::Model::inferCHW	(	const std::vector< float * > &	input,
		const std::vector< std::vector< int64_t > > &	shape,
		uint32_t	cache_size,
		StatusCode &	sc )

This overload supports inference with KV cache.

Unlike the above overload, this function returns the inference results directly instead of modifying an output parameter.

Note: This function is relevant for LLM models that use KV cache.

Parameters

[in]	input	A vector of float pointers, where each pointer represents input data in NCHW or CHW format.
[in]	shape	A vector of vectors, where each inner vector specifies the shape of the corresponding input data.
[in]	cache_size	The number of tokens accumulated in the KV cache so far.
[out]	sc	A reference to a status code that will be updated to indicate whether the inference operation was successful or encountered an error.

Returns: A vector of float vectors containing the inference results.

◆ inferSpeedrun()

StatusCode mobilint::Model::inferSpeedrun ( int variant_idx = 0 )

Development-only API for measuring pure NPU inference speed.

Runs NPU inference without uploading inputs and without retrieving outputs.

Parameters

[in] variant_idx Index of model variant to run

Returns: A status code indicating the result.

◆ inferAsync() [1/2]

Future< float > mobilint::Model::inferAsync	(	const std::vector< NDArray< float > > &	input,
		StatusCode &	sc )

Initiates asynchronous inference with input in NHWC (batch N, height H, width W, channels C) or HWC format.

Parameters

[in]	input	A vector of NDArray<float>. Each NDArray must be in NHWC or HWC format.
[out]	sc	A reference to a status code that will be updated to indicate whether the asynchronous inference request was successfully initiated or encountered an error.

Returns: A future that can be used to retrieve the inference result.

◆ inferAsyncCHW() [1/2]

Future< float > mobilint::Model::inferAsyncCHW	(	const std::vector< NDArray< float > > &	input,
		StatusCode &	sc )

Initiates asynchronous inference with input in NCHW (batch N, channels C, height H, width W) or CHW format.

Parameters

[in]	input	A vector of NDArray<float>. Each NDArray must be in NCHW or CHW format.
[out]	sc	A reference to a status code that will be updated to indicate whether the asynchronous inference request was successfully initiated or encountered an error.

Returns: A future that can be used to retrieve the inference result.

◆ inferAsync() [2/2]

Future< int8_t > mobilint::Model::inferAsync	(	const std::vector< NDArray< int8_t > > &	input,
		StatusCode &	sc )

This overload supports int8_t-to-int8_t asynchronous inference.

Parameters

[in]	input	A vector of NDArray<int8_t>. Each NDArray must be in NHWC or HWC format.
[out]	sc	A reference to a status code that will be updated to indicate whether the asynchronous inference request was successfully initiated or encountered an error.

Returns: A future that can be used to retrieve the inference result.

◆ inferAsyncCHW() [2/2]

Future< int8_t > mobilint::Model::inferAsyncCHW	(	const std::vector< NDArray< int8_t > > &	input,
		StatusCode &	sc )

This overload supports int8_t-to-int8_t asynchronous inference.

Parameters

[in]	input	A vector of NDArray<int8_t>. Each NDArray must be in NCHW or CHW format.
[out]	sc	A reference to a status code that will be updated to indicate whether the asynchronous inference request was successfully initiated or encountered an error.

Returns: A future that can be used to retrieve the inference result.

◆ inferAsyncToFloat()

Future< float > mobilint::Model::inferAsyncToFloat	(	const std::vector< NDArray< int8_t > > &	input,
		StatusCode &	sc )

This overload supports int8_t-to-float asynchronous inference.

Parameters

[in]	input	A vector of NDArray<int8_t>. Each NDArray must be in NHWC or HWC format.
[out]	sc	A reference to a status code that will be updated to indicate whether the asynchronous inference request was successfully initiated or encountered an error.

Returns: A future that can be used to retrieve the inference result.

◆ inferAsyncCHWToFloat()

Future< float > mobilint::Model::inferAsyncCHWToFloat	(	const std::vector< NDArray< int8_t > > &	input,
		StatusCode &	sc )

This overload supports int8_t-to-float asynchronous inference.

Parameters

[in]	input	A vector of NDArray<int8_t>. Each NDArray must be in NCHW or CHW format.
[out]	sc	A reference to a status code that will be updated to indicate whether the asynchronous inference request was successfully initiated or encountered an error.

Returns: A future that can be used to retrieve the inference result.

◆ getNumModelVariants()

int mobilint::Model::getNumModelVariants ( ) const

Returns the total number of model variants available in this model.

The variant_idx parameter passed to Model::getModelVariantHandle must be in the range [0, return value of this function).

Returns: The total number of model variants.

◆ getModelVariantHandle()

std::unique_ptr< ModelVariantHandle > mobilint::Model::getModelVariantHandle	(	int	variant_idx,
		StatusCode &	sc ) const

Retrieves a handle to the specified model variant.

Use the returned ModelVariantHandle to query details such as input and output shapes for the selected variant.

Parameters

[in]	variant_idx	Index of the model variant to retrieve. Must be in the range [0, getNumModelVariants()).
[out]	sc	A reference to a StatusCode variable that will be updated to indicate success or failure.

Returns: A unique pointer to the corresponding ModelVariantHandle if successful; otherwise, nullptr.

◆ getModelInputShape()

const std::vector< std::vector< int64_t > > & mobilint::Model::getModelInputShape ( ) const

Returns the input shape of the model.

Returns: A reference to the input shape of the model.

◆ getModelOutputShape()

const std::vector< std::vector< int64_t > > & mobilint::Model::getModelOutputShape ( ) const

Returns the output shape of the model.

Returns: A reference to the output shape of the model.

◆ getInputBufferInfo()

const std::vector< BufferInfo > & mobilint::Model::getInputBufferInfo ( ) const

Returns the input buffer information for the model.

Returns: A reference to a vector of input buffer information.

◆ getOutputBufferInfo()

const std::vector< BufferInfo > & mobilint::Model::getOutputBufferInfo ( ) const

Returns the output buffer information of the model.

Returns: A reference to a vector of output buffer information.

◆ getInputScale()

std::vector< Scale > mobilint::Model::getInputScale ( ) const

Returns the input quantization scale(s) of the model.

Returns: A vector of input scales.

◆ getOutputScale()

std::vector< Scale > mobilint::Model::getOutputScale ( ) const

Returns the output quantization scale(s) of the model.

Returns: A vector of output scales.

◆ getIdentifier()

uint32_t mobilint::Model::getIdentifier ( ) const

Returns the model's unique identifier.

This identifier distinguishes multiple models within a single user program. It is assigned incrementally, starting from 0 (e.g., 0, 1, 2, 3, ...).

Returns: The model identifier.

◆ getModelPath()

std::string mobilint::Model::getModelPath ( ) const

Returns the path to the MXQ model file associated with the Model.

Returns: The MXQ file path.

◆ getCacheInfos()

std::vector< CacheInfo > mobilint::Model::getCacheInfos ( ) const

Returns informations of KV-cache of the model.

Returns: A vector of CacheInfo objects.

◆ resetCacheMemory()

void mobilint::Model::resetCacheMemory ( )

Resets the KV cache memory.

Clears the stored KV cache, restoring it to its initial state.

◆ dumpCacheMemory() [1/3]

StatusCode mobilint::Model::dumpCacheMemory ( std::vector< std::vector< int8_t > > & bufs )

Dumps the KV cache memory into buffers.

Writes the current KV cache data into provided buffers.

Parameters

[out] bufs A reference to vectors of byte vectors that will store the KV cache data.

Returns: A status code indicating whether the dump operation was successful or if an error occurred.

◆ dumpCacheMemory() [2/3]

std::vector< std::vector< int8_t > > mobilint::Model::dumpCacheMemory ( StatusCode & sc )

Dumps the KV cache memory into buffers.

Writes the KV cache data into buffers and returns them.

Parameters

[out] sc A reference to a status code that will be updated to indicate whether the dump operation was successful or if an error occurred.

Returns: A vector of byte vectors containing the KV cache data.

◆ dumpCacheMemory() [3/3]

StatusCode mobilint::Model::dumpCacheMemory ( const std::string & cache_dir )

Dumps KV cache memory to files in the specified directory.

Writes the KV cache data to binary files within the given directory. Each file is named using the format: cache_<layer_hash>.bin.

Parameters

[in] cache_dir Path to the directory where KV cache files will be saved.

Returns: A status code indicating whether the dump operation was successful or if an error occurred.

◆ loadCacheMemory() [1/2]

StatusCode mobilint::Model::loadCacheMemory ( const std::vector< std::vector< int8_t > > & bufs )

Loads the KV cache memory from buffers.

Restores the KV cache from the provided buffers.

Parameters

[in] bufs A reference to a vector of byte vectors containing the KV cache data.

Returns: A status code indicating whether the load operation was successful or if an error occurred.

◆ loadCacheMemory() [2/2]

StatusCode mobilint::Model::loadCacheMemory ( const std::string & cache_dir )

Loads the KV cache memory from files in the specified directory.

Reads KV cache data from files within the given directory and restores them. Each file is named using the format: cache_<layer_hash>.bin.

Parameters

[in] cache_dir Path to the directory where KV cache files are saved.

Returns: A status code indicating whether the load operation was successful or if an error occurred.

◆ filterCacheTail()

int mobilint::Model::filterCacheTail	(	int	cache_size,
		int	tail_size,
		const std::vector< bool > &	mask,
		StatusCode &	sc )

Filter the tail of the KV cache memory.

Retains the desired caches in the tail of the KV cache memory, excludes the others, and shifts the remaining caches forward.

Parameters

[in]	cache_size	The number of tokens accumulated in the KV cache so far.
[in]	tail_size	The tail size of the KV cache to filter (<=32).
[in]	mask	A mask indicating tokens to retain or exclude at the tail of the KV cache.
[out]	sc	A status code indicating the outcome of the tail filtering.

Returns: New cache size after tail filtering.

◆ moveCacheTail()

int mobilint::Model::moveCacheTail	(	int	num_head,
		int	num_tail,
		int	cache_size,
		StatusCode &	sc )

Moves the tail of the KV cache memory to the end of the head.

Slice the tail of the KV cache memory up to the specified size and moves it to the designated cache position.

Parameters

[in]	num_head	The size of the KV cache head where the tail is appended.
[in]	num_tail	The size of the KV cache tail to be moved.
[in]	cache_size	The total number of tokens accumulated in the KV cache so far.
[out]	sc	A status code indicating the result of the tail move.

Returns: The updated cache size after moving the tail.

◆ infer() [11/12]

StatusCode mobilint::Model::infer	(	const std::vector< float * > &	input,
		std::vector< std::vector< float > > &	output,
		int	batch_size )

Deprecated: Use infer(input, output, shape) instead.

◆ infer() [12/12]

std::vector< std::vector< float > > mobilint::Model::infer	(	const std::vector< float * > &	input,
		int	batch_size,
		StatusCode &	sc )

Deprecated: Use infer(input, shape, sc) instead.

◆ inferHeightBatch()

StatusCode mobilint::Model::inferHeightBatch	(	const std::vector< float * > &	input,
		std::vector< std::vector< float > > &	output,
		int	height_batch_size )

Deprecated: Deprecated

◆ getSchedulePolicy()

SchedulePolicy mobilint::Model::getSchedulePolicy ( ) const

Deprecated

◆ getLatencySetPolicy()

LatencySetPolicy mobilint::Model::getLatencySetPolicy ( ) const

Deprecated

◆ getMaintenancePolicy()

MaintenancePolicy mobilint::Model::getMaintenancePolicy ( ) const

Deprecated

◆ getLatencyConsumed()

uint64_t mobilint::Model::getLatencyConsumed ( const int npu_op_idx ) const

Deprecated

◆ getLatencyFinished()

uint64_t mobilint::Model::getLatencyFinished ( const int npu_op_idx ) const

Deprecated

◆ getStatistics()

std::shared_ptr< Statistics > mobilint::Model::getStatistics ( ) const

Deprecated

◆ Accelerator

friend class Accelerator

friend

Definition at line 1191 of file model.h.

The documentation for this class was generated from the following file:

model.h

Model Class Reference

Model Class Reference#

Public Member Functions

Static Public Member Functions

Friends

Detailed Description

Member Function Documentation

◆ create() [1/2]

◆ create() [2/2]

◆ launch()

◆ dispose()

◆ getCoreMode()

◆ isTarget()

◆ getTargetCores()

◆ infer() [1/12]

◆ infer() [2/12]

◆ infer() [3/12]

◆ infer() [4/12]

◆ infer() [5/12]

◆ infer() [6/12]

◆ infer() [7/12]

◆ infer() [8/12]

◆ infer() [9/12]

◆ infer() [10/12]

◆ inferCHW() [1/10]

◆ inferCHW() [2/10]

◆ inferCHW() [3/10]

◆ inferCHW() [4/10]

◆ inferCHW() [5/10]

◆ inferCHW() [6/10]

◆ inferCHW() [7/10]

◆ inferCHW() [8/10]

◆ inferCHW() [9/10]

◆ inferCHW() [10/10]

◆ inferSpeedrun()

◆ inferAsync() [1/2]

◆ inferAsyncCHW() [1/2]

◆ inferAsync() [2/2]

◆ inferAsyncCHW() [2/2]

◆ inferAsyncToFloat()

◆ inferAsyncCHWToFloat()

◆ getNumModelVariants()

◆ getModelVariantHandle()

◆ getModelInputShape()

◆ getModelOutputShape()

◆ getInputBufferInfo()

◆ getOutputBufferInfo()

◆ getInputScale()

◆ getOutputScale()

◆ getIdentifier()

◆ getModelPath()

◆ getCacheInfos()

◆ resetCacheMemory()

◆ dumpCacheMemory() [1/3]

◆ dumpCacheMemory() [2/3]

◆ dumpCacheMemory() [3/3]

◆ loadCacheMemory() [1/2]

◆ loadCacheMemory() [2/2]

◆ filterCacheTail()

◆ moveCacheTail()

◆ infer() [11/12]

◆ infer() [12/12]

◆ inferHeightBatch()

◆ getSchedulePolicy()

◆ getLatencySetPolicy()

◆ getMaintenancePolicy()

◆ getLatencyConsumed()

◆ getLatencyFinished()

◆ getStatistics()

Friends And Related Symbol Documentation

◆ Accelerator