maccel.model.Model Class Reference

maccel.model.Model Class Reference#

Runtime Library: maccel.model.Model Class Reference
Runtime Library v0.30
Mobilint SDK qb

Represents an AI model loaded from an MXQ file. More...

Public Member Functions

 __init__ (self, str path, Optional[ModelConfig] model_config=None)
 Creates a Model object from the specified MXQ model file and configuration.
None launch (self, Accelerator acc)
 Launches the model on the specified Accelerator, which represents the actual NPU.
None dispose (self)
 Disposes of the model loaded onto the NPU.
bool is_target (self, CoreId core_id)
 Checks if the NPU core specified by CoreId is the target of the model.
CoreMode get_core_mode (self)
 Retrieves the core mode of the model.
List[CoreIdget_target_cores (self)
 Returns the NPU cores the model is configured to use.
List[CoreIdtarget_cores (self)
Optional[List[np.ndarray]] infer (self, Union[np.ndarray, List[np.ndarray]] inputs, Optional[List[np.ndarray]] outputs=None, int cache_size=0)
 Performs inference.
List[np.ndarray] infer_to_float (self, Union[np.ndarray, List[np.ndarray],] inputs, int cache_size=0)
 int8_t-to-float inference Performs inference with input and output elements of type int8_t
None infer_buffer (self, List[Buffer] inputs, List[Buffer] outputs, List[List[int]] shape=[], int cache_size=0)
 Buffer-to-Buffer inference.
None infer_speedrun (self)
 Development-only API for measuring pure NPU inference speed.
Future infer_async (self, Union[np.ndarray, List[np.ndarray]] inputs)
 Asynchronous Inference.
Future infer_async_to_float (self, Union[np.ndarray, List[np.ndarray]] inputs)
 This method supports int8_t-to-float asynchronous inference.
None reposition_inputs (self, List[np.ndarray] inputs, List[Buffer] input_bufs, List[List[int]] seqlens=[])
 Reposition input.
None reposition_outputs (self, List[Buffer] output_bufs, List[np.ndarray] outputs, List[List[int]] seqlens=[])
 Reposition output.
int get_num_model_variants (self)
 Returns the total number of model variants available in this model.
ModelVariantHandle get_model_variant_handle (self, variant_idx)
 Retrieves a handle to the specified model variant.
List[_Shape] get_model_input_shape (self)
 Returns the input shape of the model.
List[_Shape] get_model_output_shape (self)
 Returns the output shape of the model.
List[Scaleget_input_scale (self)
 Returns the input quantization scale(s) of the model.
List[Scaleget_output_scale (self)
 Returns the output quantization scale(s) of the model.
List[BufferInfoget_input_buffer_info (self)
 Returns the input buffer information for the model.
List[BufferInfoget_output_buffer_info (self)
 Returns the output buffer information of the model.
List[Bufferacquire_input_buffer (self, List[List[int]] seqlens=[])
 Buffer Management API.
List[Bufferacquire_output_buffer (self, List[List[int]] seqlens=[])
 Buffer Management API.
None release_buffer (self, List[Buffer] buffer)
 Buffer Management API.
int get_identifier (self)
 Returns the model's unique identifier.
str get_model_path (self)
 Returns the path to the MXQ model file associated with the Model.
List[CacheInfoget_cache_infos (self)
 Returns informations of KV-cache of the model.
SchedulePolicy get_schedule_policy (self)
LatencySetPolicy get_latency_set_policy (self)
MaintenancePolicy get_maintenance_policy (self)
int get_latency_consumed (self)
int get_latency_finished (self)
None reset_cache_memory (self)
 Resets the KV cache memory.
List[bytes] dump_cache_memory (self)
 Dumps the KV cache memory into buffers.
None load_cache_memory (self, List[bytes] bufs)
 Loads the KV cache memory from buffers.
None dump_cache_memory_to (self, str cache_dir)
 Dumps KV cache memory to files in the specified directory.
None load_cache_memory_from (self, str cache_dir)
 Loads the KV cache memory from files in the specified directory.
int filter_cache_tail (self, int cache_size, int tail_size, List[bool] mask)
 Filter the tail of the KV cache memory.
int move_cache_tail (self, int num_head, int num_tail, int cache_size)
 Moves the tail of the KV cache memory to the end of the head.

Static Public Attributes

Optional[List[np.ndarray]] infer_chw = infer
List[np.ndarray] infer_chw_to_float = infer_to_float

Protected Attributes

 _model = _cMaccel.Model(path)
List[_Shape] _input_shape = self.get_model_input_shape()
List[_Shape] _output_shape = self.get_model_output_shape()
 _acc = acc

Detailed Description

Represents an AI model loaded from an MXQ file.

This class loads an AI model from an MXQ file and provides functions to launch it on the NPU and perform inference.

Definition at line 110 of file model.py.

Constructor & Destructor Documentation

◆ __init__()

maccel.model.Model.__init__ ( self,
str path,
Optional[ModelConfig] model_config = None )

Creates a Model object from the specified MXQ model file and configuration.

Parses the MXQ file and constructs a Model object using the provided configuration, initializing the model with the given settings.

Note
The created Model object must be launched before performing inference. See Model.launch for more details.
Parameters
[in]pathThe path to the MXQ model file.
[in]model_configThe configuration settings to initialize the Model.

Definition at line 118 of file model.py.

Member Function Documentation

◆ launch()

None maccel.model.Model.launch ( self,
Accelerator acc )

Launches the model on the specified Accelerator, which represents the actual NPU.

Parameters
[in]accThe accelerator on which to launch the model.

Definition at line 141 of file model.py.

◆ dispose()

None maccel.model.Model.dispose ( self)

Disposes of the model loaded onto the NPU.

Releases any resources associated with the model on the NPU.

Definition at line 151 of file model.py.

◆ is_target()

bool maccel.model.Model.is_target ( self,
CoreId core_id )

Checks if the NPU core specified by CoreId is the target of the model.

In other words, whether the model is configured to use the given NPU core.

Parameters
[in]core_idThe CoreId to check.
Returns
True if the model is configured to use the specified CoreId, false otherwise.

Definition at line 160 of file model.py.

◆ get_core_mode()

CoreMode maccel.model.Model.get_core_mode ( self)

Retrieves the core mode of the model.

Returns
The CoreMode of the model.

Definition at line 171 of file model.py.

◆ get_target_cores()

List[CoreId] maccel.model.Model.get_target_cores ( self)

Returns the NPU cores the model is configured to use.

Returns
A list of CoreIds representing the target NPU cores.

Definition at line 179 of file model.py.

◆ target_cores()

List[CoreId] maccel.model.Model.target_cores ( self)
Deprecated

Definition at line 188 of file model.py.

◆ infer()

Optional[List[np.ndarray]] maccel.model.Model.infer ( self,
Union[np.ndarray, List[np.ndarray]] inputs,
Optional[List[np.ndarray]] outputs = None,
int cache_size = 0 )

Performs inference.

Fowllowing types of inference supported.

  1. infer(in:List[numpy]) -> List[numpy] (float / int)
  2. infer(in:numpy) -> List[numpy] (float / int)
  3. infer(in:List[numpy], out:List[numpy]) (float / int)
  4. infer(in:List[numpy], out:List[]) (float / int)
  5. infer(in:numpy, out:List[numpy]) (float / int)
  6. infer(in:numpy, out:List[]) (float / int)
Parameters
[in]inputsInput data as a single numpy.ndarray or a list of numpy.ndarray's.
[out]outputsOptional pre-allocated list of numpy.ndarray's to store inference results.
Returns
Inference results as a list of numpy.ndarray.

Definition at line 192 of file model.py.

◆ infer_to_float()

List[np.ndarray] maccel.model.Model.infer_to_float ( self,
Union[ np.ndarray, List[np.ndarray], ] inputs,
int cache_size = 0 )

int8_t-to-float inference Performs inference with input and output elements of type int8_t

Using these inference APIs requires manual scaling (quantization) of float values to int8_t for input.

Note
These APIs are intended for advanced use rather than typical usage.

Definition at line 248 of file model.py.

◆ infer_buffer()

None maccel.model.Model.infer_buffer ( self,
List[Buffer] inputs,
List[Buffer] outputs,
List[List[int]] shape = [],
int cache_size = 0 )

Buffer-to-Buffer inference.

Performs inference using input and output elements in the NPU’s internal data type. The inference operates on buffers allocated via the following APIs:

Additionally, Model.reposition_inputs(), Model.reposition_outputs(), ModelVariantHandle.reposition_inputs(), ModelVariantHandle.reposition_outputs() must be used properly.

Note
These APIs are intended for advanced use rather than typical usage.

Definition at line 282 of file model.py.

◆ infer_speedrun()

None maccel.model.Model.infer_speedrun ( self)

Development-only API for measuring pure NPU inference speed.

Runs NPU inference without uploading inputs and without retrieving outputs.

Definition at line 310 of file model.py.

◆ infer_async()

Future maccel.model.Model.infer_async ( self,
Union[np.ndarray, List[np.ndarray]] inputs )

Asynchronous Inference.

Performs inference asynchronously.

To use asynchronous inference, the model must be created using a ModelConfig object with the async pipeline configured to be enabled. This is done by calling ModelConfig.set_async_pipeline_enabled(True) before passing the configuration to Model().

Example:

import maccel
mc.set_async_pipeline_enabled(True)
model = maccel.Model(MXQ_PATH, mc)
model.launch(acc)
future = model.infer_async(inputs)
ret = future.get()
Represents an accelerator, i.e., an NPU, used for executing models.
Represents an AI model loaded from an MXQ file.
Definition model.py:110
Configures a core mode and core allocation of a model for NPU inference.
Definition type.py:474
Note
Currently, only CNN-based models are supported, as asynchronous execution is particularly effective for this type of workload.
Limitations:
  • RNN/LSTM and LLM models are not supported yet.
  • Models requiring CPU offloading are not supported yet.
  • Currently, only single-batch inference is supported (i.e., N = 1).
  • Currently, Buffer inference is not supported. The following types are supported in the synchronous API for advanced use cases, but are not yet available for asynchronous inference:

Definition at line 318 of file model.py.

◆ infer_async_to_float()

Future maccel.model.Model.infer_async_to_float ( self,
Union[np.ndarray, List[np.ndarray]] inputs )

This method supports int8_t-to-float asynchronous inference.

Parameters
[in]inputsInput data as a single numpy.ndarray or a list of numpy.ndarray's.
Returns
A future that can be used to retrieve the inference result.

Definition at line 372 of file model.py.

◆ reposition_inputs()

None maccel.model.Model.reposition_inputs ( self,
List[np.ndarray] inputs,
List[Buffer] input_bufs,
List[List[int]] seqlens = [] )

Reposition input.

Definition at line 395 of file model.py.

◆ reposition_outputs()

None maccel.model.Model.reposition_outputs ( self,
List[Buffer] output_bufs,
List[np.ndarray] outputs,
List[List[int]] seqlens = [] )

Reposition output.

Definition at line 407 of file model.py.

◆ get_num_model_variants()

int maccel.model.Model.get_num_model_variants ( self)

Returns the total number of model variants available in this model.

The variant_idx parameter passed to Model.get_model_variant_handle() must be in the range [0, return value of this function).

Returns
The total number of model variants.

Definition at line 425 of file model.py.

◆ get_model_variant_handle()

ModelVariantHandle maccel.model.Model.get_model_variant_handle ( self,
variant_idx )

Retrieves a handle to the specified model variant.

Use the returned ModelVariantHandle to query details such as input and output shapes for the selected variant.

Parameters
[in]variant_idxIndex of the model variant to retrieve. Must be in the range [0, getNumModelVariants()).
Returns
A ModelVariantHandle object if successful; otherwise, raise maccel.MAccelError "Model_InvalidVariantIdx".

Definition at line 436 of file model.py.

◆ get_model_input_shape()

List[_Shape] maccel.model.Model.get_model_input_shape ( self)

Returns the input shape of the model.

Returns
A list of input shape of the model.

Definition at line 453 of file model.py.

◆ get_model_output_shape()

List[_Shape] maccel.model.Model.get_model_output_shape ( self)

Returns the output shape of the model.

Returns
A list of output shape of the model.

Definition at line 461 of file model.py.

◆ get_input_scale()

List[Scale] maccel.model.Model.get_input_scale ( self)

Returns the input quantization scale(s) of the model.

Returns
A list of input scales.

Definition at line 469 of file model.py.

◆ get_output_scale()

List[Scale] maccel.model.Model.get_output_scale ( self)

Returns the output quantization scale(s) of the model.

Returns
A list of output scales.

Definition at line 477 of file model.py.

◆ get_input_buffer_info()

List[BufferInfo] maccel.model.Model.get_input_buffer_info ( self)

Returns the input buffer information for the model.

Returns
A list of input buffer information.

Definition at line 485 of file model.py.

◆ get_output_buffer_info()

List[BufferInfo] maccel.model.Model.get_output_buffer_info ( self)

Returns the output buffer information of the model.

Returns
A list of output buffer information.

Definition at line 493 of file model.py.

◆ acquire_input_buffer()

List[Buffer] maccel.model.Model.acquire_input_buffer ( self,
List[List[int]] seqlens = [] )

Buffer Management API.

Acquires list of Buffer for input. These API is required when calling Model.infer_buffer().

Note
These APIs are intended for advanced use rather than typical usage.

Definition at line 501 of file model.py.

◆ acquire_output_buffer()

List[Buffer] maccel.model.Model.acquire_output_buffer ( self,
List[List[int]] seqlens = [] )

Buffer Management API.

Acquires list of Buffer for output. These API is required when calling Model.infer_buffer().

Note
These APIs are intended for advanced use rather than typical usage.

Definition at line 512 of file model.py.

◆ release_buffer()

None maccel.model.Model.release_buffer ( self,
List[Buffer] buffer )

Buffer Management API.

Deallocate acquired Input/Output buffer

Note
These APIs are intended for advanced use rather than typical usage.

Definition at line 523 of file model.py.

◆ get_identifier()

int maccel.model.Model.get_identifier ( self)

Returns the model's unique identifier.

This identifier distinguishes multiple models within a single user program. It is assigned incrementally, starting from 0 (e.g., 0, 1, 2, 3, ...).

Returns
The model identifier.

Definition at line 533 of file model.py.

◆ get_model_path()

str maccel.model.Model.get_model_path ( self)

Returns the path to the MXQ model file associated with the Model.

Returns
The MXQ file path.

Definition at line 544 of file model.py.

◆ get_cache_infos()

List[CacheInfo] maccel.model.Model.get_cache_infos ( self)

Returns informations of KV-cache of the model.

Returns
A list of CacheInfo objects.

Definition at line 552 of file model.py.

◆ get_schedule_policy()

SchedulePolicy maccel.model.Model.get_schedule_policy ( self)
Deprecated

Definition at line 560 of file model.py.

◆ get_latency_set_policy()

LatencySetPolicy maccel.model.Model.get_latency_set_policy ( self)
Deprecated

Definition at line 564 of file model.py.

◆ get_maintenance_policy()

MaintenancePolicy maccel.model.Model.get_maintenance_policy ( self)
Deprecated

Definition at line 568 of file model.py.

◆ get_latency_consumed()

int maccel.model.Model.get_latency_consumed ( self)
Deprecated

Definition at line 572 of file model.py.

◆ get_latency_finished()

int maccel.model.Model.get_latency_finished ( self)
Deprecated

Definition at line 576 of file model.py.

◆ reset_cache_memory()

None maccel.model.Model.reset_cache_memory ( self)

Resets the KV cache memory.

Clears the stored KV cache, restoring it to its initial state.

Definition at line 580 of file model.py.

◆ dump_cache_memory()

List[bytes] maccel.model.Model.dump_cache_memory ( self)

Dumps the KV cache memory into buffers.

Writes the current KV cache data into provided buffers.

Returns
A list of bytes containing the KV cache data.

Definition at line 588 of file model.py.

◆ load_cache_memory()

None maccel.model.Model.load_cache_memory ( self,
List[bytes] bufs )

Loads the KV cache memory from buffers.

Restores the KV cache from the provided buffers.

Parameters
[in]bufsA list of bytes containing the KV cache

Definition at line 599 of file model.py.

◆ dump_cache_memory_to()

None maccel.model.Model.dump_cache_memory_to ( self,
str cache_dir )

Dumps KV cache memory to files in the specified directory.

Writes the KV cache data to binary files within the given directory. Each file is named using the format: cache_<layer_hash>.bin.

Parameters
[in]cache_dirPath to the directory where KV cache files will be saved.

Definition at line 611 of file model.py.

◆ load_cache_memory_from()

None maccel.model.Model.load_cache_memory_from ( self,
str cache_dir )

Loads the KV cache memory from files in the specified directory.

Reads KV cache data from files within the given directory and restores them. Each file is named using the format: cache_<layer_hash>.bin.

Parameters
[in]cache_dirPath to the directory where KV cache files are saved.

Definition at line 622 of file model.py.

◆ filter_cache_tail()

int maccel.model.Model.filter_cache_tail ( self,
int cache_size,
int tail_size,
List[bool] mask )

Filter the tail of the KV cache memory.

Retains the desired caches in the tail of the KV cache memory, excludes the others, and shifts the remaining caches forward.

Parameters
[in]cache_sizeThe number of tokens accumulated in the KV cache so far.
[in]tail_sizeThe tail size of the KV cache to filter (<=32).
[in]maskA mask indicating tokens to retain or exclude at the tail of the KV cache.
Returns
New cache size after tail filtering.

Definition at line 633 of file model.py.

◆ move_cache_tail()

int maccel.model.Model.move_cache_tail ( self,
int num_head,
int num_tail,
int cache_size )

Moves the tail of the KV cache memory to the end of the head.

Slice the tail of the KV cache memory up to the specified size and moves it to the designated cache position.

Parameters
[in]num_headThe size of the KV cache head where the tail is appended.
[in]num_tailThe size of the KV cache tail to be moved.
[in]cache_sizeThe total number of tokens accumulated in the KV cache so far.
Returns
The updated cache size after moving the tail.

Definition at line 651 of file model.py.

Member Data Documentation

◆ infer_chw

Optional[List[np.ndarray]] maccel.model.Model.infer_chw = infer
static

Definition at line 279 of file model.py.

◆ infer_chw_to_float

List[np.ndarray] maccel.model.Model.infer_chw_to_float = infer_to_float
static

Definition at line 280 of file model.py.

◆ _model

maccel.model.Model._model = _cMaccel.Model(path)
protected

Definition at line 132 of file model.py.

◆ _input_shape

List[_Shape] maccel.model.Model._input_shape = self.get_model_input_shape()
protected

Definition at line 138 of file model.py.

◆ _output_shape

List[_Shape] maccel.model.Model._output_shape = self.get_model_output_shape()
protected

Definition at line 139 of file model.py.

◆ _acc

maccel.model.Model._acc = acc
protected

Definition at line 149 of file model.py.


The documentation for this class was generated from the following file: