Changelog

Changelog#

[v0.30.1] - 2025-10-20#

Added#

CI: build & ctest check added (!131)
Check MXQ core consistency (!150)
Add @note comments for inferAsync family (!154)
Show supported cores in error message when ModelConfig mismatches MXQ file (!153)

Fixed#

Fix issue reading files over 2GB on Windows (!148)
npu_watchdog: Fix inference error in activation buffer model with Global-Core (!152)

Changed#

Supports Windows driver rev#1 : S/G DMA, core claim/unclaim (!166)

Removed#

Remove the Trace API from acc (!164)

[v0.30.0] - 2025-08-29#

Added#

Supports int8 asynchronous inference (!120)
Add getModelSummary function (!125)
Send model’s memory usage to aries2-driver at Model::launch (!145)
Add Python comments for doxygen generation (!144)
Support BiRNN, BiLSTM (!143)

Fixed#

Fix compile error by std::packaged_task in Windows (!129)
Fix out of index error of async API unittests (!129)
Windows: Restrict each NPU core to load only one model at a time (!130)
Fix Already-launched Model error handling (!140)
Fix a bug where the variable-length model only accepted a fixed shape (!143)

Changed#

Update README.md (!127)
Change Python API error message (!147)
Add numpy dependency to install maccel whl file (!138)

[v0.29.0] - 2025-07-25#

Added#

npu_task_scheduler: Add NPUTaskScheduler (!101)
type: Add CacheInfo (!119)
Support multi input shape model (!118)

Fixed#

Fix testinfer.cc help message (!111)
model_impl: Fix the logic calculating 2nd activation space when a model has multi-bundle (!114)
acc_impl: Block to upload IMEM_INIT/IMEM_INITSTART at Model::launch for Regulus (!117)
Fix compiler error caused by designated initializer in Ubuntu18.04 (!121)

Changed#

Enable additional compiler warnings and fix triggered warnings (!108)
model: arguments of dumpCacheMemory and loadCacheMemory change from file path to directory path (!119)
model: arguments of dumpCacheMemory and loadCacheMemory change from a buffer to buffer vector (!119)

[v0.28.0] - 2025-06-25#

Added#

Support model using framebuffer logic (!109)
Support model using command queue inference (!109)

Fixed#

Fix bugs where models with global-core failed to run when EXPERIMENTAL_READ_ALL_OUTPUTS_AT_ONCE=ON (!107)

Changed#

model_impl: Use CacheRearrange IMEM in Model::resetCacheMemory (!100)

[v0.27.0] - 2025-06-17#

Added#

Support Cache Rearrange (Tail-Move, Tail-Filter) by Supplementary Infer (!84)
Add async API python wrapper (!87)
Support cmake --install (!93)
Add header installation for cmake --install (!94)
Add INSTALL_MACCEL cmake option (!99)
Add a post-build event to resnet50.sln to copy a DLL file (!98)

Fixed#

Fix bugs related to SequenceLength (!95)

Changed#

Use prettified argv[0] in testinfer help message (!91)

Removed#

Remove exception handling for dev_no in sleep device (!90)

[v0.26.0] - 2025-05-20#

Added#

Windows: Support building with custom OpenBLAS path (!80)
Implement Async API for C++ (!42)
Support multi-batch inference of CPU-Offload models (!78)
Support MXQv5 and models with multiple sequence lengths (!86)

Fixed#

Resolve build error on Ubuntu 18.04 caused by _mm256_set_m128i (!73)
Fix markdown syntax errors and content errors for doxygen documentation (!77)
Fix CMake error of add_dependencies during cross-compilation (!83)
Fixed error in infer<float*> and infer<int8_t*> about reversed input shapes (!85)

Changed#

Improve the Python wheel build process - remove unnecessary logs and ensure rebuilding whl every time (!71)
Refactor unittests for the Python API (!72)
Update CMAKE_CXX_STANDARD from 14 to 17 (!75)
reposition_test: Refactor all HWC<->CHW transpose code into unified functions (!79)
CMakeLists.txt: Change the default value of MACCEL_CPU_OFFLOAD to bonfire (!80)
Improve the execution time of the Model::create (!81)
Rename Ticket to Future for consistency with common async terminology (!82)

Removed#

Remove lib_info.py and resnet50_accuracy.py (!72)

[v0.25.0] - 2025-04-03#

Added#

Support FP16 & FP32 NPU Data Type (!38)
testinfer: Support a help message to describe arguments and how to use (!41)
Support Global4 and Global8 core modes (!40)
pymaccel: Add set_global4_core_mode, set_global8_core_mode, and set_multi_core_mode to python API (!51)
Support LLM Model (!48)
pymaccel: Add max_height, max_width, max_channel, and max_cache_size in BufferInfo (!58)
Support Multi core mode (!56)
Add Python unittests to verify the Python API (!65)
Apply doxygen to maccel (!66)

Fixed#

Windows: Fix updating temperature to new union sturcture (!43)
Windows: Release pMem after updating memory consumption (!43)
tensor_utils: Fix moveToNDArray to execute copy if libtorch is used for CPU offload (!45)
Windows: Distinguish the target architecture in .whl file between MinGW Python and native Windows Python (!44)
Windows: Fix build errors in the English version of Windows by removing Korean comments from build_wheel.bat and build_release.bat (!49)
interleaving: Fix a bug where the return value of interleave was calculated larger than intended (!46)
acc_impl: Fix a bug where IMEM is overwritten by IMEM_CACHE_REARRANGE in Model::launch (!50)
Fix NDArrayData to have thead-safe refcount by using std::shared_ptr (!52)
simd_x86_64_transpose: Add missing defined(__GNUC__) && to prevent compilation in MSVC (!55)
npu_op_desc_test: Fix the failure of NPUOpDescTest.getRmemAddresses in Aries1 (!57)
pymaccel: Fix get_latency_set_policy being overlapped by get_latency_consumed (!58)
testinfer: Fix set_global8_core_mode being overlapped by set_global4_core_mode in Python (!58)
Fix the logic for calculating DDR memory usage to avoid referencing hwdep and to support multi-core mode (!62)
Windows: Fix to pass the correct Aries1 DDR usage to Windows monitoring tools (!63)
Fix Python APIs - add __repr__, is None, reposition_outputs, np.ascontiguousarray, and more (!67)
test_model: Fix test_checkInferConsistency to correct comparison (!67)

Changed#

Windows: power monitor change to power/voltage/current (!47)
driver: Remove the 4KB margin in allocHostMemory due to an interleaving bug (!46)
pymaccel: Change the notation of some methods in ModelConfig from camelCase to snake_case (!51)
Windows: Update Windows codes to match Windows driver v1.6 (!61)
ndarray: Attach noexcept to move constructor/assignment of NDArrayData (!64)
Make ModelConfig API more user-friendly - Add setSingleCoreMode (!59)
Revise the Python API wrapper of maccel (!65)
Improve docs - Markdown formatting, Python ModelConfig example in advanced_usage, and more (!68)

Removed#

model_impl: Remove redundant trace events in inferBufferOutput and inferSpeedrun (!37)
Remove packed logic (!43)

[v0.24.0] - 2024-02-12#

Added#

Windows: Tracks memory usage of Aries and sends it to the Windows monitoring tool (!39)

Fixed#

Modify the naming of .whl files to properly reflect the target platform, such as Windows or Linux (!36)

[v0.23.0] - 2024-01-24#

Added#

type.h: Add option in ModelConfig to force a single NPU bundle execution (!29)
Support FP16, FP32 Output NPU Buffer (!34)

Fixed#

type.h: Support Global core mode (!28)

[v0.22.0] - 2024-01-23#

Added#

Support Aries2 on maccel(!26)
Windows: Add aries performance monitoring (!30)

Fixed#

Fix build error on Windows by using findPythonInterp in MSVC (!31)

Changed#

Windows: Send 8->1->0 in postInfer on Windows (!32)

[v0.21.0] - 2025-01-17#

Added#

init.cc: Add initializer to set default LogLevel according to MACCEL_LOG_LEVEL (!9)
model: Add shape parameter to std::vector<T*> infer api (!17)
Support RNN/LSTM models with fixed sequence length inputs (!17)
testinfer: Add seq-sizes argument for variable length inputs (!17)
CMakeLists.txt: Automatically generate libmaccel.so* symbolic links based on the SOVERSION and VERSION extracted from the Git tag (!19)
Windows: Support Aries1/2 on Windows (!24)

Fixed#

reposition: fix logic of need_repos when shape is kept but reposition occurs (!11)
Modify the naming of .so and .whl files to appropriately reflect the target architecture (!22)
Fix whl file name error by adding linux_ prefix (!25)

Changed#

Support multi-model on single NPU core for Regulus (!6)
acc: Revise Accelerator::getCoreList to retrieve all available cores from driver (!10)
regulus: Fix regulus timeout unit from nsec to mesc (!15)
CMakeLists.txt: Add a cmake flag MACCEL_GLIBCXX_DEBUG to apply compile option D_GLIBCXX_DEBUG (!8)
npu_watchdog: dumpDDRBin to dump DDR for each bundle & sequence_index (!17)
CMakeLists.txt: Remove Warning from FindPython (!23)

[v0.20.0] - 2024-11-15#

Added#

op_desc.cc: Add support for additional CPU offload operations (!3)
type: Add CoreAllocationPolicy to support automatic NPU core allocation (!10)

Fixed#

CMakeLists.txt: Consider CMAKE_BUILD_TYPE as Release, when it is not specified (!7)

Changed#

op_desc.cc: Update implementations for some of CPU offload operations (!3)
CMakeLists.txt: CMAKE_BUILD_TYPE=Release limits LogLevel to INFO (!7)

[v0.19.0] - 2024-09-12#

Added#

simd_x86_64_scale: Implement SIMD(AVX2, SSE2)-based scale functions in x86-64 (#432, #434)
simd_x86_64_transpose: Implement SIMD(AVX2, SSE2)-based transpose functions in x86-64 (#434)
transpose: Implement transpose functions for inferCHW (#434)
simd_aarch64_scale: Implement SIMD(NEON)-based scale functions in ARM64 (#435)
simd_aarch64_transpose: Implement SIMD(NEON)-based transpose functions in ARM64 (#435)
Add definitions of platform which represents host’s architecture & SIMD (#441)
aries_win: Implement postInfer for Windows (#446)
Support multi-card in Windows (#448, #450)
Support Regulus NPU (#449)
model_impl : Implement runNPUModelTest for use in inferSpeedrun and inferOutputDiff of Thread-Benchmark (#462)
Support MXQv3 (!2)

Fixed#

reposition: Fix calcReposIndicesDefaultBase when original_size != reshaped_size (#437)
reposition_test: Fix vector out of bound in scale_list (#438)
Update SIMD Option for MSVC (#441)
Fix some unittests to support MSVC (#446)
win_ddk: Fix pure function for MSVC (#453)

Changed#

reposition: Use SIMD for float repositions (#432, #434, #435)
PCIeDriver: Exclude RiscV area in Windows heap allocator (#443)
reposition: Use SIMD-based transpose for CHW int8_t data (#442)
npu_watchdog: Unify postInfer in both Windows and Linux (#446)
reposition: MXQ compiled by default use efficient reposition (#457)
reposition: repositionFloat & repositionInt are integrated (#460)

Removed#

reposition: Remove OpenMP (#447)
reposition: Remove default reposition (#457)

[v0.18.0] - 2024-04-30#

Changed#

reposition: Revise need_repos condition to compare size of original and buffer (#431)

[v0.17.0] - 2024-04-12#

Added#

testinfer: Add batch-size, num-cores options (#414)
pymaccel: Add set_log_level, start_tracing_events, stop_tracing_events (#418)
pymaccel: Implement maccel.load() API in Python (#422)
model: Implement acquireInputBuffers, acquireOutputputBuffers, releaseBuffers (#428)
model: Implement new infer API return multi-batch output by reference (#428)
model: Implement new repositionOutputs which use vector<vector<float>>& output (#428)
model: Implement inferBufferToFloat (#428)

Fixed#

npu_watchdog: Fix default time duration overflow (#425, #426)
resnet50_test_cc: Fix wrong comparison & scale and add diff test (#428)
reposition: Fix fillgap logic (#429)

Changed#

aries: Determine to use interrupt mode by ARIES_IOC_GET_SIGNAL_TYPE ioctl rather than driver version (#417)
model_impl: Skip benchmark for re-launched model (#421)
npu_watchdog: Change Default timeout from 1s to 10s (#424)
reposition: Use scale.scale when scale is uniform (#420)

[v0.16.0] - 2024-02-20#

Added#

build_wheel: MSVC provides python wheel (#409)

Fixed#

Fix pymaccel build error from latest setuptools version (#412)
reposition: Fix ch-wise scale bug in inferCHW (#415)
Fix input shape check for batch in Python (#416)

Changed#

aries: Enhance robustness of pread/pwrite usage by apply loop (#413)

Removed#

resnet50_msvc: Remove .TestDrive, PCIeDriverSystem.dll (#409)
VERSION: Remove VERSION (#412)

[v0.15.0] - 2024-01-03#

Added#

testinfer: Implement inferBuffer infer-api
acc_impl: Add lock_guard for AcceleratorImpl
pymaccel: Implement constructor that create and launch at the same time in Python (#391)
model_impl, npu_watchdog: Implement IMEM_INIT, IMEM_INITSTART

Fixed#

memory_pool: Fix not to wait forever when model.dispose() called without releaseBuffer
reposition: Apply __pragma for MSVC (#393)
aries_win: Addtional allocate 4KB host memory to fix interleaving bug (#396)
Fix default log level bug in release build
model_impl: Fix segfault in user_outputs by rollback of resize removal (#401)

Changed#

testinfer: Modularize doMain by implementing processInputs, processOutputs, printSummary
model_desc: Revise model shape to exclude batch-size (#392)
model_desc: checkIfInputShapeMatchAndEnsureBatchDim -> doesInputShapeMatch (#392)
op_desc: Update CPU offload code (#386)
tensor_utils: Remove release when move tensor to NDArray
Change PACKAGE_FILENAME as maccel_${VENDOR}_${PRODUCT}_${VERSION}

[v0.14.0] - 2023-11-03#

Added#

pypymaccel (#369)
acc: Introduce startTracingEvents(), while deprecating Acc::startTracingEvents()
model: Implement infer APIs using NDArray

Fixed#

reposition: Calulcate CHW indices for efficient type
acc_impl: Respect OMP_NUM_THREADS env var (#378)
driver: Check {read,write}MemoryBuffer’s return value (#370)

Changed#

omp: Limit number of threads for OMP parallel block when other OMP blocks are running (#383)
윈도우에서 아래 파일들의 추가 종속성을 없앰 .TestDrive SystemDDK.dll SystemHAL.dll PCIeDriverSystem.dll
Convert maccel.h to all-in-one-style header (#366)
type: Include all cluster/cores by default

[v0.13.0] - 2023-09-13#

Added#

Add copyright notice (#351)
npu_watchdog: Dump DDR.bin for debugging (#348)

Fixed#

reposition: Fill 0 when reposition is ill-fitted
reposition: Fix filling gap bug for channel 1

Changed#

cmake: Add ‘d’ postfix for windows dll debug build
log: Change default log level
model: Implement inferHeightBatch using inferBatch (#349)

[v0.12.0] - 2023-09-01#

Added#

aries: Introduce new env var, which prevents {un,}claiming cores (#338)
Interrupt 지원 (Linux) (#324)
windows_interrupt (#315)
INFO, WARNING, ERROR 로그가 출력됩니다. 추가된 setLogLevel() API로 로그 레벨을 조절할 수 있습니다. (OFF도 가능)
Implement int8->int8 infer API in output-as-parameter fashion

Fixed#

reposition: Fill gap when input channel is 1
reposition: Don’t use efficient reposition if it’s CHW format
Fix Visual Studio build error (#322)
reposition: Make output correct for AVX2
memory_pool: When reset, free all pre-allocated memory

Changed#

reposition: Implement efficient logic for int input/output (#329)
Limit OMP num threads at runtime
Implement RepositionType::EfficientRuntimeInterleave

Removed#

model: Remove some infer APIs

[v0.11.1] - 2023-08-25#

aries: Introduce new env var, which prevents {un,}claiming cores (#339)

[v0.11.0] - 2023-06-21#

Added#

Implement experimental inferHeightBatch() API
model: Implement new infer API which takes output as parameter

Fixed#

Fix inferSpeedrun segfault
reposition: When CHW, use default method

Changed#

reposition: Implement scale using NEON
Refactor reposition functions

[v0.10.0] - 2023-06-14#

Added#

Implement batch input inference API (#290)

Fixed#

Fix EXPERIMENTAL_READ_OUTPUTS_AT_ONCE compile error
Fix inferCHW bug
Mitigate sleep() problem on windows (#283)

Changed#

Do not repos when original channel is the same as pe num (#294)
Allocate input/output buffers at once (#293)
Implement RepositionType::Efficient (#291)
pymaccel check input shape (#292)
pybind11 infer no convert (#289)
Implement MemoryPool and apply to ModelImpl (#288)
Move NPUTaskQueue-releated members to AcceleratorImpl (#286)
reposition: Clean up omp directives
Remove model runner

[v0.9.1] - 2023-05-23#

Changed#

Disable interleaving

[v0.9.0] - 2023-05-19#

Added#

Implement manual packed, multi input inference
Claim cores when model launches (#277)

Changed#

List up more detailed items for profiler (#278)
kibum/factor-out-aries-reset (#276)
Try to apply bonfire for CPU offload (#273)
root일때만 whl 빌드 (#274)

[v0.8.0] - 2023-04-18#

Fixed#

Use input’s memory format for output’s memory format (#270)

Changed#

Support width-wise reshape with 1 or 2 channels, just like 3 channels (#272)
Build a wheel file while makeing
Allow lowercase for DRIVER_TYPE

[0.7.0] - 2023-03-23#

Added#

Initial work on CPU-offloading.

Removed#

Temporarily removed non-float infer APIs

[0.6.0] - 2023-03-23#

Added#

add windows driver interface. (tested.) (#263)
Implement GlobalMode/MultiMode (#262)
core 관련 함수 추가 (#251)
add modelconfig func (#230)
Reset 구현 (#229)
Implement inferSpeedrun (#232)
model: Add infer functions using int8_t (#182)
version 기능 추가 (#173)
model_impl: Implement ChannelFirst (#166)

Changed#

Plus 0.5 then floor to convert float -> int
npu_watchdog: Set reset timeout ratio to 10 (#241)
Enable log for default cmake build
pymaccel infer gil release 추가 (#214)
profile: Reimplement profiler (#210)
Aries 이후 하드웨어를 지원하는 패치 (#185)
model_manager: Always wait until NPU_FINISH == 1

Fixed#

Fix generate export header process
acc 먼저 파괴시 model dispose 호출 (#252)
동작중인 Core에 launch 방지 (#211)
모델 업로드 전 DRAM 영역 초기화 로직 추가 (#206)

Changelog

Contents

Changelog#

[v0.30.1] - 2025-10-20#

Added#

Fixed#

Changed#

Removed#

[v0.30.0] - 2025-08-29#

Added#

Fixed#

Changed#

[v0.29.0] - 2025-07-25#

Added#

Fixed#

Changed#

[v0.28.0] - 2025-06-25#

Added#

Fixed#

Changed#

[v0.27.0] - 2025-06-17#

Added#

Fixed#

Changed#

Removed#

[v0.26.0] - 2025-05-20#

Added#

Fixed#

Changed#

Removed#

[v0.25.0] - 2025-04-03#

Added#

Fixed#

Changed#

Removed#

[v0.24.0] - 2024-02-12#

Added#

Fixed#

[v0.23.0] - 2024-01-24#

Added#

Fixed#

[v0.22.0] - 2024-01-23#

Added#

Fixed#

Changed#

[v0.21.0] - 2025-01-17#

Added#

Fixed#

Changed#

[v0.20.0] - 2024-11-15#

Added#

Fixed#

Changed#

[v0.19.0] - 2024-09-12#

Added#

Fixed#

Changed#

Removed#

[v0.18.0] - 2024-04-30#

Changed#

[v0.17.0] - 2024-04-12#

Added#

Fixed#

Changed#

[v0.16.0] - 2024-02-20#

Added#

Fixed#

Changed#

Removed#

[v0.15.0] - 2024-01-03#

Added#

Fixed#

Changed#

[v0.14.0] - 2023-11-03#

Added#

Fixed#

Changed#

[v0.13.0] - 2023-09-13#

Added#

Fixed#