Changelog#
[v0.30.1] - 2025-10-20#
Added#
CI: build & ctest check added (!131)
Check MXQ core consistency (!150)
Add @note comments for
inferAsyncfamily (!154)Show supported cores in error message when ModelConfig mismatches MXQ file (!153)
Fixed#
Fix issue reading files over 2GB on Windows (!148)
npu_watchdog: Fix inference error in activation buffer model with Global-Core (!152)
Changed#
Supports Windows driver rev#1 : S/G DMA, core claim/unclaim (!166)
Removed#
Remove the Trace API from
acc(!164)
[v0.30.0] - 2025-08-29#
Added#
Supports int8 asynchronous inference (!120)
Add getModelSummary function (!125)
Send model’s memory usage to aries2-driver at Model::launch (!145)
Add Python comments for doxygen generation (!144)
Support BiRNN, BiLSTM (!143)
Fixed#
Fix compile error by
std::packaged_taskin Windows (!129)Fix out of index error of async API unittests (!129)
Windows: Restrict each NPU core to load only one model at a time (!130)
Fix Already-launched
Modelerror handling (!140)Fix a bug where the variable-length model only accepted a fixed shape (!143)
Changed#
Update README.md (!127)
Change Python API error message (!147)
Add numpy dependency to install maccel whl file (!138)
[v0.29.0] - 2025-07-25#
Added#
npu_task_scheduler: Add
NPUTaskScheduler(!101)type: Add
CacheInfo(!119)Support multi input shape model (!118)
Fixed#
Fix testinfer.cc help message (!111)
model_impl: Fix the logic calculating 2nd activation space when a model has multi-bundle (!114)
acc_impl: Block to upload
IMEM_INIT/IMEM_INITSTARTatModel::launchfor Regulus (!117)Fix compiler error caused by designated initializer in Ubuntu18.04 (!121)
Changed#
Enable additional compiler warnings and fix triggered warnings (!108)
model: arguments of
dumpCacheMemoryandloadCacheMemorychange from file path to directory path (!119)model: arguments of
dumpCacheMemoryandloadCacheMemorychange from a buffer to buffer vector (!119)
[v0.28.0] - 2025-06-25#
Added#
Support model using framebuffer logic (!109)
Support model using command queue inference (!109)
Fixed#
Fix bugs where models with global-core failed to run when EXPERIMENTAL_READ_ALL_OUTPUTS_AT_ONCE=ON (!107)
Changed#
model_impl: Use CacheRearrange IMEM in
Model::resetCacheMemory(!100)
[v0.27.0] - 2025-06-17#
Added#
Support Cache Rearrange (Tail-Move, Tail-Filter) by Supplementary Infer (!84)
Add async API python wrapper (!87)
Support
cmake --install(!93)Add header installation for
cmake --install(!94)Add INSTALL_MACCEL cmake option (!99)
Add a post-build event to
resnet50.slnto copy a DLL file (!98)
Fixed#
Fix bugs related to
SequenceLength(!95)
Changed#
Use prettified argv[0] in testinfer help message (!91)
Removed#
Remove exception handling for dev_no in sleep device (!90)
[v0.26.0] - 2025-05-20#
Added#
Windows: Support building with custom OpenBLAS path (!80)
Implement Async API for C++ (!42)
Support multi-batch inference of CPU-Offload models (!78)
Support MXQv5 and models with multiple sequence lengths (!86)
Fixed#
Resolve build error on Ubuntu 18.04 caused by
_mm256_set_m128i(!73)Fix markdown syntax errors and content errors for doxygen documentation (!77)
Fix CMake error of
add_dependenciesduring cross-compilation (!83)Fixed error in infer<float*> and infer<int8_t*> about reversed input shapes (!85)
Changed#
Improve the Python wheel build process - remove unnecessary logs and ensure rebuilding whl every time (!71)
Refactor unittests for the Python API (!72)
Update CMAKE_CXX_STANDARD from 14 to 17 (!75)
reposition_test: Refactor all HWC<->CHW transpose code into unified functions (!79)
CMakeLists.txt: Change the default value of MACCEL_CPU_OFFLOAD to bonfire (!80)
Improve the execution time of the
Model::create(!81)Rename Ticket to Future for consistency with common async terminology (!82)
Removed#
Remove
lib_info.pyandresnet50_accuracy.py(!72)
[v0.25.0] - 2025-04-03#
Added#
Support FP16 & FP32 NPU Data Type (!38)
testinfer: Support a help message to describe arguments and how to use (!41)
Support
Global4andGlobal8core modes (!40)pymaccel: Add
set_global4_core_mode,set_global8_core_mode, andset_multi_core_modeto python API (!51)Support LLM Model (!48)
pymaccel: Add
max_height,max_width,max_channel, andmax_cache_sizeinBufferInfo(!58)Support
Multicore mode (!56)Add Python unittests to verify the Python API (!65)
Apply doxygen to maccel (!66)
Fixed#
Windows: Fix updating temperature to new union sturcture (!43)
Windows: Release
pMemafter updating memory consumption (!43)tensor_utils: Fix
moveToNDArrayto execute copy if libtorch is used for CPU offload (!45)Windows: Distinguish the target architecture in
.whlfile between MinGW Python and native Windows Python (!44)Windows: Fix build errors in the English version of Windows by removing Korean comments from
build_wheel.batandbuild_release.bat(!49)interleaving: Fix a bug where the return value of
interleavewas calculated larger than intended (!46)acc_impl: Fix a bug where IMEM is overwritten by IMEM_CACHE_REARRANGE in
Model::launch(!50)Fix
NDArrayDatato have thead-safe refcount by usingstd::shared_ptr(!52)simd_x86_64_transpose: Add missing
defined(__GNUC__) &&to prevent compilation in MSVC (!55)npu_op_desc_test: Fix the failure of
NPUOpDescTest.getRmemAddressesin Aries1 (!57)pymaccel: Fix
get_latency_set_policybeing overlapped byget_latency_consumed(!58)testinfer: Fix
set_global8_core_modebeing overlapped byset_global4_core_modein Python (!58)Fix the logic for calculating DDR memory usage to avoid referencing
hwdepand to support multi-core mode (!62)Windows: Fix to pass the correct Aries1 DDR usage to Windows monitoring tools (!63)
Fix Python APIs - add
__repr__,is None,reposition_outputs,np.ascontiguousarray, and more (!67)test_model: Fix
test_checkInferConsistencyto correct comparison (!67)
Changed#
Windows: power monitor change to power/voltage/current (!47)
driver: Remove the 4KB margin in allocHostMemory due to an interleaving bug (!46)
pymaccel: Change the notation of some methods in
ModelConfigfrom camelCase to snake_case (!51)Windows: Update Windows codes to match Windows driver v1.6 (!61)
ndarray: Attach
noexceptto move constructor/assignment ofNDArrayData(!64)Make
ModelConfigAPI more user-friendly - AddsetSingleCoreMode(!59)Revise the Python API wrapper of maccel (!65)
Improve docs - Markdown formatting, Python
ModelConfigexample in advanced_usage, and more (!68)
Removed#
model_impl: Remove redundant trace events in
inferBufferOutputandinferSpeedrun(!37)Remove
packedlogic (!43)
[v0.24.0] - 2024-02-12#
Added#
Windows: Tracks memory usage of Aries and sends it to the Windows monitoring tool (!39)
Fixed#
Modify the naming of .whl files to properly reflect the target platform, such as Windows or Linux (!36)
[v0.23.0] - 2024-01-24#
Added#
type.h: Add option in ModelConfig to force a single NPU bundle execution (!29)
Support FP16, FP32 Output NPU Buffer (!34)
Fixed#
type.h: Support Global core mode (!28)
[v0.22.0] - 2024-01-23#
Added#
Support Aries2 on maccel(!26)
Windows: Add aries performance monitoring (!30)
Fixed#
Fix build error on Windows by using findPythonInterp in MSVC (!31)
Changed#
Windows: Send 8->1->0 in postInfer on Windows (!32)
[v0.21.0] - 2025-01-17#
Added#
init.cc: Add initializer to set default LogLevel according to MACCEL_LOG_LEVEL (!9)
model: Add
shapeparameter tostd::vector<T*>infer api (!17)Support RNN/LSTM models with fixed sequence length inputs (!17)
testinfer: Add
seq-sizesargument for variable length inputs (!17)CMakeLists.txt: Automatically generate libmaccel.so* symbolic links based on the
SOVERSIONandVERSIONextracted from the Git tag (!19)Windows: Support Aries1/2 on Windows (!24)
Fixed#
reposition: fix logic of need_repos when shape is kept but reposition occurs (!11)
Modify the naming of .so and .whl files to appropriately reflect the target architecture (!22)
Fix whl file name error by adding
linux_prefix (!25)
Changed#
Support multi-model on single NPU core for Regulus (!6)
acc: Revise Accelerator::getCoreList to retrieve all available cores from driver (!10)
regulus: Fix regulus timeout unit from nsec to mesc (!15)
CMakeLists.txt: Add a cmake flag
MACCEL_GLIBCXX_DEBUGto apply compile optionD_GLIBCXX_DEBUG(!8)npu_watchdog: dumpDDRBin to dump DDR for each bundle & sequence_index (!17)
CMakeLists.txt: Remove Warning from FindPython (!23)
[v0.20.0] - 2024-11-15#
Added#
op_desc.cc: Add support for additional CPU offload operations (!3)
type: Add
CoreAllocationPolicyto support automatic NPU core allocation (!10)
Fixed#
CMakeLists.txt: Consider CMAKE_BUILD_TYPE as Release, when it is not specified (!7)
Changed#
op_desc.cc: Update implementations for some of CPU offload operations (!3)
CMakeLists.txt: CMAKE_BUILD_TYPE=Release limits LogLevel to INFO (!7)
[v0.19.0] - 2024-09-12#
Added#
simd_x86_64_scale: Implement SIMD(AVX2, SSE2)-based scale functions in x86-64 (#432, #434)
simd_x86_64_transpose: Implement SIMD(AVX2, SSE2)-based transpose functions in x86-64 (#434)
transpose: Implement transpose functions for inferCHW (#434)
simd_aarch64_scale: Implement SIMD(NEON)-based scale functions in ARM64 (#435)
simd_aarch64_transpose: Implement SIMD(NEON)-based transpose functions in ARM64 (#435)
Add definitions of platform which represents host’s architecture & SIMD (#441)
aries_win: Implement
postInferfor Windows (#446)Support multi-card in Windows (#448, #450)
Support Regulus NPU (#449)
model_impl : Implement
runNPUModelTestfor use ininferSpeedrunandinferOutputDiffof Thread-Benchmark (#462)Support MXQv3 (!2)
Fixed#
reposition: Fix calcReposIndicesDefaultBase when
original_size != reshaped_size(#437)reposition_test: Fix vector out of bound in scale_list (#438)
Update SIMD Option for MSVC (#441)
Fix some unittests to support MSVC (#446)
win_ddk: Fix pure function for MSVC (#453)
Changed#
reposition: Use SIMD for float repositions (#432, #434, #435)
PCIeDriver: Exclude RiscV area in Windows heap allocator (#443)
reposition: Use SIMD-based transpose for CHW int8_t data (#442)
npu_watchdog: Unify
postInferin both Windows and Linux (#446)reposition: MXQ compiled by default use efficient reposition (#457)
reposition: repositionFloat & repositionInt are integrated (#460)
Removed#
reposition: Remove OpenMP (#447)
reposition: Remove default reposition (#457)
[v0.18.0] - 2024-04-30#
Changed#
reposition: Revise
need_reposcondition to compare size of original and buffer (#431)
[v0.17.0] - 2024-04-12#
Added#
testinfer: Add
batch-size,num-coresoptions (#414)pymaccel: Add
set_log_level,start_tracing_events,stop_tracing_events(#418)pymaccel: Implement
maccel.load()API in Python (#422)model: Implement
acquireInputBuffers,acquireOutputputBuffers,releaseBuffers(#428)model: Implement new infer API return multi-batch output by reference (#428)
model: Implement new
repositionOutputswhich usevector<vector<float>>& output(#428)model: Implement
inferBufferToFloat(#428)
Fixed#
npu_watchdog: Fix default time duration overflow (#425, #426)
resnet50_test_cc: Fix wrong comparison & scale and add diff test (#428)
reposition: Fix fillgap logic (#429)
Changed#
aries: Determine to use interrupt mode by
ARIES_IOC_GET_SIGNAL_TYPEioctl rather than driver version (#417)model_impl: Skip benchmark for re-launched model (#421)
npu_watchdog: Change Default timeout from 1s to 10s (#424)
reposition: Use
scale.scalewhen scale is uniform (#420)
[v0.16.0] - 2024-02-20#
Added#
build_wheel: MSVC provides python wheel (#409)
Fixed#
Fix pymaccel build error from latest setuptools version (#412)
reposition: Fix ch-wise scale bug in inferCHW (#415)
Fix input shape check for batch in Python (#416)
Changed#
aries: Enhance robustness of pread/pwrite usage by apply loop (#413)
Removed#
resnet50_msvc: Remove
.TestDrive,PCIeDriverSystem.dll(#409)VERSION: Remove
VERSION(#412)
[v0.15.0] - 2024-01-03#
Added#
testinfer: Implement inferBuffer infer-api
acc_impl: Add lock_guard for AcceleratorImpl
pymaccel: Implement constructor that create and launch at the same time in Python (#391)
model_impl, npu_watchdog: Implement IMEM_INIT, IMEM_INITSTART
Fixed#
memory_pool: Fix not to wait forever when model.dispose() called without releaseBuffer
reposition: Apply __pragma for MSVC (#393)
aries_win: Addtional allocate 4KB host memory to fix interleaving bug (#396)
Fix default log level bug in release build
model_impl: Fix segfault in user_outputs by rollback of resize removal (#401)
Changed#
testinfer: Modularize doMain by implementing processInputs, processOutputs, printSummary
model_desc: Revise model shape to exclude batch-size (#392)
model_desc: checkIfInputShapeMatchAndEnsureBatchDim -> doesInputShapeMatch (#392)
op_desc: Update CPU offload code (#386)
tensor_utils: Remove release when move tensor to NDArray
Change PACKAGE_FILENAME as
maccel_${VENDOR}_${PRODUCT}_${VERSION}
[v0.14.0] - 2023-11-03#
Added#
pypymaccel (#369)
acc: Introduce startTracingEvents(), while deprecating Acc::startTracingEvents()
model: Implement infer APIs using NDArray
Fixed#
reposition: Calulcate CHW indices for efficient type
acc_impl: RespectOMP_NUM_THREADSenv var (#378)driver: Check {read,write}MemoryBuffer’s return value (#370)
Changed#
omp: Limit number of threads for
OMP parallel blockwhen otherOMPblocks are running (#383)윈도우에서 아래 파일들의 추가 종속성을 없앰 .TestDrive SystemDDK.dll SystemHAL.dll PCIeDriverSystem.dll
Convert
maccel.hto all-in-one-style header (#366)type: Include all cluster/cores by default
[v0.13.0] - 2023-09-13#
Added#
Add copyright notice (#351)
npu_watchdog: Dump DDR.bin for debugging (#348)
Fixed#
reposition: Fill 0 when reposition is ill-fitted
reposition: Fix filling gap bug for channel 1
Changed#
cmake: Add ‘d’ postfix for windows dll debug build
log: Change default log level
model: Implement inferHeightBatch using inferBatch (#349)
[v0.12.0] - 2023-09-01#
Added#
aries: Introduce new env var, which prevents {un,}claiming cores (#338)
Interrupt 지원 (Linux) (#324)
windows_interrupt (#315)
INFO, WARNING, ERROR 로그가 출력됩니다. 추가된
setLogLevel()API로 로그 레벨을 조절할 수 있습니다. (OFF도 가능)Implement int8->int8 infer API in output-as-parameter fashion
Fixed#
reposition: Fill gap when input channel is 1
reposition: Don’t use efficient reposition if it’s CHW format
Fix Visual Studio build error (#322)
reposition: Make output correct for AVX2
memory_pool: When reset, free all pre-allocated memory
Changed#
reposition: Implement
efficientlogic for int input/output (#329)Limit OMP num threads at runtime
Implement RepositionType::EfficientRuntimeInterleave
Removed#
model: Remove some infer APIs
[v0.11.1] - 2023-08-25#
aries: Introduce new env var, which prevents {un,}claiming cores (#339)
[v0.11.0] - 2023-06-21#
Added#
Implement experimental
inferHeightBatch()APImodel: Implement new infer API which takes output as parameter
Fixed#
Fix inferSpeedrun segfault
reposition: When CHW, use
defaultmethod
Changed#
reposition: Implement scale using NEON
Refactor reposition functions
[v0.10.0] - 2023-06-14#
Added#
Implement batch input inference API (#290)
Fixed#
Fix
EXPERIMENTAL_READ_OUTPUTS_AT_ONCEcompile errorFix inferCHW bug
Mitigate sleep() problem on windows (#283)
Changed#
Do not repos when original channel is the same as pe num (#294)
Allocate input/output buffers at once (#293)
Implement
RepositionType::Efficient(#291)pymaccel check input shape (#292)
pybind11 infer no convert (#289)
Implement
MemoryPooland apply toModelImpl(#288)Move
NPUTaskQueue-releated members toAcceleratorImpl(#286)reposition: Clean up omp directives
Remove model runner
[v0.9.1] - 2023-05-23#
Changed#
Disable interleaving
[v0.9.0] - 2023-05-19#
Added#
Implement manual packed, multi input inference
Claim cores when model launches (#277)
Changed#
List up more detailed items for profiler (#278)
kibum/factor-out-aries-reset (#276)
Try to apply bonfire for CPU offload (#273)
root일때만 whl 빌드 (#274)
[v0.8.0] - 2023-04-18#
Fixed#
Use input’s memory format for output’s memory format (#270)
Changed#
Support width-wise reshape with 1 or 2 channels, just like 3 channels (#272)
Build a wheel file while
makeingAllow lowercase for
DRIVER_TYPE
[0.7.0] - 2023-03-23#
Added#
Initial work on CPU-offloading.
Removed#
Temporarily removed non-float infer APIs
[0.6.0] - 2023-03-23#
Added#
add windows driver interface. (tested.) (#263)
Implement GlobalMode/MultiMode (#262)
core 관련 함수 추가 (#251)
add modelconfig func (#230)
Reset 구현 (#229)
Implement inferSpeedrun (#232)
model: Add infer functions using
int8_t(#182)version 기능 추가 (#173)
model_impl: Implement ChannelFirst (#166)
Changed#
Plus 0.5 then floor to convert float -> int
npu_watchdog: Set reset timeout ratio to 10 (#241)Enable log for default cmake build
pymaccel infer gil release 추가 (#214)
profile: Reimplement profiler (#210)
Aries 이후 하드웨어를 지원하는 패치 (#185)
model_manager: Always wait untilNPU_FINISH == 1
Fixed#
Fix generate export header process
acc 먼저 파괴시 model dispose 호출 (#252)
동작중인 Core에 launch 방지 (#211)
모델 업로드 전 DRAM 영역 초기화 로직 추가 (#206)