Releases: pulp-platform/Deeploy
Releases · pulp-platform/Deeploy
v0.2.0
Release v0.2.0 (2025-07-08) #103
This release contains major architectural changes, new platform support, enhanced simulation workflows, floating-point kernel support, training infrastructure for CCT models, memory allocation strategies, and documentation improvements.
List of Pull Requests
- Prepare v0.2.0 release #102
- Add Luka as Code Owner #101
- Fix CI, Docker Files, and Documentation Workflow #100
- Chimera Platform Integration #96
- Add Tutorial and Refactor README #97
- Reduce Mean Float Template #92
- Reshape Memory Freeing and Generic Float GEMM Fixes #91
- Prepare for Release and Separate Dependencies #90
- Fix input offsets calculation #89
- Move PULP SDK to main branch/fork #88
- Finite Lifetime for IO Tensors #51
- Improved Memory Visualization and Multi-Layer Tiling Profiling #56
- Fix Linting in CI and Reformat C Files #86
- Fix Broken CMake Flow For pulp-sdk #87
- Refactor Changelog For Release #85
- ARM Docker Container and Minor Bug Fix #84
- Added Kernel for Generic Float DW Conv2D #63
- Autoselect Self-Hosted Runners if the Action is on Upstream #81
- TEST_RECENT linking on MacOS #78
- Add RV32IMF Picolibc support for Siracusa platform #66
- Improve Documentation and VSCode Support #76
- Debug Print Topology Pass and Code Transformation #75
- Find all subdirectories of Deeploy when installing with pip install #70
- Add milestone issue template #71
- Bunch of fixes and changes #58
- Add SoftHier platform #65
- rv32imf_xpulpv2 ISA support for Siracusa platform #64
- One LLVM To Compile Them All #60
- One GVSoC to Simulate Them All #59
- Add Support for CCT Last Layer Training with Embedding Dim 8-128 #55
- Add CCT Classifier Training Support #53
- L3 Bugs: DMA Struct Datatype and Maxpool Margin Error #45
- DeepQuant Quantized Linear Support #54
- Implemented Dequant Layer for Generic and Siracusa #52
- Infinite Lifetime Buffers Considered in Tiling & Memory Allocation (+ Visualization) #44
- Implemented Quant Layer for Generic and Siracusa #49
- Increase maximal Mchan DMA transfer sizes from 64KiB to 128KiB #47
- Add MiniMalloc and Decouple Memory Allocation and Tiling #40
- Float CCT Bugs on L3 #37
- Memory Allocation Strategies and Visualization #36
- Add CODEOWNERS #42
- Add Tiling Support to All CCT Kernels and Fix CCT Operators on Siracusa Platform for L2 #35
- Add Fp gemm and Softmax for Snitch platform #31
- Add Float Kernels for CCT #29
- documentation deployment #34
- main.c Float Cast Bugs #28
- Add Float GEMM on PULP with Tiling #26
- Add Float Support & Float GEMM for Generic #25
- GVSOC support for the Snitch Cluster platform #23
- Snitch Cluster Tiling Support #22
- Snitch support integration #14
- Update bibtex citation #20
- the PR template location, bump min python to 3.10, change install command #17
- Add pre-commit for python formatting #15
- FP integration (v2) #12
- shell for sequential tests of Generic, Cortex, and Mempool platforms #11
- Add issue templates #10
- Minor CI and Readme Improvements #8
- Fix GHCR Link for Docker Build #7
- neureka's ccache id #6
- GitHub-based CI/CD Flow #4
- Generic Softmax Kernel #2
- Port GitLab CI #1
Added
- ChimeraDeployer, currently mainly a placeholder
- Allocate templates for Chimera
- ChimeraPlatform, using appropriate allocation templates and using the generic Parser + Binding for the Add node
- Adder CI test for Chimera
- Install flow for chimera-sdk via Makefile
- DeeployChimeraMath library
- Generic FP32 reduce mean bindings, parser, and template
- New alias list parameter for buffer objects
- New test, also included in the CI pipeline, for the reshape and skip connection situation
- 'shape' parameter handling similar to the 'indices' parameter in the generic reshape template
- Test the correcteness of the memory map generated by the tiler
- Add attribute to
VariableBufferto distinguish I/Os - Add proper static memory allocation with finite lifetime for I/Os
- The memory allocation visualization now displays the allocation for each level used
- Tutorial section in the documentation
- Guide on using the debug print topology pass and code transformation
- VSCode configuration files for improved IDE support
- Multi-branch GitHub Pages deployment support
- Test for the
DebugPrintTopologyPass. - Test for
PrintInputGeneration,PrintOutputGeneration,MemoryAwarePrintInputGeneration,MemoryAwarePrintOutputGeneration - check for CMAKE variable and fallback to searching for cmake
- tensor name mangling
- identity operation removal
- _unpack_const helper function to NodeParser to allow for node attributes that are direct Constant tensors or direct numpy values
- load_file_to_local in dory_mem as a way to load values directly to a local memory (not ram). needed for copying values from flash to wmem needed for Neureka v2
- Add the
documentation.ymlworkflow to deploy doc pages. - Improved README with more detailed
Getting Startedsection, a section listing related publications, and a list of supported platforms. - Schedule a CI run every 6 days at 2AM CET to refresh the cache (it expires after 7 days if unused).
- Add the
FloatImmediateAbstractType - Define fp64, fp32, fp16, and bf16
- Add float binding for the Adder in the Generic platform
- Add a FloatAdder test to the CI for Siracusa and Generic platforms
- Extend
testType.pywith float tests - LIMITATION: Current LLVM compiler does not support bfp16 and fp16, these types are commented in the library header
- cMake Flow for the Snitch Cluster
- Added
snitch_clusterto Makefile - New Snitch platform with testing application
- Testrunner for tiled and untiled execution (
testRunner_snitch.py,testRunner_tiled_snitch.py) - Minimal library with CycleCounter and utility function
- Support for single-buffered tiling from L2.
- Parsers, Templates, TypeCheckers, Layers, and TCF for the newly supported operators.
- A code transformation pass to filter DMA cores or compute cores for an
ExecutionBlock. - A code transformation pass to profile an
ExecutionBlock. - Test for single kernels, both with and without tiling.
- Adds the
--debugflag tocargo installwhen installing Banshee to get the possibility of enabling the debug prints. - New tests for the
snitch_clusterplatform. - Add macros to
main.cto disable printing and testing (convenient when running RTL simulations). - gvsoc in the Makefile and dockerfile
- cmake flow for gvsoc
- CI tests regarding Snitch run on GVSOC as well
- Float Support for Constbuffer
- Simple Float GEMM on Generic and PULP
- FP...