Skip to content

Releases: pulp-platform/Deeploy

v0.2.0

08 Jul 13:58
v0.2.0
c7fd2a1

Choose a tag to compare

Release v0.2.0 (2025-07-08) #103

This release contains major architectural changes, new platform support, enhanced simulation workflows, floating-point kernel support, training infrastructure for CCT models, memory allocation strategies, and documentation improvements.

List of Pull Requests

  • Prepare v0.2.0 release #102
  • Add Luka as Code Owner #101
  • Fix CI, Docker Files, and Documentation Workflow #100
  • Chimera Platform Integration #96
  • Add Tutorial and Refactor README #97
  • Reduce Mean Float Template #92
  • Reshape Memory Freeing and Generic Float GEMM Fixes #91
  • Prepare for Release and Separate Dependencies #90
  • Fix input offsets calculation #89
  • Move PULP SDK to main branch/fork #88
  • Finite Lifetime for IO Tensors #51
  • Improved Memory Visualization and Multi-Layer Tiling Profiling #56
  • Fix Linting in CI and Reformat C Files #86
  • Fix Broken CMake Flow For pulp-sdk #87
  • Refactor Changelog For Release #85
  • ARM Docker Container and Minor Bug Fix #84
  • Added Kernel for Generic Float DW Conv2D #63
  • Autoselect Self-Hosted Runners if the Action is on Upstream #81
  • TEST_RECENT linking on MacOS #78
  • Add RV32IMF Picolibc support for Siracusa platform #66
  • Improve Documentation and VSCode Support #76
  • Debug Print Topology Pass and Code Transformation #75
  • Find all subdirectories of Deeploy when installing with pip install #70
  • Add milestone issue template #71
  • Bunch of fixes and changes #58
  • Add SoftHier platform #65
  • rv32imf_xpulpv2 ISA support for Siracusa platform #64
  • One LLVM To Compile Them All #60
  • One GVSoC to Simulate Them All #59
  • Add Support for CCT Last Layer Training with Embedding Dim 8-128 #55
  • Add CCT Classifier Training Support #53
  • L3 Bugs: DMA Struct Datatype and Maxpool Margin Error #45
  • DeepQuant Quantized Linear Support #54
  • Implemented Dequant Layer for Generic and Siracusa #52
  • Infinite Lifetime Buffers Considered in Tiling & Memory Allocation (+ Visualization) #44
  • Implemented Quant Layer for Generic and Siracusa #49
  • Increase maximal Mchan DMA transfer sizes from 64KiB to 128KiB #47
  • Add MiniMalloc and Decouple Memory Allocation and Tiling #40
  • Float CCT Bugs on L3 #37
  • Memory Allocation Strategies and Visualization #36
  • Add CODEOWNERS #42
  • Add Tiling Support to All CCT Kernels and Fix CCT Operators on Siracusa Platform for L2 #35
  • Add Fp gemm and Softmax for Snitch platform #31
  • Add Float Kernels for CCT #29
  • documentation deployment #34
  • main.c Float Cast Bugs #28
  • Add Float GEMM on PULP with Tiling #26
  • Add Float Support & Float GEMM for Generic #25
  • GVSOC support for the Snitch Cluster platform #23
  • Snitch Cluster Tiling Support #22
  • Snitch support integration #14
  • Update bibtex citation #20
  • the PR template location, bump min python to 3.10, change install command #17
  • Add pre-commit for python formatting #15
  • FP integration (v2) #12
  • shell for sequential tests of Generic, Cortex, and Mempool platforms #11
  • Add issue templates #10
  • Minor CI and Readme Improvements #8
  • Fix GHCR Link for Docker Build #7
  • neureka's ccache id #6
  • GitHub-based CI/CD Flow #4
  • Generic Softmax Kernel #2
  • Port GitLab CI #1

Added

  • ChimeraDeployer, currently mainly a placeholder
  • Allocate templates for Chimera
  • ChimeraPlatform, using appropriate allocation templates and using the generic Parser + Binding for the Add node
  • Adder CI test for Chimera
  • Install flow for chimera-sdk via Makefile
  • DeeployChimeraMath library
  • Generic FP32 reduce mean bindings, parser, and template
  • New alias list parameter for buffer objects
  • New test, also included in the CI pipeline, for the reshape and skip connection situation
  • 'shape' parameter handling similar to the 'indices' parameter in the generic reshape template
  • Test the correcteness of the memory map generated by the tiler
  • Add attribute to VariableBuffer to distinguish I/Os
  • Add proper static memory allocation with finite lifetime for I/Os
  • The memory allocation visualization now displays the allocation for each level used
  • Tutorial section in the documentation
  • Guide on using the debug print topology pass and code transformation
  • VSCode configuration files for improved IDE support
  • Multi-branch GitHub Pages deployment support
  • Test for the DebugPrintTopologyPass.
  • Test for PrintInputGeneration, PrintOutputGeneration, MemoryAwarePrintInputGeneration, MemoryAwarePrintOutputGeneration
  • check for CMAKE variable and fallback to searching for cmake
  • tensor name mangling
  • identity operation removal
  • _unpack_const helper function to NodeParser to allow for node attributes that are direct Constant tensors or direct numpy values
  • load_file_to_local in dory_mem as a way to load values directly to a local memory (not ram). needed for copying values from flash to wmem needed for Neureka v2
  • Add the documentation.yml workflow to deploy doc pages.
  • Improved README with more detailed Getting Started section, a section listing related publications, and a list of supported platforms.
  • Schedule a CI run every 6 days at 2AM CET to refresh the cache (it expires after 7 days if unused).
  • Add the FloatImmediate AbstractType
  • Define fp64, fp32, fp16, and bf16
  • Add float binding for the Adder in the Generic platform
  • Add a FloatAdder test to the CI for Siracusa and Generic platforms
  • Extend testType.py with float tests
  • LIMITATION: Current LLVM compiler does not support bfp16 and fp16, these types are commented in the library header
  • cMake Flow for the Snitch Cluster
  • Added snitch_cluster to Makefile
  • New Snitch platform with testing application
  • Testrunner for tiled and untiled execution (testRunner_snitch.py, testRunner_tiled_snitch.py)
  • Minimal library with CycleCounter and utility function
  • Support for single-buffered tiling from L2.
  • Parsers, Templates, TypeCheckers, Layers, and TCF for the newly supported operators.
  • A code transformation pass to filter DMA cores or compute cores for an ExecutionBlock.
  • A code transformation pass to profile an ExecutionBlock.
  • Test for single kernels, both with and without tiling.
  • Adds the --debug flag to cargo install when installing Banshee to get the possibility of enabling the debug prints.
  • New tests for the snitch_cluster platform.
  • Add macros to main.c to disable printing and testing (convenient when running RTL simulations).
  • gvsoc in the Makefile and dockerfile
  • cmake flow for gvsoc
  • CI tests regarding Snitch run on GVSOC as well
  • Float Support for Constbuffer
  • Simple Float GEMM on Generic and PULP
  • FP...
Read more