As discussed on Slack already. Using a single socket of a regular compute node of [Noctua 2](https://pc2.uni-paderborn.de/hpc-services/available-systems/noctua2) (AMD EPYC Milan 7763 64-Core CPUs, i.e. Zen3)  Apparently, Octavian chooses a bad strategy for large matrix sizes.