The new version of NSIMD brings a lot of bug fixes, better support for already supported SIMD extensions, better testing, better documentation. We have added modules to NSIMD. A module is a set of features that are proposed to users usually in C++ only. This version comes with five modules. Three of them add support for GPU and propose an abstraction to CUDA and ROCm/HIP. Another module provides the random123 family of pseudo random numbers generators while the last module gives access to a vectorized implementation of fixed point numbers. We also have extended our support of float16 to GPUs.
Some numbers for this new version:
- ~21,000 lines of Python code which generate
- ~570,000 lines of C and C++ code including more than
- 100,000 unit tests for the
- 4 hardware vendors (Intel, Arm, AMD and Nvidia) including
- 17 SIMD/GPU architectures.
You only need a C++98 compliant compiler to build NSIMD but users who have access to a compliant C++20 compiler can use the corresponding NSIMD API which makes use of concepts for template arguments.