Windows-sum-float-vs-bfloat16-PROSAGA-码农传奇

Comparison of vector element sum using various data types.

For all experiments, each approach is attempted on a number of vector sizes,
running each approach 5 times per size to get a good time measure. The
experiments are done with guidance from Prof. Dip Sankar Banerjee and
Prof. Kishore Kothapalli.

Comparision with Float and BFloat16 storage types

In this experiment (float-vs-bfloat16, main), we comparing the performance of
finding the sum of numbers between, the number stored as float or
bfloat16. While it seemed to me that bfloat16 method would be a clear
winner because of reduced memory bandwidth requirement, for some reason it is
only slightly faster. This is possibly because memory loads are anyway
always 32-bit. The only reason using bfloat16 is slightly faster could
possibly be because it allows data to be retained in cache for a longer period
of time (because of its small size). Note that neither approach makes use of
SIMD instructions which are available on all modern hardware.

Comparision with Float and BFloat16 storage types

References