Abstract: Contemporary GPU architectures integrate specialized computing units for matrix multiplication, named matrix multiplication units (MXUs), to effectively process neural network applications.