Why does the order of loops in a matrix multiply algorithm affect performance? [duplicate] Why does the order of loops in a matrix multiply algorithm affect performance? [duplicate] c c