As can be seen here, when switching to MSVC, the funcA<3> definition (which should be optimised) is instead identical to the funcA<-1> definition (which cannot be optimised).
In contrast, a GNU compiler replaces funcA<3> entirely with
"int funcA<3>(int)":
mov eax, 3
ret
This suggests MSVC is not unrolling the various bitwise.hpp loops when the loop length is known at compile-time - and explains why Windows execution (with native MSVC) is noticeably slower in my ad hoc tests. Oh Microsoft! If no solution can be found (which doesn't involve enshittifying the entire codebase with manual unrolls), add doc to encourage accursed Windows users away from MSVC.
As can be seen here, when switching to MSVC, the
funcA<3>definition (which should be optimised) is instead identical to thefuncA<-1>definition (which cannot be optimised).In contrast, a GNU compiler replaces
funcA<3>entirely withThis suggests MSVC is not unrolling the various
bitwise.hpploops when the loop length is known at compile-time - and explains why Windows execution (with native MSVC) is noticeably slower in my ad hoc tests. Oh Microsoft! If no solution can be found (which doesn't involve enshittifying the entire codebase with manual unrolls), add doc to encourage accursed Windows users away from MSVC.