Raymond Toy pushed to branch issue-284-optimize-signed-byte-32-int-len-vop at cmucl / cmucl
Commits: 4ffad32e by Raymond Toy at 2024-03-22T13:08:11-07:00 Optimize unsigned-byte-32-int-len some more
Take advantage of the fact that BSR actually moves the src to the dst register when the src is 0. The Intel docs don't says this, but gcc, LLVM, and MSVC compilers basically assume this when generating code that uses BSR.
- - - - -
1 changed file:
- src/compiler/x86/arith.lisp
Changes:
===================================== src/compiler/x86/arith.lisp ===================================== @@ -760,12 +760,11 @@ (:results (res :scs (any-reg))) (:result-types positive-fixnum) (:generator 30 - (move res arg) ;; The Intel docs say that BSR leaves the destination register - ;; undefined if the source is 0. But AMD64 says the destination - ;; register is unchanged. This also appears to be the case for - ;; GCC and LLVM. - (inst bsr res res) + ;; undefined if the source is 0. However, gcc, LLVM, and MSVC + ;; generate code that pretty much says BSR basically moves the + ;; source to the destination if the source is 0. + (inst bsr res arg) (inst jmp :z DONE) ;; The result of BSR is one too small for what we want, so ;; increment the result.
View it on GitLab: https://gitlab.common-lisp.net/cmucl/cmucl/-/commit/4ffad32e09195013f27e6e3d...