Quantcast
Channel:
Viewing all articles
Browse latest Browse all 20

In looking at the generated

$
0
0

In looking at the generated code, you find some subtle difference that lead to different loop sizes.

1st)

000000013F5A1090  vmovaps     ymm0,ymmword ptr [rbx+rax]
                        vecAp++;  
                                vecBp++;
 000000013F5A1095  add         rax,20h

verses 2nd)

000000013FA51070  vmovaps     ymm0,ymmword ptr [rbp+rbx]
 000000013FA51076  add         rbx,20h
 

Ignore the assembler comment for the ++ of the two pointers, instead look at the byte address of the add instruction. The first case is +5 bytes from the start of loop ...90, the second case is +6 from the start of loop ...+70. Apparently using rbp requires a prefix byte.

Next look at the vaddps

000000013F5A109C  vaddps      ymm1,ymm0,ymmword ptr [rdi+rax-20h]
000000013F5A10A2  vmovaps     ymmword ptr [rax-20h],ymm1
verses
000000013FA5107D  vaddps      ymm0,ymm0,ymmword ptr [rbp+rbx+3FE0h]
000000013FA51086  vmovaps     ymmword ptr [rbx+rax-20h],ymm0

Note the immediate value in the first case is 20h, this fits in imm8 (one byte) making the vaddps 6 bytes
The immediate value in the second case is 3FE0h, this requires imm32 (4 bytes) making the vaddps 9 bytes

The use of the (registerized) pointers permitted the use of shorter byte length instructions.

Jim Demspey


Viewing all articles
Browse latest Browse all 20

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>