The MASM Forum Archive 2004 to 2012

General Forums => The Laboratory => Topic started by: MazeGen on May 26, 2005, 11:08:37 AM

Title: How do you save the Flags in your code?
Post by: MazeGen on May 26, 2005, 11:08:37 AM
Hello,
I was thinking about the quickest way - I use an assembler so why not to use the best way, right?  :wink

I got used to PUSHF/POPF under DOS, but according to Fog's optimization manual, it needs 4 uops and next 4 uops for the microcode (P4). POPFD needs 4 uops and next 8 uops for the microcode. It seems that PUSHFD/POPFD is not the best way.

By contrast, LAHF and SAHF both needs only one uop, no microcode (I know they don't save the OF and DF, but I rarely need them). But I often need the EAX value, so it has to be PUSHed/POPed additionaly.

Finally, we have two constructs:


; 1.
PUSHFD
...
POPFD



; 2.
PUSH EAX
LAHF
...
SAHF
POP EAX ; avoid partial register stall by writing to the whole register


What do you think about them?
The latter one seems quicker to me...
Title: Re: How do you save the Flags in your code?
Post by: Phil on May 26, 2005, 11:42:27 PM
Wow ... The things we learn by reading, thinking, and exploring! I'm fairly new to the world of 32-bit 80x86 programming but I think you have a fine idea! If we're going to do assembly then, why *not* do it right!

I've attached a zip file that contains MichaelW's timers.asm and pushtime.asm that contains the instructions you'd asked about inside timer loops and here are the results. I'm not sure why it says zero for the PUSH/LAHF cycles. Clearly it takes some time to do the operations. You might try adding one or two more instructions from your actual code until you see at least some cycles for the second case. Anyway, I thought I'd pass this along so you can play with it. These results came from a 996MHz P3:

C:\ASM\EXAMPLES>PUSHTIME
21 PUSHFD cycles
0 PUSH/LAHF cycles

You'll need to build the example code as a console application in order to use it.


[attachment deleted by admin]
Title: Re: How do you save the Flags in your code?
Post by: QvasiModo on May 27, 2005, 12:37:35 AM
Note that the two pieces of code do not perform the same task. The second only uses the lowest byte of the flags register.
Title: Re: How do you save the Flags in your code?
Post by: MichaelW on May 27, 2005, 03:09:24 AM
Hi Phil,

I modified your code so it would return a more realistic cycle count. The macros return the count for a single pass through the block of code, so for code that takes only a few cycles to execute the count is smaller than the accumulated timing inaccuracies.

; ««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    .586                       ; create 32 bit code
    .model flat, stdcall       ; 32 bit memory model
    option casemap :none       ; case sensitive

    include \masm32\include\windows.inc
    include \masm32\include\masm32.inc
    include \masm32\include\kernel32.inc

    includelib \masm32\lib\masm32.lib
    includelib \masm32\lib\kernel32.lib

    include \masm32\macros\macros.asm

    include timers.asm
; ««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    .data
    .code
; ««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
start:
; ««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    LOOP_COUNT EQU 10000000
    REPEAT_COUNT EQU 10

    ; Reality check
    counter_begin LOOP_COUNT, HIGH_PRIORITY_CLASS
      REPEAT REPEAT_COUNT
        add   eax,1
      ENDM
    counter_end
    print ustr$(eax)
    print chr$(" add reg,imm cycles * 10",13,10)
   
    counter_begin LOOP_COUNT, HIGH_PRIORITY_CLASS
      REPEAT REPEAT_COUNT
        ; 1.
        PUSHFD
        ; ...
        POPFD
      ENDM
    counter_end
    print ustr$(eax)
    print chr$(" PUSHFD cycles * 10",13,10)

    counter_begin LOOP_COUNT, HIGH_PRIORITY_CLASS
      REPEAT REPEAT_COUNT
        ; 2.
        PUSH EAX
        LAHF
        ; ...
        SAHF
        POP EAX ; avoid partial register stall by writing to the whole register
      ENDM       
    counter_end
    print ustr$(eax)
    print chr$(" PUSH/LAHF cycles * 10",13,10)

    mov   eax, input(13,10,"Press enter to exit...")
    exit
; ««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
end start


The results on my P3:

5 add reg,imm cycles * 10
222 PUSHFD cycles * 10
28 PUSH/LAHF cycles * 10

Title: Re: How do you save the Flags in your code?
Post by: Phil on May 27, 2005, 04:36:55 AM
MichaelW: Great! I hadn't figured out how to do that yet. It's also nice to have your code displayed so the person who'd originally asked about it can see how your timers work before downloading the zip. Great code there. Thanks again for taking the time to put it all together and keeping it tuned up!

MazeGen: Thanks for asking this question about saving flags. I've also got an application that I need to tweak quite a bit and I think your suggestion might help me considerably!
Title: Re: How do you save the Flags in your code?
Post by: MazeGen on May 27, 2005, 11:55:35 AM
Phil: Thanks for your effort with measuring the timings!

QuasiModo: You're right - as I said, I never change DF, and rarely need OF, since I work most of the time with unsigned arithmetic.

Michael: Again thanks for your ready-to-assemble sample. I did never look at your "code timing macros" thread, but now I see it is veeery useful  :thumbu

Didn't expect before that LAHF/SAHF can be several times faster than PUSHFD/POPFD  :eek
Title: Re: How do you save the Flags in your code?
Post by: MazeGen on May 29, 2005, 05:36:31 PM
Eh, when I've tried the measuring again a again on AMD Athlon XP 1800, I've always got very different results. I really have no idea why...

Quote
6 add reg,imm cycles * 10
142 PUSHFD cycles * 10
274 PUSH/LAHF cycles * 10

6 add reg,imm cycles * 10
144 PUSHFD cycles * 10
115 PUSH/LAHF cycles * 10

6 add reg,imm cycles * 10
145 PUSHFD cycles * 10
488 PUSH/LAHF cycles * 10

6 add reg,imm cycles * 10
144 PUSHFD cycles * 10
185 PUSH/LAHF cycles * 10

6 add reg,imm cycles * 10
145 PUSHFD cycles * 10
336 PUSH/LAHF cycles * 10

In general, PUSH EAX/LAHF version seems to be rather slower than PUSHFD/POPFD.

Here is also result from Pentium M 1400. I've also tried more measurings, but all of them were very similar:

Quote
5 add reg,imm cycles * 10
219 PUSHFD cycles * 10
46 PUSH/LAHF cycles * 10
Title: Re: How do you save the Flags in your code?
Post by: MichaelW on May 30, 2005, 12:30:04 AM
Hi MazeGen,

For the Athlon, try increasing REPEAT_COUNT. If that does not stabilize the returned counts you could try REALTIME_PRIORITY_CLASS. But note that I now avoid posting code that uses REALTIME_PRIORITY_CLASS because I have had Windows (2000 SP4 running on a very reliable system) hang when the test loop took too long to run. And when I do use REALTIME_PRIORITY_CLASS, for any test, I take care to save everything to disk first.
Title: Re: How do you save the Flags in your code?
Post by: MazeGen on May 30, 2005, 05:52:50 AM
Hi Michael,
you're right :U Thanks!

Check the results now (no need for REALTIME_PRIORITY_CLASS):

Quote
100 add reg,imm cycles * 100
1530 PUSHFD cycles * 100
652 PUSH/LAHF cycles * 100

103 add reg,imm cycles * 100
1534 PUSHFD cycles * 100
655 PUSH/LAHF cycles * 100

104 add reg,imm cycles * 100
1534 PUSHFD cycles * 100
652 PUSH/LAHF cycles * 100

104 add reg,imm cycles * 100
1534 PUSHFD cycles * 100
647 PUSH/LAHF cycles * 100

102 add reg,imm cycles * 100
1519 PUSHFD cycles * 100
645 PUSH/LAHF cycles * 100
Title: Re: How do you save the Flags in your code?
Post by: MazeGen on May 30, 2005, 07:59:21 AM
I was thinking about a sort of PUSHFD macro to avoid trashing EAX and here is the result:


pushfdszapc MACRO
push eax ;; will be rewritten by *
push eax
lahf
mov [esp+4],eax ;; *
pop eax
ENDM

popfdszapc MACRO
push eax
mov eax,[esp+4]
sahf
pop eax
lea esp,[esp+4] ;; use LEA to leave Flags unchanged
ENDM


I use postfix "szapc" since it saves only sign, zero, adjust, parity, and carry flag.

BTW, I don't mind about the code size - the macros are intended for small, optimized pieces of code, which always fit into the trace cache.

It is still faster than PUSHFD/POPFD on my processors.

But - I don't know how to avoid partial register stall on MOV [ESP+4],EAX in the PUSHDSZAPC macro when used on PPro, P2, or P3. I can't use AND EAX,0FF00h or similar because I can't touch the Flags. MOVZX EAX,AX will probably not work because it reads AX, not AH. MOVZX EAX,AH makes the macros too much complex and slow.

Michael, could you please test it on your P3 as-it-is? The code is below.

Any ideas how to improve the macros? :8)
[/color]

My timings:

Quote from: AMD Athlon XP 1800
98 add reg,imm cycles * 100
1469 PUSHFD cycles * 100
954 PUSHFDSZAPC cycles * 100

Quote from: Pentium M 1400
4 add reg,imm cycles * 10
219 PUSHFD cycles * 10
127 PUSHFDSZAPC cycles * 10

; ««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    .586                       ; create 32 bit code
    .model flat, stdcall       ; 32 bit memory model
    option casemap :none       ; case sensitive

    include \masm32\include\windows.inc
    include \masm32\include\masm32.inc
    include \masm32\include\kernel32.inc

    includelib \masm32\lib\masm32.lib
    includelib \masm32\lib\kernel32.lib

    include \masm32\macros\macros.asm

    include timers.asm

pushfdszapc MACRO
push eax ;; will be rewritten by *
push eax
lahf
mov [esp+4],eax ;; *
pop eax
ENDM

popfdszapc MACRO
push eax
mov eax,[esp+4]
sahf
pop eax
lea esp,[esp+4] ;; use LEA to leave Flags unchanged
ENDM

; ««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    .data
    .code
; ««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
start:
; ««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    LOOP_COUNT EQU 10000000
    REPEAT_COUNT EQU 10

    ; Reality check
    counter_begin LOOP_COUNT, HIGH_PRIORITY_CLASS
      REPEAT REPEAT_COUNT
        add   eax,1
      ENDM
    counter_end
    print ustr$(eax)
    print chr$(" add reg,imm cycles * 10",13,10)
   
    counter_begin LOOP_COUNT, HIGH_PRIORITY_CLASS
      REPEAT REPEAT_COUNT
        ; 1.
        PUSHFD
        ; ...
        POPFD
      ENDM
    counter_end
    print ustr$(eax)
    print chr$(" PUSHFD cycles * 10",13,10)

    counter_begin LOOP_COUNT, HIGH_PRIORITY_CLASS
      REPEAT REPEAT_COUNT
        pushfdszapc
        popfdszapc
      ENDM       
    counter_end
    print ustr$(eax)
    print chr$(" PUSHFDSZAPC cycles * 10",13,10)

    mov   eax, input(13,10,"Press enter to exit...")
    exit
; ««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
end start
Title: Re: How do you save the Flags in your code?
Post by: MichaelW on May 30, 2005, 09:40:02 AM
The results repeated exactly for 5 runs on my P3:

5 add reg,imm cycles * 10
221 PUSHFD cycles * 10
154 PUSHFDSZAPC cycles * 10


And after I tried adding a xor eax,eax before the lahf:

5 add reg,imm cycles * 10
221 PUSHFD cycles * 10
164 PUSHFDSZAPC cycles * 10

Title: Re: How do you save the Flags in your code?
Post by: MazeGen on May 30, 2005, 10:03:40 AM
Thanks again, Michael :8) Nice to see it is faster also on P3 without any modifications.

To avoid the partial register access, we need to write to the whole EAX just after LAHF. That's because LAHF writes only to AH, but  MOV [ESP+4],EAX reads from whole EAX.

If it is not annoying for you yet, try the following modification, it should be faster:


lahf
and eax,0FF00h ; avoid partial register stall
mov [esp+4],eax


In fact, we can use AND or similar because it changes the Flags, and also OF, which is not saved by LAHF.
Title: Re: How do you save the Flags in your code?
Post by: MichaelW on May 30, 2005, 10:15:03 AM

pushfdszapc MACRO
push  eax         ;; will be rewritten by *
push  eax
lahf
and   eax,0FF00h  ; avoid partial register stall
mov   [esp+4],eax
pop   eax
ENDM


5 add reg,imm cycles * 10
222 PUSHFD cycles * 10
160 PUSHFDSZAPC cycles * 10

Title: Re: How do you save the Flags in your code?
Post by: MichaelW on May 30, 2005, 11:40:33 AM
I think there is no partial-register stall problem with LAHF, or at least on a P3.

    counter_begin LOOP_COUNT, HIGH_PRIORITY_CLASS
      REPEAT REPEAT_COUNT * 10
        xor   ecx,ecx
        lahf
      ENDM
    counter_end
    print ustr$(eax)
    print chr$(" xor ecx,ecx lahf cycles * 100",13,10)

    counter_begin LOOP_COUNT, HIGH_PRIORITY_CLASS
      REPEAT REPEAT_COUNT * 10
        xor   eax,eax
        lahf
      ENDM
    counter_end
    print ustr$(eax)
    print chr$(" xor eax,eax lahf cycles * 100",13,10)


117 xor ecx,ecx lahf cycles * 100
117 xor eax,eax lahf cycles * 100


This coding is faster on a P3:

pushfdszapc MACRO
sub   esp,4
;push  eax         ;; will be rewritten by *
push  eax
lahf
mov   [esp+4],eax
pop   eax
ENDM


5 add reg,imm cycles * 10
222 PUSHFD cycles * 10
144 PUSHFDSZAPC cycles * 10

Title: Re: How do you save the Flags in your code?
Post by: MazeGen on May 30, 2005, 12:16:00 PM
Hehe, we can't SUB before we save the Flags :wink the macro would lose its meaning. That's why I use PUSH EAX.

According to your previous post, it really seems that ANDing the EAX didn't accelerate the macro.

But now it seems that one of us don't understand, what means partial register stall (no offense taken, Michael, I'm newbie in optimizations, I'm just trying to be humorous also in English :green)

According to Fog's manual, on P3, when you write to part of a 32-bit register (LAHF -> AH) and later read from the whole register (MOV [ESP+4],EAX), you get partial register stall, like in given example:


MOV AL, BYTE PTR [M8]
MOV EBX, EAX ; partial register stall


I would expect the stall here:


lahf ; write to a part of 32-bit EAX
mov [esp+4],eax ; read from the whole EAX -> the stall


That's why I tried to put AND EAX in between LAHF and MOV in order to write whole EAX.

In your test:


        xor   eax,eax ; write to the whole 32-bit register
        lahf ; write to a part of 32-bit register


You can't get partial register stall, because you always write to the register; there is no partial write + whole read.
Title: Re: How do you save the Flags in your code?
Post by: MichaelW on May 30, 2005, 01:25:00 PM
Oops, I was not considering the effect on the flags, just looking for a faster way to adjust the stack pointer, because that is the essential function of the first push instruction.

By my reasoning, EAX is being read by the following XOR EAX,EAX for all but the last repeated pair. But to make the test valid for all of the repeated instructions I added a MOV EBX,EAX.

    ; Test for partial-register stall with LAHF
    counter_begin LOOP_COUNT, HIGH_PRIORITY_CLASS
      REPEAT REPEAT_COUNT
        xor   ecx,ecx
        lahf
        mov   ebx,eax
      ENDM
    counter_end
    print ustr$(eax)
    print chr$(" xor ecx,ecx lahf mov ebx,eax cycles * 10",13,10)

    counter_begin LOOP_COUNT, HIGH_PRIORITY_CLASS
      REPEAT REPEAT_COUNT
        xor   eax,eax
        lahf
        mov   ebx,eax
      ENDM
    counter_end
    print ustr$(eax)
    print chr$(" xor eax,eax lahf mov ebx,eax cycles * 10",13,10)


71 xor ecx,ecx lahf mov ebx,eax cycles * 10
71 xor eax,eax lahf mov ebx,eax cycles * 10

Title: Re: How do you save the Flags in your code?
Post by: MazeGen on May 30, 2005, 04:59:18 PM
:green

It seems that we are still missing the point. XOR EAX, EAX in the following code has no influence on the latter instructions:


        xor   eax,eax
        lahf
        mov   ebx,eax


The important part should be what you put in between LAHF and MOV, I think.

Please try the following code - examples from the Agner's manual:


    ; Test for the simplest partial-register stall on PPro, P2, P3
    counter_begin LOOP_COUNT, HIGH_PRIORITY_CLASS
      REPEAT REPEAT_COUNT
        mov al,byte ptr [esp]
        mov ebx,eax        ; partial register stall
      ENDM
    counter_end
    print ustr$(eax)
    print chr$(" partial register stall expected * 10",13,10)

    counter_begin LOOP_COUNT, HIGH_PRIORITY_CLASS
      REPEAT REPEAT_COUNT
        movzx ebx,byte ptr [esp]
        and eax,0FFFFFF00h
        or ebx,eax
      ENDM
    counter_end
    print ustr$(eax)
    print chr$(" no partial register stall * 10",13,10)


Quote from: Pentium M 1400
5 add reg,imm cycles * 10
87 partial register stall expected * 10
11 no partial register stall * 10

Unfortunately, not even ANDing the EAX before MOV [ESP+4],EAX in the pushfszapc macro doesn't seem to improve the performance...
Title: Re: How do you save the Flags in your code?
Post by: MichaelW on May 30, 2005, 09:55:26 PM
You're right. I remembered reading this:
Quote
The PPro, P2 and P3 processors make a special case out of this combination to avoid a partial register stall when later reading from EAX. The trick is that a register is tagged as empty when it is XOR'ed with itself. The processor remembers that the upper 24 bits of EAX are zero, so that a partial stall can be avoided. This mechanism works only on certain combinations:
But I remembered only the base concept, and not the exceptions.

    counter_begin LOOP_COUNT, HIGH_PRIORITY_CLASS
      REPEAT REPEAT_COUNT
        xor   eax,eax
        mov   al,3
        mov   ebx,eax
      ENDM
    counter_end
    print ustr$(eax)
    print chr$(" Agner Fog's no partial register stall example cycles * 10",13,10)

    counter_begin LOOP_COUNT, HIGH_PRIORITY_CLASS
      REPEAT REPEAT_COUNT
        xor   eax,eax
        mov   ah,3
        mov   ebx,eax
      ENDM
    counter_end
    print ustr$(eax)
    print chr$(" Agner Fog's partial register stall example cycles * 10",13,10)

    counter_begin LOOP_COUNT, HIGH_PRIORITY_CLASS
      REPEAT REPEAT_COUNT
        xor   eax,eax
        lahf
        mov   ebx,eax
      ENDM
    counter_end
    print ustr$(eax)
    print chr$(" subsitiuting lahf for mov ah,3 in above cycles * 10",13,10)

    counter_begin LOOP_COUNT, HIGH_PRIORITY_CLASS
      REPEAT REPEAT_COUNT
        mov al,byte ptr [esp]
        mov ebx,eax        ; partial register stall
      ENDM
    counter_end
    print ustr$(eax)
    print chr$(" partial register stall expected * 10",13,10)

    counter_begin LOOP_COUNT, HIGH_PRIORITY_CLASS
      REPEAT REPEAT_COUNT
        movzx ebx,byte ptr [esp]
        and eax,0FFFFFF00h
        or ebx,eax
      ENDM
    counter_end
    print ustr$(eax)
    print chr$(" no partial register stall * 10",13,10)


10 Agner Fog's no partial register stall example cycles * 10
69 Agner Fog's partial register stall example cycles * 10
71 subsitiuting lahf for mov ah,3 in above cycles * 10
81 partial register stall expected * 10
10 no partial register stall * 10

Title: Re: How do you save the Flags in your code?
Post by: MichaelW on May 30, 2005, 10:39:54 PM
Experimenting with the code some more I could not find any way around the stall. But I did determine that a stall (apparently) occurs not only when the full register is read, after a partial write, but also when the full register is the destination in a logical instruction, after a partial write.

    counter_begin LOOP_COUNT, HIGH_PRIORITY_CLASS
      REPEAT REPEAT_COUNT
        xor   eax,eax
        mov   al,3
        mov   ebx,eax
      ENDM
    counter_end
    print ustr$(eax)
    print chr$(" Agner Fog's no partial register stall example cycles * 10",13,10)

    counter_begin LOOP_COUNT, HIGH_PRIORITY_CLASS
      REPEAT REPEAT_COUNT
        xor   eax,eax
        mov   ah,3
        mov   ebx,eax
      ENDM
    counter_end
    print ustr$(eax)
    print chr$(" Agner Fog's partial register stall example cycles * 10",13,10)

    counter_begin LOOP_COUNT, HIGH_PRIORITY_CLASS
      REPEAT REPEAT_COUNT
        xor   eax,eax
        mov   ah,3       
        or    eax,ebx
      ENDM
    counter_end
    print ustr$(eax)
    print chr$(" Above with or eax,ebx substituted for mov ebx,eax cycles * 10",13,10)

    counter_begin LOOP_COUNT, HIGH_PRIORITY_CLASS
      REPEAT REPEAT_COUNT
        xor   eax,eax
        mov   ah,3       
        mov   eax,ebx
      ENDM
    counter_end
    print ustr$(eax)
    print chr$(" Above with mov eax,ebx substituted for or eax,ebx cycles * 10",13,10)


10 Agner Fog's no partial register stall example cycles * 10
69 Agner Fog's partial register stall example cycles * 10
71 Above with or eax,ebx substituted for mov ebx,eax cycles * 10
10 Above with mov eax,ebx substituted for or eax,ebx cycles * 10

Title: Re: How do you save the Flags in your code?
Post by: MazeGen on May 31, 2005, 09:09:22 AM
Quote from: MichaelW on May 30, 2005, 10:39:54 PM
Experimenting with the code some more I could not find any way around the stall.

Yeah, it really seems we can't avoid the stall. Nevertheless, the macros are still faster than PUSHFD/POPFD.

Quote from: MichaelW on May 30, 2005, 10:39:54 PM
But I did determine that a stall (apparently) occurs not only when the full register is read, after a partial write, but also when the full register is the destination in a logical instruction, after a partial write.

The only idea which comes to my mind:
I think you haven't used correct tests. In the last example:


        xor   eax,eax
        mov   ah,3       
        mov   eax,ebx


you've rewritten whole EAX and the processor probably knows that there's no need to wait until MOV AH is finished. (But I've never read about such behaviour).

By contrast, you compare it with the following code:


        xor   eax,eax
        mov   ah,3       
        or    eax,ebx


here, the processor has to wait until MOV AH is finished, because OR EAX depends on MOV AH.

Try to modify the last example to


        xor   eax,eax
        mov   ah,3       
        mov   ebx,eax


and the timings will be much more similar:

Quote
5 add reg,imm cycles * 10
79 Above with or eax,ebx substituted for mov ebx,eax cycles * 10
74 Above with mov ebx,eax substituted for or eax,ebx cycles * 10
Title: Re: How do you save the Flags in your code?
Post by: MichaelW on May 31, 2005, 09:52:34 AM
Quote

xor eax,eax
mov ah,3
or eax,ebx

here, the processor has to wait until MOV AH is finished, because OR EAX depends on MOV AH.

Yes, just as a following MOX EBX,EAX would depend on the MOV AH,3.

From Agner Fog's Pentium Optimization manual:
Quote
Partial register stall is a problem that occurs in PPro, P2 and P3 when you write to part of a 32-bit register and later read from the whole register or a bigger part of it. Example:

MOV AL, BYTE PTR [M8]
MOV EBX, EAX ; partial register stall

This gives a delay of 5 - 6 clocks. The reason is that a temporary register has been assigned to AL (to make it independent of AH). The execution unit has to wait until the write to AL has retired before it is possible to combine the value from AL with the value of the rest of EAX.

I can't help but wonder why Intel chose to make the special case handling (that allows a preceding XOR EAX,EAX to prevent the stall) work only for AL, and not for AH.
Title: Re: How do you save the Flags in your code?
Post by: MazeGen on May 31, 2005, 12:53:10 PM
Quote from: MichaelW on May 31, 2005, 09:52:34 AM
...but wonder why Intel chose to make the special case handling (that allows a preceding XOR EAX,EAX to prevent the stall) work only for AL, and not for AH.

As I see it, it is very simple for the processor to remember that specific range of bits from msb (in case of EAX, according to Agner's manual, bits 31..8) are zero. Now, when only the range outside the zero area is read, it is sufficient to read only that part. Imagine some variable, which holds the number of the last significant zero bit (here 8).

If it should work also for AH, the processor would have to know additionaly whether AL range (7..0) is zero or not. It would be probably too complicated and slow, because the processor would have to contain one more such variable for the additional range and therefore it works only for AL.

I hope you can understand it in my funny English ;)
Title: Re: How do you save the Flags in your code?
Post by: MichaelW on May 31, 2005, 01:07:34 PM
QuoteI hope you can understand it in my funny English

No problem at all, your English is better than many of the native speakers I know :U