The MASM Forum Archive 2004 to 2012

General Forums => The Campus => Topic started by: Posit on May 14, 2005, 11:01:10 AM

Title: Alignment
Post by: Posit on May 14, 2005, 11:01:10 AM
Alright, after seeing a 42% speed increase in one procedure simply by aligning a single label on a 4-byte boundary, I'm sold on alignment. In Assembly Optimization Tips, Mark Larson says he aligns 2 byte data on 2 byte boundaries, 4 on 4, 8 on 8, etc., and he mentions aligning code on 4 byte boundaries. I'm hoping a couple of the gurus here can discuss their general approach to alignment, how they align what, and when it is overkill, before I go wild and start aligning everything in sight on 32-byte boundaries.
Title: Re: Alignment
Post by: James Ladd on May 15, 2005, 12:15:53 AM
when i make a proc i always preceed it with "align 4"
when I make a structure I try to make the count of bytes divisable by 4.
Title: Re: Alignment
Post by: hutch-- on May 15, 2005, 01:58:13 AM
Posit,

Its pretty simple stuff once you get the swing of it, modern hardware read 32 bit blocks no matter what the data size is and it reads it along 4 byte boundaries. Byte data is 1 byte aligned, word data is 2 byte aligned and dword data is 4 byte aligned.

This basically the notion of natural alignment based on the size of the data and it is important up into the larger data sizes like QWORD and OWORD as well.

With the simple data types this example may help to make sense of it.


0123-0123-0123-0123-0123-0123-0123-0123


Assuming that the beginning of this memory block is at least 4 byte aligned you read a DWORD as "0123". If you don't and end up crossing a 4 byte boundary with something like "3-012" the processor takes 2 reads to get the single 4 bytes so it slower.

BYTE data can be any single byte.

WORD data should be "01" or "23"

DWORD data should only be "0123".

Title: Re: Alignment
Post by: Posit on May 15, 2005, 07:08:59 AM
Thanks striker, I'll do that for procedures from now on.

The guidelines for aligning data seem simple enough. How about code though? Is there any reason to align the target of jumps on more than a 4 byte boundary, for instance?
Title: Re: Alignment
Post by: James Ladd on May 15, 2005, 07:56:30 AM
I go with the suggestions so far, like "align 4" for procs and making structures pad out to a 4 byte boundary.
Trying to work out if jumps should be aligned right now is probably overkill.
Unless of course the only thing left to do for your application right now is optimisation at this level.
Title: Re: Alignment
Post by: AeroASM on May 15, 2005, 02:28:00 PM
Quote from: hutch-- on May 15, 2005, 01:58:13 AM
WORD data should be "01" or "23"

Why not "23"?
Title: Re: Alignment
Post by: hutch-- on May 15, 2005, 03:07:12 PM
Try it.
Title: Re: Alignment
Post by: AeroASM on May 15, 2005, 03:12:53 PM
Whoops, I meant, why not "12"?
Title: Re: Alignment
Post by: Mark Jones on May 15, 2005, 04:20:28 PM
So does "align x" apply for only the following 1 token or the entire scope? i.e. is this second align redundant?


align 4
    lea eax, myvar
align 4
@@:
    mov dword ptr [foo],eax
    inc al
    jz @B


Same with data, do you need to align any different-sized elements, like this?


.data
align 4
    myvar  DB  0
align 8
    math   DQ  0
align 2
    count  DW  0


If that's the case, is there any way to make MASM do this automatically? :)
Title: Re: Alignment
Post by: Posit on May 15, 2005, 05:11:11 PM
Not specific to MASM, but the Intel documentation recommends arranging data from largest to smallest to help keep track of alignment, i.e. a QWORD aligned on 8 bytes followed by a DWORD, the DWORD will automatically be aligned on 4 bytes.
Title: Re: Alignment
Post by: AeroASM on May 15, 2005, 05:12:08 PM
I don't understand how you can align the "entire scope".

Align x just means make the next byte be on a boundary of x. If you understand org then align x means org (the next mem location divisible by x). Thus anything you want to be on a x byte boundary must be preceded by an align.
Title: Re: Alignment
Post by: hutch-- on May 16, 2005, 12:19:59 AM
Posit is right here, if you have to stack variables without wasting space, start with the bigger ones first and the following in descending order will also be aligned.
Title: Re: Alignment
Post by: MichaelW on May 16, 2005, 02:10:41 AM
I'm not sure that this is all valid and correct.

; ««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    .586                       ; create 32 bit code
    .model flat, stdcall       ; 32 bit memory model
    option casemap :none       ; case sensitive
    .MMX

    include \masm32\include\windows.inc
    include \masm32\include\masm32.inc
    include \masm32\include\user32.inc
    include \masm32\include\kernel32.inc
    includelib \masm32\lib\masm32.lib
    includelib \masm32\lib\kernel32.lib
    includelib \masm32\lib\user32.lib
    include \masm32\macros\macros.asm
    include timers.asm

    _EMMS equ 1
; ««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    .data
        aligned_mmx64     dd 0,0
        aligned_real8     REAL8 0.0
        aligned_mmx32     dd 0
        aligned_real4     REAL4 0.0
        aligned_dword     dd 0
        aligned_word      dw 0
        db 0              ; misalign by 1 byte
        misaligned_mmx64  dd 0,0
        misaligned_real8  REAL8 0.0
        misaligned_mmx32  dd 0
        misaligned_real4  REAL4 0.0
        misaligned_dword  dd 0
        dw 0              ; misalign across dword boundary
        misaligned_word   dw 0
    .code
; ««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
start:
; ««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    LOOP_COUNT    EQU 10000000
    REPEAT_COUNT  EQU 100

    counter_begin LOOP_COUNT,HIGH_PRIORITY_CLASS
      mov   eax,OFFSET aligned_mmx64
      REPEAT REPEAT_COUNT
        movq  mm0,[eax]
        movq  [eax],mm0
      ENDM
    counter_end
    mov   ebx,eax
    print chr$("aligned_mmx64    : ")
    print ustr$(ebx)
    print chr$(" cycles",13,10)

    counter_begin LOOP_COUNT,HIGH_PRIORITY_CLASS
      REPEAT REPEAT_COUNT
        fld   aligned_real8
        fstp  aligned_real8
      ENDM
    counter_end
    mov   ebx,eax
    print chr$("aligned_real8    : ")
    print ustr$(ebx)
    print chr$(" cycles",13,10)

    counter_begin LOOP_COUNT,HIGH_PRIORITY_CLASS
      mov   eax,OFFSET aligned_mmx32
      REPEAT REPEAT_COUNT
        movd  mm0,[eax]
        movd  [eax],mm0
      ENDM
    counter_end
    mov   ebx,eax
    print chr$("aligned_mmx32    : ")
    print ustr$(ebx)
    print chr$(" cycles",13,10)

    counter_begin LOOP_COUNT,HIGH_PRIORITY_CLASS
      REPEAT REPEAT_COUNT
        fld   aligned_real4
        fstp  aligned_real4
      ENDM
    counter_end
    mov   ebx,eax
    print chr$("aligned_real4    : ")
    print ustr$(ebx)
    print chr$(" cycles",13,10)

    counter_begin LOOP_COUNT,HIGH_PRIORITY_CLASS
      REPEAT REPEAT_COUNT
        mov   eax,aligned_dword
        mov   aligned_dword,eax
      ENDM
    counter_end
    mov   ebx,eax
    print chr$("aligned_dword    : ")
    print ustr$(ebx)
    print chr$(" cycles",13,10)

    counter_begin LOOP_COUNT,HIGH_PRIORITY_CLASS
      REPEAT REPEAT_COUNT
        mov   ax,aligned_word
        mov   aligned_word,ax
      ENDM
    counter_end
    mov   ebx,eax
    print chr$("aligned_word     : ")
    print ustr$(ebx)
    print chr$(" cycles",13,10)

    counter_begin LOOP_COUNT,HIGH_PRIORITY_CLASS
      mov   eax,OFFSET misaligned_mmx64
      REPEAT REPEAT_COUNT
        movq  mm0,[eax]
        movq  [eax],mm0
      ENDM
    counter_end
    mov   ebx,eax
    print chr$("misaligned_mmx64 : ")
    print ustr$(ebx)
    print chr$(" cycles",13,10)

    counter_begin LOOP_COUNT,HIGH_PRIORITY_CLASS
      REPEAT REPEAT_COUNT
        fld   misaligned_real8
        fstp  misaligned_real8
      ENDM
    counter_end
    mov   ebx,eax
    print chr$("misaligned_real8 : ")
    print ustr$(ebx)
    print chr$(" cycles",13,10)

    counter_begin LOOP_COUNT,HIGH_PRIORITY_CLASS
      mov   eax,OFFSET misaligned_mmx32
      REPEAT REPEAT_COUNT
        movd  mm0,[eax]
        movd  [eax],mm0
      ENDM
    counter_end
    mov   ebx,eax
    print chr$("misaligned_mmx32 : ")
    print ustr$(ebx)
    print chr$(" cycles",13,10)

    counter_begin LOOP_COUNT,HIGH_PRIORITY_CLASS
      REPEAT REPEAT_COUNT
        fld   misaligned_real4
        fstp  misaligned_real4
      ENDM
    counter_end
    mov   ebx,eax
    print chr$("misaligned_real4 : ")
    print ustr$(ebx)
    print chr$(" cycles",13,10)

    counter_begin LOOP_COUNT,HIGH_PRIORITY_CLASS
      REPEAT REPEAT_COUNT
        mov   eax,misaligned_dword
        mov   misaligned_dword,eax
      ENDM
    counter_end
    mov   ebx,eax
    print chr$("misaligned_dword : ")
    print ustr$(ebx)
    print chr$(" cycles",13,10)

    counter_begin LOOP_COUNT,HIGH_PRIORITY_CLASS
      REPEAT REPEAT_COUNT
        mov   ax,misaligned_word
        mov   misaligned_word,ax
      ENDM
    counter_end
    mov   ebx,eax
    print chr$("misaligned_word  : ")
    print ustr$(ebx)
    print chr$(" cycles",13,10)

    mov   eax, input(13,10,"Press enter to exit...")
    exit
; ««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
end start

Results on my P3:

aligned_mmx64    : 498 cycles
aligned_real8    : 499 cycles
aligned_mmx32    : 497 cycles
aligned_real4    : 365 cycles
aligned_dword    : 498 cycles
aligned_word     : 358 cycles
misaligned_mmx64 : 1709 cycles
misaligned_real8 : 1001 cycles
misaligned_mmx32 : 1001 cycles
misaligned_real4 : 315 cycles
misaligned_dword : 1001 cycles
misaligned_word  : 312 cycles



[attachment deleted by admin]
Title: Re: Alignment
Post by: Mark Jones on May 16, 2005, 01:37:48 PM
Another interesting test! Here's an AMD XP 1800+


aligned_mmx64    : 719 cycles
aligned_real8    : 763 cycles
aligned_mmx32    : 720 cycles
aligned_real4    : 754 cycles
aligned_dword    : 447 cycles
aligned_word     : 586 cycles
misaligned_mmx64 : 2191 cycles
misaligned_real8 : 2210 cycles
misaligned_mmx32 : 2174 cycles
misaligned_real4 : 753 cycles
misaligned_dword : 1545 cycles
misaligned_word  : 585 cycles
Title: Re: Alignment
Post by: thomasantony on May 16, 2005, 03:20:13 PM
Hi,
   It seems misalignment is better for WORD and REAL4 data :bdg :bdg :bdg

Thomas :bdg :bdg
Title: Re: Alignment
Post by: Mark_Larson on May 16, 2005, 09:43:15 PM

  The bigger the data size the worse the penalty for mis-aligned data.  I did some code to show the alignment problems with an MMX version of a string copy routine.

http://www.masmforum.com/simple/index.php?topic=1589.45  - search for "alignment".  The orignal code ran in 87 cycles accessing 8 bytes at a time ( MMX registers).  The misaligned code caused it to run in about 250-280 cycles depending on how unaligned it was.  Other than the misaligned data, there are no other changes to the code.
Title: Re: Alignment
Post by: Randall Hyde on May 17, 2005, 11:52:41 PM
Quote from: hutch-- on May 15, 2005, 01:58:13 AM

BYTE data can be any single byte.

WORD data should be "01" or "23"

DWORD data should only be "0123".



Intel's documentation (for the PIV) claims that word access of the form "12" are also fine. Also, most of the time you only get a big hit if the data object crosses a cache line.
Cheers,
Randy Hyde
Title: Re: Alignment
Post by: James Ladd on May 20, 2005, 01:03:05 AM
ok, so now I know how to align my data and procedures.
But am I right in thinking I can align the code within a procedure using an align statement as well ?
If I use this align keyword, with masm put NOPs in the code to make it pad out ?
Title: Re: Alignment
Post by: MichaelW on May 20, 2005, 01:12:33 AM
QuoteIf I use this align keyword, with masm put NOPs in the code to make it pad out ?

We beat that subject to death here:

http://www.masmforum.com/simple/index.php?topic=1622.0
Title: Re: Alignment
Post by: James Ladd on May 20, 2005, 07:22:33 AM
Michael, Thanks for beating it one more time :)