The MASM Forum Archive 2004 to 2012

General Forums => The Campus => Topic started by: WYVERN666 on June 17, 2011, 04:27:25 PM

Title: Memory Alignment
Post by: WYVERN666 on June 17, 2011, 04:27:25 PM
Hi all the forum.  :bg

I have some doubts about memory alignment.

What are the adventages of using "ALIGN 16"?, what power of 2 is the best to use? (i have seen lot of times "16" but dont know exactly why, i gues it depends on the architecture). Also whats the default alignament?

thanks
Title: Re: Memory Alignment
Post by: hutch-- on June 17, 2011, 04:53:03 PM
Depends where you want to use an ALIGN directive. 16 byte alignment for procedures is common and you often try out aligning a label by at least 4 but DATA is still the most critical in terms of alignment. As a very rough rule, any give data size needs to be aligned to that data size, 4 byte data needs to be 4 byte aligned etc ....

Just be careful that some aligned labels in the middle of a proc actually slow the algo down, be ready to time speed critical algorithms with different alignments.
Title: Re: Memory Alignment
Post by: WYVERN666 on June 17, 2011, 05:00:39 PM
i dont know how to benchmark in asm, can you explain me?
Title: Re: Memory Alignment
Post by: brethren on June 17, 2011, 06:50:07 PM
i, like most people here, use the code timing macros that are stickied in the laboratory
http://www.masm32.com/board/index.php?topic=770.0
Title: Re: Memory Alignment
Post by: dedndave on June 17, 2011, 07:17:10 PM
place Michael's timers.asm in the masm32\macros folder
then, here's an example showing how to use Michael's timer macros and Jochen's CPU ID routine...
Title: Re: Memory Alignment
Post by: WYVERN666 on June 17, 2011, 08:58:38 PM
thank you for the answer, the "the loop count parameter" must be 10000000 ?, what happen if you use just 1 ?, iam getting similar results with both.
Title: Re: Memory Alignment
Post by: dedndave on June 17, 2011, 10:21:27 PM
well - we usually try to adjust the loop count so that each pass of the test takes about 500 mS (or a little more)
that normally yields pretty repeatable results
if you run fewer loops, the numbers tend to jump around a little and make it hard to "read"
play with it a little bit - you'll see what i mean

note...
it depends on the processor you have
if you have one of the newer cores, you may not see the numbers jump so much
but, then, if you want other forum members to run the test, it will jump around
for example, i run a pentium IV prescott under XP media center edition
it is sometimes difficult to stabilize the readings
500 mS usually does quite well

here are my results for the test program attached above...
Pentium 4 Prescott (2005+), MMX, SSE3
6146 6157 6168 6157 6160
7141 7177 7399 7163 7178
Title: Re: Memory Alignment
Post by: MichaelW on June 18, 2011, 05:24:34 AM
The cycle counts on my P3 normally repeat very well.

P3 (2000+), MMX, SSE1
6038 6040 6036 6037 6036
4027 4027 4027 4027 4027

Title: Re: Memory Alignment
Post by: FORTRANS on June 18, 2011, 12:06:42 PM
Hi,

   My P-III also is fairly steady.  But the counts are a little
different from yours.

Steve N.

P3 (2000+), MMX, SSE1
6067 6072 6068 6069 6067
4050 4048 4050 4052 4051
Title: Re: Memory Alignment
Post by: mineiro on June 18, 2011, 01:30:25 PM
Core Duo (2006+), MMX, SSE3
6033 6032 6032 6036 6031
6018 6019 6019 6018 6019


Core Duo (2006+), MMX, SSE3
Zero Register Test
76 ms to 1000 mov eax,0
76 ms to 1000 xor eax,eax
76 ms to 1000 sub eax,eax
30829 ms to 10000 mov eax,0
7718 ms to 10000 xor eax,eax
7706 ms to 10000 sub eax,eax
Title: Re: Memory Alignment
Post by: MichaelW on June 18, 2011, 09:16:31 PM
This shows the large penalty for accessing a DWORD at an alignment that is not a multiple of 4 bytes.

;==============================================================================
    include \masm32\include\masm32rt.inc
    .686
    include \masm32\macros\timers.asm
;==============================================================================

printf MACRO format:REQ, args:VARARG
    IFNB <args>
        invoke crt_printf, cfm$(format), args
    ELSE
        invoke crt_printf, cfm$(format)
    ENDIF
    EXITM <>
ENDM

;==============================================================================

;----------------------------------------
; Returns the maximum alignment of _ptr.
;----------------------------------------

alignment MACRO _ptr
    push ecx
    xor eax, eax
    mov ecx, _ptr
    bsf ecx, ecx
    jz @F
    mov eax, 1
    shl eax, cl
  @@:
    pop ecx
    EXITM <eax>
ENDM

;==============================================================================

    .data
    .code
;==============================================================================
start:
;==============================================================================

    mov esi, alloc(500)
    printf( "buffer alignment = %d\n\n", alignment(esi) )

    invoke Sleep, 3000

    REPEAT 3

        counter_begin 1000, HIGH_PRIORITY_CLASS
            mov ecx, 50
            xor ebx, ebx
          @@:
            inc ebx
            dec ecx
            jnz @B
        counter_end
        printf( "%d cycles, loop only\n", eax )

        counter_begin 1000, HIGH_PRIORITY_CLASS
            mov ecx, 50
            xor ebx, ebx
          @@:
            mov eax, [esi+ebx*4]
            mov edx, [esi+ebx*4+4]
            mov eax, [esi+ebx*4+8]
            mov edx, [esi+ebx*4+12]
            inc ebx
            dec ecx
            jnz @B
        counter_end
        printf( "%d cycles, aligned access\n", eax )

        counter_begin 1000, HIGH_PRIORITY_CLASS
            mov ecx, 50
            xor ebx, ebx
          @@:
            mov eax, [esi+ebx*4+1]
            mov edx, [esi+ebx*4+4+1]
            mov eax, [esi+ebx*4+8+1]
            mov edx, [esi+ebx*4+12+1]
            inc ebx
            dec ecx
            jnz @B
        counter_end
        printf( "%d cycles, misaligned access\n\n", eax )

    ENDM

    free esi

    inkey "Press any key to exit..."
    exit
;==============================================================================
end start


Running on a P3:

buffer alignment = 16

106 cycles, loop only
208 cycles, aligned access
397 cycles, misaligned access

106 cycles, loop only
207 cycles, aligned access
397 cycles, misaligned access

106 cycles, loop only
207 cycles, aligned access
397 cycles, misaligned access

Title: Re: Memory Alignment
Post by: hutch-- on June 19, 2011, 01:58:56 AM
Here is the same test on my Core2 Quad.


buffer alignment = 16

94 cycles, loop only
196 cycles, aligned access
286 cycles, misaligned access

95 cycles, loop only
195 cycles, aligned access
286 cycles, misaligned access

94 cycles, loop only
195 cycles, aligned access
286 cycles, misaligned access

Press any key to exit...