News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Memory Alignment

Started by WYVERN666, June 17, 2011, 04:27:25 PM

Previous topic - Next topic

WYVERN666

Hi all the forum.  :bg

I have some doubts about memory alignment.

What are the adventages of using "ALIGN 16"?, what power of 2 is the best to use? (i have seen lot of times "16" but dont know exactly why, i gues it depends on the architecture). Also whats the default alignament?

thanks

hutch--

Depends where you want to use an ALIGN directive. 16 byte alignment for procedures is common and you often try out aligning a label by at least 4 but DATA is still the most critical in terms of alignment. As a very rough rule, any give data size needs to be aligned to that data size, 4 byte data needs to be 4 byte aligned etc ....

Just be careful that some aligned labels in the middle of a proc actually slow the algo down, be ready to time speed critical algorithms with different alignments.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

WYVERN666

i dont know how to benchmark in asm, can you explain me?

brethren

i, like most people here, use the code timing macros that are stickied in the laboratory
http://www.masm32.com/board/index.php?topic=770.0

dedndave

place Michael's timers.asm in the masm32\macros folder
then, here's an example showing how to use Michael's timer macros and Jochen's CPU ID routine...

WYVERN666

thank you for the answer, the "the loop count parameter" must be 10000000 ?, what happen if you use just 1 ?, iam getting similar results with both.

dedndave

well - we usually try to adjust the loop count so that each pass of the test takes about 500 mS (or a little more)
that normally yields pretty repeatable results
if you run fewer loops, the numbers tend to jump around a little and make it hard to "read"
play with it a little bit - you'll see what i mean

note...
it depends on the processor you have
if you have one of the newer cores, you may not see the numbers jump so much
but, then, if you want other forum members to run the test, it will jump around
for example, i run a pentium IV prescott under XP media center edition
it is sometimes difficult to stabilize the readings
500 mS usually does quite well

here are my results for the test program attached above...
Pentium 4 Prescott (2005+), MMX, SSE3
6146 6157 6168 6157 6160
7141 7177 7399 7163 7178

MichaelW

The cycle counts on my P3 normally repeat very well.

P3 (2000+), MMX, SSE1
6038 6040 6036 6037 6036
4027 4027 4027 4027 4027

eschew obfuscation

FORTRANS

Hi,

   My P-III also is fairly steady.  But the counts are a little
different from yours.

Steve N.

P3 (2000+), MMX, SSE1
6067 6072 6068 6069 6067
4050 4048 4050 4052 4051

mineiro

Core Duo (2006+), MMX, SSE3
6033 6032 6032 6036 6031
6018 6019 6019 6018 6019


Core Duo (2006+), MMX, SSE3
Zero Register Test
76 ms to 1000 mov eax,0
76 ms to 1000 xor eax,eax
76 ms to 1000 sub eax,eax
30829 ms to 10000 mov eax,0
7718 ms to 10000 xor eax,eax
7706 ms to 10000 sub eax,eax

MichaelW

This shows the large penalty for accessing a DWORD at an alignment that is not a multiple of 4 bytes.

;==============================================================================
    include \masm32\include\masm32rt.inc
    .686
    include \masm32\macros\timers.asm
;==============================================================================

printf MACRO format:REQ, args:VARARG
    IFNB <args>
        invoke crt_printf, cfm$(format), args
    ELSE
        invoke crt_printf, cfm$(format)
    ENDIF
    EXITM <>
ENDM

;==============================================================================

;----------------------------------------
; Returns the maximum alignment of _ptr.
;----------------------------------------

alignment MACRO _ptr
    push ecx
    xor eax, eax
    mov ecx, _ptr
    bsf ecx, ecx
    jz @F
    mov eax, 1
    shl eax, cl
  @@:
    pop ecx
    EXITM <eax>
ENDM

;==============================================================================

    .data
    .code
;==============================================================================
start:
;==============================================================================

    mov esi, alloc(500)
    printf( "buffer alignment = %d\n\n", alignment(esi) )

    invoke Sleep, 3000

    REPEAT 3

        counter_begin 1000, HIGH_PRIORITY_CLASS
            mov ecx, 50
            xor ebx, ebx
          @@:
            inc ebx
            dec ecx
            jnz @B
        counter_end
        printf( "%d cycles, loop only\n", eax )

        counter_begin 1000, HIGH_PRIORITY_CLASS
            mov ecx, 50
            xor ebx, ebx
          @@:
            mov eax, [esi+ebx*4]
            mov edx, [esi+ebx*4+4]
            mov eax, [esi+ebx*4+8]
            mov edx, [esi+ebx*4+12]
            inc ebx
            dec ecx
            jnz @B
        counter_end
        printf( "%d cycles, aligned access\n", eax )

        counter_begin 1000, HIGH_PRIORITY_CLASS
            mov ecx, 50
            xor ebx, ebx
          @@:
            mov eax, [esi+ebx*4+1]
            mov edx, [esi+ebx*4+4+1]
            mov eax, [esi+ebx*4+8+1]
            mov edx, [esi+ebx*4+12+1]
            inc ebx
            dec ecx
            jnz @B
        counter_end
        printf( "%d cycles, misaligned access\n\n", eax )

    ENDM

    free esi

    inkey "Press any key to exit..."
    exit
;==============================================================================
end start


Running on a P3:

buffer alignment = 16

106 cycles, loop only
208 cycles, aligned access
397 cycles, misaligned access

106 cycles, loop only
207 cycles, aligned access
397 cycles, misaligned access

106 cycles, loop only
207 cycles, aligned access
397 cycles, misaligned access

eschew obfuscation

hutch--

Here is the same test on my Core2 Quad.


buffer alignment = 16

94 cycles, loop only
196 cycles, aligned access
286 cycles, misaligned access

95 cycles, loop only
195 cycles, aligned access
286 cycles, misaligned access

94 cycles, loop only
195 cycles, aligned access
286 cycles, misaligned access

Press any key to exit...
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php