Print Page - m2m vs mrm

Title: m2m vs mrm
Post by: n00b! on June 27, 2008, 04:08:02 PM

Hello,
I want to know which macro is quicker.

Using EAX as a buffer or pushing and popping to/from the stack.

Thanks in advance.

Title: Re: m2m vs mrm
Post by: bozo on June 27, 2008, 04:13:46 PM

i believe a mov is always faster, but some will dispute that.
only way to know for sure is time your code.

Title: Re: m2m vs mrm
Post by: zooba on June 27, 2008, 11:01:47 PM

Quote from: Kernel_Gaddafi on June 27, 2008, 04:13:46 PM
i believe a mov is always faster

That seems intuitive, I'll agree. However, I seem to recall some testing that happened a while ago (here somewhere, try searching) that found m2m was actually faster.

Caught quite a few people by surprise :bg

Cheers,

Zooba :U

Title: Re: m2m vs mrm
Post by: hutch-- on June 27, 2008, 11:33:23 PM

Noob,

The usage depends on what you are doing, in the middle of a pile of messy API code, "m2m" is easily fast enough but the other macro that uses a register is usually faster so if the code you are writing is closer to the bare mnemonic end its probably a better choice.

Title: Re: m2m vs mrm
Post by: MichaelW on June 28, 2008, 08:13:06 AM

Running on my P3, I cannot find any circumstances where m2m is faster.

Code Select


; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    include \masm32\include\masm32rt.inc
    .686
    include \masm32\macros\timers.asm
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

    ; -------------------------------------------------------
    ; This is an assembly-time random number generator based
    ; on code by George Marsaglia:
    ;   #define znew  ((z=36969*(z&65535)+(z>>16))<<16)
    ;   #define wnew  ((w=18000*(w&65535)+(w>>16))&65535)
    ;   #define MWC   (znew+wnew)
    ; -------------------------------------------------------

    @znew_seed@ = 362436069
    @wnew_seed@ = 521288629

    @rnd MACRO base:REQ
      LOCAL znew, wnew

      @znew_seed@ = 36969 * (@znew_seed@ AND 65535) + (@znew_seed@ SHR 16)
      znew = @znew_seed@ SHL 16

      @wnew_seed@ = 18000 * (@wnew_seed@ AND 65535) + (@wnew_seed@ SHR 16)
      wnew = @wnew_seed@ AND 65535

      EXITM <(znew + wnew) MOD base>
    ENDM

; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    .data
      FOR mem,<m0,m1,m2,m3,m4,m5,m6,m7,m8,m9,ma,mb,mc,md,me,mf>
        mem dd 0
      ENDM
    .code
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
start:
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    mov esi, alloc(1000000*4)

    invoke Sleep, 3000

    counter_begin 1000, HIGH_PRIORITY_CLASS
      m2m m0, m1
      m2m m2, m3
      m2m m4, m5
      m2m m6, m7
      m2m m8, m9
      m2m ma, mb
      m2m mc, md
      m2m me, mf
    counter_end
    print ustr$(eax)," cycles, m2m sequential direct",13,10

    counter_begin 1000, HIGH_PRIORITY_CLASS
      mov eax, m1
      mov m0, eax
      mov eax, m3
      mov m2, eax
      mov eax, m5
      mov m4, eax
      mov eax, m7
      mov m6, eax
      mov eax, m9
      mov m8, eax
      mov eax, mb
      mov ma, eax
      mov eax, md
      mov mc, eax
      mov eax, mf
      mov me, eax
    counter_end
    print ustr$(eax)," cycles, mrm sequential direct",13,10,13,10

    counter_begin 1000, HIGH_PRIORITY_CLASS
      m2m [esi+0], [esi+4]
      m2m [esi+8], [esi+12]
      m2m [esi+16], [esi+20]
      m2m [esi+24], [esi+28]
      m2m [esi+32], [esi+36]
      m2m [esi+40], [esi+44]
      m2m [esi+48], [esi+52]
      m2m [esi+56], [esi+60]
    counter_end
    print ustr$(eax)," cycles, m2m sequential indirect",13,10

    counter_begin 1000, HIGH_PRIORITY_CLASS
      mov eax, [esi+4]
      mov [esi+0], eax
      mov eax, [esi+12]
      mov [esi+8], eax
      mov eax, [esi+20]
      mov [esi+16], eax
      mov eax, [esi+28]
      mov [esi+24], eax
      mov eax, [esi+36]
      mov [esi+32], eax
      mov eax, [esi+44]
      mov [esi+40], eax
      mov eax, [esi+52]
      mov [esi+48], eax
      mov eax, [esi+60]
      mov [esi+56], eax
    counter_end
    print ustr$(eax)," cycles, mrm sequential indirect",13,10,13,10

    counter_begin 1000, HIGH_PRIORITY_CLASS
      REPEAT 8
        m2m [esi+@rnd(250000)*4],[esi+@rnd(250000)*4]
      ENDM
    counter_end
    print ustr$(eax)," cycles, m2m random indirect",13,10

    counter_begin 1000, HIGH_PRIORITY_CLASS
      REPEAT 8
        mov eax, [esi+@rnd(250000)*4]
        mov [esi+@rnd(250000)*4], eax
      ENDM
    counter_end
    print ustr$(eax)," cycles, mrm random indirect",13,10,13,10

    inkey "Press any key to exit..."
    exit
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
end start

Code Select


28 cycles, m2m sequential direct
6 cycles, mrm sequential direct

28 cycles, m2m sequential indirect
10 cycles, mrm sequential indirect

55 cycles, m2m random indirect
36 cycles, mrm random indirect

Title: Re: m2m vs mrm
Post by: zooba on June 28, 2008, 08:40:39 AM

Quote from: MichaelW on June 28, 2008, 08:13:06 AM
Running on my P3, I cannot find any circumstances where m2m is faster.

Guess I was imagining things then :bg . Oh well. :U

Title: Re: m2m vs mrm
Post by: jj2007 on June 28, 2008, 09:25:21 AM

On a Celeron. Looks strange...

94 cycles, m2m sequential direct
1 cycles, mrm sequential direct

84 cycles, m2m sequential indirect
2 cycles, mrm sequential indirect

260 cycles, m2m random indirect
172 cycles, mrm random indirect

Title: Re: m2m vs mrm
Post by: n00b! on June 28, 2008, 09:37:55 PM

For what are the cycles and esi?
If they are lower the method is quicker?

PS: Thanks for your help.
PS2: I have no timers.asm :-(

Title: Re: m2m vs mrm
Post by: daydreamer on June 29, 2008, 07:15:20 AM

Quote from: hutch-- on June 27, 2008, 11:33:23 PM
Noob,

The usage depends on what you are doing, in the middle of a pile of messy API code, "m2m" is easily fast enough but the other macro that uses a register is usually faster so if the code you are writing is closer to the bare mnemonic end its probably a better choice.

in the case you have code you depend on need many as possible general regs, shouldnt it be time to make a mxm macro that can be used instead of mrm?
where mxm makes use of xmm0

Title: Re: m2m vs mrm
Post by: MichaelW on June 29, 2008, 09:11:52 AM

Quote from: jj2007 on June 28, 2008, 09:25:21 AM
On a Celeron. Looks strange...

If you are running on a P4 Celeron, or any other P4, then the instruction sequences are too short to get meaningful cycle counts. I considered this, but I had already spent more time on it than I had. For a quick, crude fix you could modify each test to something like this:

Code Select


counter_begin 1000, HIGH_PRIORITY_CLASS
  REPEAT 100
    m2m m0, m1
    m2m m2, m3
    m2m m4, m5
    m2m m6, m7
    m2m m8, m9
    m2m ma, mb
    m2m mc, md
    m2m me, mf
  ENDM  
counter_end

Title: Re: m2m vs mrm
Post by: NightWare on June 30, 2008, 01:48:39 AM

Quote from: zooba on June 28, 2008, 08:40:39 AM
Guess I was imagining things then :bg . Oh well. :U

not exactly, we've spoken of that here : http://www.masm32.com/board/index.php?topic=9110.0

Title: Re: m2m vs mrm
Post by: Mark Jones on July 07, 2008, 06:31:42 PM

For the bigger machines, here's Michael's code modified to expand each block 1000x. The last two blocks were changed to perform 8 tests like the others and the range of the random values was greatly increased (to around 0-3MB or so.) Included is an executable, a RadASM project file, and timers.asm from http://www.masm32.com/board/index.php?topic=770.0

Quote from: AMD X64 4000+
33371 cycles, m2m sequential direct
18964 cycles, mrm sequential direct

34587 cycles, m2m sequential indirect
8116 cycles, mrm sequential indirect

624616 cycles, m2m random indirect
392243 cycles, mrm random indirect

[attachment deleted by admin]

Title: Re: m2m vs mrm
Post by: jj2007 on July 07, 2008, 07:07:44 PM

Quote from: Mark Jones on July 07, 2008, 06:31:42 PM
For the bigger machines, here's Michael's code modified to expand each block 1000x.

Celeron:

99826 cycles, m2m sequential direct
19544 cycles, mrm sequential direct

98680 cycles, m2m sequential indirect
18225 cycles, mrm sequential indirect

2620671 cycles, m2m random indirect
2331846 cycles, mrm random indirect

Title: Re: m2m vs mrm
Post by: NightWare on July 08, 2008, 12:03:56 AM

on core2duo 2ghz :

Code Select

36581 cycles, m2m sequential direct
26111 cycles, mrm sequential direct

37038 cycles, m2m sequential indirect
15349 cycles, mrm sequential indirect

149714 cycles, m2m random indirect
93651 cycles, mrm random indirect

it's clear there is a speed up for push/pop on core2 (compared with p3/p4), but macros compared here don't do exactly the same thing, with m2m there is registers preservation, it's not the case with mrm (especially eax, the most used register to return values...). it's why m2m is generally used more often... when you have spent your time once with mrm, you remember later you must use m2m (unless you are sure you will not touch/add things to your algo later... hmm... is it something possible ?) :wink

Title: Re: m2m vs mrm
Post by: Biterider on July 09, 2008, 05:21:57 AM

Hi
This is my implementation of m2m.
I use it to freely play with the register that transfers the value or, if there is no reg available, to fall back to the push/pop version.

Code Select

m2m macro DstMem:req, SrcMem:req, AuxReg
    ifb <AuxReg>
      push SrcMem
      pop DstMem
    else
      mov AuxReg, SrcMem
      mov DstMem, AuxReg
    endif
endm

The advantage is that if you don't provide the 3rd parameter you are compatible with existing code using push/pop.

Regards,

Biterider

Title: Re: m2m vs mrm
Post by: dsouza123 on July 09, 2008, 11:49:33 PM

If your program doesn't use any x87 floating point instructions, you can use the MMX registers.

With slight modification to MichaelW's code (swapping mov with movd and eax with mm0)

Code Select


; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    include \masm32\include\masm32rt.inc
    .686
    .mmx
    include \masm32\macros\timers.asm
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

Code Select


    counter_begin 1000, HIGH_PRIORITY_CLASS
      movd mm0, m1
      movd m0, mm0
      movd mm0, m3
      movd m2, mm0
      movd mm0, m5
      movd m4, mm0
      movd mm0, m7
      movd m6, mm0
      movd mm0, m9
      movd m8, mm0
      movd mm0, mb
      movd ma, mm0
      movd mm0, md
      movd mc, mm0
      movd mm0, mf
      movd me, mm0
    counter_end

Code Select


    counter_begin 1000, HIGH_PRIORITY_CLASS
      movd mm0, [esi+4]
      movd [esi+0], mm0
      movd mm0, [esi+12]
      movd [esi+8], mm0
      movd mm0, [esi+20]
      movd [esi+16], mm0
      movd mm0, [esi+28]
      movd [esi+24], mm0
      movd mm0, [esi+36]
      movd [esi+32], mm0
      movd mm0, [esi+44]
      movd [esi+40], mm0
      movd mm0, [esi+52]
      movd [esi+48], mm0
      movd mm0, [esi+60]
      movd [esi+56], mm0
    counter_end

Code Select


      REPEAT 8
        movd mm0, [esi+@rnd(250000)*4]
        movd [esi+@rnd(250000)*4], mm0
      ENDM

If you do have x87 floating point use the emms instruction
to transition from mmx to x87 fp.

The MASM Forum Archive 2004 to 2012

General Forums => The Campus => Topic started by: n00b! on June 27, 2008, 04:08:02 PM