Print Page - Initializing Memory with ones

Title: Initializing Memory with ones
Post by: Empirelord on November 20, 2010, 11:42:05 PM

What is the fastest way to fill a couple of memory, exactly we are talking about 1,6 GByte, completly with FF in hexadecimal.
My best idea was moving it all into the mem via mov, but knowing little about ram architektur I thougt there must be a faster way.

Title: Re: Initializing Memory with ones
Post by: dedndave on November 21, 2010, 02:19:14 AM

fill it with 0's, then invert :lol
or, you could try this code

Code Select

        mov     edi,offset MemBuff
        mov     ecx,(sizeof MemBuff)/4
        mov     eax,0FFFFFFFFh
        rep     stosd

it will be reasonably fast, as long as the base address of MemBuff is 4-aligned

notice that the buffer size should be divisable by 4, as well
if it isn't, add a few pad bytes to the end so that it is

Title: Re: Initializing Memory with ones
Post by: hutch-- on November 21, 2010, 02:49:12 AM

On memory of that size if it has to be done regularly you would be better to use SSE 128 bit fills. Think of instructions like MOVNTDQA if the memory is aligned correctly.

Title: Re: Initializing Memory with ones
Post by: clive on November 21, 2010, 03:09:20 AM

Code Select

        .686

        .MODEL FLAT,C

        .MMX
        .XMM

        .CODE

FastFill PROC   DataSize:DWORD, Buffer:PTR BYTE

        push    esi
        mov     esi, Buffer
        mov     ecx, DataSize
        shr     ecx, 6

        movups     xmm0,AllFF

@@:
        movups     [esi +  0], xmm0
        movups     [esi + 16], xmm0
        movups     [esi + 32], xmm0
        movups     [esi + 48], xmm0
        add     esi, 64
        add     ecx, -1
        jnz     @B

        pop     esi
        ret

FastFill        ENDP

        .DATA

AllFF   dd      -1,-1,-1,-1

        END

Title: Re: Initializing Memory with ones
Post by: dedndave on November 21, 2010, 03:20:51 AM

the AllFF define should be 16 aligned ?

Title: Re: Initializing Memory with ones
Post by: Gunther on November 21, 2010, 04:16:14 AM

Quote from: dedndave, November 21, 2010, at 03:20:51 AMthe AllFF define should be 16 aligned ?

No, not in that case, because Clive is using MOVUPS (move unaligned packed single).

Gunther

Title: Re: Initializing Memory with ones
Post by: jj2007 on November 21, 2010, 08:44:51 AM

Quote from: hutch-- on November 21, 2010, 02:49:12 AM
On memory of that size if it has to be done regularly you would be better to use SSE 128 bit fills. Think of instructions like MOVNTDQA if the memory is aligned correctly.

Hutch has the fastest solution. Align the memory first (but most probably it is already aligned), then use MOVNTDQA. You can unroll it a little bit to save some cycles.

The point about MOVNTDQA is that it does not write to the data cache.

Title: Re: Initializing Memory with ones
Post by: sinsi on November 21, 2010, 09:31:37 AM

Isn't MOVNTDQA sse4?

I thought for more than 256 meg 'rep stosd' was pretty speedy.

Title: Re: Initializing Memory with ones
Post by: jj2007 on November 21, 2010, 10:13:56 AM

Quote from: sinsi on November 21, 2010, 09:31:37 AM
Isn't MOVNTDQA sse4?

I thought for more than 256 meg 'rep stosd' was pretty speedy.

Yes, correct - it's SSE4. But there is an 'ordinary' variant, movntdq. Note that in standard timing benchmarks it looks pretty bad because it writes without caching; you would have to change the testbed for Gigabyte size to see the difference:

Code Select

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
1405    cycles for 100*movdqa
????   cycles for 100*movntdq

EDIT: There is something weird here. See attachment, third loop.

Code Select

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
1554    cycles for 100*movdqa
1924    cycles for 100*movntdq
1571    cycles for 100*MOVNTPD

Code Select

1549    cycles for 100*movdqa
24888   cycles for 100*movntdq ; without 'speedup'

More detail on performance here (http://coding.derkeiler.com/Archive/Assembler/comp.lang.asm.x86/2006-12/msg00071.html).

Title: Re: Initializing Memory with ones
Post by: Antariy on November 21, 2010, 10:40:03 AM

Quote from: jj2007 on November 21, 2010, 10:13:56 AM
Yes, correct - it's SSE4. But there is an 'ordinary' variant, movntdq. Note that in standard timing benchmarks it looks pretty bad because it writes without caching; you would have to change the testbed for Gigabyte size to see the difference:

Go to: "http://www.masm32.com/board/index.php?topic=14685.msg119904#msg119904" and follow thread at all.

For buffer which bigger than L2 cache in some times - MOVNTDQ would be best choice.

Alex

Title: Re: Initializing Memory with ones
Post by: Empirelord on November 21, 2010, 07:57:38 PM

Thanks for all the replys, great forum.
I'm going to figure out what is the fastest solution in my case, and which brings up enough compatibility(not every pc has SSE4).

@hutch-- :
I was not quite sure where to post my question, so thanks for moving it to the right subforum.

Title: Re: Initializing Memory with ones
Post by: dedndave on November 21, 2010, 09:06:13 PM

i still say my original idea sounds best :P

Quotefill it with 0's, then invert

Title: Re: Initializing Memory with ones
Post by: Antariy on November 21, 2010, 09:30:55 PM

Quote from: dedndave on November 21, 2010, 09:06:13 PM
i still say my original idea sounds best :P

Quotefill it with 0's, then invert...

... and all with non-temporal writes for big buffers :P

The MASM Forum Archive 2004 to 2012

General Forums => The Workshop => Topic started by: Empirelord on November 20, 2010, 11:42:05 PM