News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

The fastest way to clear a buffer

Started by frktons, August 24, 2010, 08:47:34 PM

Previous topic - Next topic

frktons

Quote from: hutch-- on August 25, 2010, 08:52:18 AM
Frank,

have a play with REP STOSD, apart from SSE you will struggle to do much better.

Certainly I'll do play a little with it, and with some SSE as well afterwhile. My machine is able to
do so many things I don't even suspect  :P
Mind is like a parachute. You know what to do in order to use it :-)

Rockoon

AMD Phenom(tm) II X6 1055T Processor (SSE3)
557     cycles for RtlZeroMemory
2012    cycles for FrkTons
549     cycles for rep stosd
1509    cycles for movdqa
1509    cycles for movaps

556     cycles for RtlZeroMemory
3014    cycles for FrkTons
549     cycles for rep stosd
1016    cycles for movdqa
1015    cycles for movaps
When C++ compilers can be coerced to emit rcl and rcr, I *might* consider using one.

jj2007

Really surprising, Rockoon. There seem to be huge differences in the way rep stosd is implemented.

Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
2515    cycles for RtlZeroMemory
4300    cycles for FrkTons
2486    cycles for rep stosd
2491    cycles for movdqa
2387    cycles for movaps

hutch--


Intel(R) Core(TM)2 Quad CPU    Q9650  @ 3.00GHz (SSE4)
1055    cycles for RtlZeroMemory
2018    cycles for FrkTons
1047    cycles for rep stosd
531     cycles for movdqa
531     cycles for movaps

1055    cycles for RtlZeroMemory
2026    cycles for FrkTons
1048    cycles for rep stosd
521     cycles for movdqa
519     cycles for movaps


--- ok ---
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

dedndave

REP STOSD is simple enough
i have to ask, though
why do you want to clear the char buffer ? - lol
won't it be filled in by the next read/fill operation ?

frktons

Quote from: dedndave on August 25, 2010, 03:36:27 PM
REP STOSD is simple enough
i have to ask, though
why do you want to clear the char buffer ? - lol
won't it be filled in by the next read/fill operation ?

Yes sir, it'll be filled with the next operation, but not my curiosity  :P

And while we are here, I tried to use some SSE mnemonics to do something
different, because the ability to use 16 bytes register allures me a lot, but those
nasty little endians make me crazy:


;----------------------------------------------------------------------
; Fast way for reversing a 16 bytes string with SSE instructions.
;----------------------------------------------------------------------
; Author: frktons @ MASM32 forum
; Date: 25/aug/2010.
;----------------------------------------------------------------------


include \masm32\include\masm32rt.inc
.686
.xmm

;----------------------------------------------------------------------

.data
align 16

    str1       db   "0123456789ABCDEF",0  ; original string
    ptr_str1   dd   str1                  ; pointer to the string

align 16

    str2       db   "                ",0  ; reversed string
    ptr_str2   dd   str2                  ; pointer to reversed string

    imm8       db   27 ; bit pattern 00011011 used by pshufd to reverse
                       ; the order of the 4 DW of an xmm register


;----------------------------------------------------------------------


.data?

    rHnd      HANDLE ?

    howmany   dd ?
    buffer    INPUT_RECORD <>   
   

.code

start:

Main PROC

    INVOKE GetStdHandle, STD_INPUT_HANDLE
    mov rHnd,eax

    print "original string: "
    print ptr_str1,13,10,13,10

    CALL  rev_sse2             
   
    print "reversed string: "
    print ptr_str2,13,10,13,10
   
    CALL AnyKey

finish: INVOKE ExitProcess,0

    ret

Main ENDP

; -------------------------------------------------------------------------   

rev_sse2 PROC

    mov eax, ptr_str1
    mov ebx, ptr_str2
   
    movdqa   xmm0, [eax]
    pshufd   xmm1, xmm0, 27
    movdqa   [ebx], xmm1

    ret

rev_sse2 ENDP

; -------------------------------------------------------------------------
;Returns: key code in buffer.KeyEvent.wVirtualKeyCode WORD size
; -------------------------------------------------------------------------

AnyKey PROC

again:

    INVOKE ReadConsoleInput,rHnd,offset buffer,1,offset howmany
    cmp buffer.EventType,KEY_EVENT
    jnz again

    cmp buffer.KeyEvent.bKeyDown,0
    jz again

    ret

AnyKey ENDP

; -------------------------------------------------------------------------

end start


gives me not what I want, the reversed string, but something a bit
different:


original string: 0123456789ABCDEF

reversed string: CDEF89AB45670123



aren't those little endians nasty enough?
Or is my n00b-iness that is big [endian] enough?  :lol
Mind is like a parachute. You know what to do in order to use it :-)

dedndave

i don't think SSE give you a way to reverse bytes in a dword
the BSWAP instruction does that, though
;EAX = 12345678h
bswap eax
;EAX = 78563412h

if you want to swap nybbles, that's another story
a while back, we were playing with reversing all the bits in a dword register
there was a rather ineresting algo for that

frktons

Quote from: dedndave on August 25, 2010, 03:46:31 PM
i don't think SSE give you a way to reverse bytes in a dword
the BSWAP instruction does that, though

Yes Master, I remember the old lesson about bswap that you and
Jochen gave me some time ago. I was just experimenting this opportunity
of SSE mnemonics. Maybe there is even a way to reverse the all with SSE
but I actually don't know  ::)
Mind is like a parachute. You know what to do in order to use it :-)

jj2007

You can reverse 16 bytes with a single instruction called pshufb, but it's SSE4.

frktons

Quote from: jj2007 on August 25, 2010, 05:28:18 PM
You can reverse 16 bytes with a single instruction called pshufb, but it's SSE4.

Thanks Jochen. I'll wait until the next CPU then.  :P
Mind is like a parachute. You know what to do in order to use it :-)

frktons

I started a new thread on 64 bit section because
rep stosd was considered the fastest way to inizialize a block of
memory in 32 bit assembly.
Now SSE instructions beat it on INTEL machine at least.
It's my opinion that in 64 bit machines, working with 64 bit native operations,
we could get better results than SSE mnemonics just using general 64 bit registers.

To prove it I need the rep stosd version translated into 64 bit assembly
and tested.

Anyone wants to engage?
Mind is like a parachute. You know what to do in order to use it :-)

jj2007

Quote from: frktons on August 25, 2010, 08:15:47 PM
Anyone wants to engage?

What about you? I'll give you a starting point:
    mov rax, 20202020202020202020202020202020h
    mov rdi, offset buffer
    mov rcx, 1000
    rep stosd


I can't test it because my OS and CPU are 32 bit. Now don't be shy, just go ahead!

dedndave


frktons

Quote from: jj2007 on August 25, 2010, 08:30:53 PM
Quote from: frktons on August 25, 2010, 08:15:47 PM
Anyone wants to engage?

What about you? I'll give you a starting point:
    mov rax, 20202020202020202020202020202020h
    mov rdi, offset buffer
    mov rcx, 1000
    rep stosd


I can't test it because my OS and CPU are 32 bit. Now don't be shy, just go ahead!

Thanks Jochen. I'll gladly try it if you tell me how do I compile it?
Is MASM32 enough or have I to use any other tool?
And a last question:

.686
.xmm

are enough or have I to specify something else?
Mind is like a parachute. You know what to do in order to use it :-)

jj2007

Quote from: frktons on August 25, 2010, 08:58:46 PM
Is MASM32 enough or have I to use any other tool?

JWasm is the best option, but I can't tell you more since my OS is 32 bit.