The MASM Forum Archive 2004 to 2012

General Forums => The Campus => Topic started by: ragdog on February 19, 2007, 08:45:07 PM

Title: zero fill mem
Post by: ragdog on February 19, 2007, 08:45:07 PM
hi

I have times ask to zero mem function which is better?

RtlZeroMemory or ZeroMem

to fill buffer with zero??

thanks in forward
Title: Re: zero fill mem
Post by: jag on February 19, 2007, 10:21:46 PM
There is a thread here, http://www.masm32.com/board/index.php?topic=6576.0
The thread discusses which one is the fastest.

I believe you would want to use RtlZeroMemory.
Title: Re: zero fill mem
Post by: ic2 on February 19, 2007, 11:26:55 PM
8 out of 10 jag is right...

I'm no expert but where i can replace an API, i will.  I think i found RtlZeroMemory hard to replace once for whatever reason and only bitRAKE code did the job for me at the time when all others i tried fail.  I could have been doing something  wrong but whatever the case the code below solved that problem.   If this is what you mean by zero out the buffer, both will wipe *the entire buffer space* with char's being in it or not.

I changed Qages code a long time ago because i wanted eax to always be free. . I use it 95% of the time and never had a problem.

  Qages once said "nothing is faster than a JUMP"


; ############################

Qages_Clear_Buff PROC len:DWORD,scr:DWORD

xor edx, edx    ; (dh/dl)

    mov edx, scr               ;       Qages   cleanbuff
    xor ebx,ebx
    mov BYTE PTR [edx],0
  @@:
    inc ebx
    mov BYTE PTR [edx+ebx],0
    cmp ebx, len; -1
  jne @B

ret

c_Qages_Clear_Buff ENDP

; ############################


I once stumbled on  some weird thing coding like WriteProcessMemory without even calling the API and it worked.  I was playing with RtlZeroMemory and these other codes, and ONLY bitRAKE code did the job at the time.  I forgot how i did it and loss what i was doing and why.   

Anyway I use this when i mean business, meaning RIGHT NOW with no tricks allowed and it NEVER FAIL...

; .......................   bitRAKE clean buffer
xor eax,eax
mov edi, offset hFile
mov ecx, SIZEOF hFile
rep stosb




Title: Re: zero fill mem
Post by: hutch-- on February 19, 2007, 11:34:01 PM
The recent thread on zeroing memory made most of these questions clear. If its a large block to zero, use STOSD, if its under about 700 bytes use something like memfill in the masm32 library. An MMX version will be slightly faster on Intel hardware but slightly slower on AMD.
Title: Re: zero fill mem
Post by: evlncrn8 on February 20, 2007, 12:04:46 AM
any memory filler program is probably faster using stos* like...



zeromemory proc uses eax ecx edi, memoryarea:dword, memorysize:dword

local bytesremaining:dword

xor eax,eax
mov edi, memoryarea
mov ecx, memorysize
mov bytesremaining, ecx

mov ecx, bytesremaining

cmp ecx, 0
je finished

shl ecx, 2;divide by 4

cmp ecx, 0
je stoswmode

sub bytesremaining, ecx
repnz stosd

stoswmode:

mov ecx, bytesremaining

cmp ecx, 0
je stosbmode

shl ecx, 1 ; divide by 2

cmp ecx, 0
je stosbmode

sub bytesremaining, ecx
repnz stosw

stosbmode:

mov ecx, bytesremaining

cmp ecx, 0
je finished

repnz stosb

ret

zeromemory endp


think microsoft do something similar, haven't tested that code either, should work though
Title: Re: zero fill mem
Post by: hutch-- on February 20, 2007, 08:28:49 AM
evlncrn8,

You should benchmark it against the collection of algos in the thread mentioned above, over about 700 bytes REP STOSD leaves the rest behind.
Title: Re: zero fill mem
Post by: evlncrn8 on February 20, 2007, 10:22:05 AM
haven't really got time atm :(, 'leaves the rest behind' means its slower?
Title: Re: zero fill mem
Post by: hutch-- on February 20, 2007, 12:38:14 PM
Perhaps if you read the thread you would understand what "leaves the rest behind" meant.
Title: Re: zero fill mem
Post by: evlncrn8 on February 20, 2007, 01:00:25 PM
and you could have explained instead of going 'look at the thread', its a simple question.

and as for the thread.. different results, different pc's, and most likely different types of
memory tested.. stack, fixed, aligned, not aligned, its all just results which mean nothing
without a relative base to work from...ie: not conclusive
Title: Re: zero fill mem
Post by: hutch-- on February 20, 2007, 01:10:54 PM
Everybody has a theory, feel free to put yours to the test. That is what objective testing is about. If you think you have a more accurate benchmarking method, feel free to demonstrate it.
Title: Re: zero fill mem
Post by: ragdog on February 20, 2007, 03:26:57 PM
 :U thanks to all for the information

greets
ragdog
Title: Re: zero fill mem
Post by: Qages on March 03, 2007, 01:45:39 AM
Quote
I changed Qages code a long time ago because i wanted eax to always be free. . I use it 95% of the time and never had a problem.

  Qages once said "nothing is faster than a JUMP"

I didn't know i had a fan.
Title: Re: zero fill mem
Post by: ic2 on June 01, 2007, 03:52:50 AM
QuoteI didn't know i had a fan.

I did not notice this until now.  I went to doing something else i guest.

Yes You Did. ... and still Do, because i still read old threads where you once posted.  Haven't seen anything new under Qage lately until now...

Code i used presented by other coders i never forget.  Specially when that coder make strong comments on why it works so well...Also, it was people like you who kelp me interested in ASM in the first place even tho i was slow about it.

Anyway, this question should be in relations to all functions like yours but i use this example:

If i had only 40 bytes in a 256 byte size buffer and i use sizeof ...  Am im insured that it will clear only to the first zero it encounter or do it clean to the very end of the buffer...
256 repeats instead of 41 repeats using this code: ... I think it would be returning after hitting the 41 byte but i need to be 100% sure.  This is why im asking.


TEMP_256 db 256 dup(?)
Only 40 bytes is used:

xor eax,eax
mov edi, offset TEMP_256
mov ecx, sizeof TEMP_256
rep stosb


And if i want it to clean the entire buffer no matter what this would be one way it can be done... With only 40 bytes inside the buffer am im insured that it will clean to the bitter end of the buffer... stepping thru all 256 bytes.  Or will it REALLY still return after hitting  the 41st buffer context.  Returning well before the expected 256 hit.


TEMP_256 db 256 dup(?)
Only 40 bytes is used:

xor eax,eax
mov edi, offset TEMP_256
mov ecx, 256
rep stosb


Thanks in advance

Just need to know for sure... and understand what SIZEOF actually dose, sizeof BYTES or sizeof BUFFER
Title: Re: zero fill mem
Post by: asmfan on June 01, 2007, 12:50:40 PM
really lots of ways to do the job. Using cache (mov, movs*, movq, movdqa) or not (movnti, movntps[d], movntdq) depending on required task. Noncached access on large amounts of memory is much (appr. 3x times on my tests) faster. One advice ascending access (from less addresses to higer ones) is faster.
Title: Re: zero fill mem
Post by: thomas_remkus on June 01, 2007, 12:52:53 PM
If you start your loop at with the counter at the highest value and "dec" until you get to 0 you do not need to use the "cmp" because "dec" will fill in the 0 flag and you can "jnz" back to your label without the additional instruction.

If this is wrong, please let me know. I have a current project with a loop that some experts are helping me though right now. This project does not use this technique because I can't seem to get so much else working I don't want to mess with this too. But I think for performance this is correct.
Title: Re: zero fill mem
Post by: asmfan on June 01, 2007, 01:13:41 PM
Quote from: thomas_remkus on June 01, 2007, 12:52:53 PM
"dec" until you get to 0 you do not need to use the "cmp" because "dec" will fill in the 0 flag and you can "jnz" back to your label without the additional instruction.
Of course you're right. but i'm speaking about best memory access, not about best loop organization;) wich you are talking about.
Say its better make cld before rep stos* than std due to performance. It is similar when organizing own memory access routine.

Below my app. testing RAM fill speed, written in fasm, requires fasm to recompile and win & sse2 to run (can be done with mere sse with different packed type).

[attachment deleted by admin]
Title: Re: zero fill mem
Post by: Subhadeep.Ghosh on June 03, 2007, 06:14:18 AM
Hello,

I found this to be an interesting discussion, so I decided to contribute to it as well with whatever little I've got  :bg.

In C/C++ I've been using my own versions of memcpy, memset and the like. I think I use the same optimized algorithm as the one which the C/C++ standard library uses, but still I derive a kick out of using my own libraries when ever I can.

The functions memcpy, memset and ZeroMemory could be group under the same category wherein you are modifying a block of memory. In this case let's consider a computer which can handle 32bits (4 bytes) of data in a single clock cycle. In such a situation there might be two possibilities - either the size of the block of memory (in bytes) is divisible by 4 or it is not divisible by 4.

In case if the size of the block of memory is divisible by 4, then we could set ecx to (size of the memory block) >> 2 and do a REP STOSD. In case the size of the block of memory is not divisible by 4, then at first we could set ecx to (size of the memory block) >> 2 and do a REP STOSD and then set ecx to (size of the memory block) % 4 and then do a REP STOSB.

According to me, this is the best 32bits memory manipulation. The same algorithm could be extrapolated for 64bits and 128bits as well.

Regards,
Subhadeep Ghosh