Started by ragdog, February 19, 2007, 08:45:07 PM

I have times ask to zero mem function which is better?

RtlZeroMemory or ZeroMem

to fill buffer with zero??

thanks in forward


There is a thread here,
The thread discusses which one is the fastest.

I believe you would want to use RtlZeroMemory.


8 out of 10 jag is right...

I'm no expert but where i can replace an API, i will.  I think i found RtlZeroMemory hard to replace once for whatever reason and only bitRAKE code did the job for me at the time when all others i tried fail.  I could have been doing something  wrong but whatever the case the code below solved that problem.   If this is what you mean by zero out the buffer, both will wipe *the entire buffer space* with char's being in it or not.

I changed Qages code a long time ago because i wanted eax to always be free. . I use it 95% of the time and never had a problem.

  Qages once said "nothing is faster than a JUMP"

; ############################

Qages_Clear_Buff PROC len:DWORD,scr:DWORD

xor edx, edx    ; (dh/dl)

    mov edx, scr               ;       Qages   cleanbuff
    xor ebx,ebx
    mov BYTE PTR [edx],0
    inc ebx
    mov BYTE PTR [edx+ebx],0
    cmp ebx, len; -1
  jne @B


c_Qages_Clear_Buff ENDP

; ############################

I once stumbled on  some weird thing coding like WriteProcessMemory without even calling the API and it worked.  I was playing with RtlZeroMemory and these other codes, and ONLY bitRAKE code did the job at the time.  I forgot how i did it and loss what i was doing and why.   

Anyway I use this when i mean business, meaning RIGHT NOW with no tricks allowed and it NEVER FAIL...

; .......................   bitRAKE clean buffer
xor eax,eax
mov edi, offset hFile
mov ecx, SIZEOF hFile
rep stosb


The recent thread on zeroing memory made most of these questions clear. If its a large block to zero, use STOSD, if its under about 700 bytes use something like memfill in the masm32 library. An MMX version will be slightly faster on Intel hardware but slightly slower on AMD.
any memory filler program is probably faster using stos* like...

zeromemory proc uses eax ecx edi, memoryarea:dword, memorysize:dword

local bytesremaining:dword

xor eax,eax
mov edi, memoryarea
mov ecx, memorysize
mov bytesremaining, ecx

mov ecx, bytesremaining

cmp ecx, 0
je finished

shl ecx, 2;divide by 4

cmp ecx, 0
je stoswmode

sub bytesremaining, ecx
repnz stosd


mov ecx, bytesremaining

cmp ecx, 0
je stosbmode

shl ecx, 1 ; divide by 2

cmp ecx, 0
je stosbmode

sub bytesremaining, ecx
repnz stosw


mov ecx, bytesremaining

cmp ecx, 0
je finished

repnz stosb


zeromemory endp

think microsoft do something similar, haven't tested that code either, should work though



You should benchmark it against the collection of algos in the thread mentioned above, over about 700 bytes REP STOSD leaves the rest behind.
haven't really got time atm :(, 'leaves the rest behind' means its slower?


Perhaps if you read the thread you would understand what "leaves the rest behind" meant.
and you could have explained instead of going 'look at the thread', its a simple question.

and as for the thread.. different results, different pc's, and most likely different types of
memory tested.. stack, fixed, aligned, not aligned, its all just results which mean nothing
without a relative base to work not conclusive


Everybody has a theory, feel free to put yours to the test. That is what objective testing is about. If you think you have a more accurate benchmarking method, feel free to demonstrate it.
 :U thanks to all for the information



I didn't know i had a fan.


I did not notice this until now.  I went to doing something else i guest.

Yes You Did. ... and still Do, because i still read old threads where you once posted.  Haven't seen anything new under Qage lately until now...

Code i used presented by other coders i never forget.  Specially when that coder make strong comments on why it works so well...Also, it was people like you who kelp me interested in ASM in the first place even tho i was slow about it.

Anyway, this question should be in relations to all functions like yours but i use this example:

If i had only 40 bytes in a 256 byte size buffer and i use sizeof ...  Am im insured that it will clear only to the first zero it encounter or do it clean to the very end of the buffer...
256 repeats instead of 41 repeats using this code: ... I think it would be returning after hitting the 41 byte but i need to be 100% sure.  This is why im asking.

TEMP_256 db 256 dup(?)
Only 40 bytes is used:

xor eax,eax
mov edi, offset TEMP_256
mov ecx, sizeof TEMP_256
rep stosb

And if i want it to clean the entire buffer no matter what this would be one way it can be done... With only 40 bytes inside the buffer am im insured that it will clean to the bitter end of the buffer... stepping thru all 256 bytes.  Or will it REALLY still return after hitting  the 41st buffer context.  Returning well before the expected 256 hit.

TEMP_256 db 256 dup(?)
Only 40 bytes is used:

xor eax,eax
mov edi, offset TEMP_256
mov ecx, 256
rep stosb

Thanks in advance

Just need to know for sure... and understand what SIZEOF actually dose, sizeof BYTES or sizeof BUFFER


really lots of ways to do the job. Using cache (mov, movs*, movq, movdqa) or not (movnti, movntps[d], movntdq) depending on required task. Noncached access on large amounts of memory is much (appr. 3x times on my tests) faster. One advice ascending access (from less addresses to higer ones) is faster.
Russia is a weird place


If you start your loop at with the counter at the highest value and "dec" until you get to 0 you do not need to use the "cmp" because "dec" will fill in the 0 flag and you can "jnz" back to your label without the additional instruction.

If this is wrong, please let me know. I have a current project with a loop that some experts are helping me though right now. This project does not use this technique because I can't seem to get so much else working I don't want to mess with this too. But I think for performance this is correct.