zero fill mem

ragdog · February 19, 2007, 08:45:07 PM

hi

I have times ask to zero mem function which is better?

RtlZeroMemory or ZeroMem

to fill buffer with zero??

thanks in forward

jag · February 19, 2007, 10:21:46 PM

There is a thread here, http://www.masm32.com/board/index.php?topic=6576.0
The thread discusses which one is the fastest.

I believe you would want to use RtlZeroMemory.

ic2 · February 19, 2007, 11:26:55 PM

8 out of 10 jag is right...

I'm no expert but where i can replace an API, i will. I think i found RtlZeroMemory hard to replace once for whatever reason and only bitRAKE code did the job for me at the time when all others i tried fail. I could have been doing something wrong but whatever the case the code below solved that problem. If this is what you mean by zero out the buffer, both will wipe *the entire buffer space* with char's being in it or not.

I changed Qages code a long time ago because i wanted eax to always be free. . I use it 95% of the time and never had a problem.

Qages once said "nothing is faster than a JUMP"

Code Select

; ############################

Qages_Clear_Buff PROC len:DWORD,scr:DWORD

xor edx, edx    ; (dh/dl)

    mov edx, scr               ;       Qages   cleanbuff
    xor ebx,ebx 
    mov BYTE PTR [edx],0 
  @@: 
    inc ebx 
    mov BYTE PTR [edx+ebx],0 
    cmp ebx, len; -1
  jne @B 

ret

c_Qages_Clear_Buff ENDP

; ############################

I once stumbled on some weird thing coding like WriteProcessMemory without even calling the API and it worked. I was playing with RtlZeroMemory and these other codes, and ONLY bitRAKE code did the job at the time. I forgot how i did it and loss what i was doing and why.

Anyway I use this when i mean business, meaning RIGHT NOW with no tricks allowed and it NEVER FAIL...

Code Select

; .......................   bitRAKE clean buffer
 xor eax,eax
 mov edi, offset hFile
 mov ecx, SIZEOF hFile
 rep stosb

hutch-- · February 19, 2007, 11:34:01 PM

The recent thread on zeroing memory made most of these questions clear. If its a large block to zero, use STOSD, if its under about 700 bytes use something like memfill in the masm32 library. An MMX version will be slightly faster on Intel hardware but slightly slower on AMD.

evlncrn8 · February 20, 2007, 12:04:46 AM

any memory filler program is probably faster using stos* like...

Code Select



zeromemory proc uses eax ecx edi, memoryarea:dword, memorysize:dword

local bytesremaining:dword

xor eax,eax
mov edi, memoryarea
mov ecx, memorysize
mov bytesremaining, ecx

mov ecx, bytesremaining

cmp ecx, 0
je finished

shl ecx, 2;divide by 4

cmp ecx, 0
je stoswmode

sub bytesremaining, ecx
repnz stosd

stoswmode:

mov ecx, bytesremaining

cmp ecx, 0
je stosbmode

shl ecx, 1 ; divide by 2

cmp ecx, 0
je stosbmode

sub bytesremaining, ecx
repnz stosw

stosbmode:

mov ecx, bytesremaining

cmp ecx, 0
je finished

repnz stosb

ret

zeromemory endp

think microsoft do something similar, haven't tested that code either, should work though

hutch-- · February 20, 2007, 08:28:49 AM

evlncrn8,

You should benchmark it against the collection of algos in the thread mentioned above, over about 700 bytes REP STOSD leaves the rest behind.

evlncrn8 · February 20, 2007, 10:22:05 AM

haven't really got time atm :(, 'leaves the rest behind' means its slower?

hutch-- · February 20, 2007, 12:38:14 PM

Perhaps if you read the thread you would understand what "leaves the rest behind" meant.

evlncrn8 · February 20, 2007, 01:00:25 PM

and you could have explained instead of going 'look at the thread', its a simple question.

and as for the thread.. different results, different pc's, and most likely different types of
memory tested.. stack, fixed, aligned, not aligned, its all just results which mean nothing
without a relative base to work from...ie: not conclusive

hutch-- · February 20, 2007, 01:10:54 PM

Everybody has a theory, feel free to put yours to the test. That is what objective testing is about. If you think you have a more accurate benchmarking method, feel free to demonstrate it.

ragdog · February 20, 2007, 03:26:57 PM

:U thanks to all for the information

greets
ragdog

Qages · March 03, 2007, 01:45:39 AM

Quote
I changed Qages code a long time ago because i wanted eax to always be free. . I use it 95% of the time and never had a problem.

Qages once said "nothing is faster than a JUMP"

I didn't know i had a fan.

ic2 · June 01, 2007, 03:52:50 AM

QuoteI didn't know i had a fan.

I did not notice this until now. I went to doing something else i guest.

Yes You Did. ... and still Do, because i still read old threads where you once posted. Haven't seen anything new under Qage lately until now...

Code i used presented by other coders i never forget. Specially when that coder make strong comments on why it works so well...Also, it was people like you who kelp me interested in ASM in the first place even tho i was slow about it.

Anyway, this question should be in relations to all functions like yours but i use this example:

If i had only 40 bytes in a 256 byte size buffer and i use sizeof ... Am im insured that it will clear only to the first zero it encounter or do it clean to the very end of the buffer...
256 repeats instead of 41 repeats using this code: ... I think it would be returning after hitting the 41 byte but i need to be 100% sure. This is why im asking.

Code Select


TEMP_256 db 256 dup(?)
Only 40 bytes is used:

xor eax,eax
mov edi, offset TEMP_256
mov ecx, sizeof TEMP_256
rep stosb

And if i want it to clean the entire buffer no matter what this would be one way it can be done... With only 40 bytes inside the buffer am im insured that it will clean to the bitter end of the buffer... stepping thru all 256 bytes. Or will it REALLY still return after hitting the 41st buffer context. Returning well before the expected 256 hit.

Code Select


TEMP_256 db 256 dup(?)
Only 40 bytes is used:

xor eax,eax
mov edi, offset TEMP_256
mov ecx, 256
rep stosb

Thanks in advance

Just need to know for sure... and understand what SIZEOF actually dose, sizeof BYTES or sizeof BUFFER

asmfan · June 01, 2007, 12:50:40 PM

really lots of ways to do the job. Using cache (mov, movs*, movq, movdqa) or not (movnti, movntps[d], movntdq) depending on required task. Noncached access on large amounts of memory is much (appr. 3x times on my tests) faster. One advice ascending access (from less addresses to higer ones) is faster.

thomas_remkus · June 01, 2007, 12:52:53 PM

If you start your loop at with the counter at the highest value and "dec" until you get to 0 you do not need to use the "cmp" because "dec" will fill in the 0 flag and you can "jnz" back to your label without the additional instruction.

If this is wrong, please let me know. I have a current project with a loop that some experts are helping me though right now. This project does not use this technique because I can't seem to get so much else working I don't want to mess with this too. But I think for performance this is correct.

News:

zero fill mem

jag

ic2

Qages

ic2

thomas_remkus