Hello
The masm32 reference says, that the count value for rep is stored in cx. So, cx is 16 bit, i tried it with ecx and it works fine over the 16 bit range (Win Vista).
push ds
pop es
mov esi, MemCopie_
mov edi, edx
mov ecx, filesize
rep movsb
My test was copy a file in Memory, copy the memory in another and then save the memorycopy to another file. Filesize > 500 Mb.
Has anyone other experience with this?
in 32-bit world, it is ecx - must be an old masm manual
also, you don't have to mess with ds and es - they are all the same in a flat model program
however, "mov esi, MemCopie_" is that a string ? or a pointer
it wants to be the offset of a string
but !!!
all that isn't neccessary if you are just trying to copy a file
no need to rep movsb anything
read it into a buffer
write it out from the same buffer
Dave is right here, in FLAT memory model you don't touch the segment registers at all. Have a look at the memcopy procedure in the MASM32 library, it uses REP MOVSD for most of the file and uses REP MOVSB for the balance, its a lot fatser than byte copy. You can get faster versions again with MMX and with a late enough processor XMM instructions, depends what you need to do.
Quotein 32-bit world, it is ecx - must be an old masm manual
Yes, i have 2 books. The one from 2001, the other from 2003.
Quoteall that isn't neccessary if you are just trying to copy a file
I know, i can do it easier with read_disk_file and write_disk_file.
It was just a test. Its 2 weeks ago that i began masm32 programming. I have so much to learn.
Thanks for answer.
Quote from: hutch-- on August 26, 2009, 04:23:10 AM
... it uses REP MOVSD for most of the file and uses REP MOVSB for the balance, its a lot fatser than byte copy.
IIRC the P4 and later cpus have a "string byte move optimization" feature implemented, which eliminates the speed difference between MOVSD and MOVSB. The feature can be enabled / disabled by writing a certain MSR register, usually it's enabled.
Quote from: japheth on August 26, 2009, 05:31:56 AM
IIRC the P4 and later cpus have a "string byte move optimization" feature implemented, which eliminates the speed difference between MOVSD and MOVSB. The feature can be enabled / disabled by writing a certain MSR register, usually it's enabled.
Good to know, although it seems to kick in only at higher byte counts - results for a Prescott P4:
176240 cycles for rep movsd, ct=400000
167652 cycles for rep movsb, ct=400000
14311 cycles for rep movsd, ct=40000
14673 cycles for rep movsb, ct=40000
1267 cycles for rep movsd, ct=4000
1481 cycles for rep movsb, ct=4000
307 cycles for rep movsd, ct=400
487 cycles for rep movsb, ct=400
59 cycles for rep movsd, ct=40
275 cycles for rep movsb, ct=40
hiyas Jochen
for rep movsb, the count should be 4x that used for rep movsd
we want to compare moving the same amount of data
EDIT - i am trying to make sense of the numbers - lol
nothing works in my head - more coffee
Here is a simple test piece that verifies Japheth's information.
I get these timings on my PIV.
2859 ms REP MOVSD
2797 ms REP MOVSB
2797 ms REP MOVSD
2812 ms REP MOVSB
2797 ms REP MOVSD
2797 ms REP MOVSB
2797 ms REP MOVSD
2781 ms REP MOVSB
Press any key to continue ...
Running this test piece.
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
include \masm32\include\masm32rt.inc
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
comment * -----------------------------------------------------
Build this template with
"CONSOLE ASSEMBLE AND LINK"
----------------------------------------------------- *
MemCopyD PROTO :DWORD,:DWORD,:DWORD
MemCopyB PROTO :DWORD,:DWORD,:DWORD
.code
start:
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
call main
inkey
exit
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
main proc
LOCAL hMem1 :DWORD
LOCAL hMem2 :DWORD
meg64 equ <1024*1024*64>
mov hMem1, alloc(meg64)
mov hMem2, alloc(meg64)
push ebx
REPEAT 4
invoke GetTickCount
push eax
mov ebx, 50
@@:
invoke MemCopyD,hMem1,hMem2,meg64
sub ebx, 1
jnz @B
invoke GetTickCount
pop ecx
sub eax, ecx
print str$(eax)," ms REP MOVSD",13,10
invoke GetTickCount
push eax
mov ebx, 50
@@:
invoke MemCopyB,hMem1,hMem2,meg64
sub ebx, 1
jnz @B
invoke GetTickCount
pop ecx
sub eax, ecx
print str$(eax)," ms REP MOVSB",13,10
ENDM
pop ebx
free hMem2
free hMem1
ret
main endp
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
align 4
MemCopyD proc public uses esi edi Source:DWORD,Dest:DWORD,ln:DWORD
cld
mov esi, [Source]
mov edi, [Dest]
mov ecx, [ln]
shr ecx, 2
rep movsd
mov ecx, [ln]
and ecx, 3
rep movsb
ret
MemCopyD endp
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
align 4
MemCopyB proc public uses esi edi Source:DWORD,Dest:DWORD,ln:DWORD
cld
mov esi, Source
mov edi, Dest
mov ecx, ln
rep movsb
ret
MemCopyB endp
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
end start
I would still exercise caution at using the BYTE copy as the PIV behaviour in special case circuitry in not universal in available hardware.
Quote from: dedndave on August 26, 2009, 12:49:50 PM
for rep movsb, the count should be 4x that used for rep movsd
we want to compare moving the same amount of data
That's what I did, otherwise timings would not be so close for the large counts.
ct means bytes copied.