Print Page - rep & ecx ?

Title: rep & ecx ?
Post by: Thomas_1110 on August 26, 2009, 04:12:43 AM

Hello
The masm32 reference says, that the count value for rep is stored in cx. So, cx is 16 bit, i tried it with ecx and it works fine over the 16 bit range (Win Vista).

Code Select


	push ds
	pop es
	mov esi, MemCopie_			
	mov edi, edx						
	mov ecx, filesize 					
	rep movsb

My test was copy a file in Memory, copy the memory in another and then save the memorycopy to another file. Filesize > 500 Mb.
Has anyone other experience with this?

Title: Re: rep & ecx ?
Post by: dedndave on August 26, 2009, 04:18:13 AM

in 32-bit world, it is ecx - must be an old masm manual
also, you don't have to mess with ds and es - they are all the same in a flat model program
however, "mov esi, MemCopie_" is that a string ? or a pointer
it wants to be the offset of a string

but !!!
all that isn't neccessary if you are just trying to copy a file
no need to rep movsb anything
read it into a buffer
write it out from the same buffer

Title: Re: rep & ecx ?
Post by: hutch-- on August 26, 2009, 04:23:10 AM

Dave is right here, in FLAT memory model you don't touch the segment registers at all. Have a look at the memcopy procedure in the MASM32 library, it uses REP MOVSD for most of the file and uses REP MOVSB for the balance, its a lot fatser than byte copy. You can get faster versions again with MMX and with a late enough processor XMM instructions, depends what you need to do.

Title: Re: rep & ecx ?
Post by: Thomas_1110 on August 26, 2009, 04:32:44 AM

Quotein 32-bit world, it is ecx - must be an old masm manual

Yes, i have 2 books. The one from 2001, the other from 2003.

Quoteall that isn't neccessary if you are just trying to copy a file

I know, i can do it easier with read_disk_file and write_disk_file.
It was just a test. Its 2 weeks ago that i began masm32 programming. I have so much to learn.
Thanks for answer.

Title: Re: rep & ecx ?
Post by: japheth on August 26, 2009, 05:31:56 AM

Quote from: hutch-- on August 26, 2009, 04:23:10 AM
... it uses REP MOVSD for most of the file and uses REP MOVSB for the balance, its a lot fatser than byte copy.

IIRC the P4 and later cpus have a "string byte move optimization" feature implemented, which eliminates the speed difference between MOVSD and MOVSB. The feature can be enabled / disabled by writing a certain MSR register, usually it's enabled.

Title: Re: rep & ecx ?
Post by: jj2007 on August 26, 2009, 12:37:04 PM

Quote from: japheth on August 26, 2009, 05:31:56 AM
IIRC the P4 and later cpus have a "string byte move optimization" feature implemented, which eliminates the speed difference between MOVSD and MOVSB. The feature can be enabled / disabled by writing a certain MSR register, usually it's enabled.

Good to know, although it seems to kick in only at higher byte counts - results for a Prescott P4:

176240 cycles for rep movsd, ct=400000
167652 cycles for rep movsb, ct=400000

14311 cycles for rep movsd, ct=40000
14673 cycles for rep movsb, ct=40000

1267 cycles for rep movsd, ct=4000
1481 cycles for rep movsb, ct=4000

307 cycles for rep movsd, ct=400
487 cycles for rep movsb, ct=400

59 cycles for rep movsd, ct=40
275 cycles for rep movsb, ct=40

Title: Re: rep & ecx ?
Post by: dedndave on August 26, 2009, 12:49:50 PM

hiyas Jochen
for rep movsb, the count should be 4x that used for rep movsd
we want to compare moving the same amount of data

EDIT - i am trying to make sense of the numbers - lol
nothing works in my head - more coffee

Title: Re: rep & ecx ?
Post by: hutch-- on August 26, 2009, 02:44:28 PM

Here is a simple test piece that verifies Japheth's information.

I get these timings on my PIV.

Code Select


2859 ms REP MOVSD
2797 ms REP MOVSB
2797 ms REP MOVSD
2812 ms REP MOVSB
2797 ms REP MOVSD
2797 ms REP MOVSB
2797 ms REP MOVSD
2781 ms REP MOVSB
Press any key to continue ...

Running this test piece.

Code Select


; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
    include \masm32\include\masm32rt.inc
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

comment * -----------------------------------------------------
                        Build this  template with
                       "CONSOLE ASSEMBLE AND LINK"
        ----------------------------------------------------- *

    MemCopyD PROTO :DWORD,:DWORD,:DWORD
    MemCopyB PROTO :DWORD,:DWORD,:DWORD

    .code

start:
   
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    call main
    inkey
    exit

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

main proc

    LOCAL hMem1 :DWORD
    LOCAL hMem2 :DWORD

    meg64 equ <1024*1024*64>

    mov hMem1, alloc(meg64)
    mov hMem2, alloc(meg64)

    push ebx

  REPEAT 4

    invoke GetTickCount
    push eax

    mov ebx, 50

  @@:
    invoke MemCopyD,hMem1,hMem2,meg64
    sub ebx, 1
    jnz @B


    invoke GetTickCount
    pop ecx
    sub eax, ecx
    print str$(eax)," ms REP MOVSD",13,10


    invoke GetTickCount
    push eax

    mov ebx, 50

  @@:
    invoke MemCopyB,hMem1,hMem2,meg64
    sub ebx, 1
    jnz @B


    invoke GetTickCount
    pop ecx
    sub eax, ecx
    print str$(eax)," ms REP MOVSB",13,10

  ENDM

    pop ebx

    free hMem2
    free hMem1

    ret

main endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

align 4

MemCopyD proc public uses esi edi Source:DWORD,Dest:DWORD,ln:DWORD

    cld
    mov esi, [Source]
    mov edi, [Dest]
    mov ecx, [ln]

    shr ecx, 2
    rep movsd

    mov ecx, [ln]
    and ecx, 3
    rep movsb

    ret

MemCopyD endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

align 4

MemCopyB proc public uses esi edi Source:DWORD,Dest:DWORD,ln:DWORD

    cld
    mov esi, Source
    mov edi, Dest
    mov ecx, ln

    rep movsb

    ret

MemCopyB endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

end start

I would still exercise caution at using the BYTE copy as the PIV behaviour in special case circuitry in not universal in available hardware.

Title: Re: rep & ecx ?
Post by: jj2007 on August 26, 2009, 04:00:27 PM

Quote from: dedndave on August 26, 2009, 12:49:50 PM
for rep movsb, the count should be 4x that used for rep movsd
we want to compare moving the same amount of data

That's what I did, otherwise timings would not be so close for the large counts. ct means bytes copied.

The MASM Forum Archive 2004 to 2012

General Forums => The Campus => Topic started by: Thomas_1110 on August 26, 2009, 04:12:43 AM