News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

CopyMem

Started by MikeT, March 21, 2008, 04:37:59 AM

Previous topic - Next topic

MikeT

Assuming this function is still the prefered choice for copying memory.


' ###########################################################################
SUB MemCopy( BYVAL pDst AS DWORD, BYVAL pSrc AS DWORD, BYVAL CopyLen AS DWORD ) ' Argument order Matches MoveMemory()
             
'  For all the variations I have tried to get a faster
'  algo, this one still beats the rest. I have tried 6 register versions,
'  8 mmx register versions and this one still outclocks them. the speed
'  limit on memory copy is apparently imposed by the actual speed of
'  memory but the rep movsd pair is very well optimised and it has
'  slightly less overhead. - hutch@pbq.com.au

    #REGISTER NONE

      ! cld

      ! mov esi, pSrc
      ! mov edi, pDst
      ! mov ecx, CopyLen

      ! shr ecx, 2
      ! rep movsd

      ! mov ecx, CopyLen
      ! AND ecx, 3
      ! rep movsb

END SUB


I would like to incorporate it in a function that converts a LONG to a STRING by avoiding the call to it:
' ###########################################################################
SUB MemCopy( BYVAL pDst AS DWORD, BYVAL pSrc AS DWORD, BYVAL CopyLen AS DWORD )
             
'  For all the variations I have tried to get a faster
'  algo, this one still beats the rest. I have tried 6 register versions,
'  8 mmx register versions and this one still outclocks them. the speed
'  limit on memory copy is apparently imposed by the actual speed of
'  memory but the rep movsd pair is very well optimised and it has
'  slightly less overhead. - hutch@pbq.com.au

    #REGISTER NONE

      ! cld

      ! mov esi, pSrc
      ! mov edi, pDst
      ! mov ecx, CopyLen

      ! shr ecx, 2
      ! rep movsd

      ! mov ecx, CopyLen
      ! AND ecx, 3
      ! rep movsb

END SUB



'¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤'
FUNCTION LONG2STR( BYVAL LongVar AS LONG ) AS STRING ' 2200 Clks
     
  #REGISTER NONE
   
  LOCAL i, zPos, Mult, Digit, NegFlag AS LONG
  LOCAL zLen AS DWORD
  LOCAL sLongNum AS STRING
  LOCAL a, d AS BYTE PTR       
  LOCAL zDigits AS ASCIIZ * 10 ' LONG values -2147483648 10 Digits
   
               
    IF LongVar < 0 THEN
      LongVar = -LongVar
      NegFlag = 1   

    ELSEIF LongVar = 0 THEN
      FUNCTION = "0" : EXIT FUNCTION

    END IF
             

    d = VARPTR(zDigits) 
    Mult = 1
    zPos = 9 ' zero based         
    DO
      Digit = LongVar\Mult
      IF Digit = 0 THEN EXIT LOOP   
      Digit = Digit MOD 10
      Mult = Mult*10
      @d[zPos] = Digit + 48   
      DECR zPos
    LOOP
                 
    zLen = 9-zPos ' variable to point to the start of the string
    sLongNum = SPACE$(zLen+NegFlag)

    a = STRPTR(sLongNum) ' Create the output string                             

    IF NegFlag THEN
      @a[0] = 45 ' Minus sign 
      MemCopy a+1, d+zPos+1, zLen+1 ' MoveMemory @a[1], @d[zPos+1], zLen+1
    ELSE
      MemCopy a,   d+zPos+1, zLen   ' MoveMemory @a[0], @d[zPos+1], zLen
    END IF ' PRINT #hDbg, "sLongNum >>>" + sLongNum + "<<<"
           

  FUNCTION = sLongNum


END FUNCTION 
               


and just having it inline:


'¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤'
FUNCTION LONG2STR( BYVAL LongVar AS LONG ) AS STRING ' 2200 Clks
     
  #REGISTER NONE
   
  LOCAL i, zPos, Mult, Digit, NegFlag AS LONG
  LOCAL zLen AS DWORD
  LOCAL sLongNum AS STRING
  LOCAL a, d AS BYTE PTR       
  LOCAL zDigits AS ASCIIZ * 10 ' LONG values -2147483648 10 Digits
   
               
    IF LongVar < 0 THEN
      LongVar = -LongVar
      NegFlag = 1   

    ELSEIF LongVar = 0 THEN
      FUNCTION = "0" : EXIT FUNCTION

    END IF
             

    d = VARPTR(zDigits) 
    Mult = 1
    zPos = 9 ' zero based         
    DO
      Digit = LongVar\Mult
      IF Digit = 0 THEN EXIT LOOP   
      Digit    = Digit MOD 10
      Mult     = Mult*10
      @d[zPos] = Digit + 48   
      DECR zPos
    LOOP
                 
    zLen = 9-zPos ' variable to point to the start of the string
    sLongNum = SPACE$(zLen+NegFlag)

    a = STRPTR(sLongNum) ' Create the output string                             

    IF NegFlag THEN
      @a[0] = 45 ' Minus sign 
      ' MemCopy a+1, d+zPos+1, zLen+1 ' MoveMemory @a[1], @d[zPos+1], zLen+1
      INCR a
      d = d+zPos+1
      INCR zLen
    ELSE
      ' MemCopy a,   d+zPos+1, zLen   ' MoveMemory @a[0], @d[zPos+1], zLen
      d = d+zPos+1
    END IF ' PRINT #hDbg, "sLongNum >>>" + sLongNum + "<<<"
       
     
    ! cld

    ! mov esi, a    ' Byte pointer
    ! mov edi, d    ' Byte pointer
    ! mov ecx, zLen ' DWORD

    ! shr ecx, 2
    ! rep movsd

    ! mov ecx, zLen ' DWORD
    ! AND ecx, 3
    ! rep movsb


  FUNCTION = sLongNum


END FUNCTION 

                 
The problem is I am substituting a Byte pointer for a DWORD as a function argument and I dont know how to change this in ASM



MichaelW

I'm not sure I understand the problem. In a 32-bit Windows app a pointer is a 32-bit value, regardless of what it points to.
eschew obfuscation

hutch--

Miie,

Ae you trying to improve on the VAL / STR$ functions in PB ? Looking at the type of code you have posted the intrinsic functions would be ar faster.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

MikeT

>the intrinsic functions would be far faster.

Um not sure about that. My experience with PB showed that their built in functions were quite slow in the specific cases:

PBs FORMAT() takes about 3000 clks
ASM code does the job on a full sized LONG in about 40clks
http://www.powerbasic.com/support/pbforums/showthread.php?t=14379&highlight=clks

PBs VAL()...
... it's now running in under 370 down from 5200 clks. That's 14 times faster.
This get's it down to 90clks...
http://www.powerbasic.com/support/pbforums/showthread.php?t=14373&highlight=clks

The side benefit of a roll your own approach is that i can combine many operations in one pass through the bytes
I can remove characters & double spaces, trim the string, look for a sequence of charactrers etc etc
If used built in functions the time is a couple of orders of magnitude slower.

These days as i convert to Free Basic I am writing code that uses pointers wherever possible, as I cannot yet write ASM.
The code is slightly slower, but I dont hit a brick wall when a problem arises.


In this case I do not know how to convert the ASM CopyMem routine so that it uses a pointer derived from a local variable.
I am not clear if the ASM line:

      ! mov esi, pSrc

requires pSrc to be declared as a DWORD or if BYTE PTR is ok. (I assumed BYTE PTR was OK  ass it is a 32 bit unsigned integer, yet the example function fails)

I am also not clear if this ASM instruction is expecting to see this value come from the FUNCTIONs argument list as oppsed to a LOCAL variable. I expected there to be no difference yet the FUNCTION fails.

Basically my question here is how do Make this ASM code that was designed to work inside a FUNCTION (or in this case a SUB) work inline?

PS
Id be happy to discuss the example more if you want also

MichaelW

I don't know if this is a problem, but zDigits is too short to hold a maximum of 10 digits, plus a leading sign, plus a null terminator.

I think what the pointer is declared as does not matter. As long as the pointer contains the correct address and PB does not complain, it should work in the inline asm.

To replace the procedure call with inline asm you would need to place the inline asm at the location of the procedure call, and you would need to effectively use the procedure parameters as instruction operands. I have doubts that PB will accept expressions like a+1, d+zPos+1, or zLen+1 as instruction operands, so I think you will need to place the result of these expressions in a variable and use the variable as an operand.
eschew obfuscation

MikeT

Michael,

Thank you again. Yes I need to check that algo for large negative numbers.


#COMPILE EXE
#DIM ALL



'###########################################################################
SUB MemCopy( BYVAL pDst AS DWORD, BYVAL pSrc AS DWORD, BYVAL CopyLen AS DWORD )
             
'  For all the variations I have tried to get a faster
'  algo, this one still beats the rest. I have tried 6 register versions,
'  8 mmx register versions and this one still outclocks them. the speed
'  limit on memory copy is apparently imposed by the actual speed of
'  memory but the rep movsd pair is very well optimised and it has
'  slightly less overhead. - hutch@pbq.com.au

    #REGISTER NONE

      ! cld

      ! mov esi, pSrc
      ! mov edi, pDst
      ! mov ecx, CopyLen

      ! shr ecx, 2
      ! rep movsd

      ! mov ecx, CopyLen
      ! AND ecx, 3
      ! rep movsb

END SUB       

       
'¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
FUNCTION PBMAIN () AS LONG

  LOCAL pSrc, pDst, Ln AS DWORD
  LOCAL sSrc, sDst AS STRING   

   
#REGISTER NONE
         

    sSrc = "1234567890"
    sDst = NUL$(LEN(sSrc))

    CALL MemCopy( STRPTR(sDst), STRPTR(sSrc), LEN(sSrc) ) 


    MSGBOX sDst,64,"SUB"     
    '==================

     
    sSrc = "1234567890"
    Ln   = LEN(sSrc)
    sDst = NUL$(Ln)
    pSrc = STRPTR(sSrc)
    pDst = STRPTR(sDst)

    ! cld

    ! mov esi, pSrc
    ! mov edi, pDst
    ! mov ecx, Ln

    ! shr ecx, 2
    ! rep movsd

    ! mov ecx, Ln
    ! AND ecx, 3
    ! rep movsb
         

    MSGBOX sDst,64,"ASM"       
    '==================

END FUNCTION
'¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤


This works, so I I think I just need to make the variables a,b DWORDS and I will be fine.
(No PB does not accept an expression for an ASM operand)


MikeT

This works for all LONG values.

'¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤'
FUNCTION LONG2STR( BYVAL LongVar AS LONG ) AS STRING ' 2467 Clks  -2147483648 to 2147483647
'                                            PB VAL()  3106 Clks
  #REGISTER NONE
   
  LOCAL i, Digit, zLen, Mult AS LONG 
  LOCAL a, d AS BYTE PTR 
  STATIC sLongNum AS STRING     
  STATIC zDigits AS ASCIIZ * 10 '
   


    IF LongVar = 0 THEN
      FUNCTION = "0"
      EXIT FUNCTION

    ELSEIF LongVar < 0 THEN ' -ve number 
      Digit  = LongVar MOD 10 '
      d      = VARPTR(zDigits)       
      @d[10] = 48 - Digit '
      Mult   = 10       
      FOR i = 9 TO 1 STEP -1
        Digit    = LongVar\Mult '
        IF Digit = 0 THEN EXIT FOR 
        Digit    = Digit MOD 10
        @d[i]    = 48 - Digit '
        Mult     = Mult*10   
      NEXT
      zLen     = 10-i
      sLongNum = NUL$(zLen+1)
      a        = STRPTR(sLongNum) ' Create the output string 
      @a       = 45 ' Minus sign 
      INCR a
      d        = d+i+1
      INCR zLen
       

    ELSE ' +ve number   
      Digit    = LongVar MOD 10 '
      d        = VARPTR(zDigits)       
      @d[10]   = 48 + Digit ' 
      Mult     = 10       
      FOR i = 9 TO 1 STEP -1
        Digit    = LongVar\Mult ' PRINT #hDbg, "i="+FORMAT$(i, "00")+",zDigits="+zDigits+", Digit="+STR$(Digit)
        IF Digit = 0 THEN EXIT FOR 
        Digit    = Digit MOD 10
        Mult     = Mult*10   
        @d[i]    = 48 + Digit '
      NEXT
      zLen     = 10-i
      sLongNum = NUL$(zLen)
      a        = STRPTR(sLongNum) ' Create the output string 
      d        = d+i+1 

    END IF

    ! cld         ' CopyMem()

    ! mov esi, d
    ! mov edi, a
    ! mov ecx, zLen

    ! shr ecx, 2
    ! rep movsd

    ! mov ecx, zLen
    ! AND ecx, 3
    ! rep movsb


  FUNCTION = sLongNum


END FUNCTION 
               


If there were a way to create a STRING AT a given memory location for the function return then I could avoid the MemCopy() ASm at the end, but I dont know if thats possible. Ideally I would return a Byte Array as a dynamic string...

Pauls ASM code in the PB thread runs in 964 Clks.