News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Unicode string length

Started by jdoe, April 10, 2006, 01:32:54 AM

Previous topic - Next topic

EduardoS

Very fast...
I try to reduce the number of loads but don't help...

Edu32W proc src:DWORD
    mov eax, [esp+4]
    @@:
    mov edx, [eax]
    mov ecx, [eax+4]
    test dx, dx
    lea eax, [eax+2]
    jz @F
    shr edx, 16
    lea eax, [eax+2]
    jz @F
    test cx, cx
    lea eax, [eax+2]
    jz @F
    shr ecx, 16
    lea eax, [eax+2]
    jnz @B
    @@:
    sub eax, [esp+4]
    shr eax, 1
    dec eax
    ret 4
Edu32W endp


On an Athlon 64

my other brother darryl
LENGTHOF : 24
SIZEOF : 48
ucLen return value : 23
StrLenW return value : 23
Lingo32W return value : 23
Edu32W return value : 23
crt_wcslen return value : 23
ucLen : 87 cycles
StrLenW : 29 cycles
Lingo32W : 68 cycles
Edu32W : 35 cycles
crt_wcslen : 92 cycles
Press any key to exit...

[attachment deleted by admin]

jdoe

@Thanks for your comments Michael, it helps a lot. I'm gonna read on code alignment as much as I can to know exactly what I'm doing because for now it is a little "unsure stuff" for me.


I'm at the office rigth now and I work on a P4

my other brother darryl
LENGTHOF : 24
SIZEOF : 48
ucLen return value : 23
StrLenW return value : 23
Lingo32W return value : 23
ucLen2 return value : 23
crt_wcslen return value : 23
Edu32W return value : 23
ucLen : 74 cycles
StrLenW : 75 cycles
Lingo32W : 124 cycles
ucLen2 : 69 cycles
crt_wcslen : 110 cycles
Edu32W : 61 cycles

Press any key to exit...


StrLenw seems to have better result on AMD processor (I will post the result when at home).


All togethers in the attachment including the new one of EduardoS  :U



[attachment deleted by admin]

EduardoS

The last version:
Quote
my other brother darryl
LENGTHOF : 24
SIZEOF : 48
ucLen return value : 23
StrLenW return value : 23
Lingo32W return value : 23
ucLen2 return value : 23
crt_wcslen return value : 23
Edu32W return value : 23
ucLen : 87 cycles
StrLenW : 26 cycles
Lingo32W : 63 cycles
ucLen2 : 58 cycles
crt_wcslen : 92 cycles
Edu32W : 35 cycles

Press any key to exit...

EduardoS

Hi all,
I try two other algos,
One is my last one "joined" with jdoe's one,
The other use SSE (3DNow+, so it run on the first Athlon),

Can someone with a P4 test them?


my other brother darryl
LENGTHOF : 24
SIZEOF : 48
StrLenW return value : 23
Lingo32W return value : 23
ucLen2 return value : 23
Edu32W return value : 23
Edu32W2 return value : 23
EduSSE return value : 23

StrLenW : 26 cycles
Lingo32W : 63 cycles
ucLen2 : 58 cycles
Edu32W : 35 cycles
Edu32W2 : 24 cycles
EduSSE : 27 cycles

Press any key to exit...

[attachment deleted by admin]

hutch--

This is on a 2.8 gig Prescott PIV.


my other brother darryl
LENGTHOF : 24
SIZEOF : 48
StrLenW return value : 23
Lingo32W return value : 23
ucLen2 return value : 23
Edu32W return value : 23
Edu32W2 return value : 23
EduSSE return value : 23

StrLenW : 82 cycles
Lingo32W : 122 cycles
ucLen2 : 76 cycles
Edu32W : 72 cycles
Edu32W2 : 59 cycles
EduSSE : 18 cycles

Press any key to exit...


I do have a comment on the test sample though, while it makes sense to test a short string as it tells you the takeoff speed of each algo, it does not address the algo speed on much longer strings where the stack frame size does not matter and where you are more interested in its linear forward speed. probably a string over 64k would be a good idea as well as it avoids the considerations that best suit a short string.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

jdoe

Quote from: EduardoS on April 12, 2006, 12:58:56 AM
I try two other algos,
One is my last one "joined" with jdoe's one,

In fact StrLenW is just StrLen from masm32 library I played with to fit unicode.   :wink


On my AMD Athlon 1800+

my other brother darryl
LENGTHOF : 24
SIZEOF : 48
StrLenW return value : 23
Lingo32W return value : 23
ucLen2 return value : 23
Edu32W return value : 23
Edu32W2 return value : 23
EduSSE return value : 23

StrLenW : 35 cycles
Lingo32W : 77 cycles
ucLen2 : 72 cycles
Edu32W : 60 cycles
Edu32W2 : 29 cycles
EduSSE : 33 cycles

Press any key to exit...


:clap:


hutch--

Here is a benchmark using the windows.inc file.

These are the time I get on the PIV.

1047 ucLen
968 ucLen2
797 EduSSE
906 Edu32W2
1032 StrLenW
1453 Lingo32W
1032 ucLen
968 ucLen2
797 EduSSE
907 Edu32W2
1031 StrLenW
1453 Lingo32W
1047 ucLen
969 ucLen2
796 EduSSE
891 Edu32W2
1047 StrLenW
1453 Lingo32W
1047 ucLen
969 ucLen2
797 EduSSE
906 Edu32W2
1031 StrLenW
1453 Lingo32W
Press any key to continue ...

[attachment deleted by admin]
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

EduardoS

Thank you for testing  :U,

Quote from: hutch-- on April 12, 2006, 01:30:46 AM
I do have a comment on the test sample though, while it makes sense to test a short string as it tells you the takeoff speed of each algo, it does not address the algo speed on much longer strings where the stack frame size does not matter and where you are more interested in its linear forward speed. probably a string over 64k would be a good idea as well as it avoids the considerations that best suit a short string.

hutch, when i try test a strlen with big strings a guy say me "no one want the strlen of big strings"...
I'm happy seeing someone who thinks diferent...

hutch--

Yeah,

Me. Its not used all that often but being able to test on big stuff has its place from time to time. It does tell you if an algo has more than startup speed though, it tells you the core speed without stuff like stack entry and exit.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

Mark Jones

And what if you want to know the length of text in an Edit Control? (Assuming Unicode versions exist.) That could easily be >64kB.

Algo idea: perhaps read a dword with STOSD then AND the two "null" positions and break on either being zero?
"To deny our impulses... foolish; to revel in them, chaos." MCJ 2003.08

Tedd

Quote from: jdoe on April 10, 2006, 09:47:23 PM
That's why it is

cmp word ptr [eax], 0


Could you explain whats wrong because I don't get it.

Sorry, my bad - I obviously can't read ::)
No snowflake in an avalanche feels responsible.

jdoe

Hi,

One question came to my mind about reading past the end of a buffer. Agner Fog algo StrLen reads 3 characters past the end. Is there any danger if I want to read 7 characters past the end for example (for ASCII). Doing so make good speed improvement.

This confusion came to me while playing with StrLenW and reading 3 characters (UNICODE) past the end.

This is what I have for StrLenW...


.586

.MODEL FLAT, STDCALL

OPTION CASEMAP:NONE

.CODE

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

ALGN BYTE 0CCh
     BYTE 0CCh
     BYTE 0CCh
     BYTE 0CCh
     BYTE 0CCh
     BYTE 0CCh
     BYTE 0CCh
     BYTE 0CCh
     BYTE 0CCh
     BYTE 0CCh
     BYTE 0CCh
     BYTE 0CCh

StrLenW PROC p_lpszStr:DWORD

   mov eax, dword ptr [esp+4]

@@:
   mov ecx, dword ptr [eax]
   mov edx, dword ptr [eax+4]
   add eax, 8
   test ecx, 0FFFFh
   jz @0
   test ecx, 0FFFF0000h
   jz @2
   test edx, 0FFFFh
   jz @4
   test edx, 0FFFF0000h
   jnz @B

@6:
   sub eax, 2
   sub eax, dword ptr [esp+4]
   shr eax, 1
   ret 4

@4:
   sub eax, 4
   sub eax, dword ptr [esp+4]
   shr eax, 1
   ret 4

@2:
   sub eax, 6
   sub eax, dword ptr [esp+4]
   shr eax, 1
   ret 4

@0:
   sub eax, 8
   sub eax, dword ptr [esp+4]
   shr eax, 1
   ret 4

StrLenW ENDP

OPTION PROLOGUE:PROLOGUEDEF
OPTION EPILOGUE:EPILOGUEDEF

END



1) Do you see any problems with this StrLenW algo above (2 dword reads ecx-edx)

2) What is the danger of doing the same (2 dword reads ecx-edx) in StrLenA


Thanks

PBrennick

JDoe,
I do not think it will pose a problem as long as you know you are doing it and you ignore the extra bytes when processing the information. The thing you should not do is write past the end of the buffer. No problem there, though, right?

Paul
The GeneSys Project is available from:
The Repository or My crappy website

Relvinian

The only time reading past a length of a buffer (whether it be strings or not), is when you cross a page-boundary and the next page isn't mark for accessing by your application. Then you will generate an exception -- even if it is one byte past.  So as long as you are aware of this, you can read any length past a buffer. :P

Relvnian

Seb

I added my two newbie attempts at calculating the length of a Unicode string. I didn't test them very much, but I hope they are alright and they seem to be working correctly. I also integrated my functions in hutch's windows.inc test. Results are shown below:


my other brother darryl
LENGTHOF : 24
SIZEOF : 48
StrLenW return value : 23
Lingo32W return value : 23
ucLen2 return value : 23
Edu32W return value : 23
Edu32W2 return value : 23
EduSSE return value : 23
SebW return value : 23
SebW2 return value : 23

StrLenW : 25 cycles
Lingo32W : 64 cycles
ucLen2 : 57 cycles
Edu32W : 36 cycles
Edu32W2 : 25 cycles
EduSSE : 28 cycles
SebW : 37 cycles
SebW2 : 29 cycles

Press any key to exit...



1672 ucLen
1453 ucLen2
985 EduSSE
1125 Edu32W2
1125 StrLenW
1656 Lingo32W
1078 SebW
1109 SebW2
1641 ucLen
1484 ucLen2
969 EduSSE
1125 Edu32W2
1110 StrLenW
1671 Lingo32W
1094 SebW
1110 SebW2
1656 ucLen
1469 ucLen2
968 EduSSE
1125 Edu32W2
1110 StrLenW
1672 Lingo32W
1093 SebW
1110 SebW2
1656 ucLen
1469 ucLen2
984 EduSSE
1125 Edu32W2
1109 StrLenW
1657 Lingo32W
1093 SebW
1110 SebW2
Press any key to continue ...



OPTION PROLOGUE:NONE          ; turn it off
OPTION EPILOGUE:NONE

SebW proc src:DWORD
mov eax,[esp+4]
xor ecx,ecx

align 16
@@:
cmp word ptr [eax],0
jz @F
add ecx,1
cmp word ptr [eax+2],0
jz @F
add ecx,1
cmp word ptr [eax+4],0
jz @F
add ecx,1
cmp word ptr [eax+6],0
jz @F
add ecx,1
add eax,8
jmp @B
@@:
mov eax,ecx
ret 4
SebW endp

OPTION PROLOGUE:PROLOGUEDEF   ; turn back on the defaults
OPTION EPILOGUE:EPILOGUEDEF

OPTION PROLOGUE:NONE          ; turn it off
OPTION EPILOGUE:NONE

SebW2 proc src:DWORD
mov eax,[esp+4]
mov ecx,eax

align 16
@@:
add eax,2
cmp word ptr [eax],0
jz @F
add eax,2
cmp word ptr [eax],0
jz @F
add eax,2
cmp word ptr [eax],0
jz @F
add eax,2
cmp word ptr [eax],0
jnz @B
@@:
sub eax,ecx
shr eax,1
ret 4
SebW2 endp

OPTION PROLOGUE:PROLOGUEDEF   ; turn back on the defaults
OPTION EPILOGUE:EPILOGUEDEF


Oh, by the way, I'm on a Athlon 64 X2 Dual.

[attachment deleted by admin]