Print Page - lstrcpy vs szCopy

Title: lstrcpy vs szCopy
Post by: jj2007 on February 07, 2009, 11:02:33 PM

Well, if the subject sounds familiar: I am reviving an old thread (http://www.masm32.com/board/index.php?topic=1589.msg12747#msg12747).

The algos there are fast, but they are mmx and therefore trash the FPU (I like the FPU). So I thought of adapting one of them, actually an algo by Lingo, to produce an XMM version. And to make it more realistic, I introduced spoilers:

Code Select

.data
align 16

spoil1	db 1, 2, 3		; badly aligned source

String1     DB  "Sample String 01234 56789 ABCDEF AaBbCcDdEeFfGgHhIiJjKkLlMMNnOoP",\

On my Core 2 Celeron M, differences in timings are there but not dramatic. What was more dramatic was the silent bye-bye when I removed the spoilers...

With the spoilers, the XMM version works just fine and leaves the FPU in peace. When I remove them, then both Lingo's and my adapted algo crash miserably with exception #5 at movq xmm0, qword ptr [ecx+eax]

Anybody interested to have a look into this? I also suspect that my version could be a lot improved...

[attachment deleted by admin]

Title: Re: lstrcpy vs szCopy
Post by: sinsi on February 07, 2009, 11:54:20 PM

It seems to overwrite the counter used by counter_end - this is always 39383736h, so the counter never gets to 0. I don't get any sort of exception.

Title: Re: lstrcpy vs szCopy
Post by: jj2007 on February 08, 2009, 12:02:56 AM

Thanks Sinsi - that makes sense. In the meantime, I made up another xmm version:

Code Select

comment * based on MMX  Fast by Mark Larson *
align 16	; seems to have little influence, the nop makes it two cycles faster ;-)
nop
szCopyXMM proc dest:DWORD, src:DWORD
   mov eax,[esp+8]
   mov esi,[esp+4]
align 16
qword_copy1b:
   pxor xmm1, xmm1
   movups xmm0, oword ptr [eax]
   pcmpeqb xmm1, xmm0
   add eax, 8+8
   pmovmskb ecx, xmm1
   or ecx,ecx 
   jnz finish_rest1
   movups oword ptr [esi], xmm0
   add esi, 8+8
   jmp qword_copy1b
finish_rest1:
ret 8
szCopyXMM endp

512-byte string copy timing results:

Code Select

 
szCopyXMM -> jj   ->               xmm: 278 clocks
szCopyMMX   -> Mark Larson   ->    MMX: 312 clocks
SzCpy11  -- > Lingo ->  MMX   ->  Fast: 284 clocks
szCopyMMX1-> Mark Larson -> MMX-> Fast: 309 clocks
                                szCopy  1076 clocks
                                lstrcpy 1202 clocks
SzCpy10        - > Lingo ->        MMX: 283 clocks
MbCopy     -> jj   ->              xmm: 384 clocks

Now one problem is that, aligned or not, these algos work in chunks of 128 bytes. So there are problems with small strings...

Title: Re: lstrcpy vs szCopy
Post by: qWord on February 08, 2009, 12:24:38 AM

hi,

after a quick view, i think that the problem is caused by "test bl,bl" (in your and lingo's routine) -> There are 4 packet Bytes after "packsswb" - so you have to test for these 4 bytes with "test ebx,ebx".

regards, qWord

Title: Re: lstrcpy vs szCopy
Post by: jj2007 on February 08, 2009, 12:44:06 AM

Thanks, qword - I am afraid it keeps choking. But the other one seems to work just fine, also for small strings and bad alignment. However, it needs a zero delimiter at the end - see mov byte ptr [esi], 0 below

Code Select

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
comment * based on MMX  Fast by Mark Larson *
align 16	; seems to have little influence, the nop makes it two cycles faster ;-)
nop
szCopyXMM proc dest:DWORD, src:DWORD
   mov eax,[esp+8]
   mov esi,[esp+4]
align 16
qword_copy1b:
   pxor xmm1, xmm1
   movups xmm0, oword ptr [eax] 
   pcmpeqb xmm1, xmm0
   add eax, 8+8
   pmovmskb ecx, xmm1
   or ecx,ecx 
   jnz finish_rest1
   movups oword ptr [esi], xmm0
   add esi, 8+8
   jmp qword_copy1b
finish_rest1:
  mov byte ptr [esi], 0
ret 8
szCopyXMM endp

Title: Re: lstrcpy vs szCopy
Post by: jj2007 on February 08, 2009, 01:55:29 AM

OK, I made a bit of cleanup and am satisfied with this version:

Code Select

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
comment * inspired by "MMX  Fast" by Mark Larson *
; align 16	; seems to have NO influence
szCopyXMM proc dest:DWORD, src:DWORD
  push esi
  push edi
  mov edi, [esp+4+8]
  mov esi, [esp+8+8]
  push ecx	; preserve another valuable register
@@:
   pxor xmm1, xmm1
   movups xmm0, oword ptr [esi] 
   pcmpeqb xmm1, xmm0
   pmovmskb ecx, xmm1	; Move Byte Mask To Integer - a fantastic instruction!
   test ecx,ecx 
   jnz @F
   movups oword ptr [edi], xmm0
   add esi, 16
   add edi, 16
   jmp @B
@@:
  .Repeat
	lodsb	; relatively slow
	stosb	; tail cleanup
  .Until al==0
  mov eax, edi		; a stringcat routine might need this one
  pop ecx		; restore ecx
  pop edi
  pop esi
ret 8		; cleanup
szCopyXMM endp

Testing the 16-byte boundary looks fine:

Code Select

Source=B23456789012345
  Dest=B23456789012345
Source=C234567890123456
  Dest=C234567890123456
Source=D2345678901234567
  Dest=D2345678901234567

512-byte string copy timing results (aligned):

Code Select

 len of source string = 512
 len of szCopyXMM: 55
szCopyXMM -> jj   ->               xmm: 298 clocks
szCopyMMX   -> Mark Larson   ->    MMX: 312 clocks
SzCpy11  -- > Lingo ->  MMX   ->  Fast: 285 clocks
szCopyMMX1-> Mark Larson -> MMX-> Fast: 308 clocks
                                szCopy  1053 clocks
                                lstrcpy 1184 clocks
SzCpy10        - > Lingo ->        MMX: 283 clocks
MbCopy     -> jj   ->              xmm: 380 clocks

Three times as fast as szCopy, 55 bytes short, and does not trash the FPU. The only caveat is that your puter should be less than seven years old :green

[attachment deleted by admin]

Title: Re: lstrcpy vs szCopy
Post by: NightWare on February 08, 2009, 03:29:17 AM

jj, glad to see you play with simd stuff :bg

but,
1. can you explain me why pxor xmm1,xmm1 is IN the loop ?
2. for unaligned data, look at lddqu instruction

Title: Re: lstrcpy vs szCopy
Post by: askm on February 08, 2009, 05:09:49 AM

Who can explain these results ?

512-byte string copy timing results:

len of source string = 512
len of szCopyXMM: 52
szCopyXMM -> jj ->         xmm: 2085 clocks
szCopyMMX -> Mark Larson -> MMX: 323 clocks
SzCpy11 -- > Lingo -> MMX -> Fast: 285 clocks
szCopyMMX1-> Mark Larson -> MMX-> Fast: 324 clocks
            szCopy   1556 clocks
            lstrcpy   1573 clocks
SzCpy10 - > Lingo -> MMX: 285 clocks
MbCopy -> jj ->         xmm: 284 clocks

Title: Re: lstrcpy vs szCopy
Post by: MichaelW on February 08, 2009, 06:52:19 AM

I think the problem might be the processor it's running on. This is what I get on my P3:

Code Select


 len of source string = 512
 len of szCopyXMM: 52
szCopyXMM -> jj   ->               xmm: 2090 clocks
szCopyMMX   -> Mark Larson   ->    MMX: 319 clocks
SzCpy11  -- > Lingo ->  MMX   ->  Fast: 281 clocks
szCopyMMX1-> Mark Larson -> MMX-> Fast: 308 clocks
                                szCopy  2078 clocks
                                lstrcpy 2384 clocks
SzCpy10        - > Lingo ->        MMX: 285 clocks
MbCopy     -> jj   ->              xmm: 282 clocks

Or not. If I comment out the tail cleanup code then szCopyXMM runs in 11 cycles and the procedure fails the function tests, implying that most or all of the work is being done by the tail cleanup code.

After more tests I think the problem is my processor. On a P3 I think pmovmskb and pcmpeqb are limited to the MMX registers. I don't see any errors when I assemble, but on the first iteration of the loop ECX is always 0FFh, when it should be 0 up to the last loop.

Or not exactly. Assembling the code with ML 6.14, 6.15, and 7.00 I get:

Code Select


004019AF 660FEFC9               pxor    mm1,mm1
004019B3 0F1006                 movups  xmm0,[esi]
004019B6 660F74C8               pcmpeqb mm1,mm0
004019BA 660FD7C9               pmovmskb cx,mm1

And with 6.15 and 7.00 the code generates an illegal instruction exception somewhere further down (in MbCopy). So there is a problem with the version of ML, but if that were fixed then there would be a problem with the processor not supporting some of the instructions.

Title: Re: lstrcpy vs szCopy
Post by: jj2007 on February 08, 2009, 09:02:40 AM

Quote from: NightWare on February 08, 2009, 03:29:17 AM
jj, glad to see you play with simd stuff :bg

but,
1. can you explain me why pxor xmm1,xmm1 is IN the loop ?

Because I shamelessly copied that from Mark's code :bg

Quote
2. for unaligned data, look at lddqu instruction

Yields the same timings, is 2 bytes longer (55->57 bytes), and decreases the maximum age of your puter.
movdqu and movups produce exactly the same timings. I chose movups below (2 bytes shorter than movdqu), but maybe there are differences by processor type. Anyway, thanks a lot for the hint to lddqu, it made me find movups/movdqu, which both improve drastically the timings for the non-aligned strings:

512-byte string copy timing results:

Code Select

 len of source string = 512
 alignment: offset src=4202611, dest=4203173
 len of szCopyXMM: 55

szCopyXMM -> jj   ->               xmm: 484 clocks
szCopyMMX   -> Mark Larson   ->    MMX: 474 clocks
SzCpy11  -- > Lingo ->  MMX   ->  Fast: 474 clocks
szCopyMMX1-> Mark Larson -> MMX-> Fast: 439 clocks
                                szCopy  1053 clocks
                                lstrcpy 1214 clocks
SzCpy10        - > Lingo ->        MMX: 476 clocks
MbCopy     -> jj   ->              xmm: 560 clocks

There are some that are a few clocks faster, but remember they trash the FPU.

Code Select

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
comment * inspired by "MMX  Fast" by Mark Larson *
; align 16	; seems to have NO influence
szCopyXMM proc dest:DWORD, src:DWORD
  push esi
  push edi
  mov edi, [esp+4+8]
  mov esi, [esp+8+8]
  push ecx	; preserve another valuable register
  pxor xmm1, xmm1
@@:
  movups xmm0, [esi] 
  pcmpeqb xmm1, xmm0
  pmovmskb ecx, xmm1	; Move Byte Mask To Integer - a fantastic instruction!
  test ecx,ecx 
  jnz @F
  movups [edi], xmm0
  add esi, 16
  add edi, 16
  jmp @B
@@:
  .Repeat
	lodsb	; relatively slow
	stosb	; tail cleanup
  .Until al==0
  mov eax, edi		; a stringcat routine might need this one
  pop ecx		; save ecx
  pop edi
  pop esi
ret 8		; cleanup
szCopyXMM endp

Finally, as to the "strange" timings: Try to assemble the code with ML 9.0 or with JWasm.

EDIT: Here are the tiny differences between the codes generated by masm 6.14 and the others.
You might google for "size override" optimization 66h (http://www.google.it/search?num=50&hl=en&newwindow=1&safe=off&q=%22size+override%22+optimization+66h&btnG=Search)

ml v614
004019C0 ³? 0FEFC9 pxor mm1, mm1
004019C3 ³> 0F1006 Úmovups xmm0, dqword ptr [esi]
004019C6 ³. 0F74C8 ³pcmpeqb mm1, mm0
004019C9 ³. 0FD7C9 ³pmovmskb ecx, mm1
004019CC ³. 85C9 ³test ecx, ecx
004019CE ³.75 0B ³jne short SzCpy.004019DB
004019D0 ³. 0F1107 ³movups dqword ptr [edi], xmm0

ml v9
004019C8 ³? 660FEFC9 ³pxor xmm1, xmm1
004019CC ³. 0F1006 ³movups xmm0, dqword ptr [esi]
004019CF ³? 660F74C8 ³pcmpeqb xmm1, xmm0
004019D3 ³. 660FD7C9 ³pmovmskb ecx, xmm1
004019D7 ³? 85C9 ³test ecx, ecx
004019D9 ³.75 0B Àjne short SzCpy.004019E6
004019DB ³> 0F1107 Úmovups dqword ptr [edi], xmm0

JWasm
004019C8 ³? 660FEFC9 ³pxor xmm1, xmm1
004019CC ³. 0F1006 ³movups xmm0, dqword ptr [esi]
004019CF ³? 660F74C8 ³pcmpeqb xmm1, xmm0
004019D3 ³. 660FD7C9 ³pmovmskb ecx, xmm1
004019D7 ³? 85C9 ³test ecx, ecx
004019D9 ³.75 0B Àjne short SzCpy.004019E6
004019DB ³> 0F1107 Úmovups dqword ptr [edi], xmm0

[attachment deleted by admin]

Title: Re: lstrcpy vs szCopy
Post by: askm on February 08, 2009, 10:39:21 AM

Must be...
I used the new 'ml' and then the old 'link' and the differences are...

512-byte string copy timing results:

len of source string = 512
len of szCopyXMM: 55
szCopyXMM -> jj -> xmm: 358 clocks
szCopyMMX -> Mark Larson -> MMX: 323 clocks
SzCpy11 -- > Lingo -> MMX -> Fast: 285 clocks
szCopyMMX1-> Mark Larson -> MMX-> Fast: 325 clocks
szCopy 1564 clocks
lstrcpy 1571 clocks
SzCpy10 - > Lingo -> MMX: 284 clocks
MbCopy -> jj -> xmm: 476 clocks

Title: Re: lstrcpy vs szCopy
Post by: jj2007 on February 08, 2009, 11:29:24 AM

Quote from: askm on February 08, 2009, 10:39:21 AM
Must be...
I used the new 'ml' and then the old 'link' and the differences are...

Yes, it's the three missing size override 66h bytes. Jwasm works fine, too.

Title: Re: lstrcpy vs szCopy
Post by: MichaelW on February 08, 2009, 12:21:48 PM

Manually encoding the three operand size prefixes does not improve the cycle count on my P3, it stays at about 2090.

Title: Re: lstrcpy vs szCopy
Post by: donkey on February 08, 2009, 12:34:56 PM

I was looking through the original thread and noticed a few references to my stings functions but for some reason I never bothered to post any code probably because I just assumed I had already posted them in other threads, here are the functions from strings.lib that Mark Larson referred to. They have mostly been dissected and rewritten over the years by people like Mark who took them and vastly improved them but for what its worth...

Code Select

lszLenMMX/lszLenMMXW
	NOTE: These functions require a Pentium 3 or better with SSE instructions
	Calculates the length of a string, the string should be aligned.
	lszLenMMXW is a Unicode variant.
	Parameters:
		pString = Pointer to a null terminated string
	Returns the length of the supplied string not including the NULL terminator

lszCopyMMX
	NOTE: This function requires a Pentium 3 or better with SSE instructions
	Copies a zero terminated string using the MMX registers (not preserved)
	Parameters:
		Dest = Pointer to destination buffer
		Source = Pointer to source string
	Returns the address of the destination buffer

Code Select

lszCopyMMX FRAME lpDest,lpSource
	uses esi,edi

	mov esi,[lpSource]
	mov edi,[lpDest]

	mov ecx,esi
	and ecx,15
	rep movsb

	nop
	pxor mm0,mm0
	nop
	pxor mm1,mm1
	nop

	:
		movq mm0,[esi]
		movq mm2,[esi]
		pcmpeqb mm2,mm1
		pmovmskb ecx,mm2
		or ecx,ecx
		jnz >
		movq [edi],mm0
		add edi, 8
		add esi, 8
	jmp <
	:

	emms
	; Do the remainder
	bsf ecx,ecx
	rep movsb
	mov [edi],cl
	
	mov eax,edi
	sub eax,[lpDest]
   ret
ENDF

Code Select

lszLenMMX FRAME pString

	mov eax,[pString]
	nop
	nop ; fill in stack frame+mov to 8 bytes

	pxor mm0,mm0
	nop ; fill pxor to 4 bytes
	pxor mm1,mm1
	nop ; fill pxor to 4 bytes

	: ; this is aligned to 16 bytes
	movq mm0,[eax]
	pcmpeqb mm0,mm1
	add eax,8
	pmovmskb ecx,mm0
	or ecx,ecx
	jz <

	sub eax,[pString]

	bsf ecx,ecx
	sub eax,8
	add eax,ecx

	emms

   RET

ENDF

Code Select

lszLenMMXW FRAME pString

	mov eax,[pString]
	nop
	nop ; fill in stack frame+mov to 8 bytes

	pxor mm0,mm0
	nop ; fill pxor to 4 bytes
	pxor mm1,mm1
	nop ; fill pxor to 4 bytes

	: ; this is aligned to 16 bytes
	movq mm0,[eax]
	pcmpeqw mm0,mm1
	add eax,8
	pmovmskb ecx,mm0
	or ecx,ecx
	jz <

	sub eax,[pString]

	bsf ecx,ecx
	sub eax,8
	add eax,ecx
	shr eax,1
	emms

   RET

ENDF

Title: Re: lstrcpy vs szCopy
Post by: jj2007 on February 08, 2009, 01:42:11 PM

Quote from: MichaelW on February 08, 2009, 12:21:48 PM
Manually encoding the three operand size prefixes does not improve the cycle count on my P3, it stays at about 2090.

That is what I get with the ml614 version, see below. With JWasm and ML 9.0, this drops to 471 cycles.

I attach the latest version with the two executables.

Code Select

szCopyXMM -> jj   ->               xmm: 2084 clocks
szCopyMMX   -> Mark Larson   ->    MMX: 474 clocks
SzCpy11  -- > Lingo ->  MMX   ->  Fast: 477 clocks
szCopyMMX1-> Mark Larson -> MMX-> Fast: 438 clocks
                                szCopy  1053 clocks
                                lstrcpy 1216 clocks
SzCpy10        - > Lingo ->        MMX: 480 clocks
MbCopy     -> jj   ->              xmm: 478 clocks

[attachment deleted by admin]

Title: Re: lstrcpy vs szCopy
Post by: MichaelW on February 08, 2009, 02:34:51 PM

For szCpyV614:

Code Select


szCopyXMM -> jj   ->               xmm: 2082 clocks
szCopyMMX   -> Mark Larson   ->    MMX: 632 clocks
SzCpy11  -- > Lingo ->  MMX   ->  Fast: 697 clocks
szCopyMMX1-> Mark Larson -> MMX-> Fast: 624 clocks
                                szCopy  2076 clocks
                                lstrcpy 2704 clocks
SzCpy10        - > Lingo ->        MMX: 701 clocks
MbCopy     -> jj   ->              xmm: 697 clocks

This is basically the results I got for the version were I manually added the prefixes and assembled with 6.14.

For szCpyV9:

Code Select


szCopyXMM -> jj   ->               xmm: 2082 clocks
szCopyMMX   -> Mark Larson   ->    MMX: 633 clocks
SzCpy11  -- > Lingo ->  MMX   ->  Fast: 697 clocks
szCopyMMX1-> Mark Larson -> MMX-> Fast: 624 clocks
                                szCopy  2078 clocks
                                lstrcpy 2703 clocks
SzCpy10        - > Lingo ->        MMX: 700 clocks
MbCopy     -> jj   ->              xmm:

There is no count for the last procedure because it generates:

Exception number: c000001d (illegal instruction)

And this is basically the results I got for the versions that I assembled with 6.15 and 7.00, which BTW added the prefixes.

I would think Intel would ensure that any instruction, which would run on the processor, would produce the same result as on the later processors. I would be interested to see if other P3 processors have this problem.

Title: Re: lstrcpy vs szCopy
Post by: jj2007 on February 08, 2009, 03:13:28 PM

Interesting. Exactly the same clocks as szCopy, but a lot slower than the MMX versions. Can post anybody results for an AMD, or other Intel processors?

Michael, the 66h seems to have a function similar to nop. Could you try replacing the 66h with a number of nops?

Title: Re: lstrcpy vs szCopy
Post by: jj2007 on February 08, 2009, 03:47:05 PM

Just found this, indicating it's a known problem of early Pentiums:

Many compilers for IA-32 generate "repne scasb" in order
tto find the length of a given C string. However, it is possible
to implement strlen (and many other string functions) using
the SSE2 instruction set: pcmpeqb + pmovmaskb until there
is a set bit then bsf to find its index. On Core2 it is roughly
9.3 times faster and about 6.5 times faster on Pentium 4.

http://www.mydatabasesupport.com/forums/arch/252748-fast-string-functions.html (http://www.mydatabasesupport.com/forums/arch/252748-fast-string-functions.html)

Title: Re: lstrcpy vs szCopy
Post by: MichaelW on February 08, 2009, 05:33:38 PM

66h is the operand-size prefix, or per Intel the operand-size override prefix:

Quote
The operand-size override prefix allows a program to switch between 16- and 32-bit operand sizes. Either size can be the default; use of the prefix selects the non-default size. Use of 66H followed by 0FH is treated as a mandatory prefix by some SSE/SSE2/SSE3 instructions. Other use of the 66H prefix with MMX/SSE/SSE2/SSE3 instructions is reserved; such use may cause unpredictable behavior.

I knew it was used for the integer instructions, but I have never before noticed it on an MMX or SSE instruction, although it does make some sense that they would use it to specify the register size.

I have verified that with the prefixes in place, and the encoding exactly as MLv9 produced, on the first execution:

pmovmskb ecx, xmm1

Sets ecx to 0FFh, so the following conditional jump is always taken, and the tail cleanup code performs the copy operation. The code looks correct according to the Intel references, but on my P3 it does not work as it is documented to work.

Am I the only cheapskate here that is still running a P3?

Title: Re: lstrcpy vs szCopy
Post by: jj2007 on February 09, 2009, 11:38:01 AM

Quote from: MichaelW on February 08, 2009, 05:33:38 PM
pmovmskb ecx, xmm1
Sets ecx to 0FFh, so the following conditional jump is always taken, and the tail cleanup code performs the copy operation.

Which is kind of a convenient bug, allowing a soft fall through... :-)

Quote
Am I the only cheapskate here that is still running a P3?

Don't be desperate, the attached version should be fine for you. The code should look familiar to you, just search for 80808080h ...

Timings for a P4:

Code Select

 alignment: offset src=4210803, dest=4211365

Source len=512
1109     clocks for szCopyXMM
1113     clocks for szCopyMMX
1158     clocks for SzCpy11
1087     clocks for szCopyMMX1
1180     clocks for SzCpy10
1711     clocks for szCopy
2164     clocks for lstrcpy

Source len=511
1096     clocks for szCopyXMM
1098     clocks for szCopyMMX
1160     clocks for SzCpy11, result NOT CORRECT
1082     clocks for szCopyMMX1, result NOT CORRECT
1165     clocks for SzCpy10
1699     clocks for szCopy
2166     clocks for lstrcpy

Source len=15
75       clocks for szCopyXMM
58       clocks for szCopyMMX
47       clocks for SzCpy11, result NOT CORRECT
31       clocks for szCopyMMX1, result NOT CORRECT
49       clocks for SzCpy10
70       clocks for szCopy
147      clocks for lstrcpy

[attachment deleted by admin]

Title: Re: lstrcpy vs szCopy
Post by: FORTRANS on February 09, 2009, 03:03:45 PM

Quote from: MichaelW on February 08, 2009, 05:33:38 PM
Am I the only cheapskate here that is still running a P3?

Hi,

Hardly. While I have newer machines, _at home_ I mostly use
my PIII or Pentium systems. And I have an HP 200LX that uses
an 80186 in my pocket. I did once throw away an IBM PC... I
really ought to get rid of some of the older ones.

Regards,

Steve

Title: Re: lstrcpy vs szCopy
Post by: jj2007 on February 10, 2009, 12:01:49 AM

Updated with some refinements - see Core Duo (Celeron M) timings below.
I included crt_strcpy into the testbed - it's remarkably fast for short strings.

An additional test for a global "CanSSE2" variable would cost ca. 3 cycles extra. My goal was a fast general purpose library routine that does not trash the FPU. The current szCopyXMM is pretty good on the Core2, but I would be grateful for more timings, especially on the old Pentiums and AMDs. The algo is a bit weak on very short strings (0-15 bytes), due to the overhead. A rough estimate says a well commented MASM source has an average line length of 30 or so...

Note: All algos (except the Masm32 szCopy) have been harmonised to copy dest, source, consistent with mov dest, src.
The attached executable was assembled with JWasm.

Code Select

 alignment: offset src=4214899, dest=4215461
 len of szCopyXMM: 118

Source len=512
465      clocks for szCopyXMM
456      clocks for szCopyMMX   mmx, trashes FPU
481      clocks for SzCpy10     mmx, trashes FPU
1054     clocks for szCopy
1098     clocks for lstrcpy
668      clocks for crt_strcpy

Source len=511
454      clocks for szCopyXMM
434      clocks for szCopyMMX   mmx, trashes FPU
480      clocks for SzCpy10     mmx, trashes FPU
1055     clocks for szCopy
1086     clocks for lstrcpy
684      clocks for crt_strcpy

Source len=128
139      clocks for szCopyXMM
119      clocks for szCopyMMX   mmx, trashes FPU
126      clocks for SzCpy10     mmx, trashes FPU
285      clocks for szCopy
304      clocks for lstrcpy
171      clocks for crt_strcpy

Source len=127
129      clocks for szCopyXMM
114      clocks for szCopyMMX   mmx, trashes FPU
126      clocks for SzCpy10     mmx, trashes FPU
286      clocks for szCopy
304      clocks for lstrcpy
171      clocks for crt_strcpy

Source len=31
56       clocks for szCopyXMM
46       clocks for szCopyMMX   mmx, trashes FPU
52       clocks for SzCpy10     mmx, trashes FPU
96       clocks for szCopy
112      clocks for lstrcpy
51       clocks for crt_strcpy

Source len=17
33       clocks for szCopyXMM
38       clocks for szCopyMMX   mmx, trashes FPU
42       clocks for SzCpy10     mmx, trashes FPU
54       clocks for szCopy
94       clocks for lstrcpy
32       clocks for crt_strcpy

Source len=15
79       clocks for szCopyXMM
46       clocks for szCopyMMX   mmx, trashes FPU
39       clocks for SzCpy10     mmx, trashes FPU
48       clocks for szCopy
89       clocks for lstrcpy
31       clocks for crt_strcpy

[attachment deleted by admin]

Title: Re: lstrcpy vs szCopy
Post by: sinsi on February 10, 2009, 12:41:12 AM

Athlon XP 2600+ (2.13GHz)

Code Select


 alignment: offset src=4214899, dest=4215461
 len of szCopyXMM: 118

Source len=512
688	 clocks for szCopyXMM
451	 clocks for szCopyMMX 	mmx, trashes FPU
455	 clocks for SzCpy10 	mmx, trashes FPU
1827	 clocks for szCopy
1794	 clocks for lstrcpy
742	 clocks for crt_strcpy

Source len=511
683	 clocks for szCopyXMM
420	 clocks for szCopyMMX 	mmx, trashes FPU
455	 clocks for SzCpy10 	mmx, trashes FPU
1822	 clocks for szCopy
1789	 clocks for lstrcpy
740	 clocks for crt_strcpy

Source len=128
198	 clocks for szCopyXMM
142	 clocks for szCopyMMX 	mmx, trashes FPU
136	 clocks for SzCpy10 	mmx, trashes FPU
479	 clocks for szCopy
483	 clocks for lstrcpy
206	 clocks for crt_strcpy

Source len=127
201	 clocks for szCopyXMM
138	 clocks for szCopyMMX 	mmx, trashes FPU
136	 clocks for SzCpy10 	mmx, trashes FPU
475	 clocks for szCopy
479	 clocks for lstrcpy
204	 clocks for crt_strcpy

Source len=31
62	 clocks for szCopyXMM
43	 clocks for szCopyMMX 	mmx, trashes FPU
40	 clocks for SzCpy10 	mmx, trashes FPU
137	 clocks for szCopy
151	 clocks for lstrcpy
58	 clocks for crt_strcpy

Source len=17
43	 clocks for szCopyXMM
48	 clocks for szCopyMMX 	mmx, trashes FPU
31	 clocks for SzCpy10 	mmx, trashes FPU
88	 clocks for szCopy
104	 clocks for lstrcpy
39	 clocks for crt_strcpy

Source len=15
41	 clocks for szCopyXMM
36	 clocks for szCopyMMX 	mmx, trashes FPU
26	 clocks for SzCpy10 	mmx, trashes FPU
81	 clocks for szCopy
97	 clocks for lstrcpy
36	 clocks for crt_strcpy

Title: Re: lstrcpy vs szCopy
Post by: jj2007 on February 10, 2009, 01:23:22 AM

Thanks, Sinsi. Interesting that the algo performs much better for the 15 bytes string than on the Core2. And crt_strcpy is also remarkably good all over the place.

EDIT: Here is the innermost loop of crt_strcpy. Interesting ::)

Code Select

77C160C1               8917                          mov dword ptr [edi], edx
77C160C3               83C7 04                       add edi, 4
77C160C6               BA FFFEFE7E                   mov edx, 7EFEFEFF
77C160CB               8B01                          mov eax, dword ptr [ecx]
77C160CD               03D0                          add edx, eax
77C160CF               83F0 FF                       xor eax, FFFFFFFF
77C160D2               33C2                          xor eax, edx
77C160D4               8B11                          mov edx, dword ptr [ecx]
77C160D6               83C1 04                       add ecx, 4
77C160D9               A9 00010181                   test eax, 81010100
77C160DE              74 E1                         je short msvcrt.77C160C1

Title: Re: lstrcpy vs szCopy
Post by: donkey on February 10, 2009, 02:33:51 AM

Quote from: MichaelW on February 08, 2009, 05:33:38 PM
Am I the only cheapskate here that is still running a P3?

Well, not quite a P3 here but a PIV, a Sempron and an Athlon 64 X2, but the Sempron is the only one I do any dev work on, the A64 is for work and the PIV is just a file server.

Title: Re: lstrcpy vs szCopy
Post by: sinsi on February 10, 2009, 03:14:32 AM

Code Select


OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
align 16
simplecopy proc dest:DWORD, src:DWORD
    push ebx
    mov ecx,[esp+8]
    mov edx,[esp+12]
    sub ebx,ebx
 @@: mov al,[edx+ebx]
    mov [ecx+ebx],al
    inc ebx
    test al,al
    jnz @b
    pop ebx
    ret 8
simplecopy endp
OPTION PROLOGUE:PrologueDef 
OPTION EPILOGUE:EpilogueDef

23 bytes, works on a 386

Code Select


Source len=512
535      clocks for szCopyXMM
444      clocks for szCopyMMX   mmx, trashes FPU
469      clocks for SzCpy10     mmx, trashes FPU
559      clocks for szCopy
1071     clocks for lstrcpy
505      clocks for crt_strcpy
559      clocks for simplecopy

Source len=511
544      clocks for szCopyXMM
441      clocks for szCopyMMX   mmx, trashes FPU
470      clocks for SzCpy10     mmx, trashes FPU
554      clocks for szCopy
1071     clocks for lstrcpy
505      clocks for crt_strcpy
554      clocks for simplecopy

Source len=128
145      clocks for szCopyXMM
134      clocks for szCopyMMX   mmx, trashes FPU
125      clocks for SzCpy10     mmx, trashes FPU
174      clocks for szCopy
301      clocks for lstrcpy
136      clocks for crt_strcpy
172      clocks for simplecopy

Source len=127
141      clocks for szCopyXMM
123      clocks for szCopyMMX   mmx, trashes FPU
124      clocks for SzCpy10     mmx, trashes FPU
167      clocks for szCopy
299      clocks for lstrcpy
135      clocks for crt_strcpy
168      clocks for simplecopy

Source len=31
67       clocks for szCopyXMM
57       clocks for szCopyMMX   mmx, trashes FPU
58       clocks for SzCpy10     mmx, trashes FPU
95       clocks for szCopy
108      clocks for lstrcpy
44       clocks for crt_strcpy
94       clocks for simplecopy

Source len=17
65       clocks for szCopyXMM
50       clocks for szCopyMMX   mmx, trashes FPU
34       clocks for SzCpy10     mmx, trashes FPU
53       clocks for szCopy
66       clocks for lstrcpy
26       clocks for crt_strcpy
53       clocks for simplecopy

Source len=15
91       clocks for szCopyXMM
44       clocks for szCopyMMX   mmx, trashes FPU
37       clocks for SzCpy10     mmx, trashes FPU
46       clocks for szCopy
59       clocks for lstrcpy
22       clocks for crt_strcpy
46       clocks for simplecopy

Title: Re: lstrcpy vs szCopy
Post by: jj2007 on February 10, 2009, 07:57:38 AM

Quote from: sinsi on February 10, 2009, 03:14:32 AM
Code Select Expand
OPTION PROLOGUE:NONE OPTION EPILOGUE:NONE align 16 simplecopy proc dest:DWORD, src:DWORD push ebx mov ecx,[esp+8] mov edx,[esp+12] sub ebx,ebx @@: mov al,[edx+ebx] mov [ecx+ebx],al inc ebx test al,al jnz @b pop ebx ret 8 simplecopy endp OPTION PROLOGUE:PrologueDef OPTION EPILOGUE:EpilogueDef
23 bytes, works on a 386

Cute. Celeron M:

Code Select

Source len=511
454      clocks for szCopyXMM
1052     clocks for simplecopy

Source len=17
32       clocks for szCopyXMM
53       clocks for simplecopy

Source len=15
79       clocks for szCopyXMM
47       clocks for simplecopy

I am working on the ultimate solution:

Code Select

.if len(src)>=32
    invoke szCopyXMM, ...
.else
    invoke simplecopy, ...
.endif

But jokes apart: Did you change puter? Your previous timings look a lot different.

EDIT: Sinsi, I was about accusing you of my pulling my leg because my timings are virtually identical to the Masm32lib szCopy algo, but nope they are not identical:

Code Select

szCopy proc src:DWORD,dst:DWORD

    push ebp
    push esi

    mov edx, [esp+12]
    mov ebp, [esp+16]
    mov eax, -1
    mov esi, 1

  @@:
    add eax, esi
    movzx ecx, BYTE PTR [edx+eax]
    mov [ebp+eax], cl
    test ecx, ecx
    jnz @B

    pop esi
    pop ebp

    ret 8

szCopy endp

Title: Re: lstrcpy vs szCopy: destination alignment better than source alignment??
Post by: jj2007 on February 10, 2009, 04:25:35 PM

While chasing the ultimate lstrcpy replacement, I stumbled over an interesting question: In real life, source strings can be aligned to dwords, destinations can be aligned, but rarely we can do both simultaneously. My test bed says no data aligning is a no-no, but then I decided to compare two versions of the same algo, one pre-aligning the source, the other the destination. To my surprise, there is a difference (MbCopy=src, MbCopyD=dest aligned, timings for a P4):

Code Select

Source len=512
1254     clocks for MbCopy
1022     clocks for MbCopyD

Source len=63
178      clocks for MbCopy
139      clocks for MbCopyD

Source len=55
163      clocks for MbCopy
134      clocks for MbCopyD

Source len=48
150      clocks for MbCopy
128      clocks for MbCopyD

Source len=42
144      clocks for MbCopy
119      clocks for MbCopyD

Source len=37
138      clocks for MbCopy
105      clocks for MbCopyD

Source len=15
38       clocks for MbCopy
62       clocks for MbCopyD              <----- the exception

Source len=7
29       clocks for MbCopy
31       clocks for MbCopyD

Now, is that a well-known phenomenon, and are there established rules to follow??

Here is the algo:

Code Select


NoAlign=	0	; clearly not a good option, but you can test it here
DestAlign=	0	; choose if you want to align the source or the destination
MbCopy proc dest:DWORD, src:DWORD
	push edi
	mov edi, [esp+8]
	mov ecx, [esp+12]
	if NoAlign
		jmp mbcMain 			; neither source nor dest alignment??
	endif
	if DestAlign
	  test edi, 3					; edi=destination address
	else
	  test ecx, 3					; ecx=source address
	endif
	je mbcMain					; dword aligned
@@:	mov al, byte ptr [ecx]		; a byte from src
	inc ecx
	test al, al
	mov byte ptr [edi], al		; does not change the flag, so we can say bye if al was zero
	je mbcBye
	inc edi
	if DestAlign
	  test edi, 3					; edi=destination address
	else
	  test ecx, 3					; ecx=source address
	endif
	jne @B
	jmp mbcMain
	; align 16 no good, costs cycles

@@:				; ------------ innermost loop ------------
	mov [edi], eax
	add edi, 4
mbcMain:	
	mov eax, 07EFEFEFFh
	mov edx, [ecx]
	add eax, edx
	xor edx, eax
	xor edx, 0FFFFFFFFh
	mov eax, [ecx]
	add ecx, 4
	test edx, 81010100h
	je @B			; ------------ innermost loop ------------

	test al, al
	je mbc1
	test ah, ah
	je mbc2
	test eax, 00FF0000h
	je mbc3

mbc4:	
	mov [edi], eax
	jmp mbcBye
mbc3:	
	mov byte ptr [edi+2], 0
mbc2:	
	mov word ptr [edi], ax
mbc1:
	mov byte ptr [edi], al
mbcBye:	
	mov edx, [esp+8]	; return start of buffer
	pop edi
	ret 8
MbCopy endp

By the way, on the P4 the algo beats my previously posted XMM/SSE2 algo hands down. Full testbed attached.

[attachment deleted by admin]

Title: Re: lstrcpy vs szCopy: destination alignment better than source alignment??
Post by: Mark Jones on February 10, 2009, 04:50:22 PM

Quote from: jj2007 on February 10, 2009, 04:25:35 PM
While chasing the ultimate lstrcpy replacement...

Careful JJ, you know that such a thing is impossible, right? :toothy

Title: Re: lstrcpy vs szCopy: destination alignment better than source alignment??
Post by: jj2007 on February 10, 2009, 05:28:26 PM

Quote from: Mark Jones on February 10, 2009, 04:50:22 PM
Quote from: jj2007 on February 10, 2009, 04:25:35 PM
While chasing the ultimate lstrcpy replacement...

Careful JJ, you know that such a thing is impossible, right? :toothy

Hmmm... like infinity, right? But I am approaching zero cycles asymptotically. Go ahead, post your timings :bg

Title: Re: lstrcpy vs szCopy
Post by: jj2007 on February 10, 2009, 05:45:56 PM

Quote from: NightWare on February 08, 2009, 03:29:17 AM
jj, glad to see you play with simd stuff :bg

Bad news: I have given up. SSE2 is just too slow, see my postings above for the P4 and below for the Core2... :wink

Code Select

Source len=512
712      clocks for szCopyXMM
1098     clocks for szCopy
641      clocks for MbCopy
745      clocks for MbCopyD

Source len=63
210      clocks for szCopyXMM
201      clocks for szCopy
107      clocks for MbCopy
106      clocks for MbCopyD

MbCopy does not need any of these strange new registers ;-)

Title: Re: lstrcpy vs szCopy
Post by: NightWare on February 10, 2009, 10:45:52 PM

Quote from: sinsi on February 10, 2009, 03:14:32 AM
23 bytes, works on a 386

ebx isn't necessary :wink

Code Select

align 16
simplecopy proc dest:DWORD, src:DWORD
	mov ecx,src
	mov edx,dest
	sub edx,ecx
@@:	mov al,[ecx]
	mov [ecx+edx],al
	inc ecx
	test al,al
	jnz @b
	ret
simplecopy endp

Title: Re: lstrcpy vs szCopy
Post by: jj2007 on February 10, 2009, 11:35:06 PM

Quote from: NightWare on February 10, 2009, 10:45:52 PM
Quote from: sinsi on February 10, 2009, 03:14:32 AM
23 bytes, works on a 386
ebx isn't necessary :wink

Sinsi, how dare you waste 3 bytes without any need??? :dazzled:

It's cute, and competitive, too - same timings as the library szCopy. While implementing this, I stumbled over a very odd behaviour of the chr$ macro. It's on line 190 of the attached source:

Code Select

  Invoke Main
.listall

  print chr$(13,10,9, " --------------",13,10)
;  MsgBox 0, str$(eax), offset txHi, MB_OK
  MsgBox 0, str$(eax), chr$("Hello"), MB_OK		; GARBAGE instead of Hello

.nolist

The title of the MsgBox contains the test string, not "Hello". Excerpt from the list file:

Code Select

00000000                   1 .data
00000024                   1 *_TEXT ENDS
00000804                   1 *_DATA SEGMENT
                           1 *ASSUME CS:ERROR
00000804  48656C6C6F00     1 ??001C db "Hello",0
00000000                   1 .code
0000080A                   1 *_DATA ENDS
00000024                   1 *_TEXT SEGMENT
                           1 *ASSUME CS:FLAT
                           invoke MessageBoxA,0,reparg(ADDR ??001B),reparg(OFFSET ??001C),MB_OK
 = A                       1 quot SUBSTR <ADDR ??001B>,1,1
                           1 .data
                           1 ??001D db ADDR ??001B,0        
                           1 .code
                           1 EXITM <ADDR ??001D>      
                           invoke MessageBoxA,0,ADDR ??001B,reparg(OFFSET ??001C),MB_OK
 = O                       1 quot SUBSTR <OFFSET ??001C>,1,1
                           1 .data
                           1 ??001E db OFFSET ??001C,0        
                           1 .code
                           1 EXITM <ADDR ??001E>      
00000024                   invoke MessageBoxA,0,ADDR ??001B,OFFSET ??001C,MB_OK
00000024  6A00             * push MB_OK
00000026  6800000000       * push OFFSET ??001C
0000002B  6800000000       * push offset ??001B
00000030  6A00             * push 0

I tried with ml 6.14, ml 9.0 and JWasm, and they all show the same behaviour. Any clue what could cause this? I thought nothing was more straightforward than chr$()...

[attachment deleted by admin]

Title: Re: lstrcpy vs szCopy
Post by: sinsi on February 10, 2009, 11:58:38 PM

QuoteBut jokes apart: Did you change puter? Your previous timings look a lot different

Sorry jj, those timings were on my real computer (q6600), not the athlon.

QuoteSinsi, how dare you waste 3 bytes without any need???

It was the beer goggles...

NightWare: very clever. heh, I hate you.

Title: Re: lstrcpy vs szCopy
Post by: MichaelW on February 11, 2009, 12:26:42 AM

QuoteThe title of the MsgBox contains the test string, not "Hello".

It works correctly for me using MASM32 v9 or v10, ML 6.14, 6.15, or 7.00.

Title: Re: lstrcpy vs szCopy
Post by: NightWare on February 11, 2009, 12:43:50 AM

Quote from: sinsi on February 10, 2009, 11:58:38 PM
NightWare: very clever. heh, I hate you.

well, in fact here (if i remember well) a small correction is necessary => Jdoe : very clever :wink

Title: Re: lstrcpy vs szCopy
Post by: jj2007 on February 11, 2009, 12:58:53 AM

Quote from: MichaelW on February 11, 2009, 12:26:42 AM
QuoteThe title of the MsgBox contains the test string, not "Hello".

It works correctly for me using MASM32 v9 or v10, ML 6.14, 6.15, or 7.00.

Odd. Very odd. Even the executable works?? Thanks for testing, Michael...

This is what I get at the bottom - everything after OK has no right to be there:

Source len=7
35 clocks for szCopyXMM
23 clocks for simplecopy
23 clocks for simplecopyNW
24 clocks for szCopy
21 clocks for crt_strcpy
19 clocks for MbCopy
22 clocks for MbCopyD
--- OK ---
gHhIiJjKkLlMMNnOoPpQqRrSsTtUuVvWwXxYyZz Now I Know My ABC's, Won't You Come

Sample String 01234 56789 ABCDEF AaBbCcDdEeFfGgHhIiJjKkLlMMNnOoPpQqRrSsTtUuV
xYyZz Now I Know My ABC's, Won't You Come Play

(On Windows XP SP2, Celeron M)

Title: Re: lstrcpy vs szCopy
Post by: sinsi on February 11, 2009, 01:03:04 AM

Yep, garbage here too (ml614,615,7,8)

Title: Re: lstrcpy vs szCopy
Post by: jj2007 on February 11, 2009, 01:14:38 AM

Quote from: sinsi on February 11, 2009, 01:03:04 AM
Yep, garbage here too (ml614,615,7,8)

So it's not just me...
Thanks, I was about to throw my puter into the garbage. :green

Title: Re: lstrcpy vs szCopy
Post by: sinsi on February 11, 2009, 01:20:42 AM

String1 length=1024
String2 length=16*32

Buffer overrun?

Title: Re: lstrcpy vs szCopy
Post by: MichaelW on February 11, 2009, 01:32:31 AM

QuoteEven the executable works??

I didn't test the EXE, I just tested the few lines of code that you posted with the expectation that the problem was somehow the combination of macros. Now that I do test the complete code, I get garbage, but it's obviously part of the test string. If I pad the end of the data section with:

pad db 200 dup(0)

Then the problem goes away, so something is overwriting the message box title.

Title: Re: lstrcpy vs szCopy
Post by: jj2007 on February 11, 2009, 01:41:04 AM

Quote from: MichaelW on February 11, 2009, 01:32:31 AM

If I pad the end of the data section with:

pad db 200 dup(0)

Then the problem goes away, so something is overwriting the message box title.

Thanxalot, Sinsi & Michael. That was a typical noob error - I somehow assumed that since the MsgBox comes last in the code, it could not have been overwritten by "previous" code. Plain wrong, of course - since the .data of the MsgBox title was set before running the code.
I keep learning... :bg

Title: Re: lstrcpy vs szCopy
Post by: askm on February 11, 2009, 04:17:21 AM

Lives depend on correct and speedy code

everyday. I need not elaborate.

Let those lives diminshed due to code not do

so in vain. QA.

Title: Re: lstrcpy vs szCopy
Post by: donkey on February 11, 2009, 06:32:46 AM

Quote from: askm on February 11, 2009, 04:17:21 AM

Lives depend on correct and speedy code

everyday. I need not elaborate.

Let those lives diminshed due to code not do

so in vain. QA.

Quote>>> heart monitor - Beep,beep,beep... <<<

Get the patients readouts, stat

>>>Indexing files - Please wait<<<<

We need those damn readouts

>>>Indexing files - Please wait<<<<

Use task manager to shut down the damn indexing program !!!

>>>The process is being debugged, access denied<<<

>>> heart monitor - Beeeeeeeeeeeeeee... <<<

I hope my life never depends on Windows :eek

Title: Re: lstrcpy vs szCopy
Post by: sinsi on February 11, 2009, 06:47:20 AM

I'm sure I read in a MS EULA that it wasn't to be used in 'nuclear reactor control systems' or 'hospital intensive care systems'

Title: Re: lstrcpy vs szCopy
Post by: donkey on February 11, 2009, 07:10:41 AM

Quote from: sinsi on February 11, 2009, 06:47:20 AM
I'm sure I read in a MS EULA that it wasn't to be used in 'nuclear reactor control systems' or 'hospital intensive care systems'

I believe that's the JAVA license, it's included in the EULA of some Windows distributions.

Quote7. NOTE ON JAVA SUPPORT. THE SOFTWARE PRODUCT MAY CONTAIN SUPPORT FOR PROGRAMS WRITTEN IN JAVA.

JAVA TECHNOLOGY IS NOT FAULT TOLERANT AND IS NOT DESIGNED, MANUFACTURED, OR INTENDED FOR USE OR RESALE AS ON-LINE CONTROL EQUIPMENT IN HAZARDOUS ENVIRONMENTS REQUIRING FAIL-SAFE PERFORMANCE, SUCH AS IN THE OPERATION OF NUCLEAR FACILITIES, AIRCRAFT NAVIGATION OR COMMUNICATION SYSTEMS, AIR TRAFFIC CONTROL, DIRECT LIFE SUPPORT MACHINES, OR WEAPONS SYSTEMS, IN WHICH THE FAILURE OF JAVA TECHNOLOGY COULD LEAD DIRECTLY TO DEATH, PERSONAL INJURY, OR SEVERE PHYSICAL OR ENVIRONMENTAL DAMAGE.

Title: Re: lstrcpy vs szCopy
Post by: askm on February 11, 2009, 08:05:08 AM

The code failed when
it realized it was MS dependent.
I put a MS copyright in the data
section and tried again. Dont
forget the $ symbol. It helps
lstrcpy concurrency.
QA saves

Title: Re: lstrcpy vs szCopy
Post by: jj2007 on February 11, 2009, 11:19:15 AM

Folks, you are taking this too seriously. I never recommended the usage of the MybuggyCopy algo in intensive health care systems :naughty:

Title: Re: lstrcpy vs szCopy
Post by: askm on February 11, 2009, 06:52:47 PM

Stick out your tongue and say 'ah '.

I wish no ill health on anyone.

Not 'lives depend' in the narrow sense.

That narrowly implies that ALL code that appears on

these pages cant be compiled elsewhere

for whatever purposes their 'lives' depend.

There was MS-free assembly code before MS you know.

The processor came first, or more to the point,

logic was the progenitor. MS is not Big Brother.

Besides MS runs afoul of its own "corporate operating environment(s)",

but its health is maintained. Thats eula hypocrisy isnt it.

Now patient you sit on the table here and I'll test your reflexes.

You have come here complaining of difficulty copying strings eh ?

I'll prescribe a copyright to clear it up. Can I get a trial pack doc ?

Title: Impact iof using windows systems in critical health care systems
Post by: farrier on February 14, 2009, 11:16:27 AM

http://www.theregister.co.uk/2009/01/20/sheffield_conficker/

"The decision to disble automatic security updates was taken during Christmas week after PCs in an operating theatre rebooted mid-surgery. Conficker was detected on December 29" [sic]

Oops!

Title: Re: lstrcpy vs szCopy
Post by: herge on March 12, 2009, 02:02:03 AM

Hi All:

Translation: This code was written by lawyers and does Not Work on Microsoft systems without UPS.
And Microsoft systems don't work when the power is OFF!
It was only tested when the power was on.
I knew it only takes 45 minutes to re-boot Windows if a UPS radar system fails!
Allso assuming you Have No problems re-booting?

Regards herge

The MASM Forum Archive 2004 to 2012

General Forums => The Laboratory => Topic started by: jj2007 on February 07, 2009, 11:02:33 PM