Is there a way to set all registers to 0? (without explicitly assigning them val

BlackVortex · April 02, 2009, 04:26:20 AM

@herge
If you tried the snippet that says "lingo second use" it was supposed to fail. That's what JJ was proving, hehe

lingo · April 04, 2009, 08:12:57 AM

"That's what JJ was proving"

JJ posted wrong code (mov eax, offset Null8+8*4),
my code is shortest and OK to run once because no need to "works repeatedly"... :lol
As you can see his level is to "improve" add eax, 1 with inc eax (see above) :lol
JJ's competitor will be tetsu-jp who invented strlen with "SCASW, SCASD, and SCASQ" :lol

jj2007 · April 04, 2009, 10:55:46 AM

Quote from: lingo on April 04, 2009, 08:12:57 AM
JJ posted wrong code (mov eax, offset Null8+8*4)

What's wrong with that code? It works, in contrast to yours ::)

Quote
my code is shortest and OK to run once because no need to "works repeatedly"... :lol

44 bytes? You are kidding :bg

Quote
As you can see his level is to "improve" add eax, 1 with inc eax (see above) :lol

inc is shorter and equally fast on modern CPUs :green

Quote from: lingo on March 14, 2009, 02:16:44 PM
Who makes code optimization for archaic CPUs? IMO sick people... :lol

lingo · April 04, 2009, 02:42:24 PM

"What's wrong with that code? It works,..."

Remember: When Lingo say your code is buggy IT IS BUGGY...
because Lingo is rational rather than emotional like you...
Due to we are in "The Campus" I'll try to explain your childish error:

Code Select



.data? 
Null8	dd ?, ?, ?, ?, ?, ?, ?, ?, ? 	; 9*4=36 bytes 
hWnd	dd ?
.......	dd ?
.......	dd ?
hMem	dd ?		                ; next 8*4=32 bytes

.code
mov eax, offset Null8+8*4   	; eax->dword ptr [hWnd-4]
mov [eax+32], esp 		; you save esp in hMem variable
		               	; and that is WRONG 
		               	; because you work with memory after hWnd
Should be:
mov eax, offset Null8   	; eax->dword ptr [Null8]
mov [eax+32], esp 		; you save esp in dword ptr [hWnd-4] variable

"..in contrast to yours"

I'll repeat: my code works OK once because no need to "works repeatedly"
If YOU include it in YOUR test program without knowledge how to use it
I am not guilty about YOUR stupidity...

"44 bytes? You are kidding"

I'll repeat: my code is shortest (just 8 bytes) because when I say CODE
I understand all stuff in .code section rather than in .code plus .data section...and I am not guilty about such stupidity again...

"inc is shorter and equally fast on modern CPUs"
Again stupidity ...because if I am not wrong you have an archaic CPU rather than modern CPU and let read together:

"16.2 INC and DEC (all Intel processors)
The INC and DEC instructions do not modify the carry flag but they do modify the other
arithmetic flags. Writing to only part of the flags register costs an extra uop on P4 and P4E.
It can cause a partial flags stalls on other Intel processors if a subsequent instruction reads
al the flag bits. Furthermore, it can cause a false dependence on the carry flag from a
previous instruction." by A.Fog

Where are your apologies? :lol

mitchi · April 04, 2009, 03:29:55 PM

Quote from: lingo on April 02, 2009, 02:01:09 AM
"... less than 12 bytes in less than 5 cycles...." :lol

Code Select Expand
.data zer dd 0,0,0,0,0,0,0,0, zer .code xchg esp, [zer+8*4] popad pop esp 00401004 87 25 E4 E4 42 00 xchg esp, dword ptr ds:[42E4E4h] 0040100A 61 popad 0040100B 5C pop esp ; just 8 bytes

Neat trick ! :U

jj2007 · April 04, 2009, 09:33:23 PM

Quote from: lingo on April 04, 2009, 02:42:24 PM
"What's wrong with that code? It works,..."

Remember: When Lingo say your code is buggy IT IS BUGGY...

Hey, Lingo, you are right! That was incredibly sloppy from my part :red

Quote
I'll repeat: my code works OK once

"44 bytes? You are kidding"

I'll repeat: my code is shortest (just 8 bytes) because when I say CODE
I understand all stuff in .code section rather than in .code plus .data section...and I am not guilty about such stupidity again...

Hey, Lingo, you are wrong! When we talk here about code size, we mean the executable - especially when the code can be used only once. So it's 11 bytes (mine, multiple use) against 44 bytes (yours, single use) :green2

Quote
"inc is shorter and equally fast on modern CPUs"
Again stupidity ...because if I am not wrong you have an archaic CPU rather than modern CPU and let read together:

"16.2 INC and DEC (all Intel processors)
The INC and DEC instructions do not modify the carry flag but they do modify the other
arithmetic flags. Writing to only part of the flags register costs an extra uop on P4 and P4E.
It can cause a partial flags stalls on other Intel processors if a subsequent instruction reads
al the flag bits. Furthermore, it can cause a false dependence on the carry flag from a
previous instruction." by A.Fog

Where are your apologies? :lol

Where are your timings? :bg

dsouza123 · April 04, 2009, 11:06:38 PM

Code Select


  xor eax,eax
  cdq
  mov ebx,eax
  mov ecx,eax

7 bytes, only xor modifies flags, only cdq has a dependency.

Is it possible to do this without modifying any flags
and also no extra memory used including the stack ?

Code Select


  lahf
  xor ebx,ebx
  sahf
  mov ecx,ebx
  mov edx,ebx
  mov eax,ebx

This is the closest I've found, it modifies flags but restores them,
no register stalls that I am aware of.
Are all flags saved/restored or just some ?
Does sahf cause a flag stall ?

mitchi · April 04, 2009, 11:22:42 PM

Is there a real way of knowing if INC is faster than ADD , 1?
You could test 1000 INC vs 1000 add, 1. Is that considered a good test?

jj2007 · April 05, 2009, 02:26:46 AM

Quote from: dsouza123 on April 04, 2009, 11:06:38 PM
Code Select Expand
xor eax,eax cdq mov ebx,eax mov ecx,eax

7 bytes, only xor modifies flags, only cdq has a dependency.

OK, but the (pretty esoteric) initial question was whether all registers could be cleared. Later, a consensus emerged that esp should be excluded from this challenge :wink

Quote
Is it possible to do this without modifying any flags
and also no extra memory used including the stack ?

Now you are introducing new esoteric rules :naughty:

Quote from: mitchi on April 04, 2009, 11:22:42 PM
Is there a real way of knowing if INC is faster than ADD , 1?
You could test 1000 INC vs 1000 add, 1. Is that considered a good test?

11 cycles for 32*add reg, 1
11 cycles for 32*inc reg

Test yourself:

Code Select

include \masm32\include\masm32rt.inc
.686
.xmm
include \masm32\macros\timers.asm

	LOOP_COUNT = 1000000

.code
start:
	REPEAT 4
	counter_begin LOOP_COUNT, HIGH_PRIORITY_CLASS
		REPEAT 8
		  add eax, 1
		  add ebx, 1
		  add ecx, 1
		  add edx, 1
		ENDM
	counter_end
	print str$(eax), 9, "cycles for 32*add reg, 1", 13, 10

	counter_begin LOOP_COUNT, HIGH_PRIORITY_CLASS
		REPEAT 8
		  inc eax
		  inc ebx
		  inc ecx
		  inc edx
		ENDM
	counter_end
	print str$(eax), 9, "cycles for 32*inc reg", 13, 10, 10
	ENDM
	getkey
	exit
end start

Results for a Celeron M Core CPU. 11/32 means 0.33 cycles....
As you can see in the post above that Lingo ridiculised, results for a P4 can be dramatically different.

mitchi · April 05, 2009, 02:39:57 AM

Using your test, INC and add, 1 are evenly matched here (Intel E8500 2x3.16ghz).

Code Select

7       cycles for 32*add reg, 1
6       cycles for 32*inc reg

6       cycles for 32*add reg, 1
6       cycles for 32*inc reg

6       cycles for 32*add reg, 1
7       cycles for 32*inc reg

6       cycles for 32*add reg, 1
6       cycles for 32*inc reg

There's no need to make a fuss about this :green

dedndave · April 05, 2009, 04:54:13 PM

INC is a 1-byte instruction for dword registers
that saves time in the queue

News:

Is there a way to set all registers to 0? (without explicitly assigning them val

dsouza123