Is there a way to set all registers to 0? (without explicitly assigning them val

Started by starsiege, March 30, 2009, 05:36:22 PM

Previous topic - Next topic

BlackVortex

@herge
If you tried the snippet that says "lingo second use" it was supposed to fail. That's what JJ was proving, hehe

lingo

"That's what JJ was proving"

JJ posted wrong code (mov eax, offset Null8+8*4),
my code is shortest and OK to run  once because no need to "works repeatedly"... :lol
As you can see his level is to "improve" add eax, 1 with inc eax (see above) :lol
JJ's competitor will be tetsu-jp who invented strlen with "SCASW, SCASD, and SCASQ" :lol

jj2007

Quote from: lingo on April 04, 2009, 08:12:57 AM
JJ posted wrong code (mov eax, offset Null8+8*4)
What's wrong with that code? It works, in contrast to yours ::)

Quote
my code is shortest and OK to run  once because no need to "works repeatedly"... :lol
44 bytes? You are kidding :bg

Quote
As you can see his level is to "improve" add eax, 1 with inc eax (see above) :lol

inc is shorter and equally fast on modern CPUs :green

Quote from: lingo on March 14, 2009, 02:16:44 PM
Who makes code optimization for archaic CPUs?  IMO sick people... :lol

lingo

"What's wrong with that code? It works,..."

Remember: When Lingo say your code is buggy IT IS BUGGY...
because Lingo is rational rather than emotional like you...
Due to we are in "The Campus" I'll try to explain your childish error:

.data?
Null8 dd ?, ?, ?, ?, ?, ?, ?, ?, ? ; 9*4=36 bytes
hWnd dd ?
....... dd ?
....... dd ?
hMem dd ?                 ; next 8*4=32 bytes

.code
mov eax, offset Null8+8*4    ; eax->dword ptr [hWnd-4]
mov [eax+32], esp ; you save esp in hMem variable
               ; and that is WRONG
               ; because you work with memory after hWnd
Should be:
mov eax, offset Null8    ; eax->dword ptr [Null8]
mov [eax+32], esp ; you save esp in dword ptr [hWnd-4] variable


"..in contrast to yours"

I'll repeat: my code works OK once because no need to "works repeatedly"
If YOU include it in YOUR test program without knowledge how to use it
I am not guilty about YOUR stupidity...

"44 bytes? You are kidding"

I'll repeat: my code is shortest (just 8 bytes) because when I say CODE
I understand all stuff in .code section rather than in .code plus .data section...and I am not guilty about such stupidity again...

"inc is shorter and equally fast on modern CPUs"
Again stupidity ...because if I am not wrong you have an archaic CPU rather than  modern CPU and let read together:

"16.2 INC and DEC (all Intel processors)
The INC and DEC instructions do not modify the carry flag but they do modify the other
arithmetic flags. Writing to only part of the flags register costs an extra uop on P4 and P4E.
It can cause a partial flags stalls on other Intel processors if a subsequent instruction reads
al the flag bits. Furthermore, it can cause a false dependence on the carry flag from a
previous instruction." by A.Fog

Where are your apologies? :lol

mitchi

Quote from: lingo on April 02, 2009, 02:01:09 AM
"... less than 12 bytes in less than 5 cycles...." :lol

.data
zer  dd 0,0,0,0,0,0,0,0, zer

.code
xchg esp, [zer+8*4]
popad
pop esp

00401004 87 25 E4 E4 42 00           xchg        esp, dword ptr ds:[42E4E4h]
0040100A 61                          popad           
0040100B 5C                          pop         esp     ; just 8 bytes




Neat trick ! :U

jj2007

Quote from: lingo on April 04, 2009, 02:42:24 PM
"What's wrong with that code? It works,..."

Remember: When Lingo say your code is buggy IT IS BUGGY...

Hey, Lingo, you are right! That was incredibly sloppy from my part :red

Quote
I'll repeat: my code works OK once

"44 bytes? You are kidding"

I'll repeat: my code is shortest (just 8 bytes) because when I say CODE
I understand all stuff in .code section rather than in .code plus .data section...and I am not guilty about such stupidity again...

Hey, Lingo, you are wrong! When we talk here about code size, we mean the executable - especially when the code can be used only once. So it's 11 bytes (mine, multiple use) against 44 bytes (yours, single use) :green2

Quote
"inc is shorter and equally fast on modern CPUs"
Again stupidity ...because if I am not wrong you have an archaic CPU rather than  modern CPU and let read together:

"16.2 INC and DEC (all Intel processors)
The INC and DEC instructions do not modify the carry flag but they do modify the other
arithmetic flags. Writing to only part of the flags register costs an extra uop on P4 and P4E.
It can cause a partial flags stalls on other Intel processors if a subsequent instruction reads
al the flag bits. Furthermore, it can cause a false dependence on the carry flag from a
previous instruction." by A.Fog

Where are your apologies? :lol

Where are your timings? :bg

dsouza123


  xor eax,eax
  cdq
  mov ebx,eax
  mov ecx,eax


7 bytes, only xor modifies flags, only cdq has a dependency.

Is it possible to do this without modifying any flags
and also no extra memory used including the stack ?


  lahf
  xor ebx,ebx
  sahf
  mov ecx,ebx
  mov edx,ebx
  mov eax,ebx


This is the closest I've found, it modifies flags but restores them,
no register stalls that I am aware of.
Are all flags saved/restored or just some ?
Does sahf cause a flag stall ?

mitchi

Is there a real way of knowing if INC is faster than ADD , 1?
You could test 1000 INC vs 1000 add, 1. Is that considered a good test?

jj2007

Quote from: dsouza123 on April 04, 2009, 11:06:38 PM

  xor eax,eax
  cdq
  mov ebx,eax
  mov ecx,eax


7 bytes, only xor modifies flags, only cdq has a dependency.

OK, but the (pretty esoteric) initial question was whether all registers could be cleared. Later, a consensus emerged that esp should be excluded from this challenge :wink

Quote
Is it possible to do this without modifying any flags
and also no extra memory used including the stack ?

Now you are introducing new esoteric rules :naughty:

Quote from: mitchi on April 04, 2009, 11:22:42 PM
Is there a real way of knowing if INC is faster than ADD , 1?
You could test 1000 INC vs 1000 add, 1. Is that considered a good test?

11      cycles for 32*add reg, 1
11      cycles for 32*inc reg

Test yourself:
include \masm32\include\masm32rt.inc
.686
.xmm
include \masm32\macros\timers.asm

LOOP_COUNT = 1000000

.code
start:
REPEAT 4
counter_begin LOOP_COUNT, HIGH_PRIORITY_CLASS
REPEAT 8
  add eax, 1
  add ebx, 1
  add ecx, 1
  add edx, 1
ENDM
counter_end
print str$(eax), 9, "cycles for 32*add reg, 1", 13, 10

counter_begin LOOP_COUNT, HIGH_PRIORITY_CLASS
REPEAT 8
  inc eax
  inc ebx
  inc ecx
  inc edx
ENDM
counter_end
print str$(eax), 9, "cycles for 32*inc reg", 13, 10, 10
ENDM
getkey
exit
end start


Results for a Celeron M Core CPU. 11/32 means 0.33 cycles....
As you can see in the post above that Lingo ridiculised, results for a P4 can be dramatically different.

mitchi

Using your test, INC and add, 1 are evenly matched here (Intel E8500 2x3.16ghz).

7       cycles for 32*add reg, 1
6       cycles for 32*inc reg

6       cycles for 32*add reg, 1
6       cycles for 32*inc reg

6       cycles for 32*add reg, 1
7       cycles for 32*inc reg

6       cycles for 32*add reg, 1
6       cycles for 32*inc reg


There's no need to make a fuss about this  :green

dedndave

INC is a 1-byte instruction for dword registers
that saves time in the queue