Memory operand problems

mineiro · August 19, 2010, 05:23:01 PM

Sr dedndave, i come from electronics, hehehe, nor xnor(beer) .
With nand you can create others boolean "or,xor,and,not" and "nor, xnor,...". If we have this very optimized opcode and not (NOT AND)(1+1 cycles), I thinked in this while trying to optimize some code, but this give me an error that I simply cannot reproduce again.
Thank you for the answer, session good times.

mineiro · August 19, 2010, 07:10:13 PM

I doing some research in this, the first code that I posted don't do nothing, but this one do what I'm thinking.

Code Select


IF 0  ; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
                      Build this template with "CONSOLE ASSEMBLE AND LINK"
ENDIF ; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
    include \masm32\include\masm32rt.inc
    .code
start:
	xor eax,eax
	xor edx,edx
	xor edx,offset start
	xor al,[eax+edx]
	xor al,33h
	je @F
	xor al,33h
	print str$(eax),09h
	print "cannot say about your compiler",13,10
	inkey
	exit
@@:
	xor al,33h
	print str$(eax),09h
	print " you compiled with Masm32?(R), opcode 33h", 13, 10
    inkey
    exit
end start

regards.

clive · August 19, 2010, 08:01:58 PM

The 0x31 vs 0x33 is a source vs destination issue, and the mod/rm encoding

Code Select

 000000BB  31 05 00000000 R	        xor     dword ptr [start],eax
 000000C1  33 05 00000000 R	        xor     eax,dword ptr [start]

and the 11yyyxxx encoding for reg,reg. The 31 variant is xor regx,regy, the 33 is xor regy,regx

Code Select

000000BB 310500000000           xor     [_start],eax
000000C1 330500000000           xor     eax,[_start]
000000C7 31C0                   xor     eax,eax
000000C9 31C9                   xor     ecx,ecx
000000CB 33C0                   xor     eax,eax
000000CD 33C9                   xor     ecx,ecx
000000CF 31CA                   xor     edx,ecx
000000D1 33CA                   xor     ecx,edx

Like may architectures, the x86 is rife with multiple ways to encode the equivalent function.

mineiro · August 19, 2010, 08:10:30 PM

now I get the point Sr clive. So how can we make assumptions (like is said in intel manual)? Well, speaking in not my native language is dificult to me.
How can we predict(or do some predictions to the processors) to get faster code?
thank you Sr.
added after because I get an english dictionary.
I now confused by some factors(the xor ambiguity) , if the left operand(in 8086 architeture) every time receives the right operand, why this happens?
xor left, right
xor destin, source.
thanks in advance Sr's.

clive · August 19, 2010, 08:38:00 PM

When it cost cycles to read opcode bytes the short ones ran faster, these days it is mostly irrelevant. Increasing code density will improve cache utilization, but branches like to be on cache-line boundaries. The assembler/compilers usually pick the shortest form, the long form may be used when the assembler doesn't know the value (imported from another object), and the linker gets to fix it up, but can't revert to the short form.

Long and short forms of ADD EAX,1

Code Select

000000D3 83C001                 add     eax,1
000000D6 0501000000             add     eax,1

Long and short forms of ADD EAX,-1

Code Select

000000DB 83C0FF                 add     eax,0FFFFFFFFh
000000DE 05FFFFFFFF             add     eax,0FFFFFFFFh

clive · August 19, 2010, 08:52:09 PM

Quote from: mineiro
I now confused by some factors(the xor ambiguity) , if the left operand(in 8086 architeture) every time receives the right operand, why this happens?
xor left, right
xor destin, source.

The 8086 "the machine" has one general way to encode a memory address (or register) and a register, depending on the opcode this can refer to REG to MEM/REG, or REG from MEM/REG. The assembler syntax always has a "right goes to left" to present a common view/syntax to the human, the assembler picks the machine code to use to encode the instruction.

Observe that the machine instructions are practically identical as they define the register and the memory. The first has the memory as the destination, the second has the memory as the source.

Code Select

000000BB 310500000000           xor     [_start],eax
000000C1 330500000000           xor     eax,[_start]

The 68000 assembler has the reverse view, but again the machine code has a mostly common encoding for the effective address, and separate TO/FROM versions of the opcode encoding.

There are the reverse of each other

Code Select

000000CF 31CA                   xor     edx,ecx ; EDX = EDX eor ECX
000000D1 33CA                   xor     ecx,edx ; ECX = ECX eor EDX

These are simply equivalent

Code Select

000000C7 31C0                   xor     eax,eax ; EAX = EAX eor EAX
000000C9 31C9                   xor     ecx,ecx ; ECX = ECX eor ECX
000000CB 33C0                   xor     eax,eax ; EAX = EAX eor EAX
000000CD 33C9                   xor     ecx,ecx ; ECX = ECX eor ECX

Interestingly the 68000 has the equivalent to MOVE MEM/REG to MEM/REG, so you can get a memory-to-memory operation that does not use a register.

mineiro · August 19, 2010, 09:13:31 PM

Now I understand, thank you.
When I have seen the "convert signed dword ascii to hex" I think in this way:
Build a two complement of one byte(to later combine/organize), transform it to boolean algebra, optimize it with Karnaugh. Ok, this is the bit circuit, now I translate it to pc.
When I come to pc I cannot get the fastest code thinking in this way.
So, thanks to everybody that have posted, appreciate much yours words.
regards.

clive · August 19, 2010, 09:33:31 PM

I think you are confusing clever tricks with LEA doing addition, with doing address accesses with XOR. The thing here is that LEA doesn't do an address access, just computes the "address" that would have been accessed and thus won't generate faults.

Multiply EAX by 10, add in EBX ASCII digit

Code Select


  ADD EAX,EAX ; *2
  LEA EAX,[EAX+EAX*4] ; *5

  LEA EAX,[EAX+EBX-30h] ; EBX = ASCII '0' - '9'

mineiro · August 19, 2010, 09:51:10 PM

Yes, I figure that.

xor eax,30h ;sub 30h
shl eax,1 ;mul by 2
xor ecx,eax ; add it to zeroed ecx
shl eax,2 ; mul by 8 (it is mul by 2)
add ecx,eax

Or others ways, like "mul by 5 == div by 2" in decimal, and after rearrange.

mineiro · August 19, 2010, 11:17:58 PM

I don't write the word "Sr" but I have this in mind ok.
I re-read(and learned much with the words here) what you posted clive, so in your example:

Code Select


000000D3 83C001                 add     eax,1
000000D6 0501000000             add     eax,1

Using this we get more speed than using a simple "inc eax"?

Code Select


004018D0  40            INC EAX

And another question, how did you compile to get that long way? In my tests I can do it with "db 05h,01h,00h,00h,00h", is this the way?
Oh, yes, and why about(what means) "align 16"?
I understand align 16 like a fit your opcode in a way that it produces 16 bytes in the end. So, if this is true, maybe we can have a generic pseudo algo code that is every time aligned and with this get better speeds?
regards.

clive · August 20, 2010, 01:29:24 AM

ADD beats INC for most CPU's in recent memory. However INC does not change carry flag, so it has it's uses. LEA doesn't change any flags.

Code Select

Microsoft (R) Macro Assembler Version 6.15.8803		    08/19/10 20:22:07
test40.asm						     Page 1 - 1


				        .386
				        .MODEL FLAT
 00000000			        .CODE

 00000000			_start:

 00000000  83 C0 01		        add     eax,1
 00000003  83 C0 04		        add     eax,4
 00000006  05 00000001		        add     eax,dword ptr 1
 0000000B  05 00000004		        add     eax,dword ptr 4

				        END     _start

mineiro · August 20, 2010, 02:06:32 AM

thank you, now I read some optimization docs (Agner an Mark). My fault is not doing the little logic, but, translate that into pc world.
I learn a lot with this topic, much brainfood to one day. Like Kasparov says, have minimalistics and maximalistics guys. I reached the Karpov, now I want a challenge with Kasparov.
Case closed.
regards.

dedndave · August 20, 2010, 02:07:38 AM

we all have to start someplace
you are doing great :U
i used to play a little chess :P

mineiro · August 20, 2010, 03:09:55 AM

and of course, thank you (with much respect) minimalistic dedndave.
http://chessprogramming.wikispaces.com/General+Setwise+Operations
case closed -<(not more ambiguity xor)

dedndave · August 20, 2010, 03:21:05 AM

http://computer-chess.org/doku.php?id=home

News:

Memory operand problems