News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Memory operand problems

Started by mineiro, August 19, 2010, 05:18:18 AM

Previous topic - Next topic

mineiro

Sr dedndave, i come from electronics, hehehe, nor xnor(beer) .
With nand you can create others boolean "or,xor,and,not" and "nor, xnor,...". If we have this very optimized opcode and not (NOT AND)(1+1 cycles), I thinked in this while trying to optimize some code, but this give me an error that I simply cannot reproduce again.
Thank you for the answer, session good times.

mineiro

I doing some research in this, the first code that I posted don't do nothing, but this one do what I'm thinking.

IF 0  ; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
                      Build this template with "CONSOLE ASSEMBLE AND LINK"
ENDIF ; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
    include \masm32\include\masm32rt.inc
    .code
start:
xor eax,eax
xor edx,edx
xor edx,offset start
xor al,[eax+edx]
xor al,33h
je @F
xor al,33h
print str$(eax),09h
print "cannot say about your compiler",13,10
inkey
exit
@@:
xor al,33h
print str$(eax),09h
print " you compiled with Masm32?(R), opcode 33h", 13, 10
    inkey
    exit
end start

regards.

clive

The 0x31 vs 0x33 is a source vs destination issue, and the mod/rm encoding

000000BB  31 05 00000000 R         xor     dword ptr [start],eax
000000C1  33 05 00000000 R         xor     eax,dword ptr [start]


and the 11yyyxxx encoding for reg,reg. The 31 variant is xor regx,regy, the 33 is xor regy,regx

000000BB 310500000000           xor     [_start],eax
000000C1 330500000000           xor     eax,[_start]
000000C7 31C0                   xor     eax,eax
000000C9 31C9                   xor     ecx,ecx
000000CB 33C0                   xor     eax,eax
000000CD 33C9                   xor     ecx,ecx
000000CF 31CA                   xor     edx,ecx
000000D1 33CA                   xor     ecx,edx


Like may architectures, the x86 is rife with multiple ways to encode the equivalent function.
It could be a random act of randomness. Those happen a lot as well.

mineiro

now I get the point Sr clive. So how can we make assumptions (like is said in intel manual)? Well, speaking in not my native language is dificult to me.
How can we predict(or do some predictions to the processors) to get faster code?
thank you Sr.
added after because I get an english dictionary.
I now confused by some factors(the xor ambiguity) , if the left operand(in 8086 architeture) every time receives the right operand, why this happens?
xor left, right
xor destin, source.
thanks in advance Sr's.

clive

When it cost cycles to read opcode bytes the short ones ran faster, these days it is mostly irrelevant. Increasing code density will improve cache utilization, but branches like to be on cache-line boundaries. The assembler/compilers usually pick the shortest form, the long form may be used when the assembler doesn't know the value (imported from another object), and the linker gets to fix it up, but can't revert to the short form.

Long and short forms of ADD EAX,1

000000D3 83C001                 add     eax,1
000000D6 0501000000             add     eax,1


Long and short forms of ADD EAX,-1

000000DB 83C0FF                 add     eax,0FFFFFFFFh
000000DE 05FFFFFFFF             add     eax,0FFFFFFFFh
It could be a random act of randomness. Those happen a lot as well.

clive

Quote from: mineiro
I now confused by some factors(the xor ambiguity) , if the left operand(in 8086 architeture) every time receives the right operand, why this happens?
xor left, right
xor destin, source.

The 8086 "the machine" has one general way to encode a memory address (or register) and a register, depending on the opcode this can refer to REG to MEM/REG, or REG from MEM/REG. The assembler syntax always has a "right goes to left" to present a common view/syntax to the human, the assembler picks the machine code to use to encode the instruction.

Observe that the machine instructions are practically identical as they define the register and the memory. The first has the memory as the destination, the second has the memory as the source.

000000BB 310500000000           xor     [_start],eax
000000C1 330500000000           xor     eax,[_start]


The 68000 assembler has the reverse view, but again the machine code has a mostly common encoding for the effective address, and separate TO/FROM versions of the opcode encoding.

There are the reverse of each other

000000CF 31CA                   xor     edx,ecx ; EDX = EDX eor ECX
000000D1 33CA                   xor     ecx,edx ; ECX = ECX eor EDX


These are simply equivalent

000000C7 31C0                   xor     eax,eax ; EAX = EAX eor EAX
000000C9 31C9                   xor     ecx,ecx ; ECX = ECX eor ECX
000000CB 33C0                   xor     eax,eax ; EAX = EAX eor EAX
000000CD 33C9                   xor     ecx,ecx ; ECX = ECX eor ECX


Interestingly the 68000 has the equivalent to MOVE MEM/REG to MEM/REG, so you can get a memory-to-memory operation that does not use a register.
It could be a random act of randomness. Those happen a lot as well.

mineiro

Now I understand, thank you.
When I have seen the "convert signed dword ascii to hex" I think in this way:
Build a two complement of one byte(to later combine/organize), transform it to boolean algebra, optimize it with Karnaugh. Ok, this is the bit circuit, now I translate it to pc.
When I come to pc I cannot get the fastest code thinking in this way.
So, thanks to everybody that have posted, appreciate much yours words.
regards.

clive

I think you are confusing clever tricks with LEA doing addition, with doing address accesses with XOR. The thing here is that LEA doesn't do an address access, just computes the "address" that would have been accessed and thus won't generate faults.

Multiply EAX by 10, add in EBX ASCII digit


  ADD EAX,EAX ; *2
  LEA EAX,[EAX+EAX*4] ; *5

  LEA EAX,[EAX+EBX-30h] ; EBX = ASCII '0' - '9'

It could be a random act of randomness. Those happen a lot as well.

mineiro

#23
Yes, I figure that.

xor eax,30h   ;sub 30h
shl eax,1 ;mul by 2
xor ecx,eax  ; add it to zeroed ecx
shl eax,2 ; mul by 8 (it is mul by 2)
add ecx,eax

Or others ways, like "mul by 5 == div by 2" in decimal,  and after rearrange.

mineiro

I don't write the word "Sr" but I have this in mind ok.
I re-read(and learned much with the words here) what you posted clive, so in your example:

000000D3 83C001                 add     eax,1
000000D6 0501000000             add     eax,1

Using this we get more speed than using a simple "inc eax"?

004018D0  40            INC EAX

And another question, how did you compile to get that long way? In my tests I can do it with "db 05h,01h,00h,00h,00h", is this the way?
Oh, yes, and why about(what means) "align 16"?
I understand align 16 like a fit your opcode in a way that it produces 16 bytes in the end. So, if this is true, maybe we can have a generic pseudo algo code that is every time aligned and with this get better speeds?
regards.

clive

ADD beats INC for most CPU's in recent memory. However INC does not change carry flag, so it has it's uses. LEA doesn't change any flags.

Microsoft (R) Macro Assembler Version 6.15.8803     08/19/10 20:22:07
test40.asm      Page 1 - 1


        .386
        .MODEL FLAT
00000000         .CODE

00000000 _start:

00000000  83 C0 01         add     eax,1
00000003  83 C0 04         add     eax,4
00000006  05 00000001         add     eax,dword ptr 1
0000000B  05 00000004         add     eax,dword ptr 4

        END     _start
It could be a random act of randomness. Those happen a lot as well.

mineiro

thank you, now I read some optimization docs (Agner an Mark). My fault is not doing the little logic, but, translate that into pc world.
I learn a lot with this topic, much brainfood to one day. Like Kasparov says, have minimalistics and maximalistics guys. I reached the Karpov, now I want a challenge with Kasparov.
Case closed.
regards.

dedndave

we all have to start someplace
you are doing great   :U
i used to play a little chess   :P

mineiro

and of course, thank you (with much respect) minimalistic dedndave.
http://chessprogramming.wikispaces.com/General+Setwise+Operations
case closed  -<(not more ambiguity xor)