Assembler: xor [pointer], offset addr1 xor offset addr2

Started by raleeper, June 28, 2011, 11:49:17 PM

Previous topic - Next topic

dedndave

it must resolve the values as hard offsets when you create the DLL
just a guess   :P

raleeper

dedndave:

Quoteok - i made it look complicated by stack usage   BigGrin
here is a more generalized form that may be used for any dword value in the code section

Thanks, this looks great.  But I wasn't complaining when I said it was more complicated than I want to get into now.  Self-modifying code is complicated no matter how clear the example.  I've filed this whole topic for later.

Incidentally,
In Topic: Self Modify vs Jump Table
http://www.masm32.com/board/index.php?topic=15215.msg123564#msg123564

Redskull said:

QuoteEvery single instruction you execute will be a cache miss and a pipeline flush.  It's the anti-optimization.

Would that apply to your
Quote;execution code

        xor     kyptp,80000000h
?

Thanks, ral

redskull

If my memory serves, the problem is that assemble time operators are evaluted only during the first pass, whereas data can be reevaulted on the second.  This means that while static operations on static addressess would end up being a constant, like you supposed, MASM doesn't know for sure what it is yet when the xor is computed.  I seem to recall hard to track down macro bugs back when this was allowed, because if MASM needed to switch around the data on the second pass, the resolved constant would be out of date and you ended up with the wrong result.  Or maybe i'm just going senile.  Perhaps some of the crusty old-timers can chime in...

Anytime you change something that's in the cache you will take a performance hit; self-modifying code is particularly bad, because if the instructions are close together you have to throw out all the work the CPU did starting to decode and execute it.  The jump table for a emulator is a particularly bad idea, because every instruction of the emulated code requires restarting from scratch.  If you only do it once, though, the hit will be negligable.

-r
Strange women, lying in ponds, distributing swords, is no basis for a system of government

dedndave

the idea here is to modify the immediate operand once at program initialization
not related to the cache hit/miss issue

Antariy


xor dword ptr [kyptp], offset kyptbl xor offset kyptb2




offset kyptbl xor offset kyptb2


offsets are 32 bit width values which will be setup at *link* time. At assembly time there is no way for assembler to know what content will have these 32 bit values. I.e. - this is not constant at all. Anything stuff, which is computed at link time is "out-of-horizon" for the assembler => assembler does not know values at all.

When assembler found something like:


mov eax,offset somelabel


it puts into object file the code:


db 0b8,0,0,0,0 = mov eax,0


and marks second byte as DWORD location for the linker to patch this place as pointer at link time, when base address is known, plus offset itself.


In short: offsets are not constants, unless you're building executable manually.

jj2007

Quote from: Antariy on June 30, 2011, 04:33:12 AM
When assembler found something like:


mov eax,offset somelabel


it puts into object file the code:


db 0b8,0,0,0,0 = mov eax,0


and marks second byte as DWORD location for the linker to patch this place as pointer at link time, when base address is known, plus offset itself.

In short: offsets are not constants, unless you're building executable manually.

Thanks, Alex - we came to the same conclusion, but it seems your explanation is the correct one :U

redskull

It's worth pointing out that they are "constant enough" if you are only doing addition or subtraction.  The assembler can just store whatever you are adding on as the value itself, and the fix-up by the linker adds the memory address as it becomes known.

-r
Strange women, lying in ponds, distributing swords, is no basis for a system of government

Antariy

Quote from: redskull on June 30, 2011, 11:29:38 PM
It's worth pointing out that they are "constant enough" if you are only doing addition or subtraction.  The assembler can just store whatever you are adding on as the value itself, and the fix-up by the linker adds the memory address as it becomes known.

That's right, but this is possible only if labels are in the same segment.


.nolist
include \masm32\include\masm32rt.inc
.686
.xmm

.data?
dd1 dd ?
ddarr dd 4 dup (?)
dd2 dd ?

newseg segment dword public "DATA"

dd3 dd ?

newseg ends

.code
start:

mov eax,offset dd2-offset dd1
print hex$(eax),13,10,13,10

mov eax,offset dd3-offset1 dd1 ; this will not compile
print hex$(eax),13,10


inkey
jmp crt__exit

end start


Ian

This is a nice one. Another poster says "They're constant enough, when you're only adding
or subtracting". There's the clue, though I don't know what "adding" two addresses means.
Look in your listing. The result of "OFFSET" is an address, with attribute of its segment.
To do multiplication or XOR, you need a "NUMBER" type variable. Using TASM (and in others
I hope), when two addresses ("labels") are subtracted, the result is a number, which can then
be the subject of other arithmetic. So I set a label right at offset 0 in the segment, thus ...
ca0: ; Use this to change addresses into numbers.
(ca0 will have an offset of 0 in your listing.)
(It doesn't seem to matter what sort of label it is, so use an instruction label, it takes less typing..)
Then you can write
xor [pointer], (addr1-ca0) XOR (addr2-ca0).
Make sure the ca0 you use is in the same segment as the addrn ..
..

jj2007

Ian,
Great 2nd post :U
This works indeed:
include \masm32\include\masm32rt.inc

.data
AppName db "Masm32:", 0
a1 db "test1", 0
a2 db "test2", 0

.code

start: mov eax, offset AppName
xor [eax], dword ptr (offset a1-AppName) xor (offset a2-AppName)
exit

end start


But warning: AppName is not necessarily at offset 0 in its segment, as include files may contain .data, too.

qWord

I'm still confused: What is the purpose of combining two pointers using XOR? - pleas clarify me.
FPU in a trice: SmplMath
It's that simple!

dedndave

by XOR'ing the XOR of the 2 addresses, you can toggle the pointer back and forth between the two

who knows - it might be faster to....
Pointer1 dd Address1
Pointer2 dd Address2
;
;
;
        push    Pointer1
        push    Pointer2
        pop     Pointer1
        pop     Pointer2

then use Pointer1 as the "pointer in use"
XCHG with memory operands is slow   :P

Antariy

Quote from: Ian on July 01, 2011, 05:01:26 PM
Look in your listing. The result of "OFFSET" is an address, with attribute of its segment.

The point was: why are offsets not constants, and that is all - as the point of the listing.
Offset is an address which will be setup at link stage, and intersegment linking may have any layout. For DOS this is the segments of the executable, for Win32 this is sections...

Kopi

(Code to get what the asker wants)

Quote from: jj2007 on July 01, 2011, 06:28:32 PM
Ian,
Great 2nd post :U
This works indeed:
include \masm32\include\masm32rt.inc

.data
AppName db "Masm32:", 0
a1 db "test1", 0
a2 db "test2", 0

.code

start: mov eax, offset AppName
xor [eax], dword ptr (offset a1-AppName) xor (offset a2-AppName)
exit

end start


But warning: AppName is not necessarily at offset 0 in its segment, as include files may contain .data, too.


I think that the problem is solved by the post I've quoted, nonetheless I wish to summarize, since I've occourred in the same problem and I couldn't figure out what was going on, until I found the "Reply #19", wich stated that address are resolved at *link* time.
Right! When a .asm (with its includes) is compiled, all address calculated are relative to the beginning of its segment, taking into account of "org" statements and if the same segment appears more than once in the same asm and its includes, BUT NOT taking into account of other (or repeated) segments that appear in other asm. Thus, the actual absolute address where a certain "location" in the asm files will be loaded in memory, and even the offset relative to a segment (in case of a segment who appears in more than one asm, ie in more than one obj intermediate file), it's not known at assembly time, but only at link time: masm states (see error A2026) that expression must be resolved at assembly time.

To experiment and have a visual perception of what happends, try to compile this two sources and link them toghether with that command:

ml /AT /Fl /Sa 1.asm 2.asm

(The output will be a .com file, and so the executable contains only the compiled code and it's always loaded at offset 100h)

1.asm

c_ segment
org 100h
g proc
call main
mov ah, 4ch
int 21h
g endp
c_ ends

a segment
b db 1
off_d = $ - a
a ends

c_ segment
main proc near
mov al, off_d
mov bx, offset d + 1
mov cl, off_d XOR 0
ret
main endp
c_ ends

a segment
d db 1
a ends

end g


2.asm

a segment
off_last = $ - a ; Doesn't work good: off_last is relative to the beginning of this segment, not to the beginning of "a" in "1.asm"
last db 0ffh
a ends

h segment
i db 0eeh
h ends

end


Eventually, the .com files will contain:
c_segment (proc g, proc main, padding untill the end of paragraph).
a segment(b, d, padding untill the end of paragraph).
a segment(last, padding untill the end of paragraph).
h segment (i, no padding)


I've incurred in the same problem  with "offset" when I was trying to compute the memory needed to allocate a variable in a .com file with INT 21h / AH = 4Ah,
where the new size for the specified segment is expressed in "paragraphs", and so I'd liked to compute the paragraphs needed with: (Offset LastVariable + MyVarSize) / 10h + 1.
As I realized that the error "A2026: constant expected" was not because I used the OFFSET operand (OFFSET alone was not considered a constant), but because I also used the operand "/" (but using "+" or "-" was fine), I eventually reached that tutorial that explains MASM behaviour with great precision:
http://www.arl.wustl.edu/~lockwood/class/cs306/books/artofasm/Chapter_8/CH08-4.html#HEADING4-101

To summarize, the point is that a memory offset/address is not a constant nor a variable (see my example above), but it's a "reloc": a symbol that refers to a memory location. Since these symbols are converted in numeric values at link time, only some arithmetics are allowed on theese address symbols:
- Addition or subtraction of a constant from an offset: it produces just a "shift" in the address, that is stored in the obj file (see .lst files), and the definitive address is computed at link time.
- Subtraction of one address symbol from another address symbol: the result is a regular constant (not a reloc) and can be used in further calculations. That's the way to obtain what we want.



And so here's the work-around to obtain the effective address (relative offset in the segment) of a variable at assembly time, and make caclulations on it.
Keep in mind that, for all that is said, if the program consists of more .asm sources linked toghether, and the other sources contain the same sagment, and that source is put before in the linking order, that work-around doesn't work correctly.


Code SEGMENT

ORG 100h
Main PROC
MOV BX, 5
ADD BX, MyVarOff / 10h ; or XOR or any other operator
Main ENDP

Code ENDS

Variables SEGMENT
Any_Var DB 1
Any_OtherVar DD ?
MyVarOff = $ - Variables
MyVar DW 4 DUP (0EFh)
Variables ENDS

END main