News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Disassembly

Started by dncprogrammer, December 08, 2006, 08:15:23 PM

Previous topic - Next topic

dncprogrammer

Hey all,

I wrote a hello program, compiled, then disassembled. There are a few things that I can't understand.

jmp St
msg     db   'Hello World!', 0ah, 0dh, 0
St:       mov si, msg
PS:       lodsb
           cmp al, 0
           jz Pr
           mov ah, 0eh
           int 10h
           jmp PS
Pr:       mov ah, 00h
           int 20h

no biggie, ok here's disassembly with my interpretations on the right

1-   00      E90F00                jmp (byte 12) from here
2-   03      48                       H
3-   04      656C                   EL
4-   07      6C                       L                       
5-   08      20576F                O WO
6-   0b      726C                   RL
7-   0d      64210A               D(cr)
8-   10      0D00BE               (lf)(null)(byte 12!)(BE03 means mov si,
9-   13      0301                   (byte 03)(?)         <---------?
10- 15      AC                      lodsb
11- 16      3C00                   cmp al, 0
12- 18      7406                   jz (byte 20)
13- 1a      B40E                   mov ah, Eh
14- 1c      CD10                   int 10
15- 1e      EBF5                   jmp (byte 15)
16- 20      B400                   mov ah, 00
17- 22      CD20                   int 20

Are the bytes in the beginning of the listing grouped together in a strange fashion because the disassembler thinks that they are instructions?
At the arrow above on line 9 is the 01 part of the AC instruction on the next line which makes the LODSB?

thanks a ton!

Tedd

BE 03 01 = "mov si,0103h"

0103h will be the offset of the hello world message when the file is loaded into memory (at offset 0100h)

The numbers are grouped by instruction, but with the msg in the middle of the instruction listing, it goes out of synch - which is why you get 'H' as an instruction :lol
No snowflake in an avalanche feels responsible.

dncprogrammer

Ok Tedd,
Sometimes the whole ORG thing confuses me because for DOS (and proper PSP location) I understand the .com file loading at 100h but the processor seems to know where to find everything if I specify no ORG0000 and I load the image somewhere else. Aren't the physical locations of commands and data calculated at runtime as relative positions from the execution. Like jump 20 bytes forward, or read a char 5 bytes backward instead of specifying physical locations in memory since they could be in different physical addresses for each run? I was a BASIC programmer for many years and I am still trying to learn to think like the machine and not just GOTO 10 where, to the programmer, 10 is always the same thing. I think the whole abstraction pricipal makes it a little hard to get used to absolute machine coding. Thanks!
jon

MichaelW

QuoteAren't the physical locations of commands and data calculated at runtime as relative positions from the execution.
Near code addresses are encoded as displacements (relative addresses), so in a .COM file short jumps, near jumps, and near calls will work correctly without an ORG 100h directive. The problem is with the data addresses. The ORG directive sets the value of the location counter that MASM uses to assign addresses to labels. The default starting value is zero, so in this case, because the initial jump instruction is 2 bytes in length, MASM would assign the value 2 to the msg label, instead of 102h, which would be the actual address at runtime.

In your source ST is a reserved word, and you need to load the offset address of msg into SI.

If you load your program into DEBUG and use the unassemble command, you will get:

-u
0B59:0100 EB0F          JMP     0111
0B59:0102 48            DEC     AX
0B59:0103 65            DB      65
0B59:0104 6C            DB      6C
0B59:0105 6C            DB      6C
0B59:0106 6F            DB      6F
0B59:0107 20576F        AND     [BX+6F],DL
0B59:010A 726C          JB      0178
0B59:010C 64            DB      64
0B59:010D 210A          AND     [BP+SI],CX
0B59:010F 0D00BE        OR      AX,BE00
0B59:0112 0201          ADD     AL,[BX+DI]
0B59:0114 AC            LODSB
0B59:0115 3C00          CMP     AL,00
0B59:0117 7406          JZ      011F
0B59:0119 B40E          MOV     AH,0E
0B59:011B CD10          INT     10
0B59:011D EBF5          JMP     0114
...


DEBUG cannot tell the difference between code and data, so to correct this problem you must direct DEBUG to skip over the data, like so:

-u 111
0B59:0111 BE0201        MOV     SI,0102
0B59:0114 AC            LODSB
0B59:0115 3C00          CMP     AL,00
0B59:0117 7406          JZ      011F
0B59:0119 B40E          MOV     AH,0E
0B59:011B CD10          INT     10
0B59:011D EBF5          JMP     0114
...


Note that I added an ORG 100h, so the MOV SI instruction is encoded with the correct offset address of msg.
eschew obfuscation

dncprogrammer

Thanks Michael,
I changed my code, removing the ORG100 and then putting it back and disassembling, I see that if DOS loads my program to 100 then my string stars at something like 0003, essentially the same result as what you illustrated. Ok.
When I work with my little os project I compile all of my applications with ORG 0, and my loader places those programs at 0901:0000  and they work properly. What is the difference between the way that my loader executes programs and the way that DOS does? Is it because of my 0000 offset? If I loaded to 0901:0002 would that throw off my addresses since they are numbered from 0 by the compiler?
jon

MichaelW

It would not throw off addresses that were encoded as displacements, but it would throw off addresses that were encoded as addresses (data addresses or the destination offset addresses for far jumps and far calls).

Skipping most of the details, when DOS loads a .COM file it:

Allocates a block of memory (size = all available memory).

Creates a 256-byte PSP starting at offset 0.

Loads the program starting at offset 100h.

Sets the segment registers (CS, DS, ES, and SS) to the segment address of the block.

Sets SP to zero and pushes a zero word.

Jumps to the instruction at offset 100h.
eschew obfuscation

dncprogrammer

Ok, that sounds good.
So, it is true that if I loaded my code at an offset greater than 00 of my segment, considering that I compiled starting at 00, then my data and far addresses would be off but my short code would execute just fine and any references to the far material would be unpredictable. Why is the stack initialized with a 0 entry by DOS? and where does DOS's stack reside in order to not encroach on the application program's block?
thank you so much!

MichaelW

Zero is pushed onto the stack for compatibility with early versions of DOS. One method of terminating a .COM program, that still works today, is to perform a near return after the stack has been returned to its initial condition. The near return pops the zero from the stack and uses it as the return address (CS:0), and execution continues with the Interrupt 20h instruction that the loader placed at the start of the PSP for this purpose.

The stack is at the end of the first 64KB of allocated memory, or at the end of the allocated memory if less than 64KB was allocated (in which case SP is set to 2 more than the number of bytes allocated, instead of to zero). The push instruction decrements SP (by 2 for 16-bit code) before it copies its operand to SS:SP, so assuming at least 64KB was allocated the pushed zero ends up at SS:FFFEh.
eschew obfuscation

dncprogrammer

Where do command.com & the .sys files end up in memory when dos is running? Im sure the dos interrupt routines are somewhere in those as well, or not. ?
jon

MichaelW

AFAIK it will vary with the version of DOS. This memory map is from The MS-DOS Encyclopedia, Ray Duncan (ed.), Microsoft Press, 1988, p78 Figure 2-9:
[FFFF:000F]
ROM BIOS
[F000:0000]
Other ROM and RAM
COMMAND.COM (transient)
Free RAM
COMMAND.COM (resident)
Installable device drivers
File control blocks
Disk buffers
MS-DOS tables
MS-DOS kernel (MSDOS.SYS)
MS-DOS BIOS (IO.SYS)
[0000:0600]
ROM BIOS tables
[0000:0400]
Interrupt vectors
[0000:0000]
eschew obfuscation

dncprogrammer

How can code be transient if far addresses and data addresses can't float with the rest of the code? Is there some sort of correction that is done on the fly by the os?
jon

MichaelW

It's been a long time, so prefix all of this with AFAIK.

The DOS data segment is stored at a fixed location somewhere in the area of the DOS kernel, fixed meaning that the address is fixed at startup. See Interrupt 2Fh, Function 1203h and Interrupt 21h, Function 52h for more information.

COMMAND.COM is split into a resident part and a transient part to reduce its memory footprint. The resident part is fixed in memory. The transient part is loaded into what is effectively free memory, I think at an address that is fixed at startup, and reloaded as necessary whenever it has been overwritten. Even if the address of the transient part is not fixed, accessing code or data through a system of pointers is entirely possible, and in most cases not difficult. The memory map that I posted is the last of multiple maps, each detailing a stage in the (moderately complex) startup process.
eschew obfuscation

TomRiddle

Damn your smart Michael   :dazzled: