Curiosity about data segment initialization

Started by falcon01, July 31, 2011, 10:49:28 AM

Previous topic - Next topic

falcon01

Hi, reading at some 16bit code I encountered these sequences:

DSEG        segment para public 'DATA'
...
DSEG        ends

...other segment definition...

assume CS:CSEG,DS:DSEG,SS:SSEG


and

mov ax,DSEG
mov ds,ax


Now, what's the meaning of second sequence? I mean, I already tell in the first one (assume line) I'm associating DS register with DSEG area, so why update DS again?
And why is this done just for DS one?Are CS and SS different? :S

MichaelW

The ASSUME directive sets an assembly-time association, that the assembler needs to calculate addresses. The loading of segment registers is a run-time operation. The program loader sets the CS segment register to the segment address of the program's code segment (or the starting code segment if there is more than one), and loads the inital SS and SP values from the EXE header, but leaves the DS and ES segment registers set to the segment address of the Program Segment Prefix (PSP). For an EXE (but not for a .COM file), to access data through the DS segment register (the default segment register for most instructions that access memory) it must be loaded with the segment address of the program's data segment.

eschew obfuscation

dedndave

i haven't seen much of it in here, for some reason
but, it's good to save the original contents of DS or ES
that way, you may access the PSP that Michael refered to
the command line is in there, as well as pointers to the environment block, FCB's,
and even the first 2 command line parms are parsed if they ressemeble DOS filenames

while it's true that you cannot load a segment register directly with an immediate value,
you can store/load it directly to/from memory...
        .DATA?

PspSeg  dw ?

        .CODE

        mov     PspSeg,DS
        mov     ax,@DATA
        mov     ds,ax

falcon01

So, the "assume" is basically for saying to processor: "ehy at execution time I'll give you the right address of DS" ?
2Dave:
Thank you for every improvement hint you give ;)

FORTRANS

Hi,

   ASSUME is to tell the assembler where you expect to find
your data so it can generate the proper address for the code
that follows the assume.  And some error checking.  So if you
ASSUME DS:DATASEG, you pretty much have to initialize the
DS register to actually point to DATASEG or the assembler
will emit bad code.

HTH,

Steve N.

falcon01

Mmm...then I can't understand!
If I tell the assembler I expect to find my data at DATASEG, what need do have I to set DS=DATASEG manually?
Can't the assembler do it itself after reading DS:DATASEG?Sorry but it's not very clear...

moreover I usually put the assume directive at the END of my code, rigth before END directive...is it wrong?

dedndave

it wants to know that DS is pointing to where the data is
it is really a type of "strong checking"

if you reference a label in SOMESEG, at offset 2000h....
        mov     ax,SOMELABEL
it creates the following code....
        mov     ax,[2000h]

for data MOV's of this type, DS is the default segment register
so, it will load AX with the contents at DS:[2000h]
the assembler does not know at assembly time what value will be in DS at runtime   :P
by using ASSUME, you tell the assembler you are refering to data labels in SOMESEG
now, it is happy - lol
at runtime, DS may actually not point to SOMESEG
then, you get whatever is at DS:[2000h]

let's say we have 2 data segments, DSEG1 and DSEG2
i use ASSUME.....
        ASSUME  DS:DSEG1,ES:DSEG2

and, i load the segment registers...
        mov     ax,DSEG1
        mov     ds,ax
        mov     ax,DSEG2
        mov     es,ax


now, let's say that i want to get a value into AX from a label in DSEG2.....
        mov     ax,DSEG2DATA

the assembler knows that ES is pointing to DSEG2
it also knows that DS is not pointing to DSEG2
it knows those things because we told it so with ASSUME
it generates the following code with a segment override....
        mov     ax,ES:[2000h]
a segment override is required because the data is not in the default (DS) segment

FORTRANS

Quote from: falcon01 on July 31, 2011, 07:33:29 PM
Mmm...then I can't understand!
If I tell the assembler I expect to find my data at DATASEG, what need do have I to set DS=DATASEG manually?

   Well, one more time for redundancy's sake.


DATASEG SEGMENT PUBLIC

MyData  DW      42

DATASEG ENDS
EXTRASG  SEGMENT PUBLIC
; more data...
EXTRASG  ENDS

        ASSUME  DS:DATASEG
        MOV     AX,SEG DATASEG
        MOV     DS,AX

; Now you can access MyData

        ASSUME  DS:EXTRASG
        MOV     AX,SEG EXTRASG
        MOV     DS,AX

; Now you can NOT access MyData, and if you try
; you will get an error message.


Quote
Can't the assembler do it itself after reading DS:DATASEG?

   Probably it could, but it doesn't.

Quotemoreover I usually put the assume directive at the END of my code, rigth before END directive...is it wrong?

   If you never change a segment register, you can probably
do it that way.  If it's not broken, you don't have to fix anything.
But it is better to put the ASSUME where you initialize the
segment register(s).  Easier to see what is happening.  At
least for me.

Cheers,

Steve N.

falcon01


mineiro

When you code some rotine (procedure), you put names like PROC and end the rotine with ENDP.
Your rotine are inside some segment, so naturally you need start and end this segment, with SEGMENT and ENDS.
These names tell the assembler to generate some pseudo-ops(false operations). The assembler assume this is true.

From your point of view, you know you can "call" another procedure inside some procedure, and you know you can have more than one segment.
And if you like to call another procedure in another segment? You need inform the assembler that it must perform some work to make your job easy. And this is where pseudo-ops enter in scene, like public, near, far, assume ... .
"Assume" give informations to assembler about segments, how we like to use registers segment, ... . To understand about "assume", you need first understand about labels and variable names.
Every time you create a label, like "something proc near" or a memory variable, assembler take care about not only their simbolic name, but the type too (byte, word,..), the address of simbolic name, and the segment they are defined. Assume is closed(looking) to that last information.
Assembler do not assume automatically that all rotines are in the same segment.

So, the pseudo code below:
assume cs:code_segment

tell to assembler that cs is pointing to a simbolic name code_segment. Without this information, assembler just crosses they arms if you try to call that with "call someplace", and you get some error message like "No or unreachable cs".
Appears strange uh, because cs is every time pointing to code segment. In fact, we dont need use "assume" in this situation, but, we need use "assume" in this situation because have one thing called "segment superposition".
The processor generaly read data (like "mov al,some_variable) in data segment. But, he can read that variable from another segment (eg: es segment).
This is why assembler needs the pseudo-ops "assume:", to know what segment register he will use.
So your program can deal with multiple segments, relocations, ... .
Just a last comment, imagine yourself writing a program that have a data,code,stack,extra segment. So all segments are filled allright? Now, the point is, how can your write to video address (segment b800h)? You need discard temporary some register segment, put video segment in this addres, do some actions ,and after restore that segment.

Some parts of this post are translated by me using my poor english, if you like see a better explanation, look to:
"Peter Norton's Assembly Language Book for the IBM PC", by Peter Norton & John Socha, chapter 29 and chapter 11.

MichaelW

BTW, starting with ML 6.0 the ASSUME CS was no longer necessary.
eschew obfuscation

mineiro

Thank you Sr MichaelW, very well noted, and I admit, when I was done that translation, this thing don't come in my mind.


falcon01

A very interesting explaination mineiro, thanks u too^^

falcon01

Mmm...ok now a strange thing occurs!
If I simply remove
assume DS:DATASEG
but I leave
mov AX,DATASEG
          mov DS,AX

everything just works...

dedndave

that is because you are only using the data area in a simple program   :bg

if you load the offset of a label...
        mov     dx,offset String
the assembler does not care about the DS register
it gives you the offset relative to the beginning of the segment

however, if you use something like....
        mov     ax,SomeData
the assembler wants to know that DS matches the segment or group of SomeData