Segments, Offsets, Addresses, Memory in general

f4cepl4nt · August 27, 2007, 04:30:40 AM

Alright guys, I know I'm going to get a few odd looks my way for asking dumb questions like this, but I've recently had a bit of a programming crisis. I've been using c++ for ages now, and about a year ago I figured I needed some really low-level understanding of programming, so I picked up some basic MASM (what could be better than mnemonics for machine code??).

Just lately, I've run into some trouble. Whether that's because I'm getting some old school info from 16 bit tutorials mixed up with 32 bit tutorials, or whether that's just because I'm overcomplicating things in my mind, I don't know. Either way I'm sure you guys could answer it in a snap.

I always thought memory addresses were simply four bytes long, end of story. An EXE file contained its own "reference" addresses for things like code and data, and when the program was run everything was loaded into physical memory, not necessarily with the same address.

Then I read an ASM tutorial, and everything went awry. Now there's segments and offsets - in the form FFFF:FFFF. If you want the physical address of a variable in your program, you take its segment two bytes and shift them over to the left, then add the offset. Huh? I thought processes were loaded into physical memory wherever there was room, not in a specific spot. This question is already partially answered in my mind - this is a DOS style program that uses real mode, correct? Clarify that one for me please.

OK, next up is the data/code/stack/extra segments in ASM. I would assume that each one of these segments has its own most significant 2 bytes in the address, so say if the code segment was A2B0, then code in the program would be stored from A2B00000 to A2B0FFFF in the processes own address names (it may not be stored there in the real physical memory). But when I opened up notepad.exe in a debugger, the CS segment was 023B and most of the code had the address 0190xxxx. Same as the data segment, it was different than the first two bytes of the addresses of variables in the program. And then if each segment has a specific two first bytes in the address, wouldnt that limit there to being only 64 kB of instructions / data / stack? :dazzled:

THEN someone told me that these segments overlap, and so from that I'm getting that data and code and stack addresses are all intermingled, not separated...now I'm just plain confused.

Someone, please, give a c++ programmer some help. I almost wish I was back in my bubble, thinking about memory as four bytes of hex and that's that. :lol

hutch-- · August 27, 2007, 04:54:22 AM

f4cepl4nt,

Basically people who are not familiar with data sizes make mistakes like this. In 32 bit ALL addressing without fail is 32 bit, 00000000h in size, segment/offset addressing is a leftover from 16 bit. If you really wanted to you could design an OS that used larger segment offset values but why would you bother ?

In 32 bit you have FLAT memory model with a addressing range of 4 gig that is all NEAR addressing and it is simply linear address space, no segments or offsetas from segments are required and in fact do not work at all.

Tedd · August 27, 2007, 11:36:03 AM

First thing (just to bring back a little sanity for you :wink) addresses are 32-bit and everything you understand from C++ is about right. Phew, the world isn't going to end! :bg
The whole issue of segments is only relevant in 16-bit mode (real mode). Quick explanation: you only have 16-bit offsets to address with, which would limit you to a whopping 64k of memory - wow, so much! So, those clever people at intel decided to split the memory up into 64k segments, spaced 16 bytes apart (which is why they overlap) and then give us another 16-bit register (the segment registers) to choose which segment to base the offsets from - extending access to a grand total of 16MB!!
As for exes and mapping them into memory, it gets a little more complicated by the use of virtual memory - nothing is really where it says it is :bdg The addresses used are virtual address, and as far as each process is concerned it's the only process in memory and has access to the full address range. The truth is that processes do share physical memory, and the OS lies to them by mapping the virtual memory into physical memory where there is space (and uses paging for when there isn't.)
There, now it's much clearer, right? :lol

MichaelW · August 27, 2007, 06:09:30 PM

The original 8086/8088 design provided a 20-bit (1MB) address space, and the segment-offset address mechanism allowed Intel to do this with 16-bit registers, generating a 20-bit physical address by shifting the segment address left by 4 bits and adding it to the offset address. The address space increased with later processors, but the real-mode segment-offset address mechanism stayed the same. The segment-offset address mechanism was used as the basis for a system that allowed executables to be "relocatable", specifically so the OS could load the executable into memory wherever their was room for it, with the restriction that it had to be loaded on a 16-byte boundary.

News:

Segments, Offsets, Addresses, Memory in general

f4cepl4nt

hutch--

Tedd

MichaelW