Seeming duplication ?

bobl · October 22, 2009, 07:20:17 AM

Here's someone's 2nd stage boolloader code

mov eax, 10000h
mov eax, [eax]
jmp eax

I'd be grateful if someone could tell me what instruction 2 does than instruction 1 doesn't.

Thx.

evlncrn8 · October 22, 2009, 08:22:03 AM

mov eax, 10000h ; eax = 10000h
mov eax, [eax] ; make eax = dword @ 10000h
jmp eax ; jmp to this value

could also be done like..

mov eax, 10000h
jmp dword ptr [eax]

sinsi · October 22, 2009, 08:40:36 AM

Quite often that 10000h will be defined as a constant, so wherever it appears it will be the same, and it is a lot easier to change one definition than several.
Boot loaders usually relocate code so the address gets referenced at least twice - once to copy it to its address and then the jump, so using 'STAGE2ADDRESS equ 10000h' is often done.

Anyway, when you write your own boot code you get the basics going and rarely, if ever, go back and 'optimize' the ancient code (speaking from experience here... :bg)

Simplest code would be 'jmp dword ptr ds:[10000h]'

bobl · October 22, 2009, 09:05:28 AM

Re make eax = dword @ 10000h
So that's it!

I like your solutions much better.

The code was written by someone who was more or less a beginner himself @ the time.
I know this cos he includes a disclaimer saying as much.

Thx both

hutch-- · October 22, 2009, 10:38:27 AM

sinsi's form is the tidiest but a long instruction, not that it would matter in most instances. Just to make it more confusing, here is a small macro for the lazy.

Code Select


  ;; ---------------
  ;; jump to address
  ;; ---------------
    jta MACRO address
      jmp dword ptr ds:[address]
    ENDM

Which you would use like this.

Code Select


    jta 10000h

If you are a genuine purist you would use CS rather than DS but MASM converts DS to CS anyway as a PE file has data and code segments mapped to the same address range.

Interesting part is the DS prefix code is 1 byte shorter built by MASM.

Code Select


00401045   FF2500000100         jmp     dword ptr [10000h]
0040104B 2EFF2500000100         jmp     dword ptr cs:[10000h]

sinsi · October 22, 2009, 11:12:35 AM

hutch, since this is a boot loader's 2nd stage, I would assume that the cpu is in pmode from booting the 1st stage, a PE file is way off.
Sounds like 'OS construction' (which we know we can't do with the masm32 licence :bg)

>If you are a genuine purist you would use CS rather than DS but MASM converts DS to CS anyway
With flat it doesn't matter and the default memory access is through DS, so the CS override needs the prefix byte.
The 'ds:' is needed for masm for some obscure reason, even though it is 1)the default 2)no prefix byte needed.

redskull · October 22, 2009, 11:23:22 AM

Just for my edificiation; if this is bootloader coade (assuming pre-protected mode switch), then what are the rules for the size prefixes? Can you use the extended registers without a size operand in real mode? I'll see if I can't dig it out of an Intel manual when I get some time

-r

sinsi · October 22, 2009, 11:28:14 AM

16-bit code that uses e.g. EAX always has the 66h (operand size) prefix. 67h is the address size prefix. 'unreal' mode code is usually full of 66s and 67s.
With 32-bit code, the 66h prefix is used for 16-bit regs e.g. AX - the exact opposite.

hutch-- · October 22, 2009, 01:12:33 PM

Yes you are right, at the boot loader stage you tend to roll your own.

Vaguely long long ago I remember that the so called UnReal mode switched to protected mode and back again to get 32 bit addressing, thank God I don't play with that stuff. :bg

bobl · October 22, 2009, 04:14:33 PM

All very helpful & thx for the macro
You're right we've just crossed into pmode for the above jmp.

Re cs this is the only reference I can find in stage 1 & 2 bootloaders
It happens in stage 1 (16 bit )
==============================================================.
mov ax, 0 ; set registers to code position
mov ds, ax
mov es, ax
push ax ; segment address onto stack
mov ax, 0h
push ax ; offset address onto stack
retf ; pop offset to ip, pop segment to cs <========================HERE
====================================================================
Again as a beginner it looks quite long winded

Whilst searching for 'cs' I came across cseg and noticed that the homemade gdt has 2 selectors cseg (code?) and dseg (data?)
I notice that

jmp cseg:goforth ; clear prefetch q upon jmp

is the last 16 bit instruction before Pmode and seems to jump to the very next instruction (on the disk image though prolly not in ram) which is the first of a sequence of instructions that

loads the dseg selector's addr into ds, ss, es, fs, gs (ie not cs)

followed by ...
sti

mov al, 0AEh ; enable keyboard
out 64h, al
mov eax, 10000h ;address with address to jump to
mov eax, [eax] ;make eax = dword @ 10000h <=======evincrn8's explanation above
jmp eax ;start Forth

I' a bit confused about the roles of cseg and dseg either side of PMode cos code and data look to be mixed together throughout.
A very simple explanation of what is going on above would be very helpful indeed.

BTW Re the 66h explanation. I was beginning to wonder about the accuracy of my disassembly. You've just restored my faith. Thx

bobl · October 22, 2009, 04:30:37 PM

I thought I'd got the answer but on reflection perhaps not!

MichaelW · October 22, 2009, 07:20:17 PM

Quote
I notice that

jmp cseg:goforth ; clear prefetch q upon jmp

is the last 16 bit instruction before Pmode and seems to jump to the very next instruction (on the disk image though prolly not in ram) which is the first of a sequence of instructions that

loads the dseg selector's addr into ds, ss, es, fs, gs (ie not cs)

The inter-segment jump is used to set CS and flush the instruction queue. Unlike the other segment registers, CS cannot be loaded with a MOV or POP instruction. CS is normally loaded with a far JMP, CALL, or RET instruction. The instruction queue must be flushed because it contains instructions that were decoded for execution in real mode.

bobl · October 23, 2009, 08:33:54 AM

That certainly wasn't apparent to me. Thank you for your concise explanation.

dedndave · October 23, 2009, 09:34:08 AM

QuoteCS is normally loaded with a far JMP, CALL, or RET instruction.

i would add INT and IRET to Michael's list
quite often, they set an interrupt vector and use it to switch between modes
either of these instructions can alter the CS reg - very similar to FAR CALL / RETF

bobl · October 23, 2009, 01:04:03 PM

Thats great!

I read 'A crash course in protected mode' @ http://www.geocities.com/siliconvalley/2151/pmode.html.
before starting this thread.

It didn't treat cs any differently from the other reg's but seemingly it is.

The article focused more on gdts/selectors which I also need to know about.

Thx for enlightening me further

News:

Seeming duplication ?