I made this small Protected mode program and it is giving me a problem. When I try to run the program it will restart the computer. It seems to happen on the far jump directly after the PM switch. The code is listed below, any suggestions? Thanks
.MODEL SMALL
.386p
.STACK
GDT_DESCR STRUC
gdt_size WORD 0
gdt_location DWORD 0
GDT_DESCR ENDS
GDT_ENTRY STRUC
segment_size15_0 WORD 0
base_addr15_0 WORD 0
base_addr23_16 BYTE 0
p_dpl_s_type BYTE 0
g_db_0_avl_seg19_16 BYTE 0
base_addr31_24 BYTE 0
GDT_ENTRY ENDS
.DATA
gdt_descriptor GDT_DESCR <127>
gdt GDT_ENTRY <>, \
<0FFFFh, , , 09Ah, 0CFh>, \
<0FFFFh, , , 092h, 08Fh>
pm_jmp DWORD 0
.CODE
MAIN PROC
ORG 0
;Get linear address of GDT
MOV AX, DS
MOVZX EAX, AX
SHL EAX, 4
ADD EAX, OFFSET gdt
MOV gdt_descriptor.gdt_location, EAX
LGDT gdt_descriptor
; Initialize far pointer for mode change
MOV WORD PTR pm_jmp, OFFSET START
MOV WORD PTR pm_jmp[2], 8
; Go to PM
MOV EAX, CR0
OR AL, 01h
MOV CR0, EAX
; Do intersegment jump to set cs and flush instruction queue
JMP DWORD PTR pm_jmp
START:
CLI
HLT
MAIN ENDP
END
- you should set base of GDT descriptor 8, which is used for CS
- descriptor for CS should be a 16-bit descriptor, D-bit should be cleared
- interrupts should be disabled before changing CR0
- instead of using "jmp dword ptr []" you should use 0EAh opcode (jmp ssss:oooo)
- you should set base of GDT descriptor 8, which is used for CS
The base address is set to zero using the default declarations.
- descriptor for CS should be a 16-bit descriptor, D-bit should be cleared
Why wouldn't it be 32-bit?
- interrupts should be disabled before changing CR0
Forgot about that one, sorry.
- instead of using "jmp dword ptr []" you should use 0EAh opcode (jmp ssss:oooo)
I'll try it and post back if it works, Thanks. :thumbu
Nope, using the straight-up Op-code didn't fix it. The program is still rebooting the system.
Here is the updated code:
.MODEL SMALL
.386p
.STACK
GDT_DESCR STRUC
gdt_size WORD 0
gdt_location DWORD 0
GDT_DESCR ENDS
GDT_ENTRY STRUC
segment_size15_0 WORD 0
base_addr15_0 WORD 0
base_addr23_16 BYTE 0
p_dpl_s_type BYTE 0
g_db_0_avl_seg19_16 BYTE 0
base_addr31_24 BYTE 0
GDT_ENTRY ENDS
PM_JUMP MACRO SEGMENT,OFFSET
BYTE 0EAh
WORD OFFSET
WORD SEGMENT
ENDM
.DATA
gdt_descriptor GDT_DESCR <127>
gdt GDT_ENTRY <>, \
<0FFFFh, , , 09Ah, 0CFh>, \
<0FFFFh, , , 092h, 08Fh>
.CODE
MAIN PROC
ORG 0
;Get linear address of GDT
MOV AX, DS
MOVZX EAX, AX
SHL EAX, 4
ADD EAX, OFFSET gdt
MOV gdt_descriptor.gdt_location, EAX
LGDT gdt_descriptor
; Go to PM
CLI
MOV EAX, CR0
OR AL, 01h
MOV CR0, EAX
; Do intersegment jump to set cs and flush instruction queue
PM_JUMP 08h, OFFSET START
START:
CLI
HLT
MAIN ENDP
END
Could it be that the "OFFSET" directive is getting the "START:" segment's real-mode offset and when the processor switches to protected mode the offset value it obtained in real-mode is now in-valid?
As the linker warning indicates your code does not specify a starting address. Even though this program will work OK with the default starting address, not all program layouts will so you should generally specify a starting address.
If the program is to actually use data, you must initialize DS so it points to the data segment (and not to the PSP as the loader initializes it). You can do this with something like:
mov ax,@data
mov ds,ax
Or you can just use the .STARTUP directive which will initialize DS and set the program starting address to the address of the directive.
The ORG 0 directive serves no useful purpose.
Beyond that, I can see two problems with your PM setup code. In your jump instruction you are encoding a 16-bit immediate offset address that is calculated relative to your program's code segment. For this to work the segment base address in the code segment descriptor must match the absolute address of your program's code segment, and the D bit must be clear. If your program is to use data from PM then the segment base address in the data segment descriptor will need to be set to match the absolute address of your data segment, and DS will need to be loaded with the data selector after the switch to PM. If your program is to use the stack from PM then the SS register will need to be set appropriately. If you use the .STARTUP directive then DS and SS will be initialized to the same value (the segment address of DGROUP) and SP will be adjusted accordingly, so you can load the data selector into SS as well.
I did some extensions to your little app and now it works.
While in PM it does some screen magic and by pressing ESC it should return to RM, avoiding having to reboot
.MODEL SMALL
.386p
.STACK
GDT_DESCR STRUC
gdt_size WORD 0
gdt_location DWORD 0
GDT_DESCR ENDS
GDT_ENTRY STRUC
segment_size15_0 WORD 0
base_addr15_0 WORD 0
base_addr23_16 BYTE 0
p_dpl_s_type BYTE 0
g_db_0_avl_seg19_16 BYTE 0
base_addr31_24 BYTE 0
GDT_ENTRY ENDS
PM_JUMP MACRO _SEGMENT,_OFFSET
BYTE 0EAh
WORD _OFFSET
WORD _SEGMENT
ENDM
.DATA
gdt_descriptor GDT_DESCR <127>
gdt GDT_ENTRY <>, \
<0FFFFh, , , 09Ah, 08Fh>, \
<0FFFFh, , , 092h, 08Fh>, \
<0FFFFh, , , 092h, 000h> ;a valid 64 kB data descriptor
.CODE
MAIN PROC
mov ax,DGROUP
mov ds,ax
;Get linear address of GDT
MOV AX, DS
MOVZX EAX, AX
SHL EAX, 4
ADD EAX, OFFSET gdt
MOV gdt_descriptor.gdt_location, EAX
;set decriptor 8 to base of CS
MOV AX, CS
MOVZX EAX, AX
SHL EAX, 4
mov [gdt+1*sizeof GDT_ENTRY].base_addr15_0,ax
shr eax,16
mov [gdt+1*sizeof GDT_ENTRY].base_addr23_16,al
mov [gdt+1*sizeof GDT_ENTRY].base_addr31_24,ah
LGDT gdt_descriptor
; Go to PM
CLI
MOV EAX, CR0
OR AL, 01h
MOV CR0, EAX
; Do intersegment jump to set cs and flush instruction queue
PM_JUMP 08h, OFFSET START
START:
mov ax,10h
mov ds,ax
mov bx,0700h
nextloop:
mov ax,bx
mov cx,80*24
mov edi,0B8000h
.while (cx)
mov [edi],ax
inc edi
inc edi
inc al
dec cx
.endw
inc bl
in al,64h
and al,1
jz nextloop
in al,60h
cmp al,1 ;ESC pressed?
jnz nextloop
mov ax,18h
mov ds,ax
mov eax,cr0 ;back to real mode
and al,0FEh
mov cr0,eax
db 0eah
dw offset in_rm_again
dw seg _TEXT
in_rm_again:
sti
mov ax,4c00h
int 21h
CLI
HLT
MAIN ENDP
END MAIN
Thanks for the help guys! :U
This line:
[gdt+1*sizeof GDT_ENTRY]
I'm not understanding it, could you walk through it please.
Also, I was originally trying to set up the GDT so that there would be a code segment that would consist of the first megabyte of memory and a data segment that encompassed the entire 32-bit memory range.
Is this possible?
Thanks again.
r_miele,
this [gdt + 1 * sizeof GDT_ENTRY] is to address the second item in the gdt descriptor table.
MASM knows byte offsets only, no array indices, thats why "sizeof GDTENTRY" has to be added.
> Also, I was originally trying to set up the GDT so that there would be a code segment that would consist of the first
> Is this possible?
Is is surely possible, but a bit advanced. To use it you would have to switch to a 32bit code segment (that is, CS D-bit is set, thus EIP instead of IP is used) immediatetely after switching to protected mode. MASM will force you to place such code in a "use32" code segment.
This version uses a 32 bit flat code segment:
.MODEL SMALL
.STACK 2048
.386p
GDT_DESCR STRUC
gdt_size WORD 0
gdt_location DWORD 0
GDT_DESCR ENDS
GDT_ENTRY STRUC
segment_size15_0 WORD 0
base_addr15_0 WORD 0
base_addr23_16 BYTE 0
p_dpl_s_type BYTE 0
g_db_0_avl_seg19_16 BYTE 0
base_addr31_24 BYTE 0
GDT_ENTRY ENDS
PM_JUMP MACRO _SEGMENT,_OFFSET
BYTE 0EAh
WORD _OFFSET
WORD _SEGMENT
ENDM
.data
gdt_descriptor GDT_DESCR <127>
gdt GDT_ENTRY <>, \
<0FFFFh, , , 09Ah, 08Fh>, \ ;08
<0FFFFh, , , 092h, 08Fh>, \ ;10
<0FFFFh, , , 092h, 000h>, \ ;18 a valid 64 kB data descriptor
<0FFFFh, , , 09Ah, 0CFh> ;20 a flat 32 bit code segment
.code
MAIN PROC
mov ax,DGROUP
mov ds,ax
;Get linear address of GDT
MOV AX, DS
MOVZX EAX, AX
SHL EAX, 4
ADD EAX, OFFSET gdt
MOV gdt_descriptor.gdt_location, EAX
;set decriptor 8 to base of CS
MOV AX, CS
MOVZX EAX, AX
SHL EAX, 4
mov [gdt+1*sizeof GDT_ENTRY].base_addr15_0,ax
shr eax,16
mov [gdt+1*sizeof GDT_ENTRY].base_addr23_16,al
mov [gdt+1*sizeof GDT_ENTRY].base_addr31_24,ah
;set call to flat 32 bit code
mov ax, _TEXT32
movzx eax,ax
shl eax,4
mov dx, offset start
movzx edx,dx
add eax, edx
mov cs:[xxx], eax
LGDT gdt_descriptor
; Go to PM
CLI
MOV EAX, CR0
OR AL, 01h
MOV CR0, EAX
; Do intersegment jump to set cs and flush instruction queue
db 66h, 0eah ;jmp fword ptr 20h:start
xxx dd 0
dw 20h
back_in_16_bit::
mov ax,18h
mov ds,ax
mov eax,cr0 ;back to real mode
and al,0FEh
mov cr0,eax
db 0eah
dw offset in_rm_again
dw seg _TEXT
in_rm_again:
sti
mov ax,4c00h
int 21h
MAIN ENDP
_TEXT32 segment use32 dword private 'CODE'
start:
mov ax,10h
mov ds,ax
mov bx,0700h
nextloop:
mov ax,bx
mov cx,80*24
mov edi,0B8000h
.while (cx)
mov [edi],ax
inc edi
inc edi
inc al
dec cx
.endw
inc bl
in al,64h
and al,1
jz nextloop
in al,60h
cmp al,1 ;ESC pressed?
jnz nextloop
db 0eah
dw offset back_in_16_bit ;jmp fword ptr 8:back_in_16_bit
dw 0 ;HIWORD(offset)
dw 8
_TEXT32 ends
END MAIN
Please note: you may get linker errors with MS link or Borland's tlink.
For 32bit OMF code like this one I would suggest to use digital mars OMF linker (it is free),
which is by far the best OMF linker I know. Furthermore it has no problems with DGROUP size > 64 kB.
Thanks for the quick reply japheth! :thumbu
I have a couple of other questions if you do not mind.
[gdt+1*sizeof GDT_ENTRY]
Correct me if I am wrong, with four entries in the GDT the "SIZEOF" directive will return 256. gdt will provide the offset of the GDT and the plus one will add one byte to that. I can figure out how that is getting to the second element in the array. I don't know why this is confusing me so badly!
When a small memory model is used do the segments default to 16-bit?
This following code:
;set call to flat 32 bit code
mov ax, _TEXT32
movzx eax,ax
shl eax,4
mov dx, offset start
movzx edx,dx
add eax, edx
mov cs:[xxx], eax
This is going to calculate the linear address of the 32-bit code segment but I've never seen this syntax before:
mov cs:[xxx], eax
could you please explain it to me.
This jump:
; Do intersegment jump to set cs and flush instruction queue
db 66h, 0eah ;jmp fword ptr 20h:start
xxx dd 0
dw 20h
I'm confused with this jump, the Intel Developers Manual Vol 3 states "The instruction prefix 66H can be used to select an operand size other than the default, and the prefix 67H can be used select an address size other than the default."
Is that were the 66h is coming into effect?
Thanks for the help and sorry for so many questions, I want to make sure I understand all that is going on. :U
> Correct me if I am wrong, with four entries in the GDT the "SIZEOF" directive will return 256. gdt will provide the
> offset of the GDT and the plus one will add one byte to that
No (I dont understand how you can figure out a size of 256 for anything).
[gdt+1*sizeof GDT_ENTRY]
(can also be written: [gdt + (1*sizeof GDT_ENTRY)])
"sizeof GDT_ENTRY" is 8
so in total it is "offset gdt + 8"
> When a small memory model is used do the segments default to 16-bit?
it depends:
with:
.286
.model small
you get defaults of 16 bit, with
.386
.model small
you get 32bit defaults. So the .model directive depends on the previous processor directive.
> This is going to calculate the linear address of the 32-bit code segment but I've never seen this syntax before:
This calculation is needed because you wanted a flat, zero-based CS segment (_TEXT32). In this case you have
to calculate the linear address of label start, that is "(segment * 16) + offset"
> mov cs:[xxx], eax
the linear address of label "start" has to be calculated at run time, the dos MZ loader is unable to do that. Its done here and so the intersegment jump to 32bit may work.
> Is that were the 66h is coming into effect?
yes. with the 66h the cpu expects a dword offset in the far jump (which is required here), without it the offset is expected to be a word only.
[EDIT]
I just tried with MASM 6.15 and this more simple coding works as well:
jmp fword ptr [xxx]
xxx dd 0
dw 20h
So there is no need to use the db 66h, db 0eah form
[/EDIT]
BTW: I find it good that someone is interested in this basic stuff. My personal opinion is that an ASM programmer should have to know it, but that isn't true nowadays.
QuoteCorrect me if I am wrong, with four entries in the GDT the "SIZEOF" directive will return 256. gdt will provide the offset of the GDT and the plus one will add one byte to that. I can figure out how that is getting to the second element in the array. I don't know why this is confusing me so badly!
For a structure SIZEOF returns the number of bytes in the initializers. For the GDT_ENTRY structure SIZEOF would return 8. For the gdt
variable in japheth' s most recent code, SIZEOF would return 40 (5 entries * 8 bytes per entry).
From the MASM 6.0 Programmer's Guide:
The assembler evaluates expressions that contain more than one
operator according to the following rules:
Operations in parentheses are always performed before any
adjacent operations.
Binary operations of highest precedence are performed first.
Operations of equal precedence are performed from left to right.
Unary operations of equal precedence are performed right to left.
The order of precedence for all operators is listen in Table 1.3.
Operators on the same line have equal precedence.
Table 1.3 Operator Precedence
Precedence Operators
1 (),[]
2 LENGTH, SIZE, WIDTH, MASK
3 .(structure-field-name operator)
4 :(segment-override operator), PTR
5 LROFFSET, OFFSET, SEG, THIS, TYPE
6 HIGH, HIGHWORD, LOW, LOWWORD
7 +, - (unary)
8 *, /, MOD, SHL, SHR
9 +, - (binary)
10 EQ, NE, LT, LE, GT, GE
11 NOT
12 AND
13 OR, XOR
14 OPATTR, SHORT, .TYPE
Multiplication has a higher precedence than addition (higher meaning higher up in the list), so the multiplication is performed first. Some of these operators are unique to MASM, but the parentheses, arithmetic, relational and logical operators have the same relative order of precedence for most expression evaluators.
http://www.jimloy.com/algebra/some.htm
QuoteWhen a small memory model is used do the segments default to 16-bit?
See Defining Segments with the SEGMENT Directive, and Setting Segment Word Sizes (80386/486 only):
http://webster.cs.ucr.edu/Page_TechDocs/MASMDoc/ProgrammersGuide/Chap_02.htm
Thanks for the help guys. :U
I read that chapter MichaelW and it answered a couple of questions for me but brought up a few problems.
I think the root of my major problem with assembly is my in-ability to define the difference between physical segments and logical segments.
For example, I've always thought that when you use the small memory model you have your data and code stored in two separate physical segments. After reading that chapter I think I was wrong in that they are stored in two different logical segments. I also noticed that when I would run a small memory model program in codeview, it would state that my data and code segments were in the same physical segment.
Am I correct or do I mis-understand this. Any insight would be appreciated. :wink
I think the chapter's reference to logical segments just creates unnecessary confusion. If you were coding in hex, or creating an assembler, or certain types of memory managers, you might need to consider the distinction between physical and logical segments. But for most purposes I think you should just visualize segments as "regions" of physical memory, where the segment address specifies the region and the offset address specifies the location within the region.
QuoteLogical segments contain the three components of a program: code, data, and stack. MASM organizes the three parts for you so they occupy physical segments of memory. The segment registers CS, DS, and SS contain the addresses of the physical memory segments where the logical segments reside.
If you examine a small model program in CodeView you will see that CS, DS, and ES are all set to the same segment address when the program is loaded, but after the normal startup code executes (the code that the .STARTUP directive would generate) DS and SS are set to a different segment address (the segment address of DGROUP), and CS still contains its original value. According to the above quote, the logical code segment is in one physical segment, and the logical data and stack segments are in another physical segment. This arrangement is common to all of the conventional memory models other than TINY (the memory model for a COM file), for which all of the program's logical segments share a single physical segment.
Alright, I finally think all of my Assembly knowledge is starting to come together. :bg
I just have to verify a couple of things with you guys.
In the above code supplied by japheth, this section of code:
;set call to flat 32 bit code
MOV AX, _TEXT32
MOVZX EAX, AX
SHL EAX, 4
MOV DX, OFFSET start
MOVZX EDX, DX
ADD EAX, EDX
MOV CS:[xxx], EAX
;Initializing the GDTR.
LGDT gdt_descriptor
; Go to PM
CLI
MOV EAX, CR0
OR AL, 01h
MOV CR0, EAX
; Do intersegment jump to set cs and flush instruction queue
JMP FWORD PTR [xxx] ;jmp fword ptr 20h:start
xxx DWORD 0
WORD 20h
Correct me if I am wrong. The reason you are declaring "xxx" here and not in a data segment is because when the processor is switched into protected mode it won't be able to access the data segment because its decriptor hasn't been loaded into the DS register. Normally you can't declare data in a code segment because the processor will fault when it tries to execute a data declaration but it is possible here because the processor doesn't actually process the declaration, it just uses it for the jump operation.
This line:
MOV CS:[xxx], EAX
is telling the assembler using a segment override that the "xxx" variable is located in the CS segment and not in the DS segment, correct? Also, why are you dereferencing "xxx" using brackets, wouldn't it have been the same if you didn't use the brackets?
MOV CS:xxx, EAX
The only straight up question I have on this section of code is I thought the assembler would give an error if you tried to access a variable that has not been declared yet (ie. "XXX").
Thanks again. :thumbu