the stack

Kyoy · December 24, 2004, 05:50:23 AM

This post is more directed towards hutch as it's his codes i am using now. But because i can't find that old thread after the forum upgrade, i am making a new thread.

It was a password program and the password is "win32asm", making use of the stack to push out 4 character strings. It is something like this

Code Select


push 0
push "3msa2"
push "3niw"
mov esi, esp            
mov lpstring, input("Password is : ")    
invoke szCmp,esi,lpstring

What puzzled me is because if i am not wrong, esp will always store the offset pointing to the last push (dword)
Which will be the offset pointing to "3niw"
Which is then copied to esi (mov esi,esp)
So wont invoke szCmp,esi,lpstring
be comparing "win3" with the user's input rather than the whole string "win32asm"

because the offset is only pointing to the most recently dword value pushed onto the stack.

donkey · December 24, 2004, 05:53:49 AM

szCmp is a NULL terminated string function so it will compare bytes until it reaches one that is 0. So in the case above the value it sees is "win32asm3",0.

hutch-- · December 24, 2004, 06:05:50 AM

Kyoy,

The trick to remember is that data is pushed onto the stack backwards to the way its written and each push changes the stack pointer location. (ESP).

Code Select


push 0      ; stack at ESP = 0000
push "msa2" ; stack at ESP = 2asm0000
push "3niw" : stack at ESP = win32asm0000

It loks a bit confusing but its because MASM uses the old convention of writing BYTE data for larger data sizes in reverse.

Now when you read the address at ESP it is "win32asm" with a following zero which terminates the string. "push 0" in this context pushes 4 bytes like the next 2 pushes.

Just make sure you balance the stack with the same number of pops or if you understand it by correcting ESP.

Kyoy · December 24, 2004, 09:30:10 AM

Ok thanks. Which brings me to the next question. I see people using ret 8 or ret 12 sometimes in their codes. I know that is adding 8 or 12 to esp after return .. But i dont really know when to apply it, like why use it if you can use pop simply?

In this case where there are 3 pushes, can i use ret 12?

hutch-- · December 24, 2004, 09:36:33 AM

ret (number) is return AND pop that many bytes, its not what you need here. I always mix these two up but you can manually change ESP to balance the stack, I think its add esp, 12 in this instance.

For simple code while you are learning, the 3 pops are safe.

In more advanced code where you set up your own stack parameters, you use ret (number) to both return to the caller AND balance the stack.

donkey · December 24, 2004, 09:44:25 AM

Generally, you would not add a number after RET as the assembler does that for you. However there are times when you wish to control the CALL/RET combination more directly and want to preserve part of the stack after the procedure returns. Some assemblers do not allow you to simply RET 8, this is because RET is actually a macro that calculates the number of bytes pushed onto the stack and appends that number to the RET opcode. In most assemblers you can over-ride this behaviour by using RETN instead,for example RETN 8. Concerning POP, that would work but would require multiple read/write operations in order to perform the same task for example RET 12 would require 3 POPs and simply waste time. If you are no longer concerned about the data on the stack, ie it's a local parameter that can be discarded you would normally just directly add the number of bytes to ESP, which is what RET does internally...

Code Select

CALL SomeProc, Param1, Param2 ; <<< Pushes 8 bytes then the return address onto the stack

SomeProc Param1, Param2
LOCAL Hello :DWORD

RET ; encodes as RET 8, at that point the return address is at ESP
SomeProc ENDP

In this case RET 8 will add 8 to ESP, making it point to the place where the return address was pushed then it will POP the return address off the stack and JMP to that address. This is all done internally by the processor, no need to worry about it.

Kyoy · December 24, 2004, 03:00:50 PM

One more question. I know i asked this before but i really can't understand when the little endian theory is applicable.

Take for instance

Code Select


.data
prompt1 BYTE 10h,20h,30h,40h
.code
main PROC
mov ebx,OFFSET prompt1
mov eax,[ebx]
exit
main ENDP

eax is 40302010 according to my debugger.
This is the effect of little endian right?
So i am assuming 40h which is the lsb should be stored at the offset and [ebx+3] would store the msb 10h.
But when i tried

Code Select


mov al,[ebx+3]

40h and not 10h is stored in AL after i checked my debugger. [ebx] is actually the one storing 10h

Why?! When should we actually care about little endian?

MichaelW · December 24, 2004, 04:44:11 PM

I can never recall exactly what the term "little endian" means. You need to be aware of how data is stored in memory whenever you directly access any sub-unit of the data's fundamental type. The fundamental data types for recent x86 processors are byte, word, doubleword, and quadword. For the word, doubleword, and quadword types, the least significant byte/word/doubleword is the one that contains bit0, and it is stored at the lower address.

donkey · December 24, 2004, 06:03:31 PM

Little Endian from the standpoint of numeric data makes good sense, it starts to fall apart when you attempt to manipulate strings using numeric opcodes. For numeric data it ensures that the number can be sliced up nicely without changing the address. For example the number 01020304h...

Stored as 00000000 > 04,03,02,01

When you want to access a byte you still get it at offset 0, a WORD is also offset 0 as is the DWORD. Generally when you store something as a DWORD and need it converted to a WORD value you want to truncate it's high order WORD. To do this in Big Endian you would have to increment the addres by 2 then take the WORD, with Little Endian you simply take the WORD from the same address as the DWORD.

Ratch · December 24, 2004, 06:14:52 PM

Kyoy,
Little endian is NOT a theory, it is a hardware method. As far as the current Intel processors are concerned, they store in memory the least significant byte at the lowest memory address. That can drive you nuts when you look at a dump, because you are used to seeing the most significant byte of the number on the left, not the right. I have no idea why Intel thinks the hardware is better off doing it that way. Be thankful that those elves at Intel didn't reverse the bits within the byte, or keep that bassackwards format in their registers too. Now the folks at Motorola 6800 and 68000 did it right: they use the big endian format for their hardware. The INTEL SWAP instruction was designed to make wrong things right; look it up. MASM and other INTEL compatible assemblers know of this quirk and automatically reverse the bytes when you do a DW,DD or DQ assembler command. If you build a larger word like a quarter word from DW or DD, you better reverse the order. Same goes for stack work. Ignore the order of PUSH and POP at your peril. Ratch

MANT · December 24, 2004, 06:22:45 PM

Also beware that some assemblers handle four-byte strings differently. In MASM you'd PUSH "3niw", for instance, but I believe in FASM (and possibly some other assemblers) you'd PUSH "win3". The exact same thing is happening at the hardware level; it's just that FASM is reversing the bytes for you to make it easier for us humans to read & write the code. It can definitely get confusing sometimes!

donkey · December 24, 2004, 07:10:35 PM

In GoAsm, pushing a quoted string will result in a pointer to that string being pushed. For example push "Win3" would result in "Win3",0 being added to the data section and a pointer being pushed. This behaviour was added to facilitate inline quotes in the invoke statement. It also works with Unicode, push L"Win32". MOV operates as expected moving the 4 bytes into whatever you choose but the string is reversed for you so you would use mov eax,"Win3"

Kyoy · December 25, 2004, 12:19:53 AM

Thanks guys, that was very informative :U

But my original question

Quote
One more question. I know i asked this before but i really can't understand when the little endian theory is applicable.

Take for instance

Code Select Expand
.data prompt1 BYTE 10h,20h,30h,40h .code main PROC mov ebx,OFFSET prompt1 mov eax,[ebx] exit main ENDP
eax is 40302010 according to my debugger.
This is the effect of little endian right?
So i am assuming 40h which is the lsb should be stored at the offset and [ebx+3] would store the msb 10h.
But when i tried

Code Select Expand
mov al,[ebx+3]
40h and not 10h is stored in AL after i checked my debugger. [ebx] is actually the one storing 10h

I still don't really understand why i got 40h instead of 10h when accessing [ebx]?
?

MichaelW · December 25, 2004, 12:53:13 AM

If you are accessing the bytes 10h, 20h, 30h, and 40h as a DWORD, then the least significant byte is 10h. Multi-byte numeric data types are stored with the least significant byte/word/dword at the lowest address. If you had defined the data as DD 10203040h, it would have been stored as 40h, 30h 20h, 10h. MASM expects numeric values to be specified in the "normal" order with the most significant digit first, and number to string conversion routines normally put the digits in this same order.

Quote from: donkeyWhen you want to access a byte you still get it at offset 0, a WORD is also offset 0 as is the DWORD.

Yes, I had forgotten the reason behind this design detail. So to alter my statement:

You need to be aware of how data is stored in memory whenever you directly access any sub-unit of a data item's fundamental type, other than the first sub-unit which will always be stored at the data item's address.

Kyoy · December 25, 2004, 03:24:01 AM

When i declare them as dword

Code Select


prompt1 DWORD 10203040H

It is stored as 10h 20h 30h 40h
Where [ebx+3] is 10h
If it is stored as 10203040, why wont mov al, [ebx+3] give me 40h

When i declare them as bytes

Code Select


prompt1 BYTE 10h,20h,30h,40h

It is stored as 40h 30h 20h 10h
Where [ebx+3] is 40h
If it is stored as 40302010, why wont mov al,[ebx+3] give me 10h

This is why i am confused.

Can someone just explain to me in simpler terms why or show me examples. I will really appreciate. Sorry, i am a newbie to programming itself :( and i am just struggling to understand the concept but some of the terminologies used only confused me further. Thanks again.

News:

the stack

Kyoy

Kyoy

Kyoy

Ratch

MANT

Kyoy

Kyoy