News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Find end of instruction in asm?

Started by www.:).com, November 07, 2010, 12:28:11 AM

Previous topic - Next topic

www.:).com

This question relates to some of my other posts, if you would like to know what it is for look at them. I was wondering if in memory were the code is stored(machine code), if there is a symbol or number that tells the CPU where the instruction ends and a new one begins, like an end symbol. I figure that there must be since the CPU has to know were different instructions are, because instructions are different sizes and are not consistent. Also a disassembler has to have some way to detect instruction start and ends to convert them back to assembler.

dedndave

the CPU microcode can determine the size from the first 1 or 2 bytes of the opcode

www.:).com

So depending on the first opcode it can determine the size and were it ends?

oex

Quote from: www.:).com on November 07, 2010, 12:33:54 AM
So depending on the first opcode it can determine the size and were it ends?

Depending on the first few bytes of an opcode it can determine the size and were it ends
We are all of us insane, just to varying degrees and intelligently balanced through networking

http://www.hereford.tv


theunknownguy

The size of an instruction is determined by the encoding of it for example:


- Prefixs Byte             (66, F3, etc) - (Word prefix, REP, Segment, etc)
- OpcodeBase Byte 1  (0Fh)
- OpcodeBase Byte 2  (XX)
- MOD/REG Byte         (XX)
- SIB byte                  (XX)
- Displacement           (Byte, Word, Dword)
- Inmmediate             (Byte, Word, Dword)


But eye on that for example not all instructions use the MOD/REG byte, special encodings for the EAX regist uses combined MOD/REG byte inside the OpcodeBase Byte.

Also are alot of exceptions to this rules, for a detailed information check:

http://www.sandpile.org/ia32/opc_enc.htm

Finally i believe you should do this by try and error test. Since there are some opcodes wich doesnt follow the common encoding compared to others, makes you wonder in what was intel thinking...  :U

dedndave

i think all of them can be determined by the first 1 or 2 bytes
that does not include over-ride operators like size, segment, REP, etc

theunknownguy

Quote from: dedndave on November 07, 2010, 01:35:14 AM
i think all of them can be determined by the first 1 or 2 bytes
that does not include over-ride operands like size, segment, REP, etc

Yes, when you mean 1-2 bytes you mean OpcodeBase byte + MOD/REG with those ones you can check if it got displacement, if it have scalar and such on...

With an opcode table, fixed opcodes can be calculated by just checking the first byte (OpcodeBase one). Ofcourse the prefixs are another deal...

I still think intel mess somethings, some opcodes doenst follow this encoding and you need to treath them has "unique" for example ENTER opcode in no place says it have 2 Inmmeds where 1 is a word and another is a byte.
I found out that this is pretty lame and when making the opcode analyzer i have to waste time for check those "special" opcodes.  :snooty:

www.:).com

When you check the first 1-2 bytes does that require knowledge of the processor opcode instructions or will it hold true for all instructions. If so is the an easier way to detect the size of the instruction?

theunknownguy

Quote from: www.:).com on November 07, 2010, 02:05:07 AM
When you check the first 1-2 bytes does that require knowledge of the processor opcode instructions or will it hold true for all instructions. If so is the an easier way to detect the size of the instruction?

No, when you check the first byte instruction, usually you check for bitfields that represent something for example the last bit field of the OpcodeBase byte represent if the operation is 16 bit or 32 bit.

While the 2 bitfield of the OpcodeBase represent the SOURCE/DEST order.

The MOD/REG have inside encoded 1 or 2 regs (depends of the opcode) and also have a representative bitfield for check if opcode have a memory operand and SIB byte.

So on you have to check for get the size of the instruction and no there is no easy way. And no its not easy, some instruction need to be treated has unique and encode is different for them.

www.:).com

So the instructions in memory contained in their first 1-2 bytes of each instruction are the formatting for the rest of the instruction?

dedndave

that's about the size of it   :lol

if you are interested in this stuff, i suggest you have a look at the Intel manuals...

http://website.masm32.com/reference.htm

www.:).com

Ok, I will take a look at the manuals.

redskull

Decoding instructions is not nearly as easy as "the first few bytes determine the size".  Just to give you an idea of how complex it really is, an instruction can have up to four optional prefixes from four different groups of prefixes, even before the opcode itself starts.  After that, different opcodes have different lengths, and different bits within that determines how many bytes follow.  It is not an trivial task, and every hard-and-fast rule has exceptions.

-r
Strange women, lying in ponds, distributing swords, is no basis for a system of government

www.:).com

I realize that it will not be easy, buy i'm not looking to decode them just isolate them by using their size.
- such as i don't want to know what the instruction is, just the size of it.