News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Find end of instruction in asm?

Started by www.:).com, November 07, 2010, 12:28:11 AM

Previous topic - Next topic

theunknownguy

Quote from: redskull on November 07, 2010, 04:19:13 PM
Decoding instructions is not nearly as easy as "the first few bytes determine the size".  Just to give you an idea of how complex it really is, an instruction can have up to four optional prefixes from four different groups of prefixes, even before the opcode itself starts.  After that, different opcodes have different lengths, and different bits within that determines how many bytes follow.  It is not an trivial task, and every hard-and-fast rule has exceptions.

-r

Those prefixes still using 1 byte so its nothing complex... Just with a opcode table you can get fixed opcodes size and with bitfield analyzer from the OpcodeBase + MOD/REG you can get the displacement + SIB byte.

Problem comes for example to get the Inmmed size or with some instructions that doesnt follow the normal encoding.

QuoteI realize that it will not be easy, buy i'm not looking to decode them just isolate them by using their size.
- such as i don't want to know what the instruction is, just the size of it.

Well getting their size is decode them... You cant really think to do this without any information or a deep lecture of the intel encoding...

Here you have an example of getting an opcode size (its not complete ofcourse, its just an snippet, i wont do your homework):


Mov Eax, OpcodeAddr
Xor Ecx, Ecx
.If (Byte Ptr [Eax] == 0Fh)         ;Check 2 byte Opcode Base
  Add Ecx, 1                              ;+1 to the opcode size
.EndIf
Add Ecx, 2                                ;+2 (Opcode Base Byte + MOD/REG byte)
Movzx Edx, Byte Ptr [Eax+1]       ;Get MOD/REG Byte
.If (Ebx < 40h)                          ;Check if it have mem addr operation inside
   And Ebx, 7                             ;Check for SIB byte present
   .If (Ebx == 5)
      Add Ecx, 4                           ;Its literal address DWORD PTR DS:[100000]
   .Elseif (Ebx == 4)
      Add Ecx, 1                           ;+1 to the opcode size (SIB Byte)
      Movzx Ebx, Byte Ptr [Eax+2]    ;Get the SIB byte
     .If (Ebx >= 40H)                ;Check for Scalar
   And Ebx, 7                
   .If (Ebx == 5)         ;[REG32*SCALAR+DISPLACEMENT]
      Add Ecx, 4              ;+4 opcode size
   .EndIf
     .EndIf
   .EndIf
.EndIf


And so on you have to analyze the opcode bytes with their bitfield for have extra information. While you doing this youll realise that some opcode have their own encoding, thats why youll need an opcode table where you can process fixed opcode sizes without analyze it (faster).

Here you have an excelent link that help me with this task of making the opcode table:

http://ref.x86asm.net/geek32.html
     
PS: I use PRE formated text and still my commentaries are not aligned  :(

clive

Well it will certainly take a lot of work to get it to work flawlessly for the 386 through the i7. What exactly is the point of this exercise?
It could be a random act of randomness. Those happen a lot as well.

clive

Quote from: theunknownguy on November 07, 2010, 05:34:50 PM
      PS: I use PRE formated text and still my commentaries are not aligned  :(

Stop using HARD TABS, my guess is that it expands them to 8 space alignment, and you're using something else. Try using a soft tab setting on your editor, with you own indentation preference.
It could be a random act of randomness. Those happen a lot as well.

theunknownguy

Quote from: clive on November 07, 2010, 05:46:59 PM
Well it will certainly take a lot of work to get it to work flawlessly for the 386 through the i7. What exactly is the point of this exercise?

I think he wants to build something with self modify opcodes (not virus - i dont know) probably releated to its thread in DOS...

I have a flawless version of this. Support XMM/FPU/SSE (not SSE4 i havent add support). But cant share its part of my work, so i just give him an snippet of getting an literal address encoding and a scalar one example. (Its better give something for start than nothing)

Quote
Stop using HARD TABS, my guess is that it expands them to 8 space alignment, and you're using something else. Try using a soft tab setting on your editor, with you own indentation preference.

Soft Tab setting?... Any suggestions?

www.:).com

Thank you for the example code(answer for theunknownguy).  :clap:
- the practice of this code is as part of a dual thread handler(answer for clive)

theunknownguy

This is an old proc that i founded years ago on internet:
Credits to: I dont know (wish i could remember the site)  :dazzled:

Local Modrm:Byte
Local Prefix:Byte
Push Esi
Push Edi
Push Ebx
Xor Ebx, Ebx
Xor Eax, Eax
Xor Ecx, Ecx
Mov Prefix, Al ;Clean Prefix lame way
Mov Edx, API ;EDX = pMemory
PrefixLoop:
.If (Eax == 8)
.Else
Mov Esi, Offset OpcodeTable
Mov Cl, Byte Ptr [Edx]
Add Esi, Ecx
.If Byte Ptr [Esi] == 40H
.If Cl == 66H
Mov Prefix, 1H
.ElseIf Cl == 67H
Mov Prefix, 2H
.EndIf
Add Eax, 1H
Add Edx, 1H
Jmp PrefixLoop
.EndIf
Mov Esi, Offset OpcodeTable
Mov Cl, Byte Ptr [Edx]
Add Esi, Ecx
Mov Bl, Byte Ptr [Esi]
.If Cl == 0F6H || Cl == 0F7H
Mov Cl, Byte Ptr [Edx + 1H]
Mov Edi, Ecx
And Edi, 38H
.If Edi != NULL
Mov Ebx, 20H
.EndIf
.EndIf
.If Prefix == 1H
.If Ebx & 10H
Push Eax
Mov Eax, 10H
Not Eax
And Ebx, Eax
Or Ebx, 8H
Pop Eax
.EndIf
.EndIf
.If Prefix == 2H
Push Eax
Mov Eax, 10H
Not Eax
And Ebx, Eax
Mov Eax, 4H
Not Eax
And Ebx, Eax
Or Ebx, 8H
Pop Eax
.EndIf
.If Ebx == 2H
Add Eax, 1H
Add Edx, 1H
Mov Esi, Offset OpcodeTable
Add Esi, 0FFH
Mov Cl, Byte Ptr [Edx]
Add Esi, Ecx
Mov Bl, Byte Ptr [Esi]
.EndIf
.If Ebx == 1H
Add Eax, 1H
.EndIf
.If Ebx == 4H
Add Eax, 2H
.EndIf
.If Ebx == 8H
Add Eax, 3H
.EndIf
.If Ebx == 0CH
Add Eax, 4H
.EndIf
.If Ebx == 10H
Add Eax, 5H
.EndIf
.If Ebx == 18H
Add Eax, 7H
.EndIf
.If EBX & 20h
Add Eax, 2H
Add Edx, 1H
Mov Cl, Byte Ptr [Edx]
Mov Modrm, Cl
Shr Cl, 6H ;Prefix Scalar
.If Cl == NULL
Mov Dl, Modrm
And Dl, 7H
.If Dl == 5H
Add Eax, 4H
.EndIf
.EndIf
.If Cl != 3H
Mov Dl, Modrm
And Dl, 7H
.If Dl == 4H
Add Eax, 1H
.If CL == 1h
Mov Dl, Modrm
And Dl, 7H
.If Dl == 4H
Add Eax, 1H
.EndIf
.EndIf
.If Cl == 2H
Mov Dl, Modrm
And Dl, 7H
.If Dl == 4H
Add Eax, 4H
.EndIf
.EndIf
.Endif
.EndIf
.If Modrm >= 40H && Modrm <= 7FH
Mov Dl, Modrm
And Dl, 7H
.If Dl != 4H
Add Eax, 1H
.EndIf
.EndIf
.If Modrm >= 80H && Modrm <= 0BFH
Mov Dl, Modrm
And Dl, 7H
.If Dl != 4H
Add Eax, 4H
.EndIf
.EndIf
.If Ebx & 10H
Add Eax, 4H
.EndIf
.If Ebx & 8H
Add Eax, 2H
.EndIf
.If Ebx & 4H
Add Eax, 1H
.EndIf
.EndIf
.If Ebx == NULL
Mov Eax, 0FFFFFFFFH
.EndIf
.EndIf
Pop Ebx
Pop Edi
Pop Esi
Ret


I dont have the opcode table, but sure it will help you, i doubt it have XMM or SSE support

clive

Quote from: www.:).com
- the practice of this code is as part of a dual thread handler(answer for clive)

Yeah, I'm convinced you have conceived the most inefficient way of doing that possible, perhaps you could trace/single-step the code and have the CPU work out the instruction length for you. It would be faster.
It could be a random act of randomness. Those happen a lot as well.

theunknownguy

Quote from: clive on November 07, 2010, 06:00:02 PM
Quote from: www.:).com
- the practice of this code is as part of a dual thread handler(answer for clive)

Yeah, I'm convinced you have conceived the most inefficient way of doing that possible, perhaps you could trace/single-step the code and have the CPU work out the instruction length for you. It would be faster.

:lol :lol

Dont blame the guy for trying crazy ideas, at least he will learn alot  :green

dedndave

after he's been writing code for a while, he will know how many bytes each instruction takes   :bg
but, i can understand his curiosity
interesting how we each have a different mode when we are learning   :P

clive

Quote from: theunknownguyI have a flawless version of this. Support XMM/FPU/SSE (not SSE4 i havent add support).

And I have decoders through SSE5/VMX, but I'm pretty sure they aren't flawless or handle some undocumented cases.

QuoteSoft Tab setting?... Any suggestions?

How it indents, with hard tabs (ASCII 9) and with what column settings, or if it expands the TAB key into a number space (ASCII 32) characters to acheive the desired alignment. Most editors or IDEs have options, for assembler I usually use 8, for C 2, but it depends on personal preference, which is why most provide configuration options. Where and how on the tools you are using, I don't know. Assume the forum is using 8

Also if I was decoding instructions lengths, over half the benefit from all the work would be to know what the instruction was. For validation tasks I'd rather have the processor provide the length information, by tracing or other means.
It could be a random act of randomness. Those happen a lot as well.

dedndave

a debugger will do it for you   :bg

clive

Quote from: theunknownguy
Dont blame the guy for trying crazy ideas, at least he will learn alot

This might have been interesting years ago, but the non-linear code processing might stick a spanner in the works. Perhaps modelling how the instructions pair, assign to different execution units, pipeline, hyperthread, stall, select/retire registers, etc might give a clearer view of what's actually happening internally in a single cycle.

Or I could just use two task stacks and switch context between them.
It could be a random act of randomness. Those happen a lot as well.

theunknownguy

Quote from: clive on November 07, 2010, 06:11:13 PM
Quote from: theunknownguyI have a flawless version of this. Support XMM/FPU/SSE (not SSE4 i havent add support).

And I have decoders through SSE5/VMX, but I'm pretty sure they aren't flawless or handle some undocumented cases.

QuoteSoft Tab setting?... Any suggestions?

How it indents, with hard tabs (ASCII 9) and with what column settings, or if it expands the TAB key into a number space (ASCII 32) characters to acheive the desired alignment. Most editors or IDEs have options, for assembler I usually use 8, for C 2, but it depends on personal preference, which is why most provide configuration options. Where and how on the tools you are using, I don't know. Assume the forum is using 8

Also if I was decoding instructions lengths, over half the benefit from all the work would be to know what the instruction was. For validation tasks I'd rather have the processor provide the length information, by tracing or other means.

I havent encounter any flawless till SSE3... I still have to add alot of support meaby ill find problems later  :lol

Still havent found any virus maker that uses SSE5... Barely they use FPU.

Thanks for the TAB tip i will try it  :clap:

Quote
This might have been interesting years ago, but the non-linear code processing might stick a spanner in the works. Perhaps modelling how the instructions pair, assign to different execution units, pipeline, hyperthread, stall, select/retire registers, etc might give a clearer view of what's actually happening internally in a single cycle.

Sounds like instead of go to the NASA training program... just jump into the rocket...

I think in order to get to that point you must learn alot of things, instead of just jump to that area...


dedndave

nahhhh
we are assembly programmers!
we strap in and learn the ropes when we get to the outer atmoshpere   :lol

clive

Quote from: dedndave on November 07, 2010, 06:06:12 PM
after he's been writing code for a while, he will know how many bytes each instruction takes   :bg
but, i can understand his curiosity interesting how we each have a different mode when we are learning   :P

I recognize that, but half the trick with software development is to find the easiest/quickest/efficient way of doing something. And realizing quickly that a particular way of attacking a problem is a time-sink/dead-end.

Counting instruction bytes has no useful bearing on execution speed in 2010.
It could be a random act of randomness. Those happen a lot as well.