how to insert opcodes directly into .code segment

allynm · June 20, 2011, 06:26:00 PM

Hello everyone,

Yet another novice question....

From time to time the experts in this forum appear to insert the byte representation of an assembled instruction directly into the .code segment in contrast to the normal operation, destination, source syntax. Unassembly in debuggers frequently shows lines such as "db xxxx". As a proud and persistent novice I would like to understand what the rules are regarding direct insertion of instruction bytes in the text. Specifically, what are the syntactic requirements for writing instructions in this style? I assume that a one-byte instruction would be preceded by a "db", a two-byte instruction by "dw", a 4-byte by "dd", but what about a 3-byte instruction, or a 5-byte instruction--? I've fooled around quite a bit trying to make them work and deducing from the program's action what the rules are. Unsuccessful. The program either doesn't assemble, or if it does, then it doesn't run correctly.

A related question is this. Are there opcode generators around that convert a given assembly instruction into its bytes. I know the simplest way is to just look at the pgrm listing, but there must be other -- perhaps interactive-- tools that would take an arbitrary console string, e. g. "mov ax, 6", and report its instruction bytes?

Thanks,
Mark Allyn

qWord · June 20, 2011, 06:32:00 PM

Quote from: allynm on June 20, 2011, 06:26:00 PMAre there opcode generators around that convert a given assembly instruction into its bytes.

yes: e.g. ml.exe,jwasm.exe,... :U
For details on instruction encoding see Intel's or AMD's manuals.
(BTW: OllyDbg has a 'interactiv' assembler)

jj2007 · June 20, 2011, 06:44:29 PM

Let's take an example from Olly:
0040101F ³. DB0C24 fisttp dword ptr [local.0]

You can simply write this sequence as db 0DBh, 0Ch, 24h
If it was a four-byte instruction, you could use the dd syntax, but check the endianness question...

mineiro · June 20, 2011, 10:14:20 PM

Quote from: allynm on June 20, 2011, 06:26:00 PM
Are there opcode generators around that convert a given assembly instruction into its bytes. I know the simplest way is to just look at the pgrm listing, but there must be other -- perhaps interactive-- tools that would take an arbitrary console string, e. g. "mov ax, 6", and report its instruction bytes?

Well, example below cannot be good, but follows your idea. You need pay attention if you are dealing with "offset"(relocation?), and optimizations of assembler...

Code Select

.586
.model flat,stdcall
option casemap:none
include \masm32\include\user32.inc
include \masm32\include\kernel32.inc

includelib \masm32\lib\user32.lib
includelib \masm32\lib\kernel32.lib

.data
buffer db 64 dup (?)

.code
start	proc uses esi edi ebx;:
jmp	front
;----------
ptr_1:
;put here the mnemonic to get the opcode
.if al >="9"
	add al,7
.endif
sz_1 = $ - ptr_1
;----------

front:
;convert to ascii hex
lea esi,ptr_1
lea edi,buffer
mov ecx,sz_1
again:
sub eax,eax
lodsb
shl ax,4
shr al,4
add ax,"00"
.if ah >"9"
	add ah,7
.endif
@@:
.if al > "9"
	add al,7
.endif
@@:
ror ax,8
stosb
rol ax,8
stosb
loop again

invoke MessageBoxA,0,addr buffer,addr buffer,0
invoke ExitProcess,0
start endp
end start

allynm · June 20, 2011, 10:28:21 PM

Hi Mineiro-

WOW! That is exactly what I had in mind! Thanks very much indeed.

Mark

mineiro · June 20, 2011, 10:35:37 PM

Sr allynm , I have updated the code above, that one previous have a little fault.
regards.

jj2007 · June 20, 2011, 10:44:38 PM

Here is another snippet - not sure if you can create 16-bit opcodes that way, though.

Quoteinclude \masm32\MasmBasic\MasmBasic.inc   ; download
   Init
   Open "O", #1, "Opcodes.hex"         ; we write to a file
   Print #1, "TheOps", Tb$, "db "         ; some decoration
   mov esi, op_start
   .Repeat
      movsx ecx, byte ptr [esi]         ; get a single byte
      Let edi=Right$(Hex$(ecx), 2)+"h"   ; add the trailing h
      .if byte ptr [edi]>"9"            ; check if we need a leading zero
         Print #1, "0"
      .endif
      Print #1, edi
      inc esi
      .if esi<op_end
         Print #1, ", "               ; if it's not the last byte, we need a comma
      .endif
   .Until esi>=op_end
   Close
   Launch "NotePad.exe Opcodes.hex"   ; let's have a look
   Exit
op_start:   movaps xmm0, [edi]      ; here is the
      fimul dword ptr [edi]      ; opcodes zone
      nops 3
op_end:

end start

Output:
TheOps db 0Fh, 28h, 07h, 0DAh, 0Fh, 90h, 90h, 90h

dedndave · June 20, 2011, 10:45:39 PM

back in the 16-bit days, i used to write a lot of self-modifying code
memory space was a big issue, and you could make code smaller and sometimes faster

Code Select

        db      0B8h          ;mov eax,nnnnnnnn
Oprand  dd      0
;
;
;
;
;
        mov     Oprand,0FFFFFFFFh

or

Code Select

        mov     eax,0
Oprand  LABEL   DWORD
;
;
;
;
;
        mov     Oprand-4,0FFFFFFFFh

you had to make sure that the modified code was not in the pre-fetch queue when modified
not too hard, back then - you had to get the hang of it, though :P

nowdays, things are very different
not sure there is much advantage in doing it anymore
the code segments are protected - you need to set the attribute to read/write/execute (VirtualProtect)
not so sure about pre-fetched instructions - they are cached now - very different
someone mentioned that this isn't a problem, because the caching mechanism trashes modified blocks for you
memory space is not nearly as much an issue, either

MichaelW · June 21, 2011, 05:17:15 AM

My tests have shown a very large penalty (>300 cycles for a P3) for self-modifying code, just as Agner Fog predicted in his optimizing_assembly PDF.

hutch-- · June 21, 2011, 11:31:29 AM

You can use JWASM to generate the direct binary output from normal assembler code but you cannot use external sources or API functions. Somewhere in the MASM32 subforum I posted a toy that encapsulates JWASM just to perform this task. it was called CODE2DB.EXE.

allynm · June 21, 2011, 04:55:11 PM

Hello Hutch,

I looked high and low for CODE2DB.EXE on the forum and on GOOGLE but had no luck. I did download JWASM and played with it a bit. JWASM has a bunch of options that are terrific, but I was looking for one that might be employed in the manner you imply in your response to my post. Can you recall any more details about how you wrote the CODE2DB. I'm quite willing to try to duplicate your toy!

The method that MINIERO used above is functionally exactly what I had in mind. I haven't tried out yet what JJ sent, but it looks intriguing. Qword's point about using assemblers to produce the code bytes is humorous, of course, but I can't figure out hjow to force them to generate just a string of bytes for a specific instuction. What I'd like would be some kind of shell macro or .bat that would read the instruction as a parameter, then call the assembler, and then somehow extract the bytes from the assembler or the listing file and then print them. I was guessing that something like this was the approach you took in CODE2DB.

Regards,
Mark

qWord · June 21, 2011, 06:51:14 PM

hi,
here an quick tool using jwasm:

Code Select

C:\Users\xyz\asm\>asm "mov eax,DS:[-1]"
db 0A1h,0FFh,0FFh,0FFh,0FFh

allynm · June 21, 2011, 08:40:13 PM

Hi qWord:

Blimey! You have done it. Bravo! I hope others will find this program helpful. I looked the code over, and I must say it is extremely clever, way beyond my humble powers.

Thanks.

Mark

hutch-- · June 22, 2011, 01:14:46 AM

Try this.

Type code something like,

Code Select


    mov eax, [esp+4]
    sub eax, 1
  lbl0:
    add eax, 1
    cmp BYTE PTR [eax], 0
    jne lbl0
    sub eax, [esp+4]
    ret 4

into the edit window then click the gear button on the toolbar.

You get output like this.

Code Select


    db 139,68,36,4,131,232,1,131,192,1,128,56,0,117,248,43
    db 68,36,4,194,4,0

clive · June 22, 2011, 02:17:53 AM

You could also use what is commonly called a "line assembler", used by monitors and debuggers (DDT, SID, DEBUG, SoftICE, et al). These are much simpler to write, and cruder, than a full blown assembler.

How to you generate the codes? You basically have to understand the x86 encoding schemes, you can use a regular assembler to generate the code, or something close to what you want, and then morph that in to what you need, or make a hybrid.

The classic reason to use DB, DW, DD directly in the code segment is to address errors or support issues with the assembler. Say you wanted to support MMX or SSE instructions before such an assembler was released. Or if you wanted to use an alternate form of instruction (long, non-optimal, atypical) that the assembler won't naturally use, but is never the less a valid x86 encoding (eg PUSH 0).

If you want to add support for a new instruction set, it can often be done with macros, or overloading existing instructions, and back-patching.

There was a thread a while back about pushing floating point constants. Compilers do this, the assembler is less inclined, as it has to split a constant across multiple assembler instructions.

News:

how to insert opcodes directly into .code segment