Hello everyone,
Yet another novice question....
From time to time the experts in this forum appear to insert the byte representation of an assembled instruction directly into the .code segment in contrast to the normal operation, destination, source syntax. Unassembly in debuggers frequently shows lines such as "db xxxx". As a proud and persistent novice I would like to understand what the rules are regarding direct insertion of instruction bytes in the text. Specifically, what are the syntactic requirements for writing instructions in this style? I assume that a one-byte instruction would be preceded by a "db", a two-byte instruction by "dw", a 4-byte by "dd", but what about a 3-byte instruction, or a 5-byte instruction--? I've fooled around quite a bit trying to make them work and deducing from the program's action what the rules are. Unsuccessful. The program either doesn't assemble, or if it does, then it doesn't run correctly.
A related question is this. Are there opcode generators around that convert a given assembly instruction into its bytes. I know the simplest way is to just look at the pgrm listing, but there must be other -- perhaps interactive-- tools that would take an arbitrary console string, e. g. "mov ax, 6", and report its instruction bytes?
Thanks,
Mark Allyn
Quote from: allynm on June 20, 2011, 06:26:00 PMAre there opcode generators around that convert a given assembly instruction into its bytes.
yes: e.g. ml.exe,jwasm.exe,... :U
For details on instruction encoding see Intel's (http://www.intel.com/products/processor/manuals/) or AMD's (http://developer.amd.com/documentation/guides/pages/default.aspx#manuals) manuals.
(BTW: OllyDbg has a 'interactiv' assembler)
Let's take an example from Olly:
0040101F ³. DB0C24 fisttp dword ptr [local.0]
You can simply write this sequence as db 0DBh, 0Ch, 24h
If it was a four-byte instruction, you could use the dd syntax, but check the endianness question...
Quote from: allynm on June 20, 2011, 06:26:00 PM
Are there opcode generators around that convert a given assembly instruction into its bytes. I know the simplest way is to just look at the pgrm listing, but there must be other -- perhaps interactive-- tools that would take an arbitrary console string, e. g. "mov ax, 6", and report its instruction bytes?
Well, example below cannot be good, but follows your idea. You need pay attention if you are dealing with "offset"(relocation?), and optimizations of assembler...
.586
.model flat,stdcall
option casemap:none
include \masm32\include\user32.inc
include \masm32\include\kernel32.inc
includelib \masm32\lib\user32.lib
includelib \masm32\lib\kernel32.lib
.data
buffer db 64 dup (?)
.code
start proc uses esi edi ebx;:
jmp front
;----------
ptr_1:
;put here the mnemonic to get the opcode
.if al >="9"
add al,7
.endif
sz_1 = $ - ptr_1
;----------
front:
;convert to ascii hex
lea esi,ptr_1
lea edi,buffer
mov ecx,sz_1
again:
sub eax,eax
lodsb
shl ax,4
shr al,4
add ax,"00"
.if ah >"9"
add ah,7
.endif
@@:
.if al > "9"
add al,7
.endif
@@:
ror ax,8
stosb
rol ax,8
stosb
loop again
invoke MessageBoxA,0,addr buffer,addr buffer,0
invoke ExitProcess,0
start endp
end start
Hi Mineiro-
WOW! That is exactly what I had in mind! Thanks very much indeed.
Mark
Sr allynm , I have updated the code above, that one previous have a little fault.
regards.
Here is another snippet - not sure if you can create 16-bit opcodes that way, though.
Quoteinclude \masm32\MasmBasic\MasmBasic.inc ; download (http://www.masm32.com/board/index.php?topic=12460)
Init
Open "O", #1, "Opcodes.hex" ; we write to a file
Print #1, "TheOps", Tb$, "db " ; some decoration
mov esi, op_start
.Repeat
movsx ecx, byte ptr [esi] ; get a single byte
Let edi=Right$(Hex$(ecx), 2)+"h" ; add the trailing h
.if byte ptr [edi]>"9" ; check if we need a leading zero
Print #1, "0"
.endif
Print #1, edi
inc esi
.if esi<op_end
Print #1, ", " ; if it's not the last byte, we need a comma
.endif
.Until esi>=op_end
Close
Launch "NotePad.exe Opcodes.hex" ; let's have a look
Exit
op_start: movaps xmm0, [edi] ; here is the
fimul dword ptr [edi] ; opcodes zone
nops 3
op_end:
end start
Output:
TheOps
db 0Fh, 28h, 07h, 0DAh, 0Fh, 90h, 90h, 90h
back in the 16-bit days, i used to write a lot of self-modifying code
memory space was a big issue, and you could make code smaller and sometimes faster
db 0B8h ;mov eax,nnnnnnnn
Oprand dd 0
;
;
;
;
;
mov Oprand,0FFFFFFFFh
or
mov eax,0
Oprand LABEL DWORD
;
;
;
;
;
mov Oprand-4,0FFFFFFFFh
you had to make sure that the modified code was not in the pre-fetch queue when modified
not too hard, back then - you had to get the hang of it, though :P
nowdays, things are very different
not sure there is much advantage in doing it anymore
the code segments are protected - you need to set the attribute to read/write/execute (VirtualProtect)
not so sure about pre-fetched instructions - they are cached now - very different
someone mentioned that this isn't a problem, because the caching mechanism trashes modified blocks for you
memory space is not nearly as much an issue, either
My tests have shown a very large penalty (>300 cycles for a P3) for self-modifying code, just as Agner Fog predicted in his optimizing_assembly PDF.
You can use JWASM to generate the direct binary output from normal assembler code but you cannot use external sources or API functions. Somewhere in the MASM32 subforum I posted a toy that encapsulates JWASM just to perform this task. it was called CODE2DB.EXE.
Hello Hutch,
I looked high and low for CODE2DB.EXE on the forum and on GOOGLE but had no luck. I did download JWASM and played with it a bit. JWASM has a bunch of options that are terrific, but I was looking for one that might be employed in the manner you imply in your response to my post. Can you recall any more details about how you wrote the CODE2DB. I'm quite willing to try to duplicate your toy!
The method that MINIERO used above is functionally exactly what I had in mind. I haven't tried out yet what JJ sent, but it looks intriguing. Qword's point about using assemblers to produce the code bytes is humorous, of course, but I can't figure out hjow to force them to generate just a string of bytes for a specific instuction. What I'd like would be some kind of shell macro or .bat that would read the instruction as a parameter, then call the assembler, and then somehow extract the bytes from the assembler or the listing file and then print them. I was guessing that something like this was the approach you took in CODE2DB.
Regards,
Mark
hi,
here an quick tool using jwasm:
C:\Users\xyz\asm\>asm "mov eax,DS:[-1]"
db 0A1h,0FFh,0FFh,0FFh,0FFh
Hi qWord:
Blimey! You have done it. Bravo! I hope others will find this program helpful. I looked the code over, and I must say it is extremely clever, way beyond my humble powers.
Thanks.
Mark
Try this.
Type code something like,
mov eax, [esp+4]
sub eax, 1
lbl0:
add eax, 1
cmp BYTE PTR [eax], 0
jne lbl0
sub eax, [esp+4]
ret 4
into the edit window then click the gear button on the toolbar.
You get output like this.
db 139,68,36,4,131,232,1,131,192,1,128,56,0,117,248,43
db 68,36,4,194,4,0
You could also use what is commonly called a "line assembler", used by monitors and debuggers (DDT, SID, DEBUG, SoftICE, et al). These are much simpler to write, and cruder, than a full blown assembler.
How to you generate the codes? You basically have to understand the x86 encoding schemes, you can use a regular assembler to generate the code, or something close to what you want, and then morph that in to what you need, or make a hybrid.
The classic reason to use DB, DW, DD directly in the code segment is to address errors or support issues with the assembler. Say you wanted to support MMX or SSE instructions before such an assembler was released. Or if you wanted to use an alternate form of instruction (long, non-optimal, atypical) that the assembler won't naturally use, but is never the less a valid x86 encoding (eg PUSH 0).
If you want to add support for a new instruction set, it can often be done with macros, or overloading existing instructions, and back-patching.
There was a thread a while back about pushing floating point constants. Compilers do this, the assembler is less inclined, as it has to split a constant across multiple assembler instructions.
Quote from: hutch-- on June 22, 2011, 01:14:46 AM
Try this.
Really cute, but 192k and no source... :naughty:
Apart from encapsulating JWASM which it writes to disk, the source may be too BASIC for you. I wrote the tool in PB. To satisfy the author, JWASM was built from its source code with no modifications to the code then the binary was modified in the PE header using EDITBIN to convert it to a GUI application and thus, no console displayed. It reads the output log file to test if it worked correctly and converts the output binary to DB format if it was successful.
Hi Hutch,
JJ is right. Very cute indeed. Out of curiosity, why did you write the output as decimal numbers rather than hex? What is really cool about your "toy" is that it accepts multiple lines of assembly code.
Thanks for sharing this!
Mark
Mark,
The decimal format takes up less room than hex and its a consideration when you convert large amounts of data. i have used the tool from time to time and it works fine, you can do much the same directly from MASM if you put a label at the beginning and end of a procedure them write that binary data to disk, then you just convert the binary data to whatever format you like.
Good morning, Hutch.
Thanks for the explanation on decimal/hex decision.
Please forgive my naivete. I'm trying to understand what you mean when you write:
Quoteyou can do much the same directly from MASM if you put a label at the beginning and end of a procedure them write that binary data to disk, then you just convert the binary data to whatever format you like
In thinking this technique thru--it interests me-- what I understand you to mean is that an arbitrary label like BEGIN:,,,,, STOP: would be inserted at the beginning and end of the proc. Then the proc would be assembled to, e.g. a COFF or OMF file. Then some kind of OMF2HEX or COFF2HEX (or similar 2OTHER) program would convert the COFF or OMF format to something recognizable as ASCII hex or what have you. Am I getting this right?
If so, can you suggest an OMF or COFF converter program? I've GOOGLED around to find one and am not successful. I know there is Agner Fog's OBJCONV program (I've used it) but it provides much more than just a simple conversion into hex strings.
Thanks for all your help on this project. And thanks as well to Mineiro and JJ2007.
Mark
Quote from: allynm on June 23, 2011, 03:59:58 PM
In thinking this technique thru--it interests me-- what I understand you to mean is that an arbitrary label like BEGIN:,,,,, STOP: would be inserted at the beginning and end of the proc.
See Reply #6, op_start: and op_end: labels. The proggie simply reads what is between the labels, and translates it to hex values. This works for opcodes (which is what you want anyway) but I have not tested it for code that is subject to relocation.
Erol has a program named bin2coff.exe...
http://www.vortex.masmcode.com/
direct d/l link...
http://www.vortex.masmcode.com/files/bin2coff10.zip
you provide it with a filename (binary data) and a label name to be used as a public symbol
Hi 'Dave
Thanks. I suppose I should have known Vortex or MichaelW would have done something like this.
Regards,
Marlk
Edgar (donkey) may have something, too :bg
Mark,
If you want to perform this task manually you write your own proc in MASM then put a label at both the start and finish of the proc. hat gives you both the starting address and with a little arithmetic, the length in bytes.After you assemble the source code you then write the opcodes between the 2 labels directly to memory and convert it to either HEX or decimal output notation. The tool I posted uses JWASM to write a raw binary file from the opcodes you write into it which involves writing it to disk then converting it to the notation you require.
yah - as always, the more information we have to work with, the better the advice will be :P
if we knew what you were trying to accomplish, you will get the best answers :U
Hi Hutch & 'dave
Hutch - thanks for expanding on your earlier post. It was very helpful.
'dave -
Quoteif we knew what you were trying to accomplish, you will get the best answers
Assuming that this was directed towards me, and not Hutch, all I can say is: as far as I can tell, everyone -- mineiro, jj, and Hutch-- understood exactly what I was seeking and each one came up with a different solution that works.
The only problem I ran into was getting BIN2COFF.exe to work. When I run the program on a test .asm file, it produces an odd-looking "hybrid" .obj that doesn't link successfully because the linker can't find mainCRTstartup. I looked around on google to see if others had similar experiences but without success. Maybe I don't understand what is meant by the phrase "any binary file" in the documentation for the program where the input file is described.
So, where I am now is trying out JJ's basic approach and following Hutch's explanation on how to use labels.
Thanks, everyone.
Mark
One more for the road - suitable also for 16-bit code:
1. Insert 4-byte "label text" in your code, e.g.
Ciao and
Bye#, protected by a jmp:
Quote.Model small ; credits to DednDave (http://www.masm32.com/board/index.php?topic=12621.msg97236#msg97236)
.Stack 512
.686
.Data
MsgText db "Hello 16-bit World", 13, 10, 10
db "Hit any key to quit ...", 13, 10, "$"
.Code
_main proc FAR
; set the DS register to DGROUP (will fail with some Masm versions - use ml 6.15 or higher, or JWasm)
mov ax, @DATA
mov ds, ax
; display the message
mov dx, offset MsgText
mov ah, 9
int 21h
jmp @F
db "Ciao"
push 123
fild word ptr [esp]
fldpi
fmulp
fistp word ptr [esp]
pop ax
db "Bye#"
@@:
; wait for a key
mov ah, 0
int 16h
; the DOS equivalent to ExitProcess
mov ax, 4C00h
int 21h
_main endp
end _main
2. Track those labels down:
Quoteinclude \masm32\MasmBasic\MasmBasic.inc ; download (http://www.masm32.com/board/index.php?topic=12460)
Init
Let esi=FileRead$("Ciao16.exe")
mov ecx, LastFileSize ; MasmBasic knows that you need to know the size of this file ;-)
.Repeat
dec ecx
mov eax, [esi+ecx]
.Until Sign? || eax==Mirror$("Bye#")
.if !Sign?
lea ebx, [esi+ecx] ; end address
.Repeat
dec ecx
mov eax, [esi+ecx]
.Until Sign? || eax==Mirror$("Ciao")
.if !Sign?
lea esi, [esi+ecx+4] ; start address: first byte behind "Ciao"
Open "O", #1, "Opcodes.hex" ; we write to a file
Print #1, "TheOps", Tb$, "db " ; some decoration
.Repeat
movsx ecx, byte ptr [esi] ; get a single byte
Let edi=Right$(Hex$(ecx), 2)+"h" ; add the trailing h
.if byte ptr [edi]>"9" ; check if we need a leading zero
Print #1, "0"
.endif
Print #1, edi
inc esi
.if esi<ebx
Print #1, ", " ; if it's not the last byte, we need a comma
.endif
.Until esi>=ebx
Close
Launch "NotePad.exe Opcodes.hex" ; let's have a look
.else
Inkey "No start label found"
.endif
.else
Inkey "No end label found"
.endif
Exit
end start
Oh... and by the way, the approach might look a bit basic, but it's pure Masm :bg
Hi jj-
I just had an opportunity to try out your first MasmBasic version. Works fine for me! I don't know BASIC, which is a bit limiting, I imagine. It would be nice to be able to see the Notepad output arrayed in rows corresponding to the specific asm instructions. But, that's just a quibble.
I'll try the second version later today. I kind of flip back and forth between 16-bit and 32-bit programs (confuses the daylights out of me, by the way) so something that works with 16 bits is very handy.
I'm still trying to figure out Hutch's method. Haven't wasted my time in this pursuit because it has helped me learn some stuff about reading coff files I didn't know.
Regards,
Mark
hiya Mark
here is an example of what Hutch is talking about...
Hi again JJ and 'dave
JJ: Having spent the afternoon perusing masmbasic.inc I now see why I was really grossly misunderstanding the meaning of "basic" in your MasmBasic package. I believe I owe you an apology! If you have a moment or can direct me to a relevant link I would like to understand better the origin of MASMBASIC and how to use its many macros! Is there a primer/tute around that documents them?
'dave: As always: Thanks. I sometimes wonder why you and the other experts assembled here are willing to put up with my naivete.
Regards,
Mark
:bg
thanks, but i have much to learn before i am in the same zip-code as "expert" :P
Quote from: allynm on June 26, 2011, 09:19:07 PM
I would like to understand better the origin of MASMBASIC and how to use its many macros! Is there a primer/tute around that documents them?
Mark,
Thanks for the flowers :bg
The origin is that I needed a 32-bit dialect that resembles 16-bit GfaBasic. The tutor is \masm32\MasmBasic\
MbGuide.rtf, which can be opened in WordPad or \masm32\RichMasm\RichMasm.exe. Let me know if you need more.
Hi 'dave,
I'm still studying the code you sent in your zipfile. It's short but extremely subtle for a beginner like me. The code ran fine, btw. I have two questions:
In the MAIN_PROC why do you push eax and pop edx after the call to VirtualProtect? What is happening that requires these instructions. I can see you are using esp for its address to the code segment, but I can't see why this requires the push/pop.
Thanks,
Mark
the VirtualProtect function wants to write a dword to memory that represents the previous attribute for the block
we have no need of that value, so it's "discardable"
you could do it this way...
PrevAttr dd ?
;
;
;
INVOKE VirtualProtect,Dest,1024,PAGE_EXECUTE_READWRITE,offset PrevAttr
that method wastes about 6 bytes, because we do not intend to restore the previous setting :P
as for PUSH EAX and POP ECX - any register will do - i just like to keep the stack balanced
push eax
INVOKE VirtualProtect,Dest,1024,PAGE_EXECUTE_READWRITE, esp
pop edx
Old elegant trick: The last para is out PDWORD lpflOldProtect
Dave creates a dword variable with push eax, .i.e. the stack decreases by 4, then esp = lpflOldProtect is being pushed as first para by the invoke macro; when that is finished, you just pop the return value OldProtect into a register, in this case edx.
it works great when it's the last parameter for INVOKE
each INVOKE parameter that gets pushed changes the ESP register
so, if it is not the last parm, you have to do something like this...
push eax
mov edx,esp
INVOKE SomeFunction,edx,Parm2
pop ecx
things may go nuts if you try to use ESP to point to parm2 :P
INVOKE SomeFunction,esp,Parm2
:naughty:
Hi JJ and 'Dave,
I think I understand what you are both saying. What is confusing to me is the stack balancing bit. I thought that INVOKE took care of cleaning up the stack and keeping it balanced. Did I get this wrong? It's obviously a non-trivial bit of ignorance I am wandering about with. If we don't care about the OldProtect, then why are we popping/pushing anything? Why don't we just go straightaway to INVOKE VirtualProtect, blah, blah, blah, blah? I can see how if you push eax you need to pop to something (why not eax?) after the INVOKE macro gets finished cleaning up to keep it balanced, but since we don't care about OldProtect, then why bother pushing eax in the first place? Maybe I need to get Olly out and watch exactly what's happening.
JJ: Thanks for alerting me to MbGuide--just what I had in mind. Needed it to understand what your program (reference #6, I believe) was doing.
BTW, I already have used JJ's solution to discover that RET is 0C3h in 'Dave's code....very very cool!
Thanks,
Mark
ok - sorry to have confused you, but it's good to learn from, anyways
as i said before, you CAN go straight to INVOKE...
PrevAttr dd ?
;
;
;
INVOKE VirtualProtect,Dest,1024,PAGE_EXECUTE_READWRITE,offset PrevAttr
have a look at the MSDN docs for VirtualProtect
http://msdn.microsoft.com/en-us/library/aa366898%28v=vs.85%29.aspx
as you can see, it requires 4 parameters
the last parameter is defined as follows:
QuotelpflOldProtect [out]
A pointer to a variable that receives the previous access protection value of the first page in the
specified region of pages. If this parameter is NULL or does not point to a valid variable, the function fails.
so, you can create a dword variable and pass the address of it to the function as shown above
however, we do not care what the value is - we have no need to store it for future use, in this case
so, we can save a few bytes by creating a temporary (LOCAL) variable on the stack and pass a pointer to that
if we wanted to, we could use LOCAL to create it
however, that requires the initialization of EBP as a stack frame pointer
it is just a shortcut
we PUSH a dword register onto the stack
that is a single byte instruction and fast, too
it does not matter which register we use - we just want a dword space on the stack
at that point, the stack pointer (ESP) holds the address of the dword space we just created
it's very convenient - we reserved space - and we have a pointer to it in a register
and we did all that with a fast single-byte instruction
when we use INVOKE...
INVOKE SomeFunction,Parm1,Parm2,Parm3
the assembler generates code that actually looks something like this...
push Parm3
push Parm2
push Parm1
call SomeFunction
if we had created a dword in the data segment and pushed the address, that would be a 5-byte PUSH (1 byte instruction, 4-byte address)
but, because the address we want to pass is in register, it is a single byte
INVOKE SomeFunction,Parm1,Parm2,esp
push esp
push Parm2
push Parm1
call SomeFunction
now - you are correct in that the StdCall convention will balance the stack for us
that means that the 3 parameters will be automatically discarded when the function returns
however, we pushed a dword onto the stack ourselves - that needs to be balanced out
so, we pop it into a register - again, a single byte instruction
if we wanted to use the value immediately, we can use the same technique
the value filled in by the function is now in whichever register we elected to POP it into
so, it saves some space in the data segment (4 bytes)
and it saves some space in the code segment (2 bytes)
you can use the same method to retrieve values quickly
i use the technique quite often for WriteFile and ReadFile for the NumberOfBytes parameter
here is another example...
the QueryPerformanceCounter function requires a pointer to a QWORD (8 byte) variable
quite often, we want the value in registers, rather than in some memory location
so - we can use the same deal....
push edx
push eax
INVOKE QueryPerformanceCounter,esp
pop eax
pop edx
now, the qword value is in EDX:EAX :bg
it also comes in handy for GetProcessAffinityMask, as well as several other functions
Hi 'Dave,
Thank you for your detailed explanation of what's going on. The new things you taught me about the stack in this tutorial are very useful, but it was also reassuring to know that INVOKE does what I thought it did. Whew!
I didn't know about the VirtualProtect API until you brought it up. I looked it over on MSDN Library last night but for some reason the discussion there is quite abstact and hard to associate with this particular application (or any other, for that matter).
Regards,
Mark
well - the windows operating system places different access priviledge flags on different sections of the EXE
for example, the CONST section is read-only
the DATA and DATA? sections are read-write, no execute
the CODE section is execute only (or maybe read/execute)
if you violate these "rules" it raises an exception and crashes the program
i am sure you have seen the little Dr Watson windows :P
in order to alter the priviledge level for a specific block of memory, you can use VirtualProtect
we want to write into the CODE section, so we can alter the setting for a block
the access is controlled in "pages", which means if you alter 1 byte of memory,
you alter the priviledge for the entire page that it is in
memory is organized in pages of 4 kB, each
another way to go is to modify the EXE PE header and set the flags for that section
that is messy, and is not as secure - but it can be done
Good morning, 'Dave and JJ2007:
'Dave: OK, I didn't realize that this was what VirtualProtect does. Clear enough, now that you've conneted the dots for me. BTW, looked on OLLY and the .code section for your CopyCode proc is RE. So, if you had chosen to write in the .data block there would be no need for this function.
JJ: In your modified MasmBasic approach you use a jmp@F- @@F pair to go round the code snippet in the MAIN proc whose opcodes you want to read and print. I don't understand why you need to do this jump around the around the ciao-bye labels. I apologize for not having tested your code and observing it on OLLY before asking the question.
Regards,
Mark
Quote from: allynm on June 28, 2011, 12:41:15 PM
JJ: In your modified MasmBasic approach you use a jmp@F- @@F pair to go round the code snippet in the MAIN proc whose opcodes you want to read and print. I don't understand why you need to do this jump around the around the ciao-bye labels.
Mark,
My editor has the bad habit to assemble, link
and run the code when I hit the magic button. So it would run right into db "Ciao", which according to Olly means
imul esp, [ecx+6F], -17AE8496 to the CPU... that's why I prefer to isolate this section with a little jmp :bg
jmp @F
db "Ciao"
fild word ptr [esp]
fldpi
fmulp
fistp word ptr [esp]
db "Bye#"
@@:
JJ-
Thanks. I understand.
Regards,
Mark