In genetics each "codon" takes up multiple bases, and in assembly each instruction takes up multiple bytes.
IN a reading frame shift mutation, you start at a byte that is in the middle of an instruction, yielding a different protein or different sequence of instructions.
With this technique, it is possible to overlap two procedures on top of each other, making super-small (and hard to RE) code.
Lets make a collection of code snippets which do this.
You mean something like the following?
0040D241 7A03 jpe @C0040D246
@C0040D243:
0040D243 050102 BYTE 005h,001h,002h ; add eax,40D0201
@C0040D246:
0040D246 0D04 BYTE 00Dh,004h ; or eax,4031504
0040D248 150304 BYTE 015h,003h,004h ; adc eax,41D0403
0040D24B 1D04 BYTE 01Dh,004h ; sbb eax,4032504
0040D24D 250304 BYTE 025h,003h,004h ; and eax,12D0403
0040D250 2D01 BYTE 02Dh,001h ; sub eax,4033501
0040D252 350304 BYTE 035h,003h,004h ; xor eax,43D0403
0040D255 3D04 BYTE 03Dh,004h ; cmp eax,403A904
0040D257 A90304 BYTE 0A9h,003h,004h ; test eax,42410403
0040D25A 41 inc ecx
0040D25B 42 inc edx
0040D25C 7BE5 jpo @C0040D243
I actually meant something more like this:
.686
.model flat,stdcall
option casemap:none
include \masm32\include\windows.inc
include \masm32\include\kernel32.inc
include \masm32\include\user32.inc
includelib \masm32\lib\kernel32.lib
includelib \masm32\lib\user32.lib
.code
start:
mov eax,2630594228
add eax,1668571474
push dword ptr 0
push eax
push eax
push dword ptr 0
call MessageBox
push dword ptr 0
call ExitProcess
end start
but with code instead of a string.
It's not exactly what you're thinking of but I recall seeing examples of something along those lines when people are trying to achieve the smallest possible code using things like:
org 100h
mov dx, OFFSET stSomeMessage
mov ah, 9
int 21h
; etc. etc.
add ebx, [cs:0] ;rather than hard-coding a DWORD value for this instruction
regards,
-Brent
Do you mean SMC?
doomsday: what do you mean?
roticv: no. why do you ask?
you mean if I have a PROC that starts with a label
call L1 does one thing, but you have hidden a second PROC in the first PROC
first instruction maybe takes 3 bytes, so when you make a call L1+1 starts execution in the middle of the first instruction
and the proc is carefully constructed so it works different, but doesnt crash or cause GPF
Yes! Hard to explain though, and even harder to write. It is easier hiding a string amongst the instructions.
The point of this thread is: if anyone decides to spend time figuring out even small algorithms which use this technique, then post them here so that we can achieve a level of optimisation no compiler can ever perform.
The biggest problem we'd have to deal with is that x86 opcodes are "self-repairing" really quickly, ie. the two procedures may quickly converge to the same opcodes.
Also, both procedures must be fully relocatable (ie. no absolute offsets), otherwise relocating the code would break at least one of the procedures.
I'm eager to see such a piece of code!!
Very interesting idea AeroASM. Something like :
mov ebx, 32
xor edx, edx
xor ecx, ecx
test edx, edx
je tst_ecx
shl ebx, 6
add ebx, 10000000h
jmp ok
tst_ecx:
jecxz ok
add ebx, 10000000h
ok:
mov [result], ebx
could become:
mov ebx, 32
xor edx, edx
xor ecx, ecx
test edx, edx
je ins_shift+1
ins_shift:
shl ebx, 6
add ebx, 10000000h
mov [result], ebx
I guess this sliding code idea could be useful in fooling disassembles around.
Quote from: AeroASMdoomsday: what do you mean?
You roughly answered it in one of your following posts
QuoteIt is easier hiding a string amongst the instructions.
That in some cases, people trying to write very small code use cs:[address] references for strings or numeric constants if the desired string/constant happens to appear in the code segment as a result of the instructions assembled there. Sorry for not making it clearer in my previous post.
regards,
-Brent
Aero,
Not exactly what you are referring to but the method is a very old one. Jump tables used to be coded that way years ago, especially using the 6809e precursor to the Motorola 680x0. Just an interesting fact. Hiding code is something that many of us have been doing for years, on many processors, but not anything as involved as a procedure. In fact, in all of my years of x86 coding, I never bothered to use procedures until I moved into the WIN32 environment.
Good luck and show us some code someday.
Paul
Quote from: AeroASM on June 29, 2005, 10:18:09 AM
In genetics each "codon" takes up multiple bases, and in assembly each instruction takes up multiple bytes.
IN a reading frame shift mutation, you start at a byte that is in the middle of an instruction, yielding a different protein or different sequence of instructions.
With this technique, it is possible to overlap two procedures on top of each other, making super-small (and hard to RE) code.
Lets make a collection of code snippets which do this.
This trick is as old as the CPU. I remember seeing this trick employed in 8080 code, hiding two-byte instructions inside LXI instructions (which load a 16-bit constant into the HL register pair) throughout CP/M code in the 1970s. It is also one of the more insidious tricks that makes in impossible for an automated disassembler to successfully disassemble certain files. BTW, don't get the impression this trick started with microprocessors. As I said, it's as old as the CPU (well, Von Neumann CPU, to be exact).
Cheers,
Randy Hyde
Quote from: AeroASM on June 29, 2005, 09:01:03 PM
Yes! Hard to explain though, and even harder to write. It is easier hiding a string amongst the instructions.
The point of this thread is: if anyone decides to spend time figuring out even small algorithms which use this technique, then post them here so that we can achieve a level of optimisation no compiler can ever perform.
The main reason I've seen for this trick is to save a few bytes in memory-constrained situations. Typically, the trick was used for some purpose like this:
jmp <indirectly through a jump table>
case0: <two-byte instr>
byte <opcode for a three byte instruction>
case1: <two-byte instr>
byte <opcode for the three-byte instruction>
case2: <two-byte instr>
.
.
.
byte <opcode for the three-byte instruction>
casen: <two-byte instr>
;end of cases.
This allows you to use only one byte of overhead for each case rather than the two or three bytes a branch or jump would normally take. Of course, it is important that the three-byte instruction not do anything bad when interpreting the two-byte opcode as an operand (usually, as in the case of the 8080 LXI instruction, the three-byte instruction loads a constant into 16-bit register that we can afford to trash).
Do keep in mind the flipside to this trick -- it save space at the expense of execution time. If you execute case0, you wind up executing a sequence of <three-byte instruction>s until you get to the end of the list.
Of of the more successful uses of this trick I've seen is to skip over an INC (or DEC) instruction at the beginning of a loop, e.g.,
byte <2 byte instruction opcode>
loopLabel: inc eax
<<code for loop body>>
Jcc loopLabel
This skips over the INC instruction on the first iteration, and executes it thereafter. Sometimes, this is a very handy trick. Of course, a JMP instruction is more readable and maintainable, but it's still a slick trick if saming memory is what you want to do.
Cheers,
Randy Hyde
In this article there are a few byte-saving tricks which also help to deflect reverse engineers. (It's about DOS ASM programming but the tricks can be adapted.)
www.assembly-journal.com/viewarticle.php?id=30&layout=html (http://www.assembly-journal.com/viewarticle.php?id=30&layout=html)
Another way to hide a critical algo in an Win32 program: store it as data in a scrambled state, and, when you want to run it, unscramble it to the stack, which is executable in Win32. Not many debuggers disassemble the stack :green