News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Overlapping Code

Started by AeroASM, June 29, 2005, 10:18:09 AM

Previous topic - Next topic

AeroASM

In genetics each "codon" takes up multiple bases, and in assembly each instruction takes up multiple bytes.
IN a reading frame shift mutation, you start at a byte that is in the middle of an instruction, yielding a different protein or different sequence of instructions.

With this technique, it is possible to overlap two procedures on top of each other, making super-small (and hard to RE) code.

Lets make a collection of code snippets which do this.

MazeGen

You mean something like the following?


0040D241 7A03                           jpe     @C0040D246
                                       @C0040D243:
0040D243 050102                         BYTE   005h,001h,002h   ; add     eax,40D0201
                                       @C0040D246:
0040D246 0D04                           BYTE   00Dh,004h        ; or      eax,4031504
0040D248 150304                         BYTE   015h,003h,004h   ; adc     eax,41D0403
0040D24B 1D04                           BYTE   01Dh,004h        ; sbb     eax,4032504
0040D24D 250304                         BYTE   025h,003h,004h   ; and     eax,12D0403
0040D250 2D01                           BYTE   02Dh,001h        ; sub     eax,4033501
0040D252 350304                         BYTE   035h,003h,004h   ; xor     eax,43D0403
0040D255 3D04                           BYTE   03Dh,004h        ; cmp     eax,403A904
0040D257 A90304                         BYTE   0A9h,003h,004h   ; test    eax,42410403
0040D25A 41                             inc     ecx
0040D25B 42                             inc     edx
0040D25C 7BE5                           jpo     @C0040D243

AeroASM

I actually meant something more like this:

.686
.model flat,stdcall
option casemap:none

include \masm32\include\windows.inc

include \masm32\include\kernel32.inc
include \masm32\include\user32.inc
includelib \masm32\lib\kernel32.lib
includelib \masm32\lib\user32.lib

.code

start:

mov eax,2630594228
add eax,1668571474
push dword ptr 0
push eax
push eax
push dword ptr 0
call MessageBox
push dword ptr 0
call ExitProcess

end start

but with code instead of a string.

doomsday

It's not exactly what you're thinking of but I recall seeing examples of something along those lines when people are trying to achieve the smallest possible code using things like:
org 100h
mov dx, OFFSET stSomeMessage
mov  ah, 9
int 21h
; etc. etc.
add ebx, [cs:0]   ;rather than hard-coding a DWORD value for this instruction


regards,
-Brent

roticv


AeroASM

doomsday: what do you mean?
roticv: no. why do you ask?

daydreamer

you mean if I have a PROC that starts with a label
call L1 does one thing, but you have hidden a second PROC in the first PROC
first instruction maybe takes 3 bytes, so when you make a call L1+1 starts execution in the middle of the first instruction
and the proc is carefully constructed so it works different, but doesnt crash or cause GPF

AeroASM

Yes! Hard to explain though, and even harder to write. It is easier hiding a string amongst the instructions.

The point of this thread is: if anyone decides to spend time figuring out even small algorithms which use this technique, then post them here so that we can achieve a level of optimisation no compiler can ever perform.

chep

The biggest problem we'd have to deal with is that x86 opcodes are "self-repairing" really quickly, ie. the two procedures may quickly converge to the same opcodes.

Also, both procedures must be fully relocatable (ie. no absolute offsets), otherwise relocating the code would break at least one of the procedures.


I'm eager to see such a piece of code!!

arafel

Very interesting idea AeroASM. Something like :

  mov   ebx, 32
  xor   edx, edx
  xor   ecx, ecx

  test  edx, edx
  je    tst_ecx

  shl   ebx, 6
  add   ebx, 10000000h
  jmp   ok

tst_ecx:
  jecxz ok
  add   ebx, 10000000h

ok:
  mov   [result], ebx


could become:

  mov   ebx, 32
  xor   edx, edx
  xor   ecx, ecx

  test  edx, edx
  je    ins_shift+1

ins_shift:
  shl   ebx, 6
  add   ebx, 10000000h
  mov   [result], ebx


I guess this sliding code idea could be useful in fooling disassembles around.

doomsday

Quote from: AeroASMdoomsday: what do you mean?

You roughly answered it in one of your following posts

QuoteIt is easier hiding a string amongst the instructions.

That in some cases, people trying to write very small code use cs:[address] references for strings or numeric constants if the desired string/constant happens to appear in the code segment as a result of the instructions assembled there.  Sorry for not making it clearer in my previous post.

regards,
-Brent

Rifleman

Aero,
Not exactly what you are referring to but the method is a very old one.  Jump tables used to be coded that way years ago, especially using the 6809e precursor to the Motorola 680x0.  Just an interesting fact.  Hiding code is something that many of us have been doing for years, on many processors, but not anything as involved as a procedure.  In fact, in all of my years of x86 coding, I never bothered to use procedures until I moved into the WIN32 environment.

Good luck and show us some code someday.
Paul

Randall Hyde

Quote from: AeroASM on June 29, 2005, 10:18:09 AM
In genetics each "codon" takes up multiple bases, and in assembly each instruction takes up multiple bytes.
IN a reading frame shift mutation, you start at a byte that is in the middle of an instruction, yielding a different protein or different sequence of instructions.

With this technique, it is possible to overlap two procedures on top of each other, making super-small (and hard to RE) code.

Lets make a collection of code snippets which do this.

This trick is as old as the CPU. I remember seeing this trick employed in 8080 code, hiding two-byte instructions inside LXI instructions (which load a 16-bit constant into the HL register pair) throughout CP/M code in the 1970s. It is also one of the more insidious tricks that makes in impossible for an automated disassembler to successfully disassemble certain files.  BTW, don't get the impression this trick started with microprocessors. As I said, it's as old as the CPU (well, Von Neumann CPU, to be exact).
Cheers,
Randy Hyde


Randall Hyde

Quote from: AeroASM on June 29, 2005, 09:01:03 PM
Yes! Hard to explain though, and even harder to write. It is easier hiding a string amongst the instructions.

The point of this thread is: if anyone decides to spend time figuring out even small algorithms which use this technique, then post them here so that we can achieve a level of optimisation no compiler can ever perform.

The main reason I've seen for this trick is to save a few bytes in memory-constrained situations. Typically, the trick was used for some purpose like this:

jmp <indirectly through a jump table>

case0: <two-byte instr>
      byte <opcode for a three byte instruction>
case1: <two-byte instr>
     byte <opcode for the three-byte instruction>
case2: <two-byte instr>
    .
    .
    .
    byte <opcode for the three-byte instruction>
casen: <two-byte instr>

;end of cases.

This allows you to use only one byte of overhead for each case rather than the two or three bytes a branch or jump would normally take. Of course, it is important that the three-byte instruction not do anything bad when interpreting the two-byte opcode as an operand (usually, as in the case of the 8080 LXI instruction, the three-byte instruction loads a constant into 16-bit register that we can afford to trash).

Do keep in mind the flipside to this trick -- it save space at the expense of execution time. If you execute case0, you wind up executing a sequence of <three-byte instruction>s until you get to the end of the list.

Of of the more successful uses of this trick I've seen is to skip over an INC (or DEC) instruction at the beginning of a loop, e.g.,

   byte <2 byte instruction opcode>
loopLabel: inc eax
    <<code for loop body>>
    Jcc loopLabel

This skips over the INC instruction on the first iteration, and executes it thereafter. Sometimes, this is a very handy trick. Of course, a JMP instruction is more readable and maintainable, but it's still a slick trick if saming memory is what you want to do.
Cheers,
Randy Hyde

Larry Hammick

In this article there are a few byte-saving tricks which also help to deflect reverse engineers. (It's about DOS ASM programming but the tricks can be adapted.)
www.assembly-journal.com/viewarticle.php?id=30&layout=html
Another way to hide a critical algo in an Win32 program: store it as data in a scrambled state, and, when you want to run it, unscramble it to the stack, which is executable in Win32. Not many debuggers disassemble the stack  :green