News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Branchless Conditional Code

Started by bozo, November 09, 2007, 10:47:48 PM

Previous topic - Next topic

bozo

Can anyone here point me in the right direction of branchless code examples?
i think it would be useful for various purposes.
some good examples i wrote were lost in hard drive accident.

say conditional jumps, where the length of bytes to jump is stored in a register, rather than as code label

normal conditional test, pretty bad example, off top of my head.

test_result:
    test eax,eax
    je false

true:
   mov eax,12345678h
   jmp end_routine
false:
  mov eax,0abcdefh
end_routine:
  ret


OR

test_result:
    call @get_eip
@get_eip:
    pop ebp

    shl eax,shift_byte      ; multiples of 2, depending on how many bytes required for true condition.
    lea ecx,[ebp + eax + (@get_eip - false)]
    jmp ecx
true:
   mov eax,12345678h
   jmp end_routine
   ; padded bytes            ; depends on number of bytes executed for true condition
false:
  mov eax,0abcdefh
end_routine:
  ret


hopefully might make some sense to someone...i'll try upload real working examples later.
but the only problem is calculating the correct offsets/padding out bytes.

the purpose? optimizations, anti-debug are 2 i can think of.

drizz

Quote from: Kernel_Gaddafi on November 09, 2007, 10:47:48 PM
test_result:
    test eax,eax
    je false

true:
   mov eax,12345678h
   jmp end_routine
false:
  mov eax,0abcdefh
end_routine:
  ret


cmp eax,1
sbb eax,eax
and eax,0abcdefh-12345678h
add eax,12345678h
The truth cannot be learned ... it can only be recognized.

bozo

you gave good answer, drizz - its just that i gave bad example.
still didn't get around to creating a good example yet, didn't have time..

Rockoon

as far as the BRANCHING methodology, this form is superior:

test eax, eax
mov eax, truevalue
jne @@skip
mov eax, falsevalue
@@skip:


only a single branch (here, only branches when true), and with one less instruction it is also smaller ..
When C++ compilers can be coerced to emit rcl and rcr, I *might* consider using one.

Mark Jones

If the jump is normally taken, a small cache improvement can be made in loops by prefixing the conditional jump with a branch hint, i.e.:


T MACRO _jump:VARARG                    ; Branch Hint: Taken
    DB 2Eh
    _jump
ENDM

NT MACRO _jump:VARARG                   ; Branch Hint: Not Taken
    DB 3Eh
    _jump
ENDM

.code
start:
    mov eax,input("Enter a number: ")
    cmp byte ptr [eax+1],10             ; break on pressing enter
NT  jz gb                               ; and exit
    cmp byte ptr [eax],"-"              ; do not allow negative numbers
NT  jz start                            ;
...
T   jl sl                               ; loop if factor less than nVal
    test si,si                          ; were there any factors?
T   jnz dn                              ; if so, jump to done
    print "PRIME! ",0                   ; else show there were no factors
dn: print chr$(8,20h,10,10)             ; done factoring, newlines
    jmp start                           ; and back to beginning
   
gb: ret                                 ; exit gracefully


Although this obviously uses one additional byte.
"To deny our impulses... foolish; to revel in them, chaos." MCJ 2003.08

bozo

sorry still don't have good example yet, currently at work.
the aim is to eliminate all conditional jumps/calls..anything that addresses labels.
it is possible, but a little tricky..all adresses are created at runtime, rather than etched in memory.
i'll try do something tonight...

Rockoon

When C++ compilers can be coerced to emit rcl and rcr, I *might* consider using one.

Alloy

The book "32/64-bit 80x86 Assembly Language Architecture" by James Leiterman has a chapter on branchless code. Along with the branch hints it discussed using  bit manipulations to avoid branches. It looks like alot more code than using hints or jump tables.
We all used to be something else. Nature has always recycled.

bozo


WIN32_LEAN_AND_MEAN equ 1

.686
.xmm
.model flat,stdcall

include <windows.inc>
include <msvcrt.inc>
include <stdio.inc>

includelib <kernel32.lib>

do_padding macro code_size, flag
   local pad_size
   local count

   pad_size = 1
   count = 0                         ; would hold how many bits to shift

   while pad_size lt code_size
      pad_size = (pad_size SHL 1)
      count = count + 1
   endm

   if flag eq 1
      db (1 shl count) - code_size dup (90h)
   endif

endm

pCreateFileA  equ <esi+4*0>
pWriteFile    equ <esi+4*1>
pCloseHandle  equ <esi+4*2>
pprintf       equ <esi+4*3>
pExitProcess  equ <esi+4*4>

    .code

main:
   sub esp,4*5
   mov edi,esp
   push edi
   mov eax,[CreateFileA]          ; these api addresses could optionally be calculated
   stosd                          ; at runtime
   mov eax,[WriteFile]
   stosd
   mov eax,[CloseHandle]
   stosd
   mov eax,[printf]
   stosd
   mov eax,[ExitProcess]
   stosd
   pop esi

   call @geip                      ; only call once
@geip:
   pop ebp
   lea ebp,[ebp + (begin_call - @geip)]
   ; ebp = current eip
   lea ebp,[ebp + (end_call - begin_call)]
   ; ebp = end_call address
begin_call:
   push 0
   push FILE_ATTRIBUTE_NORMAL
   push OPEN_EXISTING
   push 0
   push FILE_SHARE_WRITE
   push GENERIC_WRITE
   push CStr(<'file.txt'>)
   push ebp                                    ; save return address on stack
   jmp dword ptr [pCreateFileA]
end_call:
   ; we end back here..

   test eax,eax                                ; test for INVALID_HANDLE_VALUE
   xchg eax,ebx                                ; save handle in ebx

   sets al                                     ; js if not opened
   and eax,0ffh                                ; clear upper 24-bits

   ; this is part i haven't been able to pre-calculate ..unless you know?
   ; set to 6, for 64 bytes

   shl eax, 6                                  ; multiply depending on pad_size
                                               ; do_padding macro calculates this value
                                               ; but no way to forward reference it!
   lea eax,[ebp + eax + (@opened - end_call)]
   jmp eax                                     ; jump on condition
                                               ; between @opened and @not_opened
                                               ; we have space for 32 bytes of instructions
@opened:
   lea ebp,[eax + (@end_write - @opened)]
   push 0
   push esp
   push 14                                     ; write 14 bytes to file
   push CStr(<'Hello,World!',13,10>)
   push ebx
   push ebp
   jmp dword ptr [pWriteFile]
@end_write:
   pop ecx                                     ; length of bytes written

   lea ebp,[ebp + (@exit_code - @end_write)]   ; return to @not_opened address
   push ebx
   push ebp
   jmp dword ptr [pCloseHandle]
@end_opened:

   do_padding (@end_opened - @opened), 1

@not_opened:
   lea ebp,[eax + (@exit_code - @not_opened)]

   push CStr(<10,'file not opened'>)
   push ebp
   jmp dword ptr [pprintf]

@exit_code:
   push 0
   jmp dword ptr [pExitProcess]

   end main


  i know what some of you are thinking... :P

  what the hell is that?? well, i wanted to write some examples that were anti-decompile without using complex
  polymorphic/metamorphic or compression code - i guess i would have to write my own assembler to make the job
  easier..just wouldn't make sense to do all this manually.

  if you process through decompiler, it won't give meaningful results, since they usually rebuild loop/conditional structures
  based on address labels.its not possible to analyse code with flowchart unless there are address labels for each conditional test.

  of course, it would probably be quite easy for someone to re-calculate the appropriate addresses, and insert JE/JNE/JS/JNS..etc where applicable, but that wouldn't be easy for everyone.

maybe it is still not the best of examples, but its best i could do for now..

Quote32/64-bit 80x86 Assembly Language Architecture

i must check this out!

drizz

Hi Kernel_Gaddafi,

nice ideas, you could simplify it by using macros.

_JS macro yes:req, no:req
cdq
and edx,offset yes-offset no
lea eax,[edx+offset no]
jmp eax
endm

_JZ macro yes:req, no:req
cmp eax,1
sbb eax,eax
and eax,offset yes-offset no
add eax,offset no
jmp eax
endm

start:
int 3

mov eax,-1;;1;;result from createfile
_JS notopened,opened;
opened: nop
jmp @F
notopened: nop
@@:
mov eax,0
_JZ _zero,not0
not0: nop
jmp @F
_zero: nop
@@:
end start
The truth cannot be learned ... it can only be recognized.

bozo

nice one, drizz.
i'll test this out and see how well it works.
cheers!

Shell

@Kernel_Gaddafi: Hello Momar, erm I mean Kernel  :green2 Having seen your last few topics, it appears we are both after the same thing - code obfuscation. Now I'm not sure what you intend to do with these little gems (posts) but my current project is a garden variety "YET ANOTHER PE PROTECTOR" :eek albeit with a few twists - my favorite among which is total abuse of SEH/VEH for everything from function calling to anti-tricks, to ring0 switching, etc.

This topic has opened my eyes to new posibilties. I hope you don't mind if I "borrow" some of these ideas  :bg Forget about the Leiterman book (unless you're looking for a few laughs). Your example is several degrees of magnitude more advanced than what the book's author achieved with the entire chapter (all three source listings worth  ::) )

@drizz: Elegant and straight to the point as always  :U

@Mark Jones: Thanks for the branch hint example usage. They will come in very handy.

u

What about the CMOVXX instructions?
Please use a smaller graphic in your signature.

bozo

@Shell

if you can use the code i write for anything, i've no problem with that.
as far as using the obfuscated code for anything, i was just curious about how to make it work - no use for it really.
i just think it would be interesting to write code that had nothing but direct jmps, with the addresses calculated dynamically, stored in a register.
it would be alot easier on x64 cpu, since you can access rip register directly.