Can anyone here point me in the right direction of branchless code examples?
i think it would be useful for various purposes.
some good examples i wrote were lost in hard drive accident.
say conditional jumps, where the length of bytes to jump is stored in a register, rather than as code label
normal conditional test, pretty bad example, off top of my head.
test_result:
test eax,eax
je false
true:
mov eax,12345678h
jmp end_routine
false:
mov eax,0abcdefh
end_routine:
ret
OR
test_result:
call @get_eip
@get_eip:
pop ebp
shl eax,shift_byte ; multiples of 2, depending on how many bytes required for true condition.
lea ecx,[ebp + eax + (@get_eip - false)]
jmp ecx
true:
mov eax,12345678h
jmp end_routine
; padded bytes ; depends on number of bytes executed for true condition
false:
mov eax,0abcdefh
end_routine:
ret
hopefully might make some sense to someone...i'll try upload real working examples later.
but the only problem is calculating the correct offsets/padding out bytes.
the purpose? optimizations, anti-debug are 2 i can think of.
Quote from: Kernel_Gaddafi on November 09, 2007, 10:47:48 PM
test_result:
  test eax,eax
  je false
true:
  mov eax,12345678h
  jmp end_routine
false:
 mov eax,0abcdefh
end_routine:
 ret
cmp eax,1
sbb eax,eax
and eax,0abcdefh-12345678h
add eax,12345678h
you gave good answer, drizz - its just that i gave bad example.
still didn't get around to creating a good example yet, didn't have time..
as far as the BRANCHING methodology, this form is superior:
test eax, eax
mov eax, truevalue
jne @@skip
mov eax, falsevalue
@@skip:
only a single branch (here, only branches when true), and with one less instruction it is also smaller ..
If the jump is normally taken, a small cache improvement can be made in loops by prefixing the conditional jump with a branch hint, i.e.:
T MACRO _jump:VARARG ; Branch Hint: Taken
DB 2Eh
_jump
ENDM
NT MACRO _jump:VARARG ; Branch Hint: Not Taken
DB 3Eh
_jump
ENDM
.code
start:
mov eax,input("Enter a number: ")
cmp byte ptr [eax+1],10 ; break on pressing enter
NT jz gb ; and exit
cmp byte ptr [eax],"-" ; do not allow negative numbers
NT jz start ;
...
T jl sl ; loop if factor less than nVal
test si,si ; were there any factors?
T jnz dn ; if so, jump to done
print "PRIME! ",0 ; else show there were no factors
dn: print chr$(8,20h,10,10) ; done factoring, newlines
jmp start ; and back to beginning
gb: ret ; exit gracefully
Although this obviously uses one additional byte.
sorry still don't have good example yet, currently at work.
the aim is to eliminate all conditional jumps/calls..anything that addresses labels.
it is possible, but a little tricky..all adresses are created at runtime, rather than etched in memory.
i'll try do something tonight...
you mean like jump tables?
The book "32/64-bit 80x86 Assembly Language Architecture" by James Leiterman has a chapter on branchless code. Along with the branch hints it discussed using bit manipulations to avoid branches. It looks like alot more code than using hints or jump tables.
WIN32_LEAN_AND_MEAN equ 1
.686
.xmm
.model flat,stdcall
include <windows.inc>
include <msvcrt.inc>
include <stdio.inc>
includelib <kernel32.lib>
do_padding macro code_size, flag
  local pad_size
  local count
  pad_size = 1
  count = 0             ; would hold how many bits to shift
  while pad_size lt code_size
   pad_size = (pad_size SHL 1)
   count = count + 1
  endm
  if flag eq 1
   db (1 shl count) - code_size dup (90h)
  endif
endm
pCreateFileAÂ equ <esi+4*0>
pWriteFile  equ <esi+4*1>
pCloseHandle equ <esi+4*2>
pprintf    equ <esi+4*3>
pExitProcess equ <esi+4*4>
  .code
main:
  sub esp,4*5
  mov edi,esp
  push edi
  mov eax,[CreateFileA]     ; these api addresses could optionally be calculated
  stosd             ; at runtime
  mov eax,[WriteFile]
  stosd
  mov eax,[CloseHandle]
  stosd
  mov eax,[printf]
  stosd
  mov eax,[ExitProcess]
  stosd
  pop esi
  call @geip           ; only call once
@geip:
  pop ebp
  lea ebp,[ebp + (begin_call - @geip)]
  ; ebp = current eip
  lea ebp,[ebp + (end_call - begin_call)]
  ; ebp = end_call address
begin_call:
  push 0
  push FILE_ATTRIBUTE_NORMAL
  push OPEN_EXISTING
  push 0
  push FILE_SHARE_WRITE
  push GENERIC_WRITE
  push CStr(<'file.txt'>)
  push ebp                  ; save return address on stack
  jmp dword ptr [pCreateFileA]
end_call:
  ; we end back here..
  test eax,eax                ; test for INVALID_HANDLE_VALUE
  xchg eax,ebx                ; save handle in ebx
  sets al                   ; js if not opened
  and eax,0ffh                ; clear upper 24-bits
  ; this is part i haven't been able to pre-calculate ..unless you know?
  ; set to 6, for 64 bytes
  shl eax, 6                 ; multiply depending on pad_size
                        ; do_padding macro calculates this value
                        ; but no way to forward reference it!
  lea eax,[ebp + eax + (@opened - end_call)]
  jmp eax                   ; jump on condition
                        ; between @opened and @not_opened
                        ; we have space for 32 bytes of instructions
@opened:
  lea ebp,[eax + (@end_write - @opened)]
  push 0
  push esp
  push 14                   ; write 14 bytes to file
  push CStr(<'Hello,World!',13,10>)
  push ebx
  push ebp
  jmp dword ptr [pWriteFile]
@end_write:
  pop ecx                   ; length of bytes written
  lea ebp,[ebp + (@exit_code - @end_write)]  ; return to @not_opened address
  push ebx
  push ebp
  jmp dword ptr [pCloseHandle]
@end_opened:
  do_padding (@end_opened - @opened), 1
@not_opened:
  lea ebp,[eax + (@exit_code - @not_opened)]
  push CStr(<10,'file not opened'>)
  push ebp
  jmp dword ptr [pprintf]
@exit_code:
  push 0
  jmp dword ptr [pExitProcess]
  end main
 i know what some of you are thinking... :P
 what the hell is that?? well, i wanted to write some examples that were anti-decompile without using complex
 polymorphic/metamorphic or compression code - i guess i would have to write my own assembler to make the job
 easier..just wouldn't make sense to do all this manually.
 if you process through decompiler, it won't give meaningful results, since they usually rebuild loop/conditional structures
 based on address labels.its not possible to analyse code with flowchart unless there are address labels for each conditional test.
of course, it would probably be quite easy for someone to re-calculate the appropriate addresses, and insert JE/JNE/JS/JNS..etc where applicable, but that wouldn't be easy for everyone.
maybe it is still not the best of examples, but its best i could do for now..
Quote32/64-bit 80x86 Assembly Language Architecture
i must check this out!
Hi Kernel_Gaddafi,
nice ideas, you could simplify it by using macros.
_JS macro yes:req, no:req
cdq
and edx,offset yes-offset no
lea eax,[edx+offset no]
jmp eax
endm
_JZ macro yes:req, no:req
cmp eax,1
sbb eax,eax
and eax,offset yes-offset no
add eax,offset no
jmp eax
endm
start:
int 3
mov eax,-1;;1;;result from createfile
_JS notopened,opened;
opened: nop
jmp @F
notopened: nop
@@:
mov eax,0
_JZ _zero,not0
not0: nop
jmp @F
_zero: nop
@@:
end start
nice one, drizz.
i'll test this out and see how well it works.
cheers!
@Kernel_Gaddafi: Hello Momar, erm I mean Kernel :green2 Having seen your last few topics, it appears we are both after the same thing - code obfuscation. Now I'm not sure what you intend to do with these little gems (posts) but my current project is a garden variety "YET ANOTHER PE PROTECTOR" :eek albeit with a few twists - my favorite among which is total abuse of SEH/VEH for everything from function calling to anti-tricks, to ring0 switching, etc.
This topic has opened my eyes to new posibilties. I hope you don't mind if I "borrow" some of these ideas :bg Forget about the Leiterman book (unless you're looking for a few laughs). Your example is several degrees of magnitude more advanced than what the book's author achieved with the entire chapter (all three source listings worth ::) )
@drizz: Elegant and straight to the point as always :U
@Mark Jones: Thanks for the branch hint example usage. They will come in very handy.
What about the CMOVXX instructions?
@Shell
if you can use the code i write for anything, i've no problem with that.
as far as using the obfuscated code for anything, i was just curious about how to make it work - no use for it really.
i just think it would be interesting to write code that had nothing but direct jmps, with the addresses calculated dynamically, stored in a register.
it would be alot easier on x64 cpu, since you can access rip register directly.