Dynamically creating and loading ASM DLLs during runtime from .Net

Started by braincell, January 03, 2012, 03:37:12 PM

Previous topic - Next topic

donkey

Quote from: braincell on January 03, 2012, 07:49:38 PM
Somebody said earlier that it is naive to think it's NOT a monumental task, but maybe we misunderstood eachother. That's a good idea, or at least it's worth a shot. I'll think about it and do a bit of research on the topic.

Its not monumental since you are controlling the input and grammar and restricting it. Writing a complete macro assembler with generic parsing and expression evaluation is a large undertaking, the most complex part of an assembler are the parsing mechanisms, you shouldn't need those.

I chose COM because I prefer it as an interop mechanism since its available to scripts etc... but I agree that a DLL and the flat API is a much easier route.
"Ahhh, what an awful dream. Ones and zeroes everywhere...[shudder] and I thought I saw a two." -- Bender
"It was just a dream, Bender. There's no such thing as two". -- Fry
-- Futurama

Donkey's Stable

vanjast

Quote from: BogdanOntanu on January 03, 2012, 04:02:53 PM
Quote
2- In .Net the addition operator "a=a+b" takes 20 CPU cycles. I am assuming under ASM it will take only 1 CPU cycle. Is this correct?

Yes, ADD EAX,ECX will take 1 cycle or 0.5 cycles or 0.25 cycles or even less IF you pair the instructions nicely and it depends on the CPU model.


Maybe it's a memory referenced operation and the cycles (or equivalent time) are taken up with getting a & b to and from memory.
Cache hit/miss Or a crappy compiler..
:8)

baltoro

This is one of those great programming questions that you never see discused either here at the MASM forum or at the Microsoft Technical Forums.
Years ago, I read, Adam Nathan's "NET. and COM: The Complete Interoperability Guide", which is an extraordinary book,...and, it never once mentioned Assembly Language compatibility with the .NET Framework.
What you have suggested is intelligent, but, you have chosen the most difficult architectural configuration to implement your application's functionality. As EDGAR (donkey) has suggested, COM interop is the most accessible way to accomplish this, since the syntactical conventions already exist and are fairly well documented. You can actually write a completely functional COM interface in Assembly Language that is compliant with the COM specification, create a .NET Framework runtime-callable wrapper (RCW) for it using TlbImp.exe (.NET Interop: Get Ready for Microsoft .NET by Using Wrappers to Interact with COM-based Applications, 2001),...but, that's an awful lot of complexity to go through just to compile an executable.
I use Visual Studio myself, and have found the marshalling aspect of COM to NET components tedious and often confusing,...so, I just code in C++ and Assembly Language, which are alot more compatible.
Baltoro

jj2007

Quote from: braincell on January 03, 2012, 07:53:53 PM
Edit: just saw your Basic lib, and RichMasm. So I guess the IDE can have colours after all and some quick help. The lib also looks great, and I think I'll use it. Thanks.

Here is a first attempt at simulating your procedure - the fmul, fadd etc are randomly changed:
Quoteinclude \masm32\MasmBasic\MasmBasic.inc   ; RichMasm: Press F6 to assemble & link
.data
TheInputs   REAL8 123.456, 11.11, 22.22, 33.33, 44.44, 55.55, 66.66, 77.77
TheOpcodeBytes   db 43h, 4bh, 73h, 63h   ; *)

   Init
   push eax   ; create lpflOldProtect
   invoke VirtualProtect, TheProc, 1024, PAGE_EXECUTE_READWRITE, esp
   mov ecx, 20
   mov esi, offset TheOpcodeBytes
   Print "Testing the proc:"
   Open "O", #1, "Genetic.log"
   Rand()
   .Repeat
      movzx eax, byte ptr [esi+Rand(4)]
      mov byte ptr Pos1[1], al
      movzx eax, byte ptr [esi+Rand(4)]
      mov byte ptr Pos2[1], al
      movzx eax, byte ptr [esi+Rand(4)]
      mov byte ptr Pos3[1], al
      movzx eax, byte ptr [esi+Rand(4)]
      mov byte ptr Pos4[1], al
      call TheProc
      Print Str$("\nLoop   %i:\t", ecx), Str$(ST(0))
      Print #1, Str$("\nLoop   %i:\t", ecx), Str$(ST(0))
      fstp st
      dec ecx
   .Until Sign?
   Close #1
   mov eax, [esp]   ; get lpflOldProtect
   invoke VirtualProtect, TheProc, 1024, eax, esp
   pop eax
   Inkey CrLf$, "Done"
   Exit
TheProc proc
; int 3      ; activate to produce the disassembly below
  mov ebx, offset TheInputs-4
Pos0::
  fld REAL8 ptr [ebx+4]
Pos1::
  fadd REAL8 ptr [ebx+4+8]
Pos2::
  fmul  REAL8 ptr [ebx+4+16]
Pos3::
  fdiv  REAL8 ptr [ebx+4+24]
Pos4::
  fsub  REAL8 ptr [ebx+4+32]
  ret
TheProc endp

end start

*)
Used in TheOpcodes:

Address         Hex dump                Command                             Comments
004011C1        Ú$  CC                  int3
004011C2        ³.  BB FC3F4000         mov ebx, 00403FFC
004011C7        ³.  DD43 04             fld qword ptr [ebx+4]
004011CA        ³.  DC4B 0C             fmul qword ptr [ebx+0C]
004011CD        ³.  DC43 14             fadd qword ptr [ebx+14]
004011D0        ³.  DC43 1C             fadd qword ptr [ebx+1C]
004011D3        ³.  DC63 24             fsub qword ptr [ebx+24]
004011D6        À.  C3                  retn


Output:
Testing the proc:
Loop   20:      68.03300
Loop   19:      1998676.0
Loop   18:      22.22215
Loop   17:      84.92533
Loop   16:      86.25867
Loop   15:      99614.14
Loop   14:      -0.6137247
Loop   13:      6473.841
Loop   12:      124.0790
Loop   11:      68.03300
Loop   10:      3700.052
Loop   9:       49.14405
Loop   8:       44.44215
Loop   7:       -6.053924
Loop   6:       1338.266
Loop   5:       2574.098
Loop   4:       20.57600
Loop   3:       209.0480
Loop   2:       167.8960
Loop   1:       83247.06
Loop   0:       -325.7846
Done


Source attached, including a timings switch:

437 ms for 1000000 loops

braincell

Quote from: jj2007 on January 03, 2012, 09:59:42 PM


Here is a first attempt at simulating your procedure - the fmul, fadd etc are randomly changed:


Again, i'm very very impressed. I've just spent about 30 minutes looking at it and trying to understand it (i didn't know what movzx and ptr keywords were doing, i'm a noob), but i think i get it and it looks ingenious.

How would other more complex operators work though? For example "if cflag = true : a = b".
Does an equivalent to that have an OpCode in ASM? I've seen it being used in the ASM interpreted code of the application i'm learning from, so I guess that there is one (at least they say those were all ASM operators - i just saw the results of the operators when translated to C with their translator, not any ASM code itself).
Edit: I looked through the Intel Opcodes and Mnemonics chm within MASM and I couldn't find an operator that is supposed to do that. This means some of the so-called "operators" in my code would really be two or more instructions. Could that then be handled just as easily or would the variable byte-size be a severe constraint?


@donekey, baltoro
Thanks, you guys have given me plenty to think about. Just figuring out what the best way to approach this is half the work I guess.
I'm considering simply moving a slightly larger portion of my code into pure ASM, then i won't have to interop at all - especially if what jj is saying can really be done like i imagined it.
I mean i'd simply use a ASM DLL which i've already tried and .Net can access it and get values from it, it works.

jj2007

Quote from: braincell on January 03, 2012, 11:35:13 PMHow would other more complex operators work though? For example "if cflag = true : a = b".
Does an equivalent to that have an OpCode in ASM?

The assembler equivalent would be branches:
.if cflag
  fadd REAL8 ptr [ebx+4+8]
.else
  fsub REAL8 ptr [ebx+4+8]
.endif
In "pure" assembler, this would be the jz and jnz opcodes, and again it would be easy to poke the appropriate bytes.

Re DLL: If .net supports message handling, then WM_COPYDATA would be the most elegant solution. The ASM part would send it whenever it has found an algo that is closer to the desired result.

jj2007

Attached a sample using WM_COPYDATA - SendData/CopyData$() in MasmBasic speak.

GeneticProcServer sends a string as follows:
QuoteSendData "NetClient", Cat$(Str$("Loop %i:\n", NumLoops-ecx)+Str$("Target=    \t%i", DesiredOutput)+Str$("\nBest proxy=\t%f", BestProxy))

NetClient receives the string:
QuoteWndProc proc hWnd:HWND, uMsg:UINT, wParam:WPARAM, lParam:LPARAM
   SWITCH uMsg
   CASE WM_COPYDATA
      SetWin$ hEdit=CopyData$

Launch NetClient.exe first, then GeneticProcServer.exe. NetClient is actually an ordinary MasmBasic/Masm32 window receiving the WM_COPYDATA message - it should not be too difficult to do the same in .net, see e.g. How to migrate SendMessage with WM_COPYDATA to .Net Framework

What is still missing is to copy the "successful" proc into a buffer and disassemble it.

braincell

Quote from: jj2007 on January 04, 2012, 09:07:13 AM
What is still missing is to copy the "successful" proc into a buffer and disassemble it.

It might not be missing because i had a brainwave last night. The randomization of operations is not speed critical so i could simply pass an array with OperationCount and OperationID() to the DLL.
Then if Operation(0)=1, add. If Operation(0)=2 subtract, etc. Then do that for all the operations up to operation count.
That way i know the operation IDs (via the passed array) and can preserve the successful process by preserving the array, and reconstructing it later.
In other words, i'd simply not use the Rand() you used, but the OperationID array!
It might use a few more jumps and ifs but that's perfectly alright with me imho.

I'll look into WM_COPYDATA as well, thanks.

I'm quite optimistic i'll be able to pull all of this off without any interop stuff and make it fast enough for my needs. This is great news.

jj2007

Well, try one more. NASM has a disassembler that works on snippets:

Loop 321:
Target=    99999
Best proxy= 99988.89

00000000  BB0C504000        mov ebx,0x40500c
00000005  DD4304            fld qword [ebx+0x4]
00000008  DC630C            fsub qword [ebx+0xc]
0000000B  DC4314            fadd qword [ebx+0x14]
0000000E  DC631C            fsub qword [ebx+0x1c]
00000011  DC6324            fsub qword [ebx+0x24]
00000014  83F9FF            cmp ecx,byte -0x1
00000017  7405              jz 0x1e
00000019  DC432C            fadd qword [ebx+0x2c]
0000001C  EB03              jmp short 0x21
0000001E  DC4B2C            fmul qword [ebx+0x2c]
00000021  C3                ret


The original looks like this - you can see above how the code has changed:

QuoteTheProc proc
; int 3      ; activate to watch changes in Olly
  mov ebx, offset TheInputs-4
Pos0::
  fld REAL8 ptr [ebx+4]
Pos1::
  fadd REAL8 ptr [ebx+4+8]
Pos2::
  fmul  REAL8 ptr [ebx+4+16]
Pos3::
  fdiv  REAL8 ptr [ebx+4+24]
Pos4::
  fsub  REAL8 ptr [ebx+4+32]
  cmp ecx, -1   ; never zero?
Pos5::
  .if Zero?      ; we poke into Pos5[0]
   fadd  REAL8 ptr [ebx+4+40]
  .else
   fmul  REAL8 ptr [ebx+4+40]
  .endif
  ret
TheProc endp
TheProc_endp:

I hope the attached archive is complete - let me know. Extract to the root of your masm32 drive with "use folder names", then launch \masm32\RichMasm\NetClient.exe

EDIT: The archive contains a modified source using macros and a structure to poke the appropriate bytes:

Quote      mov cseq.cs_fmul1, CalcOP()   ; replace
      mov cseq.cs_fadd1, CalcOP()   ; fadd, fsub etc
      mov cseq.cs_fadd2, CalcOP()   ; opcodes in
      mov cseq.cs_fadd3, CalcOP()   ; TheProc
      mov cseq.cs_fmul2, CalcOP()   ; with random
      mov cseq.cs_fsub1, CalcOP()   ; new opcodes

One interesting result is that in spite of now 6 FPU instructions and one jump, the results are often not very close to the target, and are reached after a few thousand iterations.

braincell

Yes it's complete, at least it looks that way. I will analyze it bit by bit. :)

My values are in the 0-5000 range with about two decimals, so at most 6 digits (positive and negative).
I just figured i could simply convert them to 32bit integer and skip FPU completely. I'd only need to convert the final result back to float, and that's not speed critical.

Thanks.

jj2007

Quote from: braincell on January 04, 2012, 03:42:29 PM
Yes it's complete, at least it looks that way. I will analyze it bit by bit. :)

DisAs.bat needs a little adjustment (i.e., no path):
ndisasm.exe DisAsTmp.exe -b32 >DisAsTmp.asm

QuoteI just figured i could simply convert them to 32bit integer and skip FPU completely. I'd only need to convert the final result back to float, and that's not speed critical.

The FPU is not that bad, and gives you more flexibility.