Dynamically creating and loading ASM DLLs during runtime from .Net

braincell · January 03, 2012, 03:37:12 PM

Hi,

I'm a .Net programmer with 5 years of full time coding experience and knowledge of multiple languages but I'm new to ASM. I've read some of the tutorials and i completely understand the basics, registers, memory management, and how it's all supposed to work. I also know a lot of theory of how low level code works from past experience (including coding HLSL and GPU specific code).

I've written an application which simulates Assembly programs complete with registers and basic operator instructions. This was in an attempt to mimic an existing application that dynamically created small programs and ran them via a process called genetic programming. However, now the process of "simulating" assembly is becoming too slow as my researching and development needs are increasing, so I understand i'll somehow need to take a step back, "unsimulate" and move into ASM directly.

My programs are fairly simple. They have up to 8 registers (f0 to f7) and 100 input values (v0 to v99).
The input values are a 100 size array of type Double (8 bytes).
The program takes the input values and does only the basic operations on them such as +,-,*,/,sin,cos, FPREM, bit shift, register swap, if, and ">" comparison.

An example program might be as follows (f is register, v is input value)

Code Select

f0 = f0 + v23
f0 = f0 * v11
f1 = f1 + v76
f1 = f1 / f0
f0 = sin(f0)

That's it, that simple. The output of the calculation are the registers which are then read by the rest of the program (outside of assembly simulation).

Each line of the program is dynamically created during runtime, and in my currenty "simulated" assembly, each line is run via a .Net interface to return a value after operating. This means that the operators and registers/input values being operated on are selected randomly by the program which then evaluates how that program is performing. In the above example there are 5 lines but usually there are about 200 in reality. Genetic programming tries to create random programs and evolve them with a beam-search-analogue method to arrive at a program that is most capable of producing a set of output values based on the known input values. The loop for testing those values is a very tight loop, but some of my runs may need as many as several days or weeks to finish running and testing millions of programs in my current setup to find a good solution.

So I thought about doing this:
1-Create .asm file in .Net via text stream output, save to file.
2-Call the makeit.bat from .Net
3-Load the .dll file created into .Net and use the function for faster operation.

However, i tested step 2 and it takes about 300ms on my CPU (to make and link the dll). Anything below 500 milliseconds is an improvement, but i was looking for something better than 300.

My next idea was to create the .dll directly via output of a byte() array. What i mean by that is since the programs are fairly simple, i can figure out how to reconstruct the binary code directly for each operator/register combination by looking at the code. What i'm talking about is creating a .dll file on disk with 1s and 0s completely bypassing compilation. I'd have to figure out how each operator/register combination is written but i like hackish stuff so that's ok as long as it gives me the speed. I'd write a few simple dlls to see what bytes change when registers/operators are changed and added and then translate that into code. That's lower level stuff than ASM, i know, but it also leads me to:

Questions:

1.1- Is the idea of outputing binary arrays to disk and so creating a dll file directly possible and feasable, or does the linker do some complicated business and addressing which i am likely to find too difficult to do? Consider the simplicity of the programs i'm trying to generate.

1.2- My assumption is that adding lines of ASM code translates simply to adding more bytes to the compiled program, without messing up some complicated addressing procedure. Is this a naive assumption?

2- In .Net the addition operator "a=a+b" takes 20 CPU cycles. I am assuming under ASM it will take only 1 CPU cycle. Is this correct?

3- Can you provide me with any examples of ASM DLLs being called from any other language?

4- Are there examples of an ASM DLL being loaded and a function being called which is passed an array of 8 byte values and returns an array of 8 byte values? In other words, an example of passing an array of Double to the DLL and returning an array of Double?

5- What libs will I need to use to perform operations on double floating point variables, or can that be done in ASM without any libs? I ask because a register is 32bits, 4 bytes, and i'm looking at 8 bytes of data.

6- Finally, do you have any better/different solutions for the type of problem I'm encountering? If there is a way to use lower clock cycle count via some pre-generated assembly operator DLL which can operate on input values, it might prove to be an adequate speed-up on it's own (ie a DLL that will do a=a+b and other operations).

Any answers or pointers will be highly appreciated, thanks.

BogdanOntanu · January 03, 2012, 04:02:53 PM

Quote from: braincell on January 03, 2012, 03:37:12 PM
Hi,
...
Questions:

1.1- Is the idea of outputing binary arrays to disk and so creating a dll file directly possible and feasable, or does the linker do some complicated business and addressing which i am likely to find too difficult to do? Consider the simplicity of the programs i'm trying to generate.

Not very useful IMHO. Generating and then loading an DLL from disk is slow. It might also get complicated... depends on you level of knowledge and understanding.

Quote
1.2- My assumption is that adding lines of ASM code translates simply to adding more bytes to the compiled program, without messing up some complicated addressing procedure. Is this a naive assumption?

Yes, most of the time it is naive.

Quote
2- In .Net the addition operator "a=a+b" takes 20 CPU cycles. I am assuming under ASM it will take only 1 CPU cycle. Is this correct?

Yes, ADD EAX,ECX will take 1 cycle or 0.5 cycles or 0.25 cycles or even less IF you pair the instructions nicely and it depends on the CPU model.

Quote
3- Can you provide me with any examples of ASM DLLs being called from any other language?

Yes, check MASM32 examples and forum.

Quote
4- Are there examples of an ASM DLL being loaded and a function being called which is passed an array of 8 byte values and returns an array of 8 byte values? In other words, an example of passing an array of Double to the DLL and returning an array of Double?

Not exactly what you want I guess .... but it is kind of trivial.

Quote
5- What libs will I need to use to perform operations on double floating point variables, or can that be done in ASM without any libs? I ask because a register is 32bits, 4 bytes, and i'm looking at 8 bytes of data.

Yes, FPU can handle double floats and ASM instructions are available for this without libs.

However please note that the FPU code is "slow". SSE should be faster... or if possible you can design a form of 4 + 4 integer fixed point math to speed up.

Quote
6- Finally, do you have any better/different solutions for the type of problem I'm encountering? If there is a way to use lower clock cycle count via some pre-generated assembly operator DLL which can operate on input values, it might prove to be an adequate speed-up on it's own (ie a DLL that will do a=a+b and other operations).

Yes, generate the code directly in memory, Virtual protect it to allow execution, execute... much faster, more easy, more logical.

Anyway... it took this Universe 14 billion years to evolve a genetic algorithm... using speed of light operations at quantum / atomic level....

I assume a super computer...programmed by a genius in ASM ( using light CPU ) might solve this problem in approx 30 billion years at best... unfortunately by that time the Universe might be dead :D

But do not let me stop you ;)

braincell · January 03, 2012, 04:32:40 PM

Quote from: BogdanOntanu on January 03, 2012, 04:02:53 PM

Yes, check MASM32 examples and forum.

I just found an example with an integer array. I'm all good.

Quote
Yes, FPU can handle double floats and ASM instructions are available for this without libs.
However please note that the FPU code is "slow". SSE should be faster... or if possible you can design a form of 4 + 4 integer fixed point math to speed up.

Ok I need to find out how to use SSE. Any early linkage before I start my search wouldn't hurt.
The 4+4 was an idea i already tried with shaders once. It's quite complicated, i'm not sure i want to try it again tbh... :/

Quote
Yes, generate the code directly in memory, Virtual protect it to allow execution, execute... much faster, more easy, more logical.

I've used the .Net Reflection and the Assembly methods within it but I always assumed it would only accept MSIL assembly, not native. It accepts an array of bytes for creation too, and to do that I'd need some way of compiling, so I'm back to square 1. Disregarding those questions, maybe you meant something else? I mean, what would i have to search for to get my answers as to how i would do what you suggest (in .Net)?

Quote
Anyway... it took this Universe 14 billion years to evolve a genetic algorithm... using speed of light operations at quantum / atomic level....
I assume a super computer...programmed by a genius in ASM ( using light CPU ) might solve this problem in approx 30 billion years at best... unfortunately by that time the Universe might be dead :D
But do not let me stop you ;)

I'm not insane to be trying to evolve the universe :) i'm trying to evolve a much simpler predictive math function.

dedndave · January 03, 2012, 05:03:31 PM

the FPU isn't too bad :P
what may be slow is moving stuff in and out of it
but, if you want to add 2 floats, it's not so bad

braincell · January 03, 2012, 05:49:07 PM

Is this possible in .Net or any other platform:

-Have ASM code in memory (as string or something)
-call some process which would compile that code into a DLL
-return it as byte array or whatever
-put that binary program in protected memory
-get the address to a method within that program and call it

Obviously, the ".Net" part could be a problem since i have no clue how i would even get a newly compiled array of bits/bytes to run straight from memory, or what/how the Reflection does it (or if it allows non-.Net assembly loading in the first place). Anyway, if it's possible on any other platform like c++ i'd like to know. Thanks.

jj2007 · January 03, 2012, 06:09:26 PM

It's certainly possible in pure assembler using a series of nops that you overwrite "on the fly" using VirtualProtect, as Bogdan suggested.

donkey · January 03, 2012, 06:21:59 PM

Quote from: braincell on January 03, 2012, 05:49:07 PM
Is this possible in .Net or any other platform:

-Have ASM code in memory (as string or something)
-call some process which would compile that code into a DLL
-return it as byte array or whatever
-put that binary program in protected memory
-get the address to a method within that program and call it

Obviously, the ".Net" part could be a problem since i have no clue how i would even get a newly compiled array of bits/bytes to run straight from memory, or what/how the Reflection does it (or if it allows non-.Net assembly loading in the first place). Anyway, if it's possible on any other platform like c++ i'd like to know. Thanks.

Doing that in .NET is going to be really complicated if it can be done at all. You can use the Reflection.Emit namespace and emit MSIL using an ILGenerator Class. Running raw assembly language code would be pretty much out of the question, how would you obtain the results in a managed environment. I don't claim to know a lot about .NET, only what I use in my projects but it doesn't sound like its going to work.

braincell · January 03, 2012, 06:31:32 PM

Quote from: jj2007 on January 03, 2012, 06:09:26 PM
It's certainly possible in pure assembler using a series of nops that you overwrite "on the fly" using VirtualProtect, as Bogdan suggested.

I'm not sure what nops are, but on an average/slow CPU (ie Core2Duo) how many milliseconds would it approximately take to convert the ASM code (for example the 5 lines of code mentioned in my first post) into bits and store it in memory for calling?

Quote from: donkey on January 03, 2012, 06:21:59 PM
Doing that in .NET is going to be really complicated if it can be done at all. You can use the Reflection.Emit namespace and emit MSIL using an ILGenerator Class. Running raw assembly language code would be pretty much out of the question, how would you obtain the results in a managed environment. I don't claim to know a lot about .NET, only what I use in my projects but it doesn't sound like its going to work.

Yeah...The more i read about reflection, the more i'm convinced it wouldn't work. I even thought about going into unmanaged c++ , then getting the DLL and result from there, and then interfacing it back to .Net. I'm not sure THAT would work either because - how would i compile ASM code on the fly and return results to memory in unmanaged c++ anyway? This is harder than i thought.

donkey · January 03, 2012, 06:47:19 PM

How about writing a skeleton unmanaged interop, like a COM object that recieves the emitted assembly executes it then returns it in methods. .NET can interface directly to unmanaged COM through the interop layer so it should be do-able, actually maybe not even that difficult if you spend enough time and the odd braincell (pun intended) writing the interface. After all COM can use pointers and has the ability to manipulate memory so it should work. What you would end up doing is instantiating the COM interface, build your machine code using the .NET application then pipe the data to the COM object where it would create the necessary memory buffer, execute it and return a result.

http://msdn.microsoft.com/en-us/library/ms973872.aspx

braincell · January 03, 2012, 06:59:23 PM

Hmm... That's kind of what i was going for when i said ".Net to unmanaged" in my previous post, but your idea is more specific (and also good).

What I'm unclear about though is this part:

Quote from: donkey on January 03, 2012, 06:47:19 PM
build your machine code using the .NET application then pipe the data to the COM object.

How would I build my machine code? Can I call ml.exe and link.exe and make them return the created DLL to memory instead of writing to disk?
Previously I tested compiling a dll to file and it took around 300ms for an empty DLL and that's maybe still too slow because I would have to keep recreating the evolving programs.
What I need is a way to convert ASM code to a compiled byte array and have it in memory in as little time as possible (prefferably around 50ms), so how would i do that?

donkey · January 03, 2012, 07:13:52 PM

Well, machine code is essentially just a string of BYTES that you can throw together, .NET doesn't need to know what they are and it would be easier to construct with a higher level language, but you could take any route to get the final byte code. For example FLD for a simple FLOAT by reference is always 0xD9, 0x05 followed by the offset of the FLOAT in memory which would be "fixed up" in the COM loader (ie offset from the beginning of the data block + address of the data block in memory). If as you said in the first post you are limiting the complexity of the machine language portion, you are probably a lot better off to just assemble the byte code yourself and bypass the assembler/linker completely. It would not be a monumental task in C# and would speed things up considerably as well as cutting external dependencies. If you want to avoid the work of writing a simple assembler, the IDebugControl interface can do it for you and you have no need of ML or LINK.

qWord · January 03, 2012, 07:36:42 PM

I would suggest to write a DLL with a flat API interface. The DLL, which creates the code and has a basic execution/debugging enviroment, could be a modification of japheth's JWASM.

jj2007 · January 03, 2012, 07:48:09 PM

Here are two snippets demonstrating how an executable can be "patched":

- the code that performs the calculations - we leave some nops that later can be used to insert arithmetic calculations:

Code Select

include \masm32\include\masm32rt.inc
.code
start:
  nop			; 90
  xchg eax, ecx		; 91 to find the location, we use
  xchg eax, ecx		; 91 a rare opcode combination ;-)
  nop			; 90
  mov eax, 123		; 5+1 bytes
  nop
  mov ecx, 321		; 5+1 bytes
  nop
  add ecx, 4000		; 6 bytes 81C1 E8030000
  nops 8
  add eax, ecx
  print str$(eax), 9
  inkey "ok?"
  exit
end start

- the "patcher":

Quoteinclude \masm32\MasmBasic\MasmBasic.inc   ; Download the library
Init
Let esi=FileRead$("GeneticCalc.exe")
mov ecx, LastFileSize
.Repeat
     mov eax, [esi+ecx]   ; search until we find...
     dec ecx
.Until Sign? || eax==90919190h   ; ...the rare opcode combination
.if Zero?
   lea ecx, [esi+ecx+24]
   invoke MbCopy, ecx, Chr$(0B9h, 7bh, 0,0,0), 5   ; the code we inject
   FileWrite "TheCalcModified.exe", esi, LastFileSize
   sub ecx, esi
   Inkey "File modified at pos ", Hex$(ecx)
.else
   Inkey "Magic string not found, sorry"
.endif
Exit
end start

This example is pretty simple, and too slow for practical purposes. There are a number of techniques to pass results to .net, such as pipes or memory-mapped files. 50 ms is a lot of time, don't be surprised if you can cut that down to some nanoseconds.

braincell · January 03, 2012, 07:49:38 PM

Quote from: donkey on January 03, 2012, 07:13:52 PM
It would not be a monumental task in C# and would speed things up considerably as well as cutting external dependencies. If you want to avoid the work of writing a simple assembler, the IDebugControl interface can do it for you and you have no need of ML or LINK.

Somebody said earlier that it is naive to think it's NOT a monumental task, but maybe we misunderstood eachother. That's a good idea, or at least it's worth a shot. I'll think about it and do a bit of research on the topic.

Quote from: qWord on January 03, 2012, 07:36:42 PM
I would suggest to write a DLL with a flat API interface. The DLL, which creates the code and has a basic execution/debugging enviroment, could be a modification of japheth's JWASM.

Hmm, thanks. I'll have a look.

braincell · January 03, 2012, 07:53:53 PM

Quote from: jj2007 on January 03, 2012, 07:48:09 PM
Here are two snippets demonstrating how an executable can be "patched":

Wow thanks jj. I think i'll need to get past the basic ASM tutorials to really see how it works but I'll keep this thread bookmarked.

So far ASM seems more interesting and "fun" than i thought, but also the resources for learning are all so different, people use different standards which are confusing for the noob, the IDEs have no intellisense or colouring, and there seem to be 9 ways of doing the same thing but i can't decypher what are "best practices" quite yet. So it's certainly difficult but not because of the language but because of a lack of any kind of official support, tutorials or books.

Edit: just saw your Basic lib, and RichMasm. So I guess the IDE can have colours after all and some quick help. The lib also looks great, and I think I'll use it. Thanks.

News:

Dynamically creating and loading ASM DLLs during runtime from .Net