Hi,
I'm a .Net programmer with 5 years of full time coding experience and knowledge of multiple languages but I'm new to ASM. I've read some of the tutorials and i completely understand the basics, registers, memory management, and how it's all supposed to work. I also know a lot of theory of how low level code works from past experience (including coding HLSL and GPU specific code).
I've written an application which simulates Assembly programs complete with registers and basic operator instructions. This was in an attempt to mimic an existing application that dynamically created small programs and ran them via a process called genetic programming. However, now the process of "simulating" assembly is becoming too slow as my researching and development needs are increasing, so I understand i'll somehow need to take a step back, "unsimulate" and move into ASM directly.
My programs are fairly simple. They have up to 8 registers (f0 to f7) and 100 input values (v0 to v99).
The input values are a 100 size array of type Double (8 bytes).
The program takes the input values and does only the basic operations on them such as +,-,*,/,sin,cos, FPREM, bit shift, register swap, if, and ">" comparison.
An example program might be as follows (f is register, v is input value)
f0 = f0 + v23
f0 = f0 * v11
f1 = f1 + v76
f1 = f1 / f0
f0 = sin(f0)
That's it, that simple. The output of the calculation are the registers which are then read by the rest of the program (outside of assembly simulation).
Each line of the program is dynamically created during runtime, and in my currenty "simulated" assembly, each line is run via a .Net interface to return a value after operating. This means that the operators and registers/input values being operated on are selected randomly by the program which then evaluates how that program is performing. In the above example there are 5 lines but usually there are about 200 in reality. Genetic programming tries to create random programs and evolve them with a beam-search-analogue method to arrive at a program that is most capable of producing a set of output values based on the known input values. The loop for testing those values is a very tight loop, but some of my runs may need as many as several days or weeks to finish running and testing millions of programs in my current setup to find a good solution.
So I thought about doing this:
1-Create .asm file in .Net via text stream output, save to file.
2-Call the makeit.bat from .Net
3-Load the .dll file created into .Net and use the function for faster operation.
However, i tested step 2 and it takes about 300ms on my CPU (to make and link the dll). Anything below 500 milliseconds is an improvement, but i was looking for something better than 300.
My next idea was to create the .dll directly via output of a byte() array. What i mean by that is since the programs are fairly simple, i can figure out how to reconstruct the binary code directly for each operator/register combination by looking at the code. What i'm talking about is creating a .dll file on disk with 1s and 0s completely bypassing compilation. I'd have to figure out how each operator/register combination is written but i like hackish stuff so that's ok as long as it gives me the speed. I'd write a few simple dlls to see what bytes change when registers/operators are changed and added and then translate that into code. That's lower level stuff than ASM, i know, but it also leads me to:
Questions:
- 1.1- Is the idea of outputing binary arrays to disk and so creating a dll file directly possible and feasable, or does the linker do some complicated business and addressing which i am likely to find too difficult to do? Consider the simplicity of the programs i'm trying to generate.
1.2- My assumption is that adding lines of ASM code translates simply to adding more bytes to the compiled program, without messing up some complicated addressing procedure. Is this a naive assumption?
2- In .Net the addition operator "a=a+b" takes 20 CPU cycles. I am assuming under ASM it will take only 1 CPU cycle. Is this correct?
3- Can you provide me with any examples of ASM DLLs being called from any other language?
4- Are there examples of an ASM DLL being loaded and a function being called which is passed an array of 8 byte values and returns an array of 8 byte values? In other words, an example of passing an array of Double to the DLL and returning an array of Double?
5- What libs will I need to use to perform operations on double floating point variables, or can that be done in ASM without any libs? I ask because a register is 32bits, 4 bytes, and i'm looking at 8 bytes of data.
6- Finally, do you have any better/different solutions for the type of problem I'm encountering? If there is a way to use lower clock cycle count via some pre-generated assembly operator DLL which can operate on input values, it might prove to be an adequate speed-up on it's own (ie a DLL that will do a=a+b and other operations).
Any answers or pointers will be highly appreciated, thanks.
Quote from: braincell on January 03, 2012, 03:37:12 PM
Hi,
...
Questions:
- 1.1- Is the idea of outputing binary arrays to disk and so creating a dll file directly possible and feasable, or does the linker do some complicated business and addressing which i am likely to find too difficult to do? Consider the simplicity of the programs i'm trying to generate.
Not very useful IMHO. Generating and then loading an DLL from disk is slow. It might also get complicated... depends on you level of knowledge and understanding.
Quote
1.2- My assumption is that adding lines of ASM code translates simply to adding more bytes to the compiled program, without messing up some complicated addressing procedure. Is this a naive assumption?
Yes, most of the time it is naive.
Quote
2- In .Net the addition operator "a=a+b" takes 20 CPU cycles. I am assuming under ASM it will take only 1 CPU cycle. Is this correct?
Yes, ADD EAX,ECX will take 1 cycle or 0.5 cycles or 0.25 cycles or even less IF you pair the instructions nicely and it depends on the CPU model.
Quote
3- Can you provide me with any examples of ASM DLLs being called from any other language?
Yes, check MASM32 examples and forum.
Quote
4- Are there examples of an ASM DLL being loaded and a function being called which is passed an array of 8 byte values and returns an array of 8 byte values? In other words, an example of passing an array of Double to the DLL and returning an array of Double?
Not exactly what you want I guess .... but it is kind of trivial.
Quote
5- What libs will I need to use to perform operations on double floating point variables, or can that be done in ASM without any libs? I ask because a register is 32bits, 4 bytes, and i'm looking at 8 bytes of data.
Yes, FPU can handle double floats and ASM instructions are available for this without libs.
However please note that the FPU code is "slow". SSE should be faster... or if possible you can design a form of 4 + 4 integer fixed point math to speed up.
Quote
6- Finally, do you have any better/different solutions for the type of problem I'm encountering? If there is a way to use lower clock cycle count via some pre-generated assembly operator DLL which can operate on input values, it might prove to be an adequate speed-up on it's own (ie a DLL that will do a=a+b and other operations).
Yes, generate the code directly in memory, Virtual protect it to allow execution, execute... much faster, more easy, more logical.
Anyway... it took this Universe 14 billion years to evolve a genetic algorithm... using speed of light operations at quantum / atomic level....
I assume a super computer...programmed by a genius in ASM ( using light CPU ) might solve this problem in approx 30 billion years at best... unfortunately by that time the Universe might be dead :D
But do not let me stop you ;)
Quote from: BogdanOntanu on January 03, 2012, 04:02:53 PM
Yes, check MASM32 examples and forum.
I just found an example with an integer array. I'm all good.
Quote
Yes, FPU can handle double floats and ASM instructions are available for this without libs.
However please note that the FPU code is "slow". SSE should be faster... or if possible you can design a form of 4 + 4 integer fixed point math to speed up.
Ok I need to find out how to use SSE. Any early linkage before I start my search wouldn't hurt.
The 4+4 was an idea i already tried with shaders once. It's quite complicated, i'm not sure i want to try it again tbh... :/
Quote
Yes, generate the code directly in memory, Virtual protect it to allow execution, execute... much faster, more easy, more logical.
I've used the .Net Reflection and the Assembly methods within it but I always assumed it would only accept MSIL assembly, not native. It accepts an array of bytes for creation too, and to do that I'd need some way of compiling, so I'm back to square 1. Disregarding those questions, maybe you meant something else? I mean, what would i have to search for to get my answers as to how i would do what you suggest (in .Net)?
Quote
Anyway... it took this Universe 14 billion years to evolve a genetic algorithm... using speed of light operations at quantum / atomic level....
I assume a super computer...programmed by a genius in ASM ( using light CPU ) might solve this problem in approx 30 billion years at best... unfortunately by that time the Universe might be dead :D
But do not let me stop you ;)
I'm not insane to be trying to evolve the universe :) i'm trying to evolve a much simpler predictive math function.
the FPU isn't too bad :P
what may be slow is moving stuff in and out of it
but, if you want to add 2 floats, it's not so bad
Is this possible in .Net or any other platform:
-Have ASM code in memory (as string or something)
-call some process which would compile that code into a DLL
-return it as byte array or whatever
-put that binary program in protected memory
-get the address to a method within that program and call it
Obviously, the ".Net" part could be a problem since i have no clue how i would even get a newly compiled array of bits/bytes to run straight from memory, or what/how the Reflection does it (or if it allows non-.Net assembly loading in the first place). Anyway, if it's possible on any other platform like c++ i'd like to know. Thanks.
It's certainly possible in pure assembler using a series of nops that you overwrite "on the fly" using VirtualProtect, as Bogdan suggested.
Quote from: braincell on January 03, 2012, 05:49:07 PM
Is this possible in .Net or any other platform:
-Have ASM code in memory (as string or something)
-call some process which would compile that code into a DLL
-return it as byte array or whatever
-put that binary program in protected memory
-get the address to a method within that program and call it
Obviously, the ".Net" part could be a problem since i have no clue how i would even get a newly compiled array of bits/bytes to run straight from memory, or what/how the Reflection does it (or if it allows non-.Net assembly loading in the first place). Anyway, if it's possible on any other platform like c++ i'd like to know. Thanks.
Doing that in .NET is going to be really complicated if it can be done at all. You can use the Reflection.Emit namespace and emit MSIL using an ILGenerator Class. Running raw assembly language code would be pretty much out of the question, how would you obtain the results in a managed environment. I don't claim to know a lot about .NET, only what I use in my projects but it doesn't sound like its going to work.
Quote from: jj2007 on January 03, 2012, 06:09:26 PM
It's certainly possible in pure assembler using a series of nops that you overwrite "on the fly" using VirtualProtect, as Bogdan suggested.
I'm not sure what nops are, but on an average/slow CPU (ie Core2Duo) how many milliseconds would it approximately take to convert the ASM code (for example the 5 lines of code mentioned in my first post) into bits and store it in memory for calling?
Quote from: donkey on January 03, 2012, 06:21:59 PM
Doing that in .NET is going to be really complicated if it can be done at all. You can use the Reflection.Emit namespace and emit MSIL using an ILGenerator Class. Running raw assembly language code would be pretty much out of the question, how would you obtain the results in a managed environment. I don't claim to know a lot about .NET, only what I use in my projects but it doesn't sound like its going to work.
Yeah...The more i read about reflection, the more i'm convinced it wouldn't work. I even thought about going into unmanaged c++ , then getting the DLL and result from there, and then interfacing it back to .Net. I'm not sure THAT would work either because - how would i compile ASM code on the fly and return results to memory in unmanaged c++ anyway? This is harder than i thought.
How about writing a skeleton unmanaged interop, like a COM object that recieves the emitted assembly executes it then returns it in methods. .NET can interface directly to unmanaged COM through the interop layer so it should be do-able, actually maybe not even that difficult if you spend enough time and the odd braincell (pun intended) writing the interface. After all COM can use pointers and has the ability to manipulate memory so it should work. What you would end up doing is instantiating the COM interface, build your machine code using the .NET application then pipe the data to the COM object where it would create the necessary memory buffer, execute it and return a result.
http://msdn.microsoft.com/en-us/library/ms973872.aspx
Hmm... That's kind of what i was going for when i said ".Net to unmanaged" in my previous post, but your idea is more specific (and also good).
What I'm unclear about though is this part:
Quote from: donkey on January 03, 2012, 06:47:19 PM
build your machine code using the .NET application then pipe the data to the COM object.
How would I build my machine code? Can I call ml.exe and link.exe and make them return the created DLL to memory instead of writing to disk?
Previously I tested compiling a dll to file and it took around 300ms for an empty DLL and that's maybe still too slow because I would have to keep recreating the evolving programs.
What I need is a way to convert ASM code to a compiled byte array and have it in memory in as little time as possible (prefferably around 50ms), so how would i do that?
Well, machine code is essentially just a string of BYTES that you can throw together, .NET doesn't need to know what they are and it would be easier to construct with a higher level language, but you could take any route to get the final byte code. For example FLD for a simple FLOAT by reference is always 0xD9, 0x05 followed by the offset of the FLOAT in memory which would be "fixed up" in the COM loader (ie offset from the beginning of the data block + address of the data block in memory). If as you said in the first post you are limiting the complexity of the machine language portion, you are probably a lot better off to just assemble the byte code yourself and bypass the assembler/linker completely. It would not be a monumental task in C# and would speed things up considerably as well as cutting external dependencies. If you want to avoid the work of writing a simple assembler, the IDebugControl (http://msdn.microsoft.com/en-us/library/ff538121%28v=vs.85%29.aspx) interface can do it for you and you have no need of ML or LINK.
I would suggest to write a DLL with a flat API interface. The DLL, which creates the code and has a basic execution/debugging enviroment, could be a modification of japheth's JWASM (http://www.japheth.de/JWasm.html).
Here are two snippets demonstrating how an executable can be "patched":
- the code that performs the calculations - we leave some nops that later can be used to insert arithmetic calculations:
include \masm32\include\masm32rt.inc
.code
start:
nop ; 90
xchg eax, ecx ; 91 to find the location, we use
xchg eax, ecx ; 91 a rare opcode combination ;-)
nop ; 90
mov eax, 123 ; 5+1 bytes
nop
mov ecx, 321 ; 5+1 bytes
nop
add ecx, 4000 ; 6 bytes 81C1 E8030000
nops 8
add eax, ecx
print str$(eax), 9
inkey "ok?"
exit
end start
- the "patcher":
Quoteinclude \masm32\MasmBasic\MasmBasic.inc ; Download the library (http://www.masm32.com/board/index.php?topic=12460)
Init
Let esi=FileRead$("GeneticCalc.exe")
mov ecx, LastFileSize
.Repeat
mov eax, [esi+ecx] ; search until we find...
dec ecx
.Until Sign? || eax==90919190h ; ...the rare opcode combination
.if Zero?
lea ecx, [esi+ecx+24]
invoke MbCopy, ecx, Chr$(0B9h, 7bh, 0,0,0), 5 ; the code we inject
FileWrite "TheCalcModified.exe", esi, LastFileSize
sub ecx, esi
Inkey "File modified at pos ", Hex$(ecx)
.else
Inkey "Magic string not found, sorry"
.endif
Exit
end start
This example is pretty simple, and too slow for practical purposes. There are a number of techniques to pass results to .net, such as pipes or memory-mapped files. 50 ms is a lot of time, don't be surprised if you can cut that down to some nanoseconds.
Quote from: donkey on January 03, 2012, 07:13:52 PM
It would not be a monumental task in C# and would speed things up considerably as well as cutting external dependencies. If you want to avoid the work of writing a simple assembler, the IDebugControl (http://msdn.microsoft.com/en-us/library/ff538121%28v=vs.85%29.aspx) interface can do it for you and you have no need of ML or LINK.
Somebody said earlier that it is naive to think it's NOT a monumental task, but maybe we misunderstood eachother. That's a good idea, or at least it's worth a shot. I'll think about it and do a bit of research on the topic.
Quote from: qWord on January 03, 2012, 07:36:42 PM
I would suggest to write a DLL with a flat API interface. The DLL, which creates the code and has a basic execution/debugging enviroment, could be a modification of japheth's JWASM (http://www.japheth.de/JWasm.html).
Hmm, thanks. I'll have a look.
Quote from: jj2007 on January 03, 2012, 07:48:09 PM
Here are two snippets demonstrating how an executable can be "patched":
Wow thanks jj. I think i'll need to get past the basic ASM tutorials to really see how it works but I'll keep this thread bookmarked.
So far ASM seems more interesting and "fun" than i thought, but also the resources for learning are all so different, people use different standards which are confusing for the noob, the IDEs have no intellisense or colouring, and there seem to be 9 ways of doing the same thing but i can't decypher what are "best practices" quite yet. So it's certainly difficult but not because of the language but because of a lack of any kind of official support, tutorials or books.
Edit: just saw your Basic lib, and RichMasm. So I guess the IDE can have colours after all and some quick help. The lib also looks great, and I think I'll use it. Thanks.
Quote from: braincell on January 03, 2012, 07:49:38 PM
Somebody said earlier that it is naive to think it's NOT a monumental task, but maybe we misunderstood eachother. That's a good idea, or at least it's worth a shot. I'll think about it and do a bit of research on the topic.
Its not monumental since you are controlling the input and grammar and restricting it. Writing a complete macro assembler with generic parsing and expression evaluation is a large undertaking, the most complex part of an assembler are the parsing mechanisms, you shouldn't need those.
I chose COM because I prefer it as an interop mechanism since its available to scripts etc... but I agree that a DLL and the flat API is a much easier route.
Quote from: BogdanOntanu on January 03, 2012, 04:02:53 PM
Quote
2- In .Net the addition operator "a=a+b" takes 20 CPU cycles. I am assuming under ASM it will take only 1 CPU cycle. Is this correct?
Yes, ADD EAX,ECX will take 1 cycle or 0.5 cycles or 0.25 cycles or even less IF you pair the instructions nicely and it depends on the CPU model.
Maybe it's a memory referenced operation and the cycles (or equivalent time) are taken up with getting a & b to and from memory.
Cache hit/miss Or a crappy compiler..
:8)
This is one of those great programming questions that you never see discused either here at the MASM forum or at the Microsoft Technical Forums (http://www.microsoft.com/communities/forums/default.mspx).
Years ago, I read, Adam Nathan's (http://blogs.msdn.com/b/adam_nathan/)"NET. and COM: The Complete Interoperability Guide", which is an extraordinary book,...and, it never once mentioned Assembly Language compatibility with the .NET Framework.
What you have suggested is intelligent, but, you have chosen the most difficult architectural configuration to implement your application's functionality. As EDGAR (donkey) has suggested, COM interop is the most accessible way to accomplish this, since the syntactical conventions already exist and are fairly well documented. You can actually write a completely functional COM interface in Assembly Language that is compliant with the COM specification, create a .NET Framework runtime-callable wrapper (RCW) for it using TlbImp.exe (.NET Interop: Get Ready for Microsoft .NET by Using Wrappers to Interact with COM-based Applications, 2001 (http://msdn.microsoft.com/en-us/magazine/cc301750.aspx)),...but, that's an awful lot of complexity to go through just to compile an executable.
I use Visual Studio myself, and have found the marshalling aspect of COM to NET components tedious and often confusing,...so, I just code in C++ and Assembly Language, which are alot more compatible.
Quote from: braincell on January 03, 2012, 07:53:53 PM
Edit: just saw your Basic lib, and RichMasm. So I guess the IDE can have colours after all and some quick help. The lib also looks great, and I think I'll use it. Thanks.
Here is a first attempt at simulating your procedure - the fmul, fadd etc are randomly changed:
Quoteinclude \masm32\MasmBasic\MasmBasic.inc ; RichMasm (http://www.masm32.com/board/index.php?topic=12460): Press F6 to assemble & link
.data
TheInputs REAL8 123.456, 11.11, 22.22, 33.33, 44.44, 55.55, 66.66, 77.77
TheOpcodeBytes db 43h, 4bh, 73h, 63h ; *)
Init
push eax ; create lpflOldProtect
invoke VirtualProtect, TheProc, 1024, PAGE_EXECUTE_READWRITE, esp
mov ecx, 20
mov esi, offset TheOpcodeBytes
Print "Testing the proc:"
Open "O", #1, "Genetic.log"
Rand()
.Repeat
movzx eax, byte ptr [esi+Rand(4)]
mov byte ptr Pos1[1], al
movzx eax, byte ptr [esi+Rand(4)]
mov byte ptr Pos2[1], al
movzx eax, byte ptr [esi+Rand(4)]
mov byte ptr Pos3[1], al
movzx eax, byte ptr [esi+Rand(4)]
mov byte ptr Pos4[1], al
call TheProc
Print Str$("\nLoop %i:\t", ecx), Str$(ST(0))
Print #1, Str$("\nLoop %i:\t", ecx), Str$(ST(0))
fstp st
dec ecx
.Until Sign?
Close #1
mov eax, [esp] ; get lpflOldProtect
invoke VirtualProtect, TheProc, 1024, eax, esp
pop eax
Inkey CrLf$, "Done"
Exit
TheProc proc
; int 3 ; activate to produce the disassembly below
mov ebx, offset TheInputs-4
Pos0::
fld REAL8 ptr [ebx+4]
Pos1::
fadd REAL8 ptr [ebx+4+8]
Pos2::
fmul REAL8 ptr [ebx+4+16]
Pos3::
fdiv REAL8 ptr [ebx+4+24]
Pos4::
fsub REAL8 ptr [ebx+4+32]
ret
TheProc endp
end start
*)
Used in TheOpcodes:
Address Hex dump Command Comments
004011C1 Ú$ CC int3
004011C2 ³. BB FC3F4000 mov ebx, 00403FFC
004011C7 ³. DD43 04 fld qword ptr [ebx+4]
004011CA ³. DC4B 0C fmul qword ptr [ebx+0C]
004011CD ³. DC43 14 fadd qword ptr [ebx+14]
004011D0 ³. DC43 1C fadd qword ptr [ebx+1C]
004011D3 ³. DC63 24 fsub qword ptr [ebx+24]
004011D6 À. C3 retn
Output:
Testing the proc:
Loop 20: 68.03300
Loop 19: 1998676.0
Loop 18: 22.22215
Loop 17: 84.92533
Loop 16: 86.25867
Loop 15: 99614.14
Loop 14: -0.6137247
Loop 13: 6473.841
Loop 12: 124.0790
Loop 11: 68.03300
Loop 10: 3700.052
Loop 9: 49.14405
Loop 8: 44.44215
Loop 7: -6.053924
Loop 6: 1338.266
Loop 5: 2574.098
Loop 4: 20.57600
Loop 3: 209.0480
Loop 2: 167.8960
Loop 1: 83247.06
Loop 0: -325.7846
Done
Source attached, including a timings switch:
437 ms for 1000000 loops
Quote from: jj2007 on January 03, 2012, 09:59:42 PM
Here is a first attempt at simulating your procedure - the fmul, fadd etc are randomly changed:
Again, i'm very very impressed. I've just spent about 30 minutes looking at it and trying to understand it (i didn't know what movzx and ptr keywords were doing, i'm a noob), but i think i get it and it looks ingenious.
How would other more complex operators work though? For example "if cflag = true : a = b".
Does an equivalent to that have an OpCode in ASM? I've seen it being used in the ASM interpreted code of the application i'm learning from, so I guess that there is one (at least they say those were all ASM operators - i just saw the results of the operators when translated to C with their translator, not any ASM code itself).
Edit: I looked through the Intel Opcodes and Mnemonics chm within MASM and I couldn't find an operator that is supposed to do that. This means some of the so-called "operators" in my code would really be two or more instructions. Could that then be handled just as easily or would the variable byte-size be a severe constraint?
@donekey, baltoro
Thanks, you guys have given me plenty to think about. Just figuring out what the best way to approach this is half the work I guess.
I'm considering simply moving a slightly larger portion of my code into pure ASM, then i won't have to interop at all - especially if what jj is saying can really be done like i imagined it.
I mean i'd simply use a ASM DLL which i've already tried and .Net can access it and get values from it, it works.
Quote from: braincell on January 03, 2012, 11:35:13 PMHow would other more complex operators work though? For example "if cflag = true : a = b".
Does an equivalent to that have an OpCode in ASM?
The assembler equivalent would be branches:
.if cflag
fadd REAL8 ptr [ebx+4+8]
.else
fsub REAL8 ptr [ebx+4+8]
.endif
In "pure" assembler, this would be the
jz and
jnz opcodes, and again it would be easy to poke the appropriate bytes.
Re DLL: If .net supports message handling, then WM_COPYDATA would be the most elegant solution. The ASM part would send it whenever it has found an algo that is closer to the desired result.
Attached a sample using WM_COPYDATA - SendData/CopyData$() in MasmBasic speak.
GeneticProcServer sends a string as follows:
QuoteSendData "NetClient", Cat$(Str$("Loop %i:\n", NumLoops-ecx)+Str$("Target= \t%i", DesiredOutput)+Str$("\nBest proxy=\t%f", BestProxy))
NetClient receives the string:
QuoteWndProc proc hWnd:HWND, uMsg:UINT, wParam:WPARAM, lParam:LPARAM
SWITCH uMsg
CASE WM_COPYDATA
SetWin$ hEdit=CopyData$
Launch NetClient.exe first, then GeneticProcServer.exe. NetClient is actually an ordinary MasmBasic/Masm32 window receiving the WM_COPYDATA message - it should not be too difficult to do the same in .net, see e.g. How to migrate SendMessage with WM_COPYDATA to .Net Framework (http://social.msdn.microsoft.com/forums/en-US/vbinterop/thread/62d0c25f-dadb-4b24-9679-f9f75717456c/)
What is still missing is to copy the "successful" proc into a buffer and disassemble it.
Quote from: jj2007 on January 04, 2012, 09:07:13 AM
What is still missing is to copy the "successful" proc into a buffer and disassemble it.
It might not be missing because i had a brainwave last night. The randomization of operations is not speed critical so i could simply pass an array with OperationCount and OperationID() to the DLL.
Then if Operation(0)=1, add. If Operation(0)=2 subtract, etc. Then do that for all the operations up to operation count.
That way i know the operation IDs (via the passed array) and can preserve the successful process by preserving the array, and reconstructing it later.
In other words, i'd simply not use the Rand() you used, but the OperationID array!
It might use a few more jumps and ifs but that's perfectly alright with me imho.
I'll look into WM_COPYDATA as well, thanks.
I'm quite optimistic i'll be able to pull all of this off without any interop stuff and make it fast enough for my needs. This is great news.
Well, try one more. NASM has a disassembler that works on snippets:
Loop 321:
Target= 99999
Best proxy= 99988.89
00000000 BB0C504000 mov ebx,0x40500c
00000005 DD4304 fld qword [ebx+0x4]
00000008 DC630C fsub qword [ebx+0xc]
0000000B DC4314 fadd qword [ebx+0x14]
0000000E DC631C fsub qword [ebx+0x1c]
00000011 DC6324 fsub qword [ebx+0x24]
00000014 83F9FF cmp ecx,byte -0x1
00000017 7405 jz 0x1e
00000019 DC432C fadd qword [ebx+0x2c]
0000001C EB03 jmp short 0x21
0000001E DC4B2C fmul qword [ebx+0x2c]
00000021 C3 ret
The original looks like this - you can see above how the code has changed:
QuoteTheProc proc
; int 3 ; activate to watch changes in Olly (http://www.ollydbg.de/version2.html)
mov ebx, offset TheInputs-4
Pos0::
fld REAL8 ptr [ebx+4]
Pos1::
fadd REAL8 ptr [ebx+4+8]
Pos2::
fmul REAL8 ptr [ebx+4+16]
Pos3::
fdiv REAL8 ptr [ebx+4+24]
Pos4::
fsub REAL8 ptr [ebx+4+32]
cmp ecx, -1 ; never zero?
Pos5::
.if Zero? ; we poke into Pos5[0]
fadd REAL8 ptr [ebx+4+40]
.else
fmul REAL8 ptr [ebx+4+40]
.endif
ret
TheProc endp
TheProc_endp:
I hope the attached archive is complete - let me know. Extract to the root of your masm32 drive with "use folder names", then launch \masm32\RichMasm\NetClient.exe
EDIT: The archive contains a modified source using macros and a structure to poke the appropriate bytes:
Quote mov cseq.cs_fmul1, CalcOP() ; replace
mov cseq.cs_fadd1, CalcOP() ; fadd, fsub etc
mov cseq.cs_fadd2, CalcOP() ; opcodes in
mov cseq.cs_fadd3, CalcOP() ; TheProc
mov cseq.cs_fmul2, CalcOP() ; with random
mov cseq.cs_fsub1, CalcOP() ; new opcodes
One interesting result is that in spite of now 6 FPU instructions and one jump, the results are often not very close to the target, and are reached after a few thousand iterations.
Yes it's complete, at least it looks that way. I will analyze it bit by bit. :)
My values are in the 0-5000 range with about two decimals, so at most 6 digits (positive and negative).
I just figured i could simply convert them to 32bit integer and skip FPU completely. I'd only need to convert the final result back to float, and that's not speed critical.
Thanks.
Quote from: braincell on January 04, 2012, 03:42:29 PM
Yes it's complete, at least it looks that way. I will analyze it bit by bit. :)
DisAs.bat needs a little adjustment (i.e., no path):
ndisasm.exe DisAsTmp.exe -b32 >DisAsTmp.asm
QuoteI just figured i could simply convert them to 32bit integer and skip FPU completely. I'd only need to convert the final result back to float, and that's not speed critical.
The FPU is not that bad, and gives you more flexibility.