Excuse me got the following doubt i have two methods for do a same thing, let me show:
.If (OpSize == 1)
Add [Eax], Edx
.Else
.If (Esi == 1)
Add Word Ptr [Eax], Dx
.Else
Add Byte Ptr [Eax], Dl
.EndIf
.EndIf
Where:
OpSize == 1 , simbolise if the opcode is 32 byte.
If ESI == 1 The opcode is Word (66 prefix)
If ESI == 0 The opcode is byte
Now i can do this other method:
Mov Ebx, OpSize
Imul Ebx, Ebx, 0FFFFFFh
Imul Esi, Esi, 0FFh
Rol Esi, 8
Rol Ebx, 8
Add Esi, 0FFh
Add Ebx, Esi
Add Dword Ptr [Eax], Edx
And Dword Ptr [Eax], Ebx
Where:
OpSize == 1 (0FFFFFFFFh)
ESI == 1 (0FFFFh)
ESI == 0 (0FFh)
(Yes, there will never be OpSize == 1 and ESI == 1. Prefix 66 is ignored in this case)
So wich is faster?.
I know Imul and Rol operation uses some clocks, but avoid the branchs, cmps, and the partial regist operation would speed up things. *PROBABLY*.
Thanks :U
as you have them written, the if-then-else is probably faster
that doesn't mean it's the fastest way :bg
do EAX, EDX or ESI have to be preserved ?
what are the other possible values of ESI ? (if it is not=1)
in fact, tell us what all the values might be - lol
Quote from: dedndave on November 03, 2010, 12:57:23 AM
as you have them written, the if-then-else is probably faster
that doesn't mean it's the fastest way :bg
do EAX, EDX or ESI have to be preserved ?
what are the other possible values of ESI ? (if it is not=1)
EAX == Hold destiny of the operation opcode
EDX == Hold source of the operation opcode
ESI == Hold Prefix of operation opcode
Ofc i can use other regs, but why you ask?.
ESI could be only: 1 or 0 (66 prefix is founded or not)
Example:
Add Dword ptr ds:[Eax], 10 (To be emulated)
EAX == Addr of [EAX]
EDX == 10
ESI == 0
give us some values
what is the range of values of [EAX] and EDX
for ESI, you could use
dec esi
now, it is either all 0's or all 1' :U
DEC ESI is a single byte opcode and is quite fast
Quote from: dedndave on November 03, 2010, 01:03:41 AM
give us some values
what is the range of values of [EAX] and EDX
That depends cause i am doing an opcode emulator (intel architecture).
So i analyze other process and get the opcode, usually i check if the address is valid. Another example:
Add Word ptr ds:[100000h], 1000
EAX == [1000000h]
EDX == 1000
ESI == 1
So i kind of actually not know whats inside of 1000000h i just perform the operation.
IMO, the conditional moves are your best way to optimize things; off the top of my head, something like this (I'm doing this quick and i'm full of gin, so its a suggestion only):
mov ebx, dl
mov ecx, dx
cmp esi,1
cmovz ecx, ebx
cmp OpSize,1
cmovz ecx, edx
add [eax], ecx
On the large scale, the unnecessary moves are worth not having the branch mispredictions. Take it as you will, i'm sure there are mistakes.
-r
Quote from: dedndave on November 03, 2010, 01:05:28 AM
for ESI, you could use
dec esi
now, it is either all 0's or all 1' :U
DEC ESI is a single byte opcode and is quite fast
If ESI == 0 yes the sub esi, 1 (instead of dec esi, i think sub is faster) its a great idea but:
ESI == 1 i need it to be 0FFFFh caused it represent the WORD prefix (66).
Also i need this calculation to work with the EBX regist since:EBX == 1 i need it to be 0FFFFFFFFh (It represent 32 bit reg operation)
EBX == 0
Then i take in consideration the ESI value.All of this caused i want to avoid branchs, cmps and the partial regist emulation, so i can treath everything has a DWORD and finally AND it with the actual operation size.
Hope i havent confused you guys >.<
PS: Here a more detailed explain of the deal, since i think i am confusing everyone:
Opcodes have the Operation size bit example:
ADD BYTE PTR DS:[EAX],DL
OpcodeNumber: 00
ADD DWORD PTR DS:[EAX],EDX
OpcodeNumber: 01
Ok so the WORD at least on the win32 enviroment i am trying to emulate needs to have the 66 prefix:
ADD WORD PTR DS:[EAX],DX
PrefixOpcode: 66
OpcodeNumber: 01
You guys know all about this, so i go to the main point:
ESI == PrefixOpcode (If 66 then 1, if not then 0)
EBX == Byte Size (If 32 bit then 1, if 8 bit then 0)
EAX == Dest of the operation opcode
EDX == Source of the operation opcode
So instead of making cmps for execute the ADD instruction (or many others) with their partial regist, i wanted to avoid the branchs, cmps and do everything in 32 bit method and finally AND it with the real size of the operation.
Quote from: redskull on November 03, 2010, 01:08:46 AM
IMO, the conditional moves are your best way to optimize things; off the top of my head, something like this (I'm doing this quick and i'm full of gin, so its a suggestion only):
mov ebx, dl
mov ecx, dx
cmp esi,1
cmovz ecx, ebx
cmp OpSize,1
cmovz ecx, edx
add [eax], ecx
On the large scale, the unnecessary moves are worth not having the branch mispredictions. Take it as you will, i'm sure there are mistakes.
-r
Thx redskull i think conditional moves will work for this, but i tought they where slower than arithmetic operations like IMUL or ROL... Ill give it a try :wink
dec esi
not esi
or esi,0FFh
and edx,esi
add [eax],dx
Quote from: dedndave on November 03, 2010, 01:33:21 AM
dec esi
not esi
or esi,0FFh
and edx,esi
add [eax],dx
ESI == 0 (0FFh)
ESI == 1 (0FFFFFFFFh)
But it needs to be:
ESI == 1 (0FFFFh) (WORD)
nahhhh
the upper word of EDX isn't used
however, there might be a problem if [EAX] + DL generates a carry into the second byte
Quote from: theunknownguy on November 03, 2010, 01:41:40 AM
ESI == 0 (0FFh)
ESI == 1 (0FFFFFFFFh)
But it needs to be:
ESI == 1 (0FFFFh) (WORD)
Strange rules, anyway :green2
Quote from: dedndave on November 03, 2010, 01:44:53 AM
nahhhh
the upper word of EDX isn't used
however, there might be a problem if [EAX] + DL generates a carry into the second byte
What happen then when EDX have a 32 bit source?
Example:
Add Dword Ptr ds:[Eax], 10000000 (Trying to emulate)
EDX == 100000000
Also the idea was to avoid the partial regist emulation, since using 32 bit is faster than using 16 or 8...
Quote from: Antariy on November 03, 2010, 01:45:59 AM
Quote from: theunknownguy on November 03, 2010, 01:41:40 AM
ESI == 0 (0FFh)
ESI == 1 (0FFFFFFFFh)
But it needs to be:
ESI == 1 (0FFFFh) (WORD)
Strange rules, anyway :green2
Oh nvm your example that you deleted give me:
ESI == 1 (0FFFFh) (ok cool)
ESI == 0 (0) (wrong, i need it to be 0FFh)
LoL...
Quote from: theunknownguy on November 03, 2010, 01:59:39 AM
neg esi
shl esi, 16
rol esi, 16
If ESI is 0 - you will get an AND EAX,0 => 0
If ESI is 1 - you will get an AND EAX,FFFF => anyway loss of hight part
Quote from: Antariy on November 03, 2010, 02:03:23 AM
Quote from: theunknownguy on November 03, 2010, 01:59:39 AM
neg esi
shl esi, 16
rol esi, 16
If ESI is 0 - you will get an AND EAX,0 => 0
If ESI is 1 - you will get an AND EAX,FFFF => anyway loss of hight part
Yes, and i need:
If ESI is 0 - the operation is 8 bit (so in the end i need to AND with 0FFh)
If ESI is 1 - the operation is 16 bit (so in the end i need to AND with 0FFFFh)
Please just try to understand that i am trying to emulate the following opcode:
ADD DWORD PTR DS:[ADDRESS], VALUE
And "ADD" opcode have its 8-16 & 32 bit version.
So in order to emulate it i could simply just do:
ADD BYTE PTR DS:[ADDRESS], VALUE-8BIT
ADD WORD PTR DS:[ADDRESS], VALUE-16BIT
ADD DWORD PTR DS:[ADDRESS], VALUE-32BIT
Instead of using 8 and 16 bit emulation wich ive read that is slower than 32 bit i can do:
ADD DWORD PTR DS:[ADDRESS], VALUE-32BIT
AND DWORD PTR DS:[ADDRESS], SIZE_OF_OPERATION
you are going to have troubles if [EAX] + DL > 255
seeing as we do not know the range of these two values, we have to be on the safe side....
dec esi
jz label1
add [eax],dl
jmp short label2
label1: add [eax],dx
label2:
Quote from: dedndave on November 03, 2010, 02:13:14 AM
you are going to have troubles if [EAX] + DL > 255
seeing as we do not know the range of these two values, we have to be on the safe side....
dec esi
jz label1
add [eax],dl
jmp short label2
label1: add [eax],dx
label2:
Isnt suppose the AND in the end fix that problem?
And [Eax], SIZE_OF_OPERATIONExample of emulation with this problem:
Add Al, 1 (Trying to emulate)
My EAX virtual regist == 0FFh
So when trying to do the emulation with 32 bit (for avoid partial regist and making it fastest) it would get:
EAX virtual regist == 100h
BUT with the
AND opcode in the end it would be:
EAX virtual regist == 0.
let me give you an example where it may cause trouble
the byte values at [EAX] are: 0FFh,0 (word = 00FFh)
the value in DX is 1
if we ADD [EAX],DL, the bytes will be: 0,0 (word = 0000h)
if we ADD [EAX],DX, the bytes will be: 0,1 (word = 0100h)
the if-then-else code you posted orignally avoids this problem
but, if we simply clear out DH, we have not avoided it
as i said, if we knew the range of these values, we might be able to use slicker code
Quote from: dedndave on November 03, 2010, 02:29:18 AM
let me give you an example where it may cause trouble
the byte values at [EAX] are: 0FFh,0
the value in DX is 1
if we ADD [EAX],DL, the bytes will be: 0,0
if we ADD [EAX],DX, the bytes will be: 0,1
the code you posted orignally avoids this problem
but, if we simply clear out DH, we have not avoided it
as i said, if we knew the range of these values, we might be able to use slicker code
Oh i get it, for my bad luck i cant have a range of these values, since i am just emulating those opcodes...
So the solution would be to use conditional branch? :'(
PS: I am testing this bug like crazy and still cant make it, testing with ollydbg
004010C0 TestExe.<ModuleEntryPoint> 0110 ADD DWORD PTR DS:[EAX],EDX
004010C2 2118 AND DWORD PTR DS:[EAX],EBX
Where:
[EAX] == 0FFh
EDX == 1
EBX == 0FF (In case of 8 bit) or 0FFFh (In case of 16 bit)
i would say so - it also yields the proper resultant flags
this is a sure thing - i may think of something later, though :P
dec esi
jz label1
add [eax],dl
jmp short label2
label1: add [eax],dx
label2:
Quote from: dedndave on November 03, 2010, 02:42:56 AM
i would say so - it also yields the proper resultant flags
this is a sure thing - i may think of something later, though :P
dec esi
jz label1
add [eax],dl
jmp short label2
label1: add [eax],dx
label2:
Thanks dedndave, you have remind me about partial regist like AH, CH, DH, BH...
I need to add support for those :dazzled:
PS: Also you right about the flags, the AND in the end would change them (me so fool). But i still can use PUSHF for that problem
PS2: Thx also for the bug, i finally made it, now need to code a solution.