Hello,
Is there a method (in masm 32 bits) to make this work ?
Quote
gros equ 1 SHL 38 ; =0 only 32 bits
Only with SSE2:
include \masm32\include\masm32rt.inc
.686
.xmm
.data
MyQword dq -1
MyCt dd 38
.code
start:
movq xmm2, MyQword
movd xmm3, MyCt
psllq xmm2, xmm3
movq MyQword, xmm2
exit
end start
Using sse is a soluce.
I was thinking about a macro that place some result in a qword with [qword] [qword+4]
Quote
; shift > 32 1ui64 = 1 in 64 bits
SPFEI_FLAGCHECK equ <( 1ui64 SHL SPEI_RESERVED1) OR( 1ui64 SHL SPEI_RESERVED2)>
SPFEI_ALL_TTS_EVENTS equ < 0000FFFEh OR SPFEI_FLAGCHECK>
SPFEI_ALL_SR_EVENTS equ < 0001FFFFC00000000h OR SPFEI_FLAGCHECK>
SPFEI_ALL_EVENTS equ < 0EFFFFFFFFFFFFFFFh>
SPFEI MACRO SPEI_ord
local Value
Value equ <( 1ui64 SHL SPEI_ord ) OR SPFEI_FLAGCHECK>
EXITM <Value>
ENDM
sample:
qwordSPFEI MACRO SPEI_ord,Aqword
...
ENDM
Here are two simple macros. Usage:
shl64 MyQw, 3
shr64 MyQw, 3
include \masm32\include\masm32rt.inc
.686
.xmm
shl64 MACRO arg, ct
movq xmm0, arg
push ct
movd xmm1, dword ptr [esp]
psllq xmm0, xmm1
movq arg, xmm0
add esp, 4
ENDM
shr64 MACRO arg, ct
movq xmm0, arg
push ct
movd xmm1, dword ptr [esp]
psrlq xmm0, xmm1
movq arg, xmm0
add esp, 4
ENDM
.data
MyQw dq 256
.code
start:
print "MyQw, original = ", 9
print uqword$(MyQw), 13, 10
shl64 MyQw, 3
print "MyQw, shl3 = ", 9
print uqword$(MyQw), 13, 10
shr64 MyQw, 3
print "MyQw, shr3 = ", 9
print uqword$(MyQw), 13, 10
getkey
exit
end start
i guess if you are shifting left for example. then you can just shift the first dword normally. then the second dword test its most significant bit. if it is 1 then add 1 to the first dword. then shift that dword right as well. this method can be applied similarly for shifting right
Quote from: Slugsnack on June 11, 2009, 12:08:25 PM
i guess if you are shifting left for example. then you can just shift the first dword normally. then the second dword test its most significant bit. if it is 1 then add 1 to the first dword. then shift that dword right as well. this method can be applied similarly for shifting right
Can you post an example? Just use the code above as a skeleton.
How about:
shl64 macro arg,cnt
if cnt lt 32
mov eax,dword ptr [arg]
shld dword ptr [arg+4],eax,cnt
shl dword ptr [arg],cnt
else
sub eax,eax
xchg eax,dword ptr [arg]
shl eax,cnt-32
mov dword ptr [arg+4],eax
endif
endm
The only trouble is that you need a qword in memory. I think ToutEnMasm wants an equate.
QuoteThe only trouble is that you need a qword in memory. I think ToutEnMasm wants an equate.
I would have to guess that ToutEnMasm will have to abandon the idea of having it as an equate on a 32-bit box and settle to keep it as a qword variable in the .data section. :( :'(
Quote from: jj2007 on June 11, 2009, 12:10:38 PM
Quote from: Slugsnack on June 11, 2009, 12:08:25 PM
i guess if you are shifting left for example. then you can just shift the first dword normally. then the second dword test its most significant bit. if it is 1 then add 1 to the first dword. then shift that dword right as well. this method can be applied similarly for shifting right
Can you post an example? Just use the code above as a skeleton.
at college right now and the computers don't have masm32 installed but i imagine it might be something like this ?
mov eax, offset MyQword
mov ecx, dword ptr ds:[eax]
shl ecx, 1
mov dword ptr ds:[eax], ecx
mov ecx, dword ptr ds:[eax+4]
test ecx, 10000000h
jnz @f
mov byte ptr ds:[eax+3], 1
@@:
shl ecx, 1
mov dword ptr ds:[eax+4], ecx
that would be for big endian + unsigned. it has to be modified for little endian + 2's complement
i have a feeling i misunderstood what he wants..
gonna get back in about 20 mins and i'll recode it for little endian + 2's complement
include \masm32\include\masm32rt.inc
ShiftLeft proto :DWORD
.data?
qwNum qword ?
.code
Start:
xor ebx, ebx
invoke AllocConsole
mov edi, input("Input your qword here : ")
mov eax, a2uq(edi)
mov ecx, offset qwNum
push [eax]
pop [ecx]
push [eax+4]
pop [ecx+4]
invoke ShiftLeft, addr qwNum
print uqword$(qwNum), 13, 10
inkey
invoke FreeConsole
invoke ExitProcess, ebx
ShiftLeft proc lpqwNum:DWORD
mov eax, lpqwNum
mov ecx, [eax+4]
shl ecx, 1
mov [eax+4], ecx
mov ecx, [eax]
cmp ecx, 10000000h
jl @f
mov edx, [eax+4]
inc edx
mov [eax+4], edx
@@:
shl ecx, 1
mov [eax], ecx
ret
ShiftLeft endp
end Start
for unsigned qwords. pretty easy to change to work for signed. to make it shift more than 1 place, just call the function more times. or add a new parameter for the count.
also.. yeah i know it's coded bad. i coded it straight from how i would do it in my head for readability.
For constants:
HIDWORD = 0
LODWORD = 0
N = N AND 63
IF N LT 32
HIDWORD = VALUE SHR (32-N)
LODWORD = (VALUE SHL N) AND 0FFFFFFFFh
ELSE
HIDWORD = (VALUE SHL (N-32)) AND 0FFFFFFFFh
LODWORD = 0
ENDIF
myval label QWORD
DWORD LODWORD,HIDWORD
...but forget about full 64bit expression evaluation.
Hello,
Thanks for answer.
What i want , is just a satisfying soluce to the problem to put in the sapi sample i have posted.
Making shift only in dword need to propagate the bigger bit in the other dword
Quote
shit left of 10000000h need to pass 1 to the next dword
I have a look on the rol instruction to avoid the use of SSE but this one erase some bits.
Are you wanting an equate (EQU) or a memory qword? What context are you using this in?
Quote from: ToutEnMasm on June 12, 2009, 06:30:46 AM
Quoteshit left of 10000000h need to pass 1 to the next dword
ah, merde :lol
Quote from: sinsi on June 12, 2009, 06:43:36 AM
Quote from: ToutEnMasm on June 12, 2009, 06:30:46 AM
Quoteshit left of 10000000h need to pass 1 to the next dword
ah, merde :lol
Didn't know you speak French over there... ::)
Anyway, it can be done:
include \masm32\include\masm32rt.inc
.data
MyQw dq 0000FFFFFFFF0000h
.code
start:
print "Before: ", 9
mov eax, dword ptr MyQw
mov edx, dword ptr MyQw[4]
pushad
print xqword$(MyQw), 13, 10, "After: ", 9
popad
mov ecx, 8
.Repeat
shl edx, 1
shl eax, 1
adc edx, 0 ; yep ;-)
dec ecx
.Until Zero?
mov dword ptr MyQw, eax
mov dword ptr MyQw[4], edx
print xqword$(MyQw), 13, 10
exit
end start
:bg
Quote from: jj2007 on June 12, 2009, 06:54:31 AMDidn't know you speak French over there... ::)
Yep, we're real multicultural over here, we've got SBS.
P.S. Why the 'roll eyes'?
Ok , thinks are more clear now.
I prefered (perhaps i am false in that) the shl on to dwords than the sse soluce.
Quote
Are you wanting an equate (EQU) or a memory qword? What context are you using this in?
This is to make work speech recognition (SAPI).The sample I follow use the instruction below.
SPEI is a macro (SDK spi51.h or .sdk) who made a shift on constants of qword size.
If it is not possible to make it only in declaration,i have no choice.I take the qword in data as a soluce
Quote
cpRecoCtxt->SetInterest(SPFEI(SPEI_RECOGNITION), SPFEI(SPEI_RECOGNITION));
SPEI_RECOGNITION= 38 ; made a shl of 38
Here's a full set of choices :bg
include \masm32\include\masm32rt.inc
.686
.xmm
;; simple and non-SSE2, 29 bytes **************
shl64 MACRO arg, ct
mov eax, dword ptr arg
mov edx, dword ptr &arg&+4
shld edx, eax, ct
shl eax, ct
mov dword ptr arg, eax
mov dword ptr &arg&+4, edx
ENDM
shr64 MACRO arg, ct
mov eax, dword ptr arg
mov edx, dword ptr &arg&+4
shrd eax, edx, ct
shr edx, ct
mov dword ptr arg, eax
mov dword ptr &arg&+4, edx
ENDM
;; elegant but SSE2 only, 30 bytes **************
shl64x MACRO arg, ct
movq xmm0, arg
push ct
movd xmm1, dword ptr [esp]
psllq xmm0, xmm1
movq arg, xmm0
add esp, 4
ENDM
shr64x MACRO arg, ct
movq xmm0, arg
push ct
movd xmm1, dword ptr [esp]
psrlq xmm0, xmm1
movq arg, xmm0
add esp, 4
ENDM
;; the complicated way, non-SSE2, 37 bytes ********
shl64z MACRO arg, ct
mov ecx, ct
mov eax, dword ptr arg
mov edx, dword ptr &arg&+4
.Repeat
shl edx, 1
shl eax, 1
adc edx, 0 ; yep ;-)
dec ecx
.Until Zero?
mov dword ptr arg, eax
mov dword ptr &arg&+4, edx
ENDM
shr64z MACRO arg, ct
mov ecx, ct
mov eax, dword ptr arg
mov edx, dword ptr &arg&+4
.Repeat
shr eax, 1
shr edx, 1
jnc @F
btc eax, 31
@@: dec ecx
.Until Zero?
mov dword ptr arg, eax
mov dword ptr &arg&+4, edx
ENDM
.data
MyQw dq 0000FFFFFFFF0000h
.code
start:
print "MyQw, original = ", 9
print xqword$(MyQw), 13, 10
shl64 MyQw, 3
print "MyQw, shl3 = ", 9
print xqword$(MyQw), 13, 10
shr64 MyQw, 3
print "MyQw, shr3 = ", 9
print xqword$(MyQw), 13, 10, 10
print "MyQw, original = ", 9
print xqword$(MyQw), 13, 10
shl64x MyQw, 3
print "MyQw, shl3 = ", 9
print xqword$(MyQw), 13, 10
shr64x MyQw, 3
print "MyQw, shr3 = ", 9
print xqword$(MyQw), 13, 10, 10
print "MyQw, original = ", 9
print xqword$(MyQw), 13, 10
shl64z MyQw, 3
print "MyQw, shl3 = ", 9
print xqword$(MyQw), 13, 10
shr64z MyQw, 3
print "MyQw, shr3 = ", 9
print xqword$(MyQw), 13, 10, 10
getkey
exit
end start
marvellous :U
Hi,
Instead of;
.Repeat
shl edx, 1
shl eax, 1
adc edx, 0 ; yep ;-)
dec ecx
.Until Zero?
.Repeat
shr eax, 1
shr edx, 1
jnc @F
btc eax, 31
@@: dec ecx
.Until Zero?
Why not
.Repeat
shl eax, 1
RCL edx, 1
dec ecx
.Until Zero?
.Repeat
shr edx, 1
RCR eax, 1
dec ecx
.Until Zero?
Seems nicer to me?
Steve N.
Edit, out of order registers.
Not so marvelous and not so nicer , a little problem occure.
the SHL couldnt pass 31 shift , if 38 failed
This one work
Quote
QSHL MACRO aqword, ct
local value
IF ct GE 32
value equ ct - 31
mov eax, dword ptr aqword[0]
mov edx, dword ptr aqword[+4]
shld edx, eax, value
shl eax, value
mov dword ptr aqword[0], eax
mov dword ptr aqword[4], edx
;-------------------------
mov eax, dword ptr aqword[0]
mov edx, dword ptr aqword[+4]
shld edx, eax, 31
shl eax, 31
mov dword ptr aqword[0], eax
mov dword ptr aqword[4], edx
ELSE
mov eax, dword ptr aqword[0]
mov edx, dword ptr aqword[+4]
shld edx, eax, ct
shl eax, ct
mov dword ptr aqword[0], eax
mov dword ptr aqword[4], edx
ENDIF
ENDM
Quote from: ToutEnMasm on June 12, 2009, 02:48:23 PM
Not so marvelous and not so nicer , a little problem occure.
the SHL couldnt pass 31 shift , if 38 failed
This one work
Quote
QSHL MACRO aqword, ct
local value
IF ct GE 32
value equ ct - 31
mov eax, dword ptr aqword[0]
mov edx, dword ptr aqword[+4]
shld edx, eax, value
shl eax, value
mov dword ptr aqword[0], eax
mov dword ptr aqword[4], edx
;-------------------------
mov eax, dword ptr aqword[0]
mov edx, dword ptr aqword[+4]
shld edx, eax, 31
shl eax, 31
mov dword ptr aqword[0], eax
mov dword ptr aqword[4], edx
ELSE
mov eax, dword ptr aqword[0]
mov edx, dword ptr aqword[+4]
shld edx, eax, ct
shl eax, ct
mov dword ptr aqword[0], eax
mov dword ptr aqword[4], edx
ENDIF
ENDM
Maybe it could be shortened a bit?
IF ct GE 32
value equ ct - 31
mov edx, dword ptr aqword[0]
shl edx, value
mov dword ptr aqword[0], 0
mov dword ptr aqword[4], edx
ELSE
Quote
Maybe it could be shortened a bit?
No,i have tested it whith 1 38 shl , then 38 shr , finish = 1
Quote from: ToutEnMasm on June 12, 2009, 03:44:19 PM
Quote
Maybe it could be shortened a bit?
No,i have tested it whith 1 38 shl , then 38 shr , finish = 1
I did not doubt that your version works, but look e.g. at this part:
mov dword ptr aqword[0], eax
mov dword ptr aqword[4], edx
;-------------------------
mov eax, dword ptr aqword[0]
mov edx, dword ptr aqword[+4]
Doesn't it look, ehm, a little bit redundant? Furthermore, shifting 32+ bits out of eax means that eax must be zero. Shifting exactly 32 bits out means edx=eax :wink
:bdg
Quote
Doesn't it look, ehm, a little bit redundant?
Not a little,really redundant.I have delet this lines in the talkback sample i am uploading.
It could also be done with the FPU as follows as long as the shift does not result in an unsigned integer exceeding 63 bits.
finit ;should be done once at start of program
;to insure the full 64 bits for the mantissa
fild count ;should obviously be less than 63
fild aqword ;declared as a qword
fscale
fistp aqword ;replaces the previous value by the shifted one
fstp st ;cleanup
Hello,
Is it possible to made a shift right with the FPU ?
Quote from: raymond on June 13, 2009, 03:52:06 AM
It could also be done with the FPU as follows as long as the shift does not result in an unsigned integer exceeding 63 bits.
Which means it fails already for
aqword dq 1234567812345678h
and shiftcount>2
Quote from: ToutEnMasm on June 13, 2009, 04:57:30 AM
Is it possible to made a shift right with the FPU ?
Probably it works by inverting the number before and after the left shift.
But see above... the behaviour is very different from a normal shl.
Quote from: ToutEnMasm on June 13, 2009, 04:57:30 AM
Hello,
Is it possible to made a shift right with the FPU ?
Divide by 2?
Here is snippet using the FPU. But as mentioned above, make sure the resulting value is a qword. There is no "shift bits out" with the FPU...
MyCt = 8
MyQw4 dq 12345678h
shl64f MyQw4, MyCt
print xqword$(MyQw4), 13, 10
shr64f MyQw4, MyCt
print xqword$(MyQw4), 13, 10
...
shl64f MACRO aqword, count
push count
fild dword ptr [esp] ; should obviously be less than 63
fild aqword ; declared as a qword
mov eax, dword ptr aqword
mov edx, dword ptr aqword[4]
fscale
fistp qword ptr aqword ; replaces the previous value by the shifted one
mov eax, dword ptr aqword
mov edx, dword ptr aqword[4]
fstp st ;cleanup
pop eax
ENDM
shr64f MACRO aqword, count
fld1
fild aqword ; declared as a qword
mov eax, dword ptr aqword
mov edx, dword ptr aqword[4]
fdivr st, st(1) ; 1/x
push count
fild dword ptr [esp] ; should obviously be less than 63
fxch
fscale
fld1
fdiv st, st(1) ; 1/x
fistp qword ptr aqword ; replaces the previous value by the shifted one
mov eax, dword ptr aqword
mov edx, dword ptr aqword[4]
REPEAT 3
fstp st ; cleanup
ENDM
pop eax
ENDM
Of course! :red
change the exponent by 1 - don't need an FPU for that
Quote from: ToutEnMasm on June 13, 2009, 04:57:30 AM
Hello,
Is it possible to made a shift right with the FPU ?
Simple. One additional opcode. However, the stored integer would be rounded according to the setting of the rounding control of the FPU (if not modified, the default is to round to the nearest integer). For example, 0A2345Dh shifted right by 4 with the FPU would return 0A2346h, as compared to shifting with the shr instruction which would return 0A2345h.
finit ;should be done once at start of program
;to insure the full 64 bits for the mantissa
fild count ;should obviously be less than 63
fchs ;make the count negative
fild aqword ;declared as a qword
fscale
fistp aqword ;replaces the previous value by the shifted one
fstp st ;cleanup
Of course - I had not thought of that. Thanks, Raymond. Here are the correct macros using the FPU:
COMMENT @ Usage:
.data
MyQw4 dq 12345678h
MyCt = 8
.code
shl64f MyQw4, MyCt
print xqword$(MyQw4), 13, 10 ; immediate
shr64f MyQw4, MyCt
print xqword$(MyQw4), 13, 10
mov ecx, MyCt
shl64f MyQw4, ecx ; passed as dword
print xqword$(MyQw4), 13, 10
mov ecx, MyCt
shr64f MyQw4, cl ; passed as byte
print xqword$(MyQw4), 13, 10, 10
@
shl64f MACRO aqword, count
LOCAL oa, tmp$
oa = (opattr count) AND 127
if oa eq 36 ;; immediate
push count
else
tmp$ SUBSTR <count>, 2
ifidni tmp$, <L>
movzx eax, count
push eax
else
push count
endif
endif
fild dword ptr [esp] ; should obviously be less than 63
fild aqword ; declared as a qword
fscale
fistp qword ptr aqword ; replaces the previous value by the shifted one
fstp st ;cleanup
pop eax
ENDM
shr64f MACRO aqword, count
LOCAL oa, tmp$
oa = (opattr count) AND 127
if oa eq 36 ;; immediate
push -count
else
tmp$ SUBSTR <count>, 2
ifidni tmp$, <L>
movzx eax, count
neg eax
push eax
else
neg count
push count
endif
endif
fild dword ptr [esp] ;; should obviously be less than 63
fild aqword ;; declared as a qword
fscale
fistp qword ptr aqword ;; replaces the previous value by the shifted one
fstp st ;; cleanup
pop eax
ENDM
Nonetheless, it seems faster and easier without the FPU.
Hello,
With the fpu , a shift need to be a division (right) or a multiply (left) by a power of 2.
Divider = 1 * 2(POWER of nshift)
then
result = Number /divider
the first work is to transform the number of shift in an operator ( divider or a multiplicater).
With two shift operator = 4 that is 1 shif 2
Those give the formula for the operator .operator = 2 shift (number of shift)
This need to be verify.
Quote from: ToutEnMasm on June 14, 2009, 07:42:25 AM
With the fpu , a shift need to be a division (right) or a multiply (left) by a power of 2.
FSCALE
Scales by powers of 2 by calculating the function Y = Y * 2^X. X is the scaling factor taken from ST(1), and Y is the value to be scaled from ST. The scaled result replaces the value in ST; the scaling factor remains in ST(1). If the scaling factor is not an integer, it will be truncated toward zero before the scaling.
Nonetheless, a 64 bit
shift should be done with
shld, not with the FPU - full code attached.
; simplest version:
shl64 MyQw3, 24 ; pass qword and immediate shift count
print xqword$(MyQw3), 9, "shl64 24", 13, 10
shr64 MyQw3, 24 ; back to original value
print xqword$(MyQw3), 9, 9, "shr64 24", 13, 10, 10, "Shift 40 tests:", 13, 10
shiftLeft = 40
shiftRight = 40
print "Original value: ", 13, 10
print xqword$(MyQw1), 13, 10
mov eax, offset MyQw1 ; pass ptr to qword in a register
shl64 eax, shiftLeft
print xqword$(MyQw1), 13, 10
mov eax, offset MyQw1
shr64 eax, shiftRight
print xqword$(MyQw1), 13, 10
mov ecx, shiftLeft
shl64 MyQw2, cl ; pass qword directly, shiftcount in byte register
print xqword$(MyQw2), 13, 10
mov ecx, shiftRight
shr64 MyQw2, cl
print xqword$(MyQw2), 13, 10
shl64 offset MyQw3, shiftLeft ; pass offset qword, shiftcount as immediate
print xqword$(MyQw3), 13, 10
shr64 offset MyQw3, shiftRight
print xqword$(MyQw3), 13, 10, 10
[attachment deleted by admin]
Hello,
This one is without problem
shift 0 to 63
here 48 shift return 1 , if greater return 0
Quote
MyQw qword 0000FFFFFFFF0000h
NumberOfShit equ 48
mov eax,NumberOfShit
push eax
fild dword ptr [esp]
pop eax
fld1
fscale
fild qword ptr MyQw
fdiv st(0),st(1)
fistp qword ptr MyQw
finit
QuoteNumberOfShit equ 48
mov eax,NumberOfShit
i found a problem Yves
where's the "f" ?
Quote from: dedndave on June 15, 2009, 08:06:09 AM
QuoteNumberOfShit equ 48
mov eax,NumberOfShit
i found a problem Yves
where's the "f" ?
Oh Shift!!(but it's not the only problem...)
If you want,you can change it by NombreDeDecalages.
:P
that's ok - lol
i knew what you meant - good thing the thread title is right, though
i used to live in a small desert town named Bouse, Arizona - lol - Yves will get that
Quote from: ToutEnMasm on June 15, 2009, 07:46:50 AM
Hello,
This one is without problem
shift 0 to 63
I have tested it with other numbers and get strange results:
1234567812345678 original value
3456781234567800 shl64 MyQw1, 8
0012345678123456 fpu, ToutEnMasm
1234567812345678 original value
5678123456780000 shl64 MyQw1, 16
0000123456781234 fpu, ToutEnMasm
1234567812345678 original value
7812345678000000 shl64 MyQw1, 24
0000001234567812 fpu, ToutEnMasm
1234567812345678 original value
1234567800000000 shl64 MyQw1, 32
8000000000000000 fpu, ToutEnMasm
1234567812345678 original value
3456780000000000 shl64 MyQw1, 40
8000000000000000 fpu, ToutEnMasm
1234567812345678 original value
5678000000000000 shl64 MyQw1, 48
8000000000000000 fpu, ToutEnMasm
1234567812345678 original value
7800000000000000 shl64 MyQw1, 56
8000000000000000 fpu, ToutEnMasm
Initially, the algo does a
right shift, then it gets stuck... :(
Testbed attached, including a significant change to the shl64 and shr64 macros: You can now optionally supply a destination qword.
Quoteshl64 MyQw1, 3 ; qword, immediate counter
mov ecx, 3
shl64 MyQw1, cl ; qword, reg8 counter
mov edx, offset MyQw1 ; ptr to src in edx
shl64 edx, 3 ; ptr, imm8 counter
mov eax, offset MyQw1 ; ptr to src in eax
mov ecx, 3
shl64 eax, cl ; ptr, reg8 counter
shl64 MyQw1, 3, MyQwRes ; result to MyQwRes
mov eax, offset MyQw1 ; ptr to src in eax
mov ecx, 3
shl64 eax, cl, MyQwRes ; result to MyQwRes
EDIT: shl in example replaced with shr, fpu stack saved - see new attachment
[attachment deleted by admin]
Bad test.
The method couldn't failed ecxept if the FPU isn't in a normal state.Perhaps a finit missing before or other thing like bad print.
I have not the same results as you.
Try also to compare a right shift with a right shift,this will be more clear.
Quote from: ToutEnMasm on June 15, 2009, 03:16:51 PM
Bad test.
The method couldn't failed ecxept if the FPU isn't in a normal state.Perhaps a finit missing before or other thing like bad print.
I have not the same results as you.
Try also to compare a right shift with a right shift,this will be more clear.
OK, I replaced shl64 with shr64, see attachment above. Your code still fails, simply because it trashes the FPU. You have to clean up:
fistp qword ptr MyQwRes
fstp st
fstp st
With this little change, it works correctly, but it needs 32 instead of 25 bytes and is a lot slower, too.
Testing qword shift macros:
1234567812345678 original value
0012345678123456 shr64 MyQw1, 8, MyQwRes
0012345678123456 fpu, ToutEnMasm
1234567812345678 original value
0000123456781234 shr64 MyQw1, 16, MyQwRes
0000123456781234 fpu, ToutEnMasm
1234567812345678 original value
0000001234567812 shr64 MyQw1, 24, MyQwRes
0000001234567812 fpu, ToutEnMasm
1234567812345678 original value
0000000012345678 shr64 MyQw1, 32, MyQwRes
0000000012345678 fpu, ToutEnMasm
1234567812345678 original value
0000000000123456 shr64 MyQw1, 40, MyQwRes
0000000000123456 fpu, ToutEnMasm
1234567812345678 original value
0000000000001234 shr64 MyQw1, 48, MyQwRes
0000000000001234 fpu, ToutEnMasm
1234567812345678 original value
0000000000000012 shr64 MyQw1, 56, MyQwRes
0000000000000012 fpu, ToutEnMasm
Code sizes:
fpu 32
alu 25
Sorry , it is your code who is responsible .He leave the fpu in an undeterminate state.Add FINIT before the call of my routine,and all wil be ok,except for yours.The end of my routine is a finit,could'nt trashes anyting,surely a bad copy.
Quote from: ToutEnMasm on June 15, 2009, 04:39:39 PM
Sorry , it is your code who is responsible .He leave the fpu in an undeterminate state.Add FINIT before the call of my routine,and all wil be ok,except for yours.The end of my routine is a finit,could'nt trashes anyting,surely a bad copy.
When you launch a program, you get a fresh FPU. You
can use finit to initialise it, but it is not necessary.
Your code leaves two registers (ST0, ST1) with valid entries on the stack. When you call it a seond time, 4 registers are left valid. After four calls, it fails because ST0 is no longer in a free state.
EDIT: More precisely, in call #4, fild qword ptr MyQw1 fails because ST7 is not empty. You can watch that in OllyDbg (http://www.ollydbg.de/version2.html).
Of course, you can use finit for each call, but first, it is horribly slow, and second, other code parts might use the FPU, too, so it is not good programming practice to trash it with a low level instruction such as a shift.
EDIT: Just to give you an idea of the difference in speed:
20 cycles for 10*shr64
584 cycles for 10*FPU, fstp*2
1089 cycles for 10*FPU, finit
I love the FPU, but it's just not a good idea to use it for a 64-byte shift...
Seems you need to review it.
Quote
MyQw qword 0000FFFFFFFF0000h
NumberOfShit equ 48
mov eax,NumberOfShit
push eax
fild dword ptr [esp]
pop eax
fld1
fscale
fild qword ptr MyQw
fdiv st(0),st(1)
fistp qword ptr MyQw
finit
The finit is here to leave the fpu in is original state.The routine can be repeat without problem
Quote
Your code leaves two registers (ST0, ST1) with valid entries on the stack. When you call it a seond time, 4 registers are left valid. After four calls, it fails because ST0 is no longer in a free state.
NOT TRUE
Hey jj, quit worrying so much about the absolute fastest speed. You are obsessed, and you expect everyone else to be the same way. I don't worry about code speed unless I have to.
Using finit is a good idea (most of the time).
Quote from: Greg on June 15, 2009, 06:41:25 PM
Hey jj, quit worrying so much about the absolute fastest speed. You are obsessed, and you expect everyone else to be the same way. I don't worry about code speed unless I have to.
Using finit is a good idea (most of the time).
Greg, I may seem obsessed but where is the point in choosing an algo that is
20%50% longer and a factor 55 slower? Sure you can use Visual Basic, too, but after all, why are we here, in an assembler forum??
> The finit is here to leave the fpu in is original state.
Noobs might actually believe this, therefore I correct it:
finit initialises the FPU, i.e. everything that happened to be in the eight ST registers is
gone. That has nothing to do with "original state".
i think it is always good to know the fastest way to do things
that is one of the things that makes writing in assembler unique from all other languages
we all understand that not every routine has to be written for speed
but, when it comes time to speed up a repetitive task, the forum is a good place to search
each algo they research adds one more chapter to the reference book
QuoteWhen you launch a program, you get a fresh FPU.
TRUE
QuoteYou can use finit to initialise it, but it is not necessary.
TRUE
HOWEVER, (at least under Windows) if you don't use finit, the Precision Control is set to double-precision, i.e. 64 bits with 53 bits for the mantissa. Using finit sets the PC bits of the Control Word to extended double-precision, i.e. the full 80-bit precision.
Although finit is slow, it can be done as the first instruction at the start of the program and will be performed in parallel with the other initializing code which does not need the FPU. Then you simply keep the FPU registers clean.
Hello,
Quote
When you launch a program, you get a fresh FPU.
FALSE,the status word isn't clear and this can made false result.If you want i can post a visual sample off this.
The FINIT at the start of a program is a very good practice.
Quote from: ToutEnMasm on June 16, 2009, 08:41:30 AM
Hello,
Quote
When you launch a program, you get a fresh FPU.
FALSE,the status word isn't clear and this can made false result.If you want i can post a visual sample off this.
Go check in Olly. The status word
is clear at startup, which means 53 bits of mantissa, i.e. a slightly reduced precision. With finit, you set it to the full 64 bits precision.
Quote
The FINIT at the start of a program is a very good practice.
Yes indeed. I am glad that you gave up the idea of trashing the FPU after each and every call. I am afraid, however, that it does not solve the inherent problem of using the FPU for a 64-bit shift. It works for certain values but fails for others - irrespectively of the FPU status flag precision. Higher precision just means that you can use somewhat higher source figures.
Results below are for
full 64-bit precision, and a correctly balanced FPU stack.
Quote1234567812345678 original value 1
0123456781234567 shr64 MyQw1, 4, MyQwRes
0123456781234568 shr64TeM MyQw1, 4, MyQwRes <<<<<<<<<<<< almost OK
8FFFFFFFFFFFFFFF original value 2
08FFFFFFFFFFFFFF shr64 MyQw2, 4, MyQwRes
F900000000000000 shr64TeM MyQw2, 4, MyQwRes <<<<<<<<<<<< ????
1234567812345678 original value 1
0012345678123456 shr64 MyQw1, 8, MyQwRes
0012345678123456 shr64TeM MyQw1, 8, MyQwRes <<<<<<<<<<<< OK
8FFFFFFFFFFFFFFF original value 2
008FFFFFFFFFFFFF shr64 MyQw2, 8, MyQwRes
FF90000000000000 shr64TeM MyQw2, 8, MyQwRes <<<<<<<<<<<< ????
Code sizes:
fpu 32
alu 25
Timings (Prescott P4):
86 cycles for 10 * shl64, non-FPU
952 cycles for 10 * shl64 FPU, 2*fstp ST
13865 cycles for 10 * shl64 FPU, finit
Full code attached.
[attachment deleted by admin]
If don't want to review your test,just put a clear code hat can be clearly understanding
Quote from: ToutEnMasm on June 16, 2009, 10:08:25 AM
If don't want to review your test,just put a clear code hat can be clearly understanding
Yves, before complaining about the lack of clarity of my code, you might at least look at it: Download counter is at 1, and that was my own test download. Anyway, here is the macro of your code, in case you want to verify yourself:
shr64TeM MACRO qwarg, NumberOfShifts, destarg
LOCAL dest
ifb <destarg>
dest equ qwarg
else
dest equ destarg
endif
mov eax, NumberOfShifts
push eax
; int 3 ; set a breakpoint for Olly
fild dword ptr [esp]
pop eax
fld1
fscale
fild qword ptr qwarg
fdiv st(0),st(1)
fistp qword ptr dest
; mov eax, dword ptr dest ; check in Olly what
; mov edx, dword ptr dest[4] ; happens to your FPU
fstp st ; balance the
fstp st ; FPU stack correctly
; finit ; not a good idea
ENDM
Usage:
shr64TeM MyQw1, 40 [, MyDestinationQword]
Not very difficult to see where is the problem,the finit is put in comment at the end.
You don't show also how it's use.
A clear code is with .code start: and exitprocess
Other thing is lose of time
Quote from: ToutEnMasm on June 16, 2009, 10:39:57 AM
Not very difficult to see where is the problem,the finit is put in comment at the end.
You don't show also how it's use.
A clear code is with .code start: and exitprocess
Other thing is lose of time
You are inconsistent: Just a few posts above, you had agreed that once is enough. Which is what I did.
Suggestion: Uncomment the finit, assemble and run. Post the results here. I am really curious if your CPU produces a different result.
Quote
You are inconsistent: Just a few posts above, you had agreed that once is enough. Which is what I did.
Test my routine as it is,not how you want that it be.Seems also that you affirm manything false.
Quote
you had agreed that once is enough
Where is it ??? .The rule is , at enter in the routine,the fpu is in a normal state,at the end he must be in a normal state.This two conditions must be satisfied.
The routine i have posted have been tested seriously.I repeat use it as it is and in normal conditions.She works perfectly.
Quote from: ToutEnMasm on June 16, 2009, 12:25:07 PM
The routine i have posted have been tested seriously.I repeat use it as it is and in normal conditions.She works perfectly.
Great. So what do you get when you shift 8FFFFFFFFFFFFFFFh one nibble (=4 bits) to the right?
08FFFFFFFFFFFFFFh like for my routine, or something else?
i say that is the first thing useful you say since a while.
The qword is interpreted as a negative number and show the limits of the function.
That is something clear to say,Not ?
Quote from: ToutEnMasm on June 16, 2009, 02:11:16 PM
i say that is the first thing useful you say since a while.
The qword is interpreted as a negative number and show the limits of the function.
That is something clear to say,Not ?
OK, try again with MyQw2s dq 00fffffffffffffffh (positive)
(http://www.welcometopixelton.com/wp-content/uploads/2007/11/funny-pictures-of-cats-dot-info-011.jpg)
Seriously, lighten up gentlemen. Life's too short to waste on arguing. "Joie de vivre!"
Oh ! a little cat.
Perhaps he want to go to another post
Quote
http://www.masm32.com/board/index.php?topic=11673.0