Hi to all!
This is my first post and I have a problem with a masm routine called from fortran. I started from a functioning example and I modified that example: the scope was to start using SSE2 to make the fortran routine run faster.
I use Microsoft (R) Macro Assembler Version 8.00.50727.42 to compile the asm source.
The compiler says no error, and also the fortran code seems right.
Instead at run-time I obtain this error message:
forrtl: severe (157): Program Exception - access violation
I think that I'm doing something wrong in accessing parameters in the asm routine, but I don't know where is the error. This is the asm code:
.586
.MODEL FLAT
.mmx
.xmm
.CODE
PUBLIC _inprod@24
_inprod@24 PROC
SUB ESP,16 ; VALUE SUBTRACTED IS 4 * (5 - NUMBER OF PUSHES NEEDED)
PUSH EBX ; PRESERVE EBX
MOV EDX, DWORD PTR 16[ESP] ; REF TO A (FIRST ARGUMENT)
MOV EAX, DWORD PTR 20[ESP] ; REF TO B (SECOND ARGUMENT)
MOV EBX, DWORD PTR 24[ESP] ; REF TO CC (THIRD ARGUMENT)
MOV ECX, DWORD PTR 28[ESP] ; N (FOURTH ARGUMENT, PASSED BY VALUE)
MOV ESI, DWORD PTR 32[ESP] ; REF TO N3 (FIFTH ARGUMENT)
MOV EDI, DWORD PTR 36[ESP] ; REF TO N2 (SIXTH ARGUMENT)
movapd xmm0,REAL8 PTR [EDX]
movapd xmm1,[EAX]
movapd xmm2,REAL8 PTR [EDX+32]
movapd xmm3,REAL8 PTR [EAX+32]
mulpd xmm0,xmm1
movapd [EDI],xmm0
POP EBX ; RESTORE EBX THAT WAS PRESERVED ABOVE
ADD ESP,16 ; VALUE MUST CORRESPOND WITH INITIAL ESP SUBTRACTION.
RET 24 ; NUMBER OF BYTES IN THE ARGUMENTS (SAME AS DECORATION PART OF THE PROCEDURE NAME)
_inprod@24 ENDP
END
and this is the fortran .for file:
!dec$ attributes STDCALL, ALIAS:'_inprod@24' :: inprod
interface
subroutine inprod (a, b, cc, n, n3, n2)
!dec$ attributes REFERENCE :: a
real*8 a(4)
real*8 b(4)
!dec$ attributes REFERENCE :: cc
real*8 cc
integer*4 n
real*8 n3
!dec$ attributes REFERENCE :: n2
real*8 n2(2)
end subroutine
end interface
real*8 a(4)
real*8 b(4)
real*8 cc
integer*4 n
real*8 n3
real*8 n2(2)
data a /2.,2.,3.,4./
data b /1.,2.,3.,4./
cc = 0.
n = 3
n3 = 4.5
data n2 /4.,6./
call inprod (a,b,cc,n,n3,n2)
write(*,*) cc
write(*,*) n2
end
I hope that someone helps me! :U
Thanks in advance!
Try preserving 'esi' and 'edi' as well as ebx - it's the usual convention.
My first other thought would be to check that you're accessing the arguments correctly.
The other thing to check is that extra stack cleanup code isn't being inserted - it's generally done automatically when you use proc - since you're handling it yourself, that could mess you up a little.
Insert..
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
..on the line before the proc, and..
OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef
..after endp (the latter shouldn't really be needed - it just restores the default.)
On possibility is that the SIMD instructions are accessing memory operands that are not aligned on 16-byte boundaries.
Thank you all for your advices.
I've tried to modify the code with Tedd's advices, I push and pop also EDI and ESI, but I'm getting the same error...
Quote from: MichaelW on October 25, 2007, 01:16:59 PM
On possibility is that the SIMD instructions are accessing memory operands that are not aligned on 16-byte boundaries.
Reading Intel manual describing basic architecture (253665) it seems that using "aligned" SIMD when operands are not aligned (instead of unaligned SIMD), and viceversa, results only in a speed penalty...
From IA-32 Intel® Architecture Software Developer's Manual
Volume 2A: Instruction Set Reference, A-M (25366615):
MOVAPD Move Aligned Packed Double-Precision Floating-PointValues
...
Protected Mode Exceptions
#GP(0)
For an illegal memory operand effective address in the CS, DS, ES, FS or GS segments.
If a memory operand is not aligned on a 16-byte boundary, regardless of segment.
You could substitute MOVUPD, which does not have the alignment requirement, and comment out the MULPD, and see if that, in combination with Tedd's suggestions, eliminates the access violation.
I tried to change the MOVAPD instruction with MOVUPD, but the result is the same...
MULPD also requires that the memory operands be aligned, and AFAIK there is no unaligned form, and that was why I suggested that you comment it out for the test.
SUB ESP,16 ; VALUE SUBTRACTED IS 4 * (5 - NUMBER OF PUSHES NEEDED)
PUSH EBX ; PRESERVE EBX
MOV EDX, DWORD PTR 16[ESP] ; REF TO A (FIRST ARGUMENT)
I am wondering how this can possibly access your first arguement...
When your procedure is called, esp + 4 points to your first arguement, and then you subtract 16 from esp so now your first arguement is at esp + 20, then you push ebx which subtracts another 4 from esp so the first arguement is now at esp + 24
Edited to add:
If your calculations here are incorrect and the parameters are pointers, then you are pretty much guaranteed to get access violations
Quote from: MichaelW on October 26, 2007, 12:03:20 AM
MULPD also requires that the memory operands be aligned, and AFAIK there is no unaligned form, and that was why I suggested that you comment it out for the test.
Int he last post I didn't say that I also comment MULPD.
Quote from: Rockoon on October 26, 2007, 05:14:26 AM
SUB ESP,16 ; VALUE SUBTRACTED IS 4 * (5 - NUMBER OF PUSHES NEEDED)
PUSH EBX ; PRESERVE EBX
MOV EDX, DWORD PTR 16[ESP] ; REF TO A (FIRST ARGUMENT)
I am wondering how this can possibly access your first arguement...
When your procedure is called, esp + 4 points to your first arguement, and then you subtract 16 from esp so now your first arguement is at esp + 20, then you push ebx which subtracts another 4 from esp so the first arguement is now at esp + 24
Edited to add:
If your calculations here are incorrect and the parameters are pointers, then you are pretty much guaranteed to get access violations
Now I try to modify the code with your advice.
Ok! Like Rockoon saied, I was wrong when I save the address in my 32bit registers.
I was wrong also in saving the real*8 n3: I thought (I read the Compaq Fortran manual too quickly) that arguments, which type length were more then 4 bytes (n3 is 8 byte length), were passed by default by reference. This is true only if you use the default option in calling your routines. But if you use the C or STDCALL option (like in my case), the default changes to passing almost all data by value except arrays. So variable "n3" were passed by value and occupied 8 bytes in the stack.
Then I post the code that seems to go well!
First the fortran .for file (note: now I force "n3" to be passed by REFERENCE):
!dec$ attributes STDCALL, ALIAS:'_inprod@24' :: inprod
interface
subroutine inprod (a, b, cc, n, n3, n2)
!dec$ attributes REFERENCE :: a
real*8 a(4) !split.v_retta(1)
real*8 b(4) !parz_res(I_PATH).COORD(1,1)
c Note the following is necessary if a result is desired in a scalar argument,
c In this case, with the STDCALL conventions, a and b are arrays so they are
c automatically called by reference, but cc is a 4-byte scalar so would normally
c be called by value without the following line.
!dec$ attributes REFERENCE :: cc
real*8 cc !parz_res(I_PATH+1).V_INC(1)
integer*4 n !parz_res(I_PATH).N_VERT
!dec$ attributes REFERENCE :: n3
real*8 n3 !parz_res(I_PATH).COORD(1,I_FIN)
!dec$ attributes REFERENCE :: n2
real*8 n2(2) !parz_res(I_PATH).NORM_EST(1,1)
end subroutine
end interface
real*8 a(4)
real*8 b(4)
real*8 cc
integer*4 n
real*8 n3
real*8 n2(2)
data a /2.,2.,3.,4./
data b /1.,2.,3.,4./
cc = 0.
n = 3
n3 = 4.5
data n2 /4.,6./
call inprod (a,b,cc,n,n3,n2)
write(*,*) cc
write(*,*) n2
end
and then the ASM code (it seems to go well also with MOVUPD):
.586
.MODEL FLAT
.mmx
.xmm
.CODE
; Note the suffix integer on the name (decoration) must be same as number
; of bytes in the arguments.
; In this case there are 6 arguments with 4 bytes each, hence 24.
PUBLIC _inprod@24
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
_inprod@24 PROC
SUB ESP,4 ; VALUE SUBTRACTED IS 4 * (5 - NUMBER OF PUSHES NEEDED)
PUSH EBX ; PRESERVE EBX
PUSH ESI ; PRESERVE EBX
PUSH EDI ; PRESERVE EBX
PUSH ECX ; PRESERVE ECX
MOV EDX, DWORD PTR 24[ESP] ; REF TO A (FIRST ARGUMENT)
MOV EAX, DWORD PTR 28[ESP] ; REF TO B (SECOND ARGUMENT)
MOV EBX, DWORD PTR 32[ESP] ; REF TO CC (THIRD ARGUMENT)
MOV ECX, DWORD PTR 36[ESP] ; N (FOURTH ARGUMENT, PASSED BY VALUE)
MOV ESI, DWORD PTR 40[ESP] ; REF TO N3 (FIFTH ARGUMENT)
MOV EDI, DWORD PTR 44[ESP] ; REF TO N2 (SIXTH ARGUMENT)
movapd xmm0,REAL8 PTR [EDX]
movapd xmm1,REAL8 PTR [EAX]
movapd xmm2,REAL8 PTR [EDX+32]
movapd xmm3,REAL8 PTR [EAX+32]
mulpd xmm0,xmm1
movapd [EDI],xmm0
POP ECX ; RESTORE ECX THAT WAS PRESERVED ABOVE
POP EDI ; RESTORE EDI THAT WAS PRESERVED ABOVE
POP ESI ; RESTORE ESI THAT WAS PRESERVED ABOVE
POP EBX ; RESTORE EBX THAT WAS PRESERVED ABOVE
ADD ESP,4 ; VALUE MUST CORRESPOND WITH INITIAL ESP SUBTRACTION.
RET 24 ; NUMBER OF BYTES IN THE ARGUMENTS (SAME AS DECORATION PART OF THE PROCEDURE NAME)
_inprod@24 ENDP
OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef
END
Really thank you for all your advices!! :clap:
I hope that I'll have any other problems with the rest of the asm code in the routine. :U
I hope also that this topic will be usefull for other "mad boys" that try to optimize a Fortran code calling an ASM routine.
Continuing writing code I have a question about addressing mode.
For example in fortran I have a double dimensional array: real*8 b(3,40)
Memory layout of b is:
b(1,1)
b(2,1)
b(3,1)
b(1,2)
b(2,2)
b(3,2)
...
b(1,40)
b(2,40)
b(3,40)
This code doesn't compile:
- the compiler doesn't recognize jumps;
- do you think that I'm addressing data in a correct way?
mov esi,0 ; initialisation of counter I (=0) to reproduce the fortran statement DO I=1,N .... END DO
dec ecx ; value of N-1 in ecx
Loop: ; start the loop
mov eax,esi ; this represents an IF statement to know the value of a variable
inc eax
cmp si,cx
jne Endif
mov eax,0
Endif:
mov ebx,24 ; 24 => each column is composed by 3 real*8 value (8byte each one)
mul ebx
movupd xmm4,[edi+eax] ; edi contains the address of b(1,1) (eax=0 => 1st column, eax=24 => 2nd column,...)
movsd xmm5,REAL8 PTR [edi+eax+16] ; 16 => to load the third value of the column
mov eax,esi
mul ebx
movupd xmm6,[edi+eax]
movsd xmm7,REAL8 PTR [edi+eax+16]
subpd xmm6,xmm4
subsd xmm7,xmm5
movupd [ebp],xmm6
movsd REAL8 PTR 16[ebp],xmm7
inc esi
cmp esi,ecx
jne Loop
I found the errors. I used reserved name as label name.
I'm blocked in front of this run time error reguarding an access violation.
This code works well:
subpd xmm6,xmm4
subsd xmm7,xmm5
movupd [ebp],xmm6
movsd REAL8 PTR 16[ebp],xmm7
instead in this code the last "movsd" instruction creates the run time problem:
mulpd xmm2,xmm0
movupd xmm6,xmm2
movsd REAL8 PTR [edx+eax+16],xmm2
I don't understand why...
The "only" difference that for me is possibly relevant is that ebp points to the first element of a vector (a(3)) and edx points to the first element of a double dimensional array (n2(3,40)). Do you think that I have to access the double dimensional array in another way?
probably edx, eax, or both, are wrong.
..and why add 16 if you are using pointers?
Because I want to write to the third element of the I column.
For example if I=3 (4th column), eax=24(3 double precision each column)*3(value of I), so [edx+eax] would point to the first element of the 4th column (n2(1,4) in the example).
To access n2(3,4) I sum 16 to [edx+eax].
Do you think I'm wrong?
Solution: a previous MUL operation used edx also if there was no overflaw. So I rewrite the correct address in edx and there are no more access violations! :U
Now I have to write some value in the vector to test if the routine writes the result in the corrept positions!
How can I calculate the absolute value of a double precision floating stored in the low part of an XMM register? I've seen that there are FABS and PABSB/D/W instructions, but FABS acts on ST(0) and PABS only on integer values.
Do you have any suggestion?
Should just be a simple bitwise AND with 7FFFFFFFh on float32's ..
Quote from: Rockoon on October 28, 2007, 06:46:32 AM
Should just be a simple bitwise AND with 7FFFFFFFh on float32's ..
Yesterday night I thought to this simple solution, but I was no more in front of my computer to see the manual. So I can write the correct 128 bit mask in a XMM register and perform an ANDPD instruction with my data contained in another XMM register. I think that this should be the right solution, I will make a test.
Thank you one more time Rockoon!