Print Page - Fortran + MASM (using SSE2)

Title: Fortran + MASM (using SSE2)
Post by: desmo3re on October 25, 2007, 09:22:34 AM

Hi to all!
This is my first post and I have a problem with a masm routine called from fortran. I started from a functioning example and I modified that example: the scope was to start using SSE2 to make the fortran routine run faster.
I use Microsoft (R) Macro Assembler Version 8.00.50727.42 to compile the asm source.
The compiler says no error, and also the fortran code seems right.
Instead at run-time I obtain this error message:
forrtl: severe (157): Program Exception - access violation

I think that I'm doing something wrong in accessing parameters in the asm routine, but I don't know where is the error. This is the asm code:

Code Select

          
           .586
           .MODEL FLAT
           .mmx
           .xmm
           .CODE

           PUBLIC  _inprod@24
_inprod@24 PROC

           SUB     ESP,16  ; VALUE SUBTRACTED IS 4 * (5 - NUMBER OF PUSHES NEEDED)
           PUSH    EBX    ; PRESERVE EBX
           MOV     EDX, DWORD PTR 16[ESP] ; REF TO A (FIRST ARGUMENT)
           MOV     EAX, DWORD PTR 20[ESP] ; REF TO B (SECOND ARGUMENT)
           MOV     EBX, DWORD PTR 24[ESP] ; REF TO CC (THIRD ARGUMENT)
           MOV     ECX, DWORD PTR 28[ESP] ; N (FOURTH ARGUMENT, PASSED BY VALUE)
           MOV     ESI, DWORD PTR 32[ESP] ; REF TO N3 (FIFTH ARGUMENT)
           MOV     EDI, DWORD PTR 36[ESP] ; REF TO N2 (SIXTH ARGUMENT)

			movapd	xmm0,REAL8 PTR [EDX]
			movapd	xmm1,[EAX]
			movapd	xmm2,REAL8 PTR [EDX+32]
			movapd	xmm3,REAL8 PTR [EAX+32]
			mulpd	xmm0,xmm1
			movapd	[EDI],xmm0

           POP     EBX     ; RESTORE EBX THAT WAS PRESERVED ABOVE
           ADD     ESP,16   ; VALUE MUST CORRESPOND WITH INITIAL ESP SUBTRACTION.
           RET     24      ; NUMBER OF BYTES IN THE ARGUMENTS (SAME AS DECORATION PART OF THE PROCEDURE NAME)
_inprod@24 ENDP
           END

and this is the fortran .for file:

Code Select

!dec$ attributes STDCALL, ALIAS:'_inprod@24' :: inprod
        interface 
	subroutine inprod (a, b, cc, n, n3, n2)
!dec$ attributes REFERENCE :: a
        real*8 a(4)         
        real*8 b(4)         
!dec$ attributes REFERENCE :: cc
        real*8 cc           
        integer*4 n        
        real*8 n3                  
!dec$ attributes REFERENCE :: n2
        real*8 n2(2)        
        end subroutine
      end interface

      real*8 a(4)
      real*8 b(4)
      real*8 cc
      integer*4 n
      real*8 n3
      real*8 n2(2)
      data a /2.,2.,3.,4./
      data b /1.,2.,3.,4./
      cc = 0.
      n = 3
      n3 = 4.5
      data n2 /4.,6./
      call inprod (a,b,cc,n,n3,n2)
      write(*,*) cc
      write(*,*) n2
      end

I hope that someone helps me! :U
Thanks in advance!

Title: Re: Fortran + MASM (using SSE2)
Post by: Tedd on October 25, 2007, 12:32:34 PM

Try preserving 'esi' and 'edi' as well as ebx - it's the usual convention.

My first other thought would be to check that you're accessing the arguments correctly.

The other thing to check is that extra stack cleanup code isn't being inserted - it's generally done automatically when you use proc - since you're handling it yourself, that could mess you up a little.
Insert..

Code Select

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

..on the line before the proc, and..

Code Select

OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef

..after endp (the latter shouldn't really be needed - it just restores the default.)

Title: Re: Fortran + MASM (using SSE2)
Post by: MichaelW on October 25, 2007, 01:16:59 PM

On possibility is that the SIMD instructions are accessing memory operands that are not aligned on 16-byte boundaries.

Title: Re: Fortran + MASM (using SSE2)
Post by: desmo3re on October 25, 2007, 02:17:02 PM

Thank you all for your advices.
I've tried to modify the code with Tedd's advices, I push and pop also EDI and ESI, but I'm getting the same error...

Quote from: MichaelW on October 25, 2007, 01:16:59 PM
On possibility is that the SIMD instructions are accessing memory operands that are not aligned on 16-byte boundaries.

Reading Intel manual describing basic architecture (253665) it seems that using "aligned" SIMD when operands are not aligned (instead of unaligned SIMD), and viceversa, results only in a speed penalty...

Title: Re: Fortran + MASM (using SSE2)
Post by: MichaelW on October 25, 2007, 02:45:29 PM

From IA-32 Intel® Architecture Software Developer's Manual
Volume 2A: Instruction Set Reference, A-M (25366615):

MOVAPD Move Aligned Packed Double-Precision Floating-PointValues
...
Protected Mode Exceptions
#GP(0)
For an illegal memory operand effective address in the CS, DS, ES, FS or GS segments.
If a memory operand is not aligned on a 16-byte boundary, regardless of segment.

You could substitute MOVUPD, which does not have the alignment requirement, and comment out the MULPD, and see if that, in combination with Tedd's suggestions, eliminates the access violation.

Title: Re: Fortran + MASM (using SSE2)
Post by: desmo3re on October 25, 2007, 11:08:24 PM

I tried to change the MOVAPD instruction with MOVUPD, but the result is the same...

Title: Re: Fortran + MASM (using SSE2)
Post by: MichaelW on October 26, 2007, 12:03:20 AM

MULPD also requires that the memory operands be aligned, and AFAIK there is no unaligned form, and that was why I suggested that you comment it out for the test.

Title: Re: Fortran + MASM (using SSE2)
Post by: Rockoon on October 26, 2007, 05:14:26 AM

SUB ESP,16 ; VALUE SUBTRACTED IS 4 * (5 - NUMBER OF PUSHES NEEDED)
PUSH EBX ; PRESERVE EBX
MOV EDX, DWORD PTR 16[ESP] ; REF TO A (FIRST ARGUMENT)

I am wondering how this can possibly access your first arguement...

When your procedure is called, esp + 4 points to your first arguement, and then you subtract 16 from esp so now your first arguement is at esp + 20, then you push ebx which subtracts another 4 from esp so the first arguement is now at esp + 24

Edited to add:

If your calculations here are incorrect and the parameters are pointers, then you are pretty much guaranteed to get access violations

Title: Re: Fortran + MASM (using SSE2)
Post by: desmo3re on October 26, 2007, 08:16:11 AM

Quote from: MichaelW on October 26, 2007, 12:03:20 AM
MULPD also requires that the memory operands be aligned, and AFAIK there is no unaligned form, and that was why I suggested that you comment it out for the test.

Int he last post I didn't say that I also comment MULPD.

Quote from: Rockoon on October 26, 2007, 05:14:26 AM
SUB ESP,16 ; VALUE SUBTRACTED IS 4 * (5 - NUMBER OF PUSHES NEEDED)
PUSH EBX ; PRESERVE EBX
MOV EDX, DWORD PTR 16[ESP] ; REF TO A (FIRST ARGUMENT)

I am wondering how this can possibly access your first arguement...

When your procedure is called, esp + 4 points to your first arguement, and then you subtract 16 from esp so now your first arguement is at esp + 20, then you push ebx which subtracts another 4 from esp so the first arguement is now at esp + 24

Edited to add:

If your calculations here are incorrect and the parameters are pointers, then you are pretty much guaranteed to get access violations

Now I try to modify the code with your advice.

Title: Re: Fortran + MASM (using SSE2)
Post by: desmo3re on October 26, 2007, 09:01:30 AM

Ok! Like Rockoon saied, I was wrong when I save the address in my 32bit registers.
I was wrong also in saving the real*8 n3: I thought (I read the Compaq Fortran manual too quickly) that arguments, which type length were more then 4 bytes (n3 is 8 byte length), were passed by default by reference. This is true only if you use the default option in calling your routines. But if you use the C or STDCALL option (like in my case), the default changes to passing almost all data by value except arrays. So variable "n3" were passed by value and occupied 8 bytes in the stack.
Then I post the code that seems to go well!
First the fortran .for file (note: now I force "n3" to be passed by REFERENCE):

Code Select

!dec$ attributes STDCALL, ALIAS:'_inprod@24' :: inprod
      interface 
	subroutine inprod (a, b, cc, n, n3, n2)
!dec$ attributes REFERENCE :: a
        real*8 a(4)         !split.v_retta(1)
        real*8 b(4)         !parz_res(I_PATH).COORD(1,1)
c Note the following is necessary if a result is desired in a scalar argument,
c In this case, with the STDCALL conventions, a and b are arrays so they are
c automatically called by reference, but cc is a 4-byte scalar so would normally
c be called by value without the following line.
!dec$ attributes REFERENCE :: cc
        real*8 cc           !parz_res(I_PATH+1).V_INC(1)
        integer*4 n         !parz_res(I_PATH).N_VERT
!dec$ attributes REFERENCE :: n3        
        real*8 n3           !parz_res(I_PATH).COORD(1,I_FIN)
!dec$ attributes REFERENCE :: n2
        real*8 n2(2)        !parz_res(I_PATH).NORM_EST(1,1)
        end subroutine
      end interface

      real*8 a(4)
      real*8 b(4)
      real*8 cc
      integer*4 n
      real*8 n3
      real*8 n2(2)
      data a /2.,2.,3.,4./
      data b /1.,2.,3.,4./
      cc = 0.
      n = 3
      n3 = 4.5
      data n2 /4.,6./
      call inprod (a,b,cc,n,n3,n2)
      write(*,*) cc
      write(*,*) n2
      end

and then the ASM code (it seems to go well also with MOVUPD):

Code Select

           .586
           .MODEL FLAT
           .mmx
           .xmm
           .CODE
; Note the suffix integer on the name (decoration) must be same as number
; of bytes in the arguments.
; In this case there are 6 arguments with 4 bytes each, hence 24.
           PUBLIC  _inprod@24
           OPTION PROLOGUE:NONE
           OPTION EPILOGUE:NONE
_inprod@24 PROC

           SUB     ESP,4  ; VALUE SUBTRACTED IS 4 * (5 - NUMBER OF PUSHES NEEDED)
           PUSH    EBX    ; PRESERVE EBX
           PUSH    ESI    ; PRESERVE EBX
           PUSH    EDI    ; PRESERVE EBX
           PUSH    ECX    ; PRESERVE ECX
           MOV     EDX, DWORD PTR 24[ESP] ; REF TO A (FIRST ARGUMENT)
           MOV     EAX, DWORD PTR 28[ESP] ; REF TO B (SECOND ARGUMENT)
           MOV     EBX, DWORD PTR 32[ESP] ; REF TO CC (THIRD ARGUMENT)
           MOV     ECX, DWORD PTR 36[ESP] ; N (FOURTH ARGUMENT, PASSED BY VALUE)
           MOV     ESI, DWORD PTR 40[ESP] ; REF TO N3 (FIFTH ARGUMENT)
           MOV     EDI, DWORD PTR 44[ESP] ; REF TO N2 (SIXTH ARGUMENT)

			movapd	xmm0,REAL8 PTR [EDX]
			movapd	xmm1,REAL8 PTR [EAX]
			movapd	xmm2,REAL8 PTR [EDX+32]
			movapd	xmm3,REAL8 PTR [EAX+32]
			mulpd	xmm0,xmm1
			movapd	[EDI],xmm0

           POP     ECX     ; RESTORE ECX THAT WAS PRESERVED ABOVE
           POP     EDI     ; RESTORE EDI THAT WAS PRESERVED ABOVE
           POP     ESI     ; RESTORE ESI THAT WAS PRESERVED ABOVE
           POP     EBX     ; RESTORE EBX THAT WAS PRESERVED ABOVE
           ADD     ESP,4   ; VALUE MUST CORRESPOND WITH INITIAL ESP SUBTRACTION.
           RET     24      ; NUMBER OF BYTES IN THE ARGUMENTS (SAME AS DECORATION PART OF THE PROCEDURE NAME)
_inprod@24 ENDP
		   OPTION PROLOGUE:PrologueDef
		   OPTION EPILOGUE:EpilogueDef
           END

Really thank you for all your advices!! :clap:
I hope that I'll have any other problems with the rest of the asm code in the routine. :U
I hope also that this topic will be usefull for other "mad boys" that try to optimize a Fortran code calling an ASM routine.

Title: Re: Fortran + MASM (using SSE2)
Post by: desmo3re on October 26, 2007, 03:51:32 PM

Continuing writing code I have a question about addressing mode.
For example in fortran I have a double dimensional array: real*8 b(3,40)
Memory layout of b is:
b(1,1)
b(2,1)
b(3,1)
b(1,2)
b(2,2)
b(3,2)
...
b(1,40)
b(2,40)
b(3,40)

This code doesn't compile:
- the compiler doesn't recognize jumps;
- do you think that I'm addressing data in a correct way?

Code Select

mov		esi,0					; initialisation of counter I (=0) to reproduce the fortran statement DO I=1,N  ....  END DO
		dec		ecx				; value of N-1 in ecx
Loop:								; start the loop
		mov		eax,esi				; this represents an IF statement to know the value of a variable
		inc		eax
		cmp 	si,cx
		jne		Endif
		mov		eax,0
Endif:	
		mov		ebx,24				; 24 => each column is composed by 3 real*8 value (8byte each one)
		mul		ebx
		movupd	xmm4,[edi+eax]			; edi contains the address of b(1,1) (eax=0 => 1st column, eax=24 => 2nd column,...)
		movsd	xmm5,REAL8 PTR [edi+eax+16]	;  16 => to load the third value of the column
		mov		eax,esi
		mul		ebx
		movupd	xmm6,[edi+eax]
		movsd	xmm7,REAL8 PTR [edi+eax+16]
		subpd	xmm6,xmm4
		subsd	xmm7,xmm5
		movupd	[ebp],xmm6
		movsd	REAL8 PTR 16[ebp],xmm7
		inc		esi
		cmp		esi,ecx
		jne		Loop

Title: Re: Fortran + MASM (using SSE2)
Post by: desmo3re on October 26, 2007, 06:19:27 PM

I found the errors. I used reserved name as label name.

Title: Re: Fortran + MASM (using SSE2)
Post by: desmo3re on October 27, 2007, 09:37:36 AM

I'm blocked in front of this run time error reguarding an access violation.
This code works well:

Code Select

subpd	xmm6,xmm4
		subsd	xmm7,xmm5
		movupd	[ebp],xmm6
		movsd	REAL8 PTR 16[ebp],xmm7

instead in this code the last "movsd" instruction creates the run time problem:

Code Select

mulpd	xmm2,xmm0
		movupd	xmm6,xmm2
		movsd	REAL8 PTR [edx+eax+16],xmm2

I don't understand why...
The "only" difference that for me is possibly relevant is that ebp points to the first element of a vector (a(3)) and edx points to the first element of a double dimensional array (n2(3,40)). Do you think that I have to access the double dimensional array in another way?

Title: Re: Fortran + MASM (using SSE2)
Post by: Rockoon on October 27, 2007, 10:08:25 AM

probably edx, eax, or both, are wrong.

..and why add 16 if you are using pointers?

Title: Re: Fortran + MASM (using SSE2)
Post by: desmo3re on October 27, 2007, 10:20:52 AM

Because I want to write to the third element of the I column.
For example if I=3 (4th column), eax=24(3 double precision each column)*3(value of I), so [edx+eax] would point to the first element of the 4th column (n2(1,4) in the example).
To access n2(3,4) I sum 16 to [edx+eax].
Do you think I'm wrong?

Title: Re: Fortran + MASM (using SSE2)
Post by: desmo3re on October 27, 2007, 10:55:55 AM

Solution: a previous MUL operation used edx also if there was no overflaw. So I rewrite the correct address in edx and there are no more access violations! :U
Now I have to write some value in the vector to test if the routine writes the result in the corrept positions!

Title: Re: Fortran + MASM (using SSE2)
Post by: desmo3re on October 27, 2007, 12:08:39 PM

How can I calculate the absolute value of a double precision floating stored in the low part of an XMM register? I've seen that there are FABS and PABSB/D/W instructions, but FABS acts on ST(0) and PABS only on integer values.
Do you have any suggestion?

Title: Re: Fortran + MASM (using SSE2)
Post by: Rockoon on October 28, 2007, 06:46:32 AM

Should just be a simple bitwise AND with 7FFFFFFFh on float32's ..

Title: Re: Fortran + MASM (using SSE2)
Post by: desmo3re on October 28, 2007, 08:50:36 AM

Quote from: Rockoon on October 28, 2007, 06:46:32 AM
Should just be a simple bitwise AND with 7FFFFFFFh on float32's ..

Yesterday night I thought to this simple solution, but I was no more in front of my computer to see the manual. So I can write the correct 128 bit mask in a XMM register and perform an ANDPD instruction with my data contained in another XMM register. I think that this should be the right solution, I will make a test.
Thank you one more time Rockoon!

The MASM Forum Archive 2004 to 2012

General Forums => The Workshop => Topic started by: desmo3re on October 25, 2007, 09:22:34 AM