News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Fortran + MASM (using SSE2)

Started by desmo3re, October 25, 2007, 09:22:34 AM

Previous topic - Next topic

desmo3re

Hi to all!
This is my first post and I have a problem with a masm routine called from fortran. I started from a functioning example and I modified that example: the scope was to start using SSE2 to make the fortran routine run faster.
I use Microsoft (R) Macro Assembler Version 8.00.50727.42 to compile the asm source.
The compiler says no error, and also the fortran code seems right.
Instead at run-time I obtain this error message:
forrtl: severe (157): Program Exception - access violation

I think that I'm doing something wrong in accessing parameters in the asm routine, but I don't know where is the error. This is the asm code:
         
           .586
           .MODEL FLAT
           .mmx
           .xmm
           .CODE

           PUBLIC  _inprod@24
_inprod@24 PROC

           SUB     ESP,16  ; VALUE SUBTRACTED IS 4 * (5 - NUMBER OF PUSHES NEEDED)
           PUSH    EBX    ; PRESERVE EBX
           MOV     EDX, DWORD PTR 16[ESP] ; REF TO A (FIRST ARGUMENT)
           MOV     EAX, DWORD PTR 20[ESP] ; REF TO B (SECOND ARGUMENT)
           MOV     EBX, DWORD PTR 24[ESP] ; REF TO CC (THIRD ARGUMENT)
           MOV     ECX, DWORD PTR 28[ESP] ; N (FOURTH ARGUMENT, PASSED BY VALUE)
           MOV     ESI, DWORD PTR 32[ESP] ; REF TO N3 (FIFTH ARGUMENT)
           MOV     EDI, DWORD PTR 36[ESP] ; REF TO N2 (SIXTH ARGUMENT)

movapd xmm0,REAL8 PTR [EDX]
movapd xmm1,[EAX]
movapd xmm2,REAL8 PTR [EDX+32]
movapd xmm3,REAL8 PTR [EAX+32]
mulpd xmm0,xmm1
movapd [EDI],xmm0

           POP     EBX     ; RESTORE EBX THAT WAS PRESERVED ABOVE
           ADD     ESP,16   ; VALUE MUST CORRESPOND WITH INITIAL ESP SUBTRACTION.
           RET     24      ; NUMBER OF BYTES IN THE ARGUMENTS (SAME AS DECORATION PART OF THE PROCEDURE NAME)
_inprod@24 ENDP
           END


and this is the fortran .for file:
!dec$ attributes STDCALL, ALIAS:'_inprod@24' :: inprod
        interface
subroutine inprod (a, b, cc, n, n3, n2)
!dec$ attributes REFERENCE :: a
        real*8 a(4)         
        real*8 b(4)         
!dec$ attributes REFERENCE :: cc
        real*8 cc           
        integer*4 n       
        real*8 n3                 
!dec$ attributes REFERENCE :: n2
        real*8 n2(2)       
        end subroutine
      end interface

      real*8 a(4)
      real*8 b(4)
      real*8 cc
      integer*4 n
      real*8 n3
      real*8 n2(2)
      data a /2.,2.,3.,4./
      data b /1.,2.,3.,4./
      cc = 0.
      n = 3
      n3 = 4.5
      data n2 /4.,6./
      call inprod (a,b,cc,n,n3,n2)
      write(*,*) cc
      write(*,*) n2
      end


I hope that someone helps me! :U
Thanks in advance!

Tedd

Try preserving 'esi' and 'edi' as well as ebx - it's the usual convention.

My first other thought would be to check that you're accessing the arguments correctly.

The other thing to check is that extra stack cleanup code isn't being inserted - it's generally done automatically when you use proc - since you're handling it yourself, that could mess you up a little.
Insert..
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

..on the line before the proc, and..
OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef

..after endp (the latter shouldn't really be needed - it just restores the default.)
No snowflake in an avalanche feels responsible.

MichaelW

On possibility is that the SIMD instructions are accessing memory operands that are not aligned on 16-byte boundaries.
eschew obfuscation

desmo3re

Thank you all for your advices.
I've tried to modify the code with Tedd's advices, I push and pop also EDI and ESI, but I'm getting the same error...
Quote from: MichaelW on October 25, 2007, 01:16:59 PM
On possibility is that the SIMD instructions are accessing memory operands that are not aligned on 16-byte boundaries.
Reading Intel manual describing basic architecture (253665) it seems that using "aligned" SIMD when operands are not aligned (instead of unaligned SIMD), and viceversa, results only in a speed penalty...

MichaelW

From IA-32 Intel® Architecture Software Developer's Manual
Volume 2A: Instruction Set Reference, A-M (25366615):

MOVAPD Move Aligned Packed Double-Precision Floating-PointValues
...
Protected Mode Exceptions
#GP(0)
For an illegal memory operand effective address in the CS, DS, ES, FS or GS segments.
If a memory operand is not aligned on a 16-byte boundary, regardless of segment.

You could substitute MOVUPD, which does not have the alignment requirement, and comment out the MULPD, and see if that, in combination with Tedd's suggestions, eliminates the access violation.
eschew obfuscation

desmo3re

I tried to change the MOVAPD instruction with MOVUPD, but the result is the same...

MichaelW

MULPD also requires that the memory operands be aligned, and AFAIK there is no unaligned form, and that was why I suggested that you comment it out for the test.

eschew obfuscation

Rockoon

           SUB     ESP,16  ; VALUE SUBTRACTED IS 4 * (5 - NUMBER OF PUSHES NEEDED)
           PUSH    EBX    ; PRESERVE EBX
           MOV     EDX, DWORD PTR 16[ESP] ; REF TO A (FIRST ARGUMENT)

I am wondering how this can possibly access your first arguement...

When your procedure is called, esp + 4 points to your first arguement, and then you subtract 16 from esp so now your first arguement is at esp + 20, then you push ebx which subtracts another 4 from esp so the first arguement is now at esp + 24

Edited to add:

If your calculations here are incorrect and the parameters are pointers, then you are pretty much guaranteed to get access violations



When C++ compilers can be coerced to emit rcl and rcr, I *might* consider using one.

desmo3re

Quote from: MichaelW on October 26, 2007, 12:03:20 AM
MULPD also requires that the memory operands be aligned, and AFAIK there is no unaligned form, and that was why I suggested that you comment it out for the test.


Int he last post I didn't say that I also comment MULPD.
Quote from: Rockoon on October 26, 2007, 05:14:26 AM
           SUB     ESP,16  ; VALUE SUBTRACTED IS 4 * (5 - NUMBER OF PUSHES NEEDED)
           PUSH    EBX    ; PRESERVE EBX
           MOV     EDX, DWORD PTR 16[ESP] ; REF TO A (FIRST ARGUMENT)

I am wondering how this can possibly access your first arguement...

When your procedure is called, esp + 4 points to your first arguement, and then you subtract 16 from esp so now your first arguement is at esp + 20, then you push ebx which subtracts another 4 from esp so the first arguement is now at esp + 24

Edited to add:

If your calculations here are incorrect and the parameters are pointers, then you are pretty much guaranteed to get access violations

Now I try to modify the code with your advice.

desmo3re

Ok! Like Rockoon saied, I was wrong when I save the address in my 32bit registers.
I was wrong also in saving the real*8 n3: I thought (I read the Compaq Fortran manual too quickly) that arguments, which type length were more then 4 bytes (n3 is 8 byte length), were passed by default by reference. This is true only if you use the default option in calling your routines. But if you use the C or STDCALL option (like in my case), the default changes to passing almost all data by value except arrays. So variable "n3" were passed by value and occupied 8 bytes in the stack.
Then I post the code that seems to go well!
First the fortran .for file (note: now I force "n3" to be passed by REFERENCE):
!dec$ attributes STDCALL, ALIAS:'_inprod@24' :: inprod
      interface
subroutine inprod (a, b, cc, n, n3, n2)
!dec$ attributes REFERENCE :: a
        real*8 a(4)         !split.v_retta(1)
        real*8 b(4)         !parz_res(I_PATH).COORD(1,1)
c Note the following is necessary if a result is desired in a scalar argument,
c In this case, with the STDCALL conventions, a and b are arrays so they are
c automatically called by reference, but cc is a 4-byte scalar so would normally
c be called by value without the following line.
!dec$ attributes REFERENCE :: cc
        real*8 cc           !parz_res(I_PATH+1).V_INC(1)
        integer*4 n         !parz_res(I_PATH).N_VERT
!dec$ attributes REFERENCE :: n3       
        real*8 n3           !parz_res(I_PATH).COORD(1,I_FIN)
!dec$ attributes REFERENCE :: n2
        real*8 n2(2)        !parz_res(I_PATH).NORM_EST(1,1)
        end subroutine
      end interface

      real*8 a(4)
      real*8 b(4)
      real*8 cc
      integer*4 n
      real*8 n3
      real*8 n2(2)
      data a /2.,2.,3.,4./
      data b /1.,2.,3.,4./
      cc = 0.
      n = 3
      n3 = 4.5
      data n2 /4.,6./
      call inprod (a,b,cc,n,n3,n2)
      write(*,*) cc
      write(*,*) n2
      end

and then the ASM code (it seems to go well also with MOVUPD):
           .586
           .MODEL FLAT
           .mmx
           .xmm
           .CODE
; Note the suffix integer on the name (decoration) must be same as number
; of bytes in the arguments.
; In this case there are 6 arguments with 4 bytes each, hence 24.
           PUBLIC  _inprod@24
           OPTION PROLOGUE:NONE
           OPTION EPILOGUE:NONE
_inprod@24 PROC

           SUB     ESP,4  ; VALUE SUBTRACTED IS 4 * (5 - NUMBER OF PUSHES NEEDED)
           PUSH    EBX    ; PRESERVE EBX
           PUSH    ESI    ; PRESERVE EBX
           PUSH    EDI    ; PRESERVE EBX
           PUSH    ECX    ; PRESERVE ECX
           MOV     EDX, DWORD PTR 24[ESP] ; REF TO A (FIRST ARGUMENT)
           MOV     EAX, DWORD PTR 28[ESP] ; REF TO B (SECOND ARGUMENT)
           MOV     EBX, DWORD PTR 32[ESP] ; REF TO CC (THIRD ARGUMENT)
           MOV     ECX, DWORD PTR 36[ESP] ; N (FOURTH ARGUMENT, PASSED BY VALUE)
           MOV     ESI, DWORD PTR 40[ESP] ; REF TO N3 (FIFTH ARGUMENT)
           MOV     EDI, DWORD PTR 44[ESP] ; REF TO N2 (SIXTH ARGUMENT)

movapd xmm0,REAL8 PTR [EDX]
movapd xmm1,REAL8 PTR [EAX]
movapd xmm2,REAL8 PTR [EDX+32]
movapd xmm3,REAL8 PTR [EAX+32]
mulpd xmm0,xmm1
movapd [EDI],xmm0

           POP     ECX     ; RESTORE ECX THAT WAS PRESERVED ABOVE
           POP     EDI     ; RESTORE EDI THAT WAS PRESERVED ABOVE
           POP     ESI     ; RESTORE ESI THAT WAS PRESERVED ABOVE
           POP     EBX     ; RESTORE EBX THAT WAS PRESERVED ABOVE
           ADD     ESP,4   ; VALUE MUST CORRESPOND WITH INITIAL ESP SUBTRACTION.
           RET     24      ; NUMBER OF BYTES IN THE ARGUMENTS (SAME AS DECORATION PART OF THE PROCEDURE NAME)
_inprod@24 ENDP
   OPTION PROLOGUE:PrologueDef
   OPTION EPILOGUE:EpilogueDef
           END


Really thank you for all your advices!! :clap:
I hope that I'll have any other problems with the rest of the asm code in the routine. :U
I hope also that this topic will be usefull for other "mad boys" that try to optimize a Fortran code calling an ASM routine.

desmo3re

Continuing writing code I have a question about addressing mode.
For example in fortran I have a double dimensional array:  real*8 b(3,40)
Memory layout of b is:
b(1,1)   
b(2,1)   
b(3,1)   
b(1,2)
b(2,2)
b(3,2)
...
b(1,40)
b(2,40)
b(3,40)

This code doesn't compile:
- the compiler doesn't recognize jumps;
- do you think that I'm addressing data in a correct way?

mov esi,0 ; initialisation of counter I (=0) to reproduce the fortran statement DO I=1,N  ....  END DO
dec ecx ; value of N-1 in ecx
Loop: ; start the loop
mov eax,esi ; this represents an IF statement to know the value of a variable
inc eax
cmp si,cx
jne Endif
mov eax,0
Endif:
mov ebx,24 ; 24 => each column is composed by 3 real*8 value (8byte each one)
mul ebx
movupd xmm4,[edi+eax] ; edi contains the address of b(1,1) (eax=0 => 1st column, eax=24 => 2nd column,...)
movsd xmm5,REAL8 PTR [edi+eax+16] ;  16 => to load the third value of the column
mov eax,esi
mul ebx
movupd xmm6,[edi+eax]
movsd xmm7,REAL8 PTR [edi+eax+16]
subpd xmm6,xmm4
subsd xmm7,xmm5
movupd [ebp],xmm6
movsd REAL8 PTR 16[ebp],xmm7
inc esi
cmp esi,ecx
jne Loop

desmo3re

I found the errors. I used reserved name as label name.

desmo3re

I'm blocked in front of this run time error reguarding an access violation.
This code works well:
subpd xmm6,xmm4
subsd xmm7,xmm5
movupd [ebp],xmm6
movsd REAL8 PTR 16[ebp],xmm7

instead in this code the last "movsd" instruction creates the run time problem:
mulpd xmm2,xmm0
movupd xmm6,xmm2
movsd REAL8 PTR [edx+eax+16],xmm2

I don't understand why...
The "only" difference that for me is possibly relevant is that ebp points to the first element of a vector (a(3)) and edx points to the first element of a double dimensional array (n2(3,40)). Do you think that I have to access the double dimensional array in another way?

Rockoon

probably edx, eax, or both, are wrong.

..and why add 16 if you are using pointers?
When C++ compilers can be coerced to emit rcl and rcr, I *might* consider using one.

desmo3re

Because I want to write to the third element of the I column.
For example if I=3 (4th column), eax=24(3 double precision each column)*3(value of I), so [edx+eax] would point to the first element of the 4th column (n2(1,4) in the example).
To access n2(3,4) I sum 16 to [edx+eax].
Do you think I'm wrong?