News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Issue accessing an array on the stack

Started by JPlayer, December 11, 2006, 08:36:51 PM

Previous topic - Next topic

JPlayer

Hi. It's been ages since the last time i've posted here and it's been quite awhile since i've used assembly; however my current job requires that I go down to the assembly level to make the program as fast as possible.

I created an array on the stack (I think) however whenever I try accessing it (even the 0th element), I seg fault:
        leal    (%eax, %ecx), %edx
        sall    $4, %edx
        movapd  (%esp, %edx), %xmm2

Sorry for the at&t syntax but i'm sticking my function directly into gcc generated asm and at&t syntax is what i'm most familiar with now. %eax contains the value of j*lenA, %ecx contains the value of k, and %esp SHOULD be where X is located. My goal is to put the value at X{j*lenA + k] into xmm2 so that I can do the work with it. The sall is needed cause the array contains complex doubles. I seg fault on the movapd line. What is confusing is that this code right before it works fine:
        leal    (%ebx, %ecx), %edx
        sall    $4, %edx
        movl    8(%ebp), %edi
        movapd  (%edi, %edx), %xmm1

That tells me that it's an issue with the stack...I probably am not understanding something about it. I THOUGHT that %esp contains an address of where the top of the stack is so that (%esp) gives you the contents of the top of the stack...but this code seems to imply that that isn't true. Any help would be greatly appreciated. Thanks.

EDIT: I don't think this post is making much sense. Sorry about that, i'm in unbearable pain right now and can barely even see from the pain, let alone think clearly enough to communicate my thoughts. Here is some more info:

I "created" the array by subtracting the appropriate amount from %esp. The array should be at the top of the stack. The goal is to get both R[i*lenA + k] and X[j*lenA + k] into xmm registers so I can multiply them. R was passed into the function as an Ipp64fc* (this is just a complex double pointer) so that's why I have the "movl    8(%ebp), %edi" in the code that works. I think that's all the important info. Let me know if you need more. Thanks.
END OF EDIT

PS. Hutch, the animated emoticons when creating a topic are VERY distracting and cause my blinking cursor to go crazy for some reason. Maybe put them in a dropdown list instead?

Tedd

(Quick idea..)
Don't xmm accesses to memory have to be aligned (to 16 bytes)?? That would explain the exception - you might get lucky and the stack is aligned at the right time, but mostly it won't be.
Try 'creating' the array so its size is as big as you need it, plus the correct amount to align the new value of esp (to 16 bytes.)
No snowflake in an avalanche feels responsible.

MichaelW

I think this is a correct translation to Intel syntax:

leal    (%ebx, %ecx), %edx
sall    $4, %edx
movl    8(%ebp), %edi
movapd  (%edi, %edx), %xmm1

lea     edx, [ebx+ecx]
sal     edx, 4
mov     edi, [ebp+8]
movapd  xmm1, [edi+edx]

leal    (%eax, %ecx), %edx
sall    $4, %edx
movapd  (%esp, %edx), %xmm2

lea     edx, [ebx+ecx]
sal     edx, 4
movapd  xmm2, [esp+edx]


If the fault is an alignment problem, I think you should be able to eliminate it by substituting MOVUPD.
eschew obfuscation

JPlayer

Oh yeah. I forgot to align the stack to a 16-byte boundary (movupd worked). My code has the local variables aligned (assuming an aligned stack) but not the actual stack. I COULD use movupd but I have a feeling that movapd would be faster (is that correct?), which means I need to align my stack. Which brings me to a new question. How do I align the stack to a 16-byte boundary succesfully? I thought about ANDing the stack with -14 (1111.....11110000 in binary) after subtracting off the amount needed for local data but now i'm seg faulting for a new reason :) lol. So how can I align the stack to a 16-byte boundary and successfully restore it at the end of the function? Thanks in advance.

MichaelW

I don't know what you are doing, but if GCC is setting up the stack and putting the data you are accessing on the stack, then I think you may need to align it from the GCC end.

There may be some useful information here.
eschew obfuscation

hutch--

JPlayer ,

Quote
PS. Hutch, the animated emoticons when creating a topic are VERY distracting and cause my blinking cursor to go crazy for some reason. Maybe put them in a dropdown list instead?

Maybe not. I can access this forum using K-MELEON, Netscape, IE, Firefox and every other browser I have ever tried without any problems at all. You may have a display problem with your computer.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

JPlayer

QuoteI don't know what you are doing, but if GCC is setting up the stack and putting the data you are accessing on the stack, then I think you may need to align it from the GCC end.

Actually, GCC isn't setting up the stack in this function (however I DO tell gcc to align at a 16-byte boundary on everything it creates). Here's what i'm doing: I wrote a program in C and I have a function that is VERY slow (a runtime of about 37.7 microseconds) and my program runtime needs to be around 500 microseconds (it's a realtime application). This function is called many times and it is the primary factor holding me back from achieving my runtime goal. My runtime right now is about 1.8 milliseconds. So what i'm going to do is write a highly optimized version in assembly; please don't reply about how i'm not going to be able to pull it off because I know exactly how i'm going to pull it off and my bosses are willing to wait. I am writing every single line of this function by myself (even setting up the stack). My question now is about aligning the stack properly. I take back my -14 number, I should AND with -16 correct? How do I restore the stack afterwards though? I've tried pushing the %esp before the AND and then popping it at the end (then doing the regular stuff like adding and popping other stuff off), however it still crashed. Any help would be appreciated. Thanks in advance and thanks for the replies so far.

PS. Hutch, I wasn't really complaining about the blinking cursor (although it goes crazy on my home computer too), I was mainly referring to all those animated smileys moving nonstop...it's very distracting and annoying :) lol

MichaelW

To align to 16 I think you would need to first add 15, and then and with -16. Restoring ESP by popping it will not work if ESP does not have the value it had after the push. The normal method is to store ESP in a register. Could you perhaps solve both problems (speed and alignment) by in-lining the function?

BTW, I have never noticed any problem with the emoticons, and I have a very strong dislike of things moving on the screen (other than things that I move).
eschew obfuscation

Tedd

On the emoticon/cursor problem: I noticed the same thing when using k-meleon - the cursor flashes in time with the animation update of the emoticons (which is too fast for the cursor update.) It does the same with IE and Firefox though, but it seems a lot less noticable.
No snowflake in an avalanche feels responsible.

stanhebben

Rounding an address up to 16 bytes is done with add 15, and -16.
Rounding an address down is done with and -16.

Because ESP is a stack pointer, you'll have to round it down. (and esp, -16)

As for the emoticons, I have no complaints at all. (using firefox)

JPlayer

Hey. Thanks for the replies. I have an aligned stack now and I have aligned data on that aligned stack :). My next question is a pretty newbie question that i've forgotten over time. The function that gcc created (that i'm replacing with my function) pushed %edi, %esi, and %ebx and popped them at the end. However, I didn't include those pushes and pops in my function and it works fine. So the question is this: am I getting lucky here? Who's responsibility is it to protect the registers....the caller or the callee? I know i'm responsible for %ebp and probably %esp but am I responsible for anything else? Just a reminder, if it helps, my program is written in C and compiled with GCC (possibly Intel C++ Compiler in the future) and then i'm sticking my assembly function into the compiler generated assembly code to replace the assembly it created for this function. Thanks for the help.

Tedd

It's the callee's responsibility to preserve the values of any registers it messes with.
However, it's not always necessary (but is good practice :wink) unless the caller (or further up) uses those values - depending on the convention followed; esi|edi|ebx is commonly used.
So, you're either lucky because the code calling the function doesn't use these values (unlikely), or it's replacing them (hmmm), or gcc is magically detecting their use and saving them for you (you'll have to compile and disassemble to check what's really happening.)
No snowflake in an avalanche feels responsible.

JPlayer

Hey. Sorry for the late reply. Been out sick for a day and a half. In case you were interested, gcc IS replacing the registers with new values after I return. Which is a little odd cause i'm telling it to be as fast as possible and yet it's pushing and popping registers that it ends up overwriting anyway. I finished implementing my function and the total runtime is about 20 microseconds slower which means that my function runtime is less than a microsecond slower than the compiler-generated function and mine isn't fully optimized yet. So this looks promising :). Thanks for the help.

Tedd

I think gcc doesn't make any attempt to optimize 'across' hand-written asm. So your function will be left as is, and callers will treat it with suspicion (playing it safe, making no assumptions - hence no optimize.)
No snowflake in an avalanche feels responsible.

JPlayer

Actually, gcc has no awareness of my hand-written code. I compile the C code (even the function I rewrote) and then I just remove the function from the assembly and put in my own. The function I remove is supposed to be highly optimized and since gcc doesn't know i'm going to replace it, it should know that it's pushing and popping registers that it's going to replace after the function finishes. So it's still a little odd.