News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Frame pointer register?? / Opt. Advice/Opinions

Started by robione, December 31, 2009, 04:11:33 PM

Previous topic - Next topic

robione

I was kinda curious as to what this is? I'm continuing my fiddling with CV and coded up a Laplace edge detection SSE/MMX version and I wanted to start debugging my logic.... unfortunately I can't even get that far. I'm getting this warning in a few places:

D:\Visual Studio Projects\edge\edge.cpp(247) : warning C4731: 'Laplace' : frame pointer register 'ebx' modified by inline assembly code

My code crashes before I enter the main loop of the function.... here (mov ecx,height)


void Laplace(BYTE *src, BYTE *dest, int width, int height) {
__declspec(align(16)) short sv25[16];
__declspec(align(16)) short svSum[20];

__asm {
mov ecx,0x00190019
pinsrw xmm6,ecx,0
pinsrw xmm6,ecx,1
movd mm6,ecx
mov edi,dest
mov ebx,width
mov ecx,height
mov eax,ebx
mov edx,ebx
shl eax,1
sub ecx,2
mov esi,src
lea edi,[edi+eax+2]
// ......


BTW if anyone wants to check out my SSE Sobel code follow the link in my sig. Got some impressive times I think :)

I was also curious about optimization advice. Laplace is sort of similar to Sobel. It uses 1 5x5 mask. So I load up 5 rows of image data of 16 pixels each in the SSE registers, unpack them, add them up. Those are the values for svSum[0-15]. I take the middle line and multiple it by 25. Then psrldq 4 half the data to get rid of the two shorts I don't need. These values will be stored at sv25[0-13]. In MMX registers I move the next 4 pixels in, and essentially do the same thing. That gives me the values for svSum[16-19] and sv25[14-15]. Once I have the values computed I write them to memory.

Then I read them back into the GPRs. So I can sum up the 1st 5 values of svSum and write that back to svSum[0]. I then loop through the array, subtracting the left most value and adding in the rightmost (of the 5x5 mask). This "shrinks" the data in the svSum array into 16 elements.

Then I load up svSum and sv25 in xmm registers, do the final subtraction and pack the data back up. I didn't post the code 'cause I thought a couple of paragraphs would be easier to read than ~150 lines of assembly. A quick primer on Sobel/Laplace can be found here: http://www.pages.drexel.edu/~weg22/edge.html about 2/3s down. I know my description of it is lacking in this post. You essentially multiply the pixel values by the corresponding value in the 5x5 mask and sum them..... The mask is all -1 ecxept the center value...24.

I just am not sure if the moving of data around between the MMX, SSE and GP registers via memory is the most efficient. I could've used less data in the SSE registers (computing 12 pixels in SSE alone instead of 16 using SSE/MMX) but doing it like I am now I can use movaps instead of movups. Once I figure out this frame pointer thing I can determine which route to take.... but in either one I still do the "shrinking" of svSum the same. Which is below..... is there a better way to do the same?


//svSum and sv25 values computed. Store them
movaps sv25,xmm1
movups [sv25+12],xmm4
movaps svSum,xmm0
movaps [svSum+16],xmm3
movq [svSum+32],mm0
movd [sv25+28],mm3

//Read svSum back :/. Compute the "real" value of the summation for 1st pixel
mov ebx,dword ptr [svSum+2]
mov edx,dword ptr [svSum+6]
movzx eax,word ptr [svSum]
add ebx,edx
mov edx,ebx
add ebx,eax
shr edx,16
add ebx,edx
mov svSum,bx

mov ecx,1
sums:
//Compute the "real" value of the summation for remaining pixels
inc ecx
sub ebx,eax
movzx eax,word ptr [svSum+ecx*2+10]
add ebx,eax
mov word ptr [svSum+ecx*2],bx
cmp ecx,16
jb sums

//Read the two arrays back for final computation
movaps xmm0,sv25
movaps xmm1,[sv25+16]
movaps xmm2,svSum
movaps xmm3,[svSum+16]


jj2007

ebx is the register that serves for local variables. You can use it if you have no local variables but you must save and restore it:

push ebx
... do stuff
pop ebx

BogdanOntanu

Quote from: jj2007 on December 31, 2009, 04:28:49 PM
ebx is the register that serves for local variables. You can use it if you have no local variables but you must save and restore it:

push ebx
... do stuff
pop ebx


I guess you wanted to write EBP not EBX ;)
Ambition is a lame excuse for the ones not brave enough to be lazy.
http://www.oby.ro

jj2007

Quote from: BogdanOntanu on December 31, 2009, 05:01:21 PM
I guess you wanted to write EBP not EBX ;)

Well, yes and no. I meant ebp, but he gets a warning for ebx. I have never used C - does it use ebx instead of ebp? In any case, robione should preserve the non-trashable registers (ebx, esi, edi, ebp).

dedndave


robione

Interestingly by inserting push ebp, push ebx and their corresponding pops on exit..... I go from 8 warnings to 10. :/

Why do I have problems with this function and none of my other assembly versions of stuff?

[Times passes......]

Ok as it turns out, if I remove the __declspec(align(16)) for my locals I no longer get warnings...... why would this be? Can't I just get wasted space in the stack that corresponds to the padding needed to align the local vars properly instead of a program that crashes?

jj2007

The easiest way to understand would be to insert an int 3 at the top and run the exe through OllyDbg.
I can't help you a lot because I don't know whether C really uses ebx as frame pointer, or ebp as in Masm. Either way you could not use simultaneously the frame pointer and local variables. Try replacing ebp/ebx with a local var - it will be slower but if the warning is valid it should no longer crash...

GregL

QuoteI have never used C - does it use ebx instead of ebp?

No, it normally uses EBP as the frame pointer.





jj2007

Quote from: Greg Lyon on January 01, 2010, 12:59:18 AM
QuoteI have never used C - does it use ebx instead of ebp?

No, it normally uses EBP as the frame pointer.


OK, so "warning C4731: 'Laplace' : frame pointer register 'ebx' modified by inline assembly code" must be a typo.

GregL

jj,

Maybe, but not necessarily. You would really need to look at the assembly listing from the compiler to see what it is doing.

Here is the MSDN info for warning C4731.


robione

Thanks guys. I did some register shuffling and am not using ebx anymore... low and behold it works fine. Thx JJ for the advice re: the variables. I can begin my "official" debugging now :)

I had looked at the msdn site before. I just didnt see how what I was doing matched what was in the article. Is it possible to use two frame pointers at the same time? Maybe I was lucky on my other functions and they were already 16-byte aligned and I escaped this problem. i.e. EBP is used "normally" where as EBX is 16-byte aligned only? Is that a possibility?

MichaelW


; Listing generated by Microsoft (R) Optimizing Compiler Version 13.10.3077

TITLE test.c
.386P
include listing.inc
if @Version gt 510
.model FLAT
else
_TEXT SEGMENT PARA USE32 PUBLIC 'CODE'
_TEXT ENDS
_DATA SEGMENT DWORD USE32 PUBLIC 'DATA'
_DATA ENDS
CONST SEGMENT DWORD USE32 PUBLIC 'CONST'
CONST ENDS
_BSS SEGMENT DWORD USE32 PUBLIC 'BSS'
_BSS ENDS
$$SYMBOLS SEGMENT BYTE USE32 'DEBSYM'
$$SYMBOLS ENDS
_TLS SEGMENT DWORD USE32 PUBLIC 'TLS'
_TLS ENDS
FLAT GROUP _DATA, CONST, _BSS
ASSUME CS: FLAT, DS: FLAT, SS: FLAT
endif

INCLUDELIB LIBC
INCLUDELIB OLDNAMES

PUBLIC _Laplace
; Function compile flags: /Odt
_TEXT SEGMENT
_svSum$ = -80 ; size = 40
_sv25$ = -32 ; size = 32
_src$ = 8 ; size = 4
_dest$ = 12 ; size = 4
_width$ = 16 ; size = 4
_height$ = 20 ; size = 4
_Laplace PROC NEAR
; File c:\program files\microsoft visual c++ toolkit 2003\my\robione\test.c
; Line 2
push ebx
mov ebx, esp
push ecx
and esp, -16 ; fffffff0H
add esp, 4
push ebp
mov ebp, esp
sub esp, 84 ; 00000054H
push ebx
push esi
push edi
; Line 6
mov ecx, 1638425 ; 00190019H
; Line 7
pinsrw xmm6, ecx, 0
; Line 8
pinsrw xmm6, ecx, 1
; Line 9
movd mm6, ecx
; Line 10
mov edi, DWORD PTR _dest$[ebx]
; Line 11
mov ebx, DWORD PTR _width$[ebx]
; Line 12
mov ecx, DWORD PTR _height$[ebx]
; Line 13
mov eax, ebx
; Line 14
mov edx, ebx
; Line 15
shl eax, 1
; Line 16
sub ecx, 2
; Line 17
mov esi, DWORD PTR _src$[ebx]
; Line 18
lea edi, DWORD PTR [edi+eax+2]
; Line 20
movaps XMMWORD PTR _sv25$[ebp], xmm1
; Line 21
movups XMMWORD PTR _sv25$[ebp+12], xmm4
; Line 22
movaps XMMWORD PTR _svSum$[ebp], xmm0
; Line 23
movaps XMMWORD PTR _svSum$[ebp+16], xmm3
; Line 24
movq MMWORD PTR _svSum$[ebp+32], mm0
; Line 25
movd DWORD PTR _sv25$[ebp+28], mm3
; Line 28
mov ebx, DWORD PTR _svSum$[ebp+2]
; Line 29
mov edx, DWORD PTR _svSum$[ebp+6]
; Line 30
movzx eax, WORD PTR _svSum$[ebp]
; Line 31
add ebx, edx
; Line 32
mov edx, ebx
; Line 33
add ebx, eax
; Line 34
shr edx, 16 ; 00000010H
; Line 35
add ebx, edx
; Line 36
mov WORD PTR _svSum$[ebp], bx
; Line 38
mov ecx, 1
$sums$74004:
; Line 41
inc ecx
; Line 42
sub ebx, eax
; Line 43
movzx eax, WORD PTR _svSum$[ebp+ecx*2+10]
; Line 44
add ebx, eax
; Line 45
mov WORD PTR _svSum$[ebp+ecx*2], bx
; Line 46
cmp ecx, 16 ; 00000010H
; Line 47
jb SHORT $sums$74004
; Line 50
movaps xmm0, XMMWORD PTR _sv25$[ebp]
; Line 51
movaps xmm1, XMMWORD PTR _sv25$[ebp+16]
; Line 52
movaps xmm2, XMMWORD PTR _svSum$[ebp]
; Line 53
movaps xmm3, XMMWORD PTR _svSum$[ebp+16]
; Line 56
pop edi
pop esi
pop ebx
mov esp, ebp
pop ebp
mov esp, ebx
pop ebx
ret 0
_Laplace ENDP
_TEXT ENDS
eschew obfuscation

jj2007


robione

I feel like I'm stuck in a time warp..... working with VC++ 98. ..... I'll have to spend the time and see if I can bring over all my libraries into VC++ '08 Express.

GregL

robione,

Since your original post asked for opinions, I would recommend that you write your functions in MASM and then link those modules into your C/C++ program. You can do a whole lot more with MASM than you can with inline assembly.  Inline assembly is pretty limited.