Print Page - Frame pointer register?? / Opt. Advice/Opinions

Title: Frame pointer register?? / Opt. Advice/Opinions
Post by: robione on December 31, 2009, 04:11:33 PM

I was kinda curious as to what this is? I'm continuing my fiddling with CV and coded up a Laplace edge detection SSE/MMX version and I wanted to start debugging my logic.... unfortunately I can't even get that far. I'm getting this warning in a few places:

D:\Visual Studio Projects\edge\edge.cpp(247) : warning C4731: 'Laplace' : frame pointer register 'ebx' modified by inline assembly code

My code crashes before I enter the main loop of the function.... here (mov ecx,height)

Code Select


void Laplace(BYTE *src, BYTE *dest, int width, int height) {
	__declspec(align(16)) short sv25[16];
	__declspec(align(16)) short svSum[20];

	__asm {
		mov ecx,0x00190019
		pinsrw xmm6,ecx,0
		pinsrw xmm6,ecx,1
		movd mm6,ecx
		mov edi,dest
		mov ebx,width
		mov ecx,height
		mov eax,ebx
		mov edx,ebx
		shl eax,1
		sub ecx,2
		mov esi,src
		lea edi,[edi+eax+2]
// ......

BTW if anyone wants to check out my SSE Sobel code follow the link in my sig. Got some impressive times I think :)

I was also curious about optimization advice. Laplace is sort of similar to Sobel. It uses 1 5x5 mask. So I load up 5 rows of image data of 16 pixels each in the SSE registers, unpack them, add them up. Those are the values for svSum[0-15]. I take the middle line and multiple it by 25. Then psrldq 4 half the data to get rid of the two shorts I don't need. These values will be stored at sv25[0-13]. In MMX registers I move the next 4 pixels in, and essentially do the same thing. That gives me the values for svSum[16-19] and sv25[14-15]. Once I have the values computed I write them to memory.

Then I read them back into the GPRs. So I can sum up the 1st 5 values of svSum and write that back to svSum[0]. I then loop through the array, subtracting the left most value and adding in the rightmost (of the 5x5 mask). This "shrinks" the data in the svSum array into 16 elements.

Then I load up svSum and sv25 in xmm registers, do the final subtraction and pack the data back up. I didn't post the code 'cause I thought a couple of paragraphs would be easier to read than ~150 lines of assembly. A quick primer on Sobel/Laplace can be found here: http://www.pages.drexel.edu/~weg22/edge.html about 2/3s down. I know my description of it is lacking in this post. You essentially multiply the pixel values by the corresponding value in the 5x5 mask and sum them..... The mask is all -1 ecxept the center value...24.

I just am not sure if the moving of data around between the MMX, SSE and GP registers via memory is the most efficient. I could've used less data in the SSE registers (computing 12 pixels in SSE alone instead of 16 using SSE/MMX) but doing it like I am now I can use movaps instead of movups. Once I figure out this frame pointer thing I can determine which route to take.... but in either one I still do the "shrinking" of svSum the same. Which is below..... is there a better way to do the same?

Code Select


		//svSum and sv25 values computed. Store them
		movaps sv25,xmm1
		movups [sv25+12],xmm4
		movaps svSum,xmm0
		movaps [svSum+16],xmm3
		movq [svSum+32],mm0
		movd [sv25+28],mm3

		//Read svSum back :/. Compute the "real" value of the summation for 1st pixel
		mov ebx,dword ptr [svSum+2]
		mov edx,dword ptr [svSum+6]
		movzx eax,word ptr [svSum]
		add ebx,edx
		mov edx,ebx
		add ebx,eax
		shr edx,16
		add ebx,edx
		mov svSum,bx

		mov ecx,1
sums:
		//Compute the "real" value of the summation for remaining pixels
		inc ecx
		sub ebx,eax
		movzx eax,word ptr [svSum+ecx*2+10]
		add ebx,eax
		mov word ptr [svSum+ecx*2],bx 
		cmp ecx,16
		jb sums

		//Read the two arrays back for final computation
		movaps xmm0,sv25
		movaps xmm1,[sv25+16]
		movaps xmm2,svSum
		movaps xmm3,[svSum+16]

Title: Re: Frame pointer register?? / Opt. Advice/Opinions
Post by: jj2007 on December 31, 2009, 04:28:49 PM

ebx is the register that serves for local variables. You can use it if you have no local variables but you must save and restore it:

push ebx
... do stuff
pop ebx

Title: Re: Frame pointer register?? / Opt. Advice/Opinions
Post by: BogdanOntanu on December 31, 2009, 05:01:21 PM

Quote from: jj2007 on December 31, 2009, 04:28:49 PM
ebx is the register that serves for local variables. You can use it if you have no local variables but you must save and restore it:

push ebx
... do stuff
pop ebx

I guess you wanted to write EBP not EBX ;)

Title: Re: Frame pointer register?? / Opt. Advice/Opinions
Post by: jj2007 on December 31, 2009, 05:20:40 PM

Quote from: BogdanOntanu on December 31, 2009, 05:01:21 PM
I guess you wanted to write EBP not EBX ;)

Well, yes and no. I meant ebp, but he gets a warning for ebx. I have never used C - does it use ebx instead of ebp? In any case, robione should preserve the non-trashable registers (ebx, esi, edi, ebp).

Title: Re: Frame pointer register?? / Opt. Advice/Opinions
Post by: dedndave on December 31, 2009, 06:46:44 PM

and don't forget the DF :P

Title: Re: Frame pointer register?? / Opt. Advice/Opinions
Post by: robione on December 31, 2009, 09:49:58 PM

Interestingly by inserting push ebp, push ebx and their corresponding pops on exit..... I go from 8 warnings to 10. :/

Why do I have problems with this function and none of my other assembly versions of stuff?

[Times passes......]

Ok as it turns out, if I remove the __declspec(align(16)) for my locals I no longer get warnings...... why would this be? Can't I just get wasted space in the stack that corresponds to the padding needed to align the local vars properly instead of a program that crashes?

Title: Re: Frame pointer register?? / Opt. Advice/Opinions
Post by: jj2007 on December 31, 2009, 11:40:48 PM

The easiest way to understand would be to insert an int 3 at the top and run the exe through OllyDbg.
I can't help you a lot because I don't know whether C really uses ebx as frame pointer, or ebp as in Masm. Either way you could not use simultaneously the frame pointer and local variables. Try replacing ebp/ebx with a local var - it will be slower but if the warning is valid it should no longer crash...

Title: Re: Frame pointer register?? / Opt. Advice/Opinions
Post by: GregL on January 01, 2010, 12:59:18 AM

QuoteI have never used C - does it use ebx instead of ebp?

No, it normally uses EBP as the frame pointer.

Title: Re: Frame pointer register?? / Opt. Advice/Opinions
Post by: jj2007 on January 01, 2010, 01:16:41 AM

Quote from: Greg Lyon on January 01, 2010, 12:59:18 AM
QuoteI have never used C - does it use ebx instead of ebp?

No, it normally uses EBP as the frame pointer.

OK, so "warning C4731: 'Laplace' : frame pointer register 'ebx' modified by inline assembly code" must be a typo.

Title: Re: Frame pointer register?? / Opt. Advice/Opinions
Post by: GregL on January 01, 2010, 01:29:12 AM

jj,

Maybe, but not necessarily. You would really need to look at the assembly listing from the compiler to see what it is doing.

Here is the MSDN info for warning C4731 (http://msdn.microsoft.com/en-us/library/ywz8xf2a.aspx).

Title: Re: Frame pointer register?? / Opt. Advice/Opinions
Post by: robione on January 01, 2010, 02:16:50 AM

Thanks guys. I did some register shuffling and am not using ebx anymore... low and behold it works fine. Thx JJ for the advice re: the variables. I can begin my "official" debugging now :)

I had looked at the msdn site before. I just didnt see how what I was doing matched what was in the article. Is it possible to use two frame pointers at the same time? Maybe I was lucky on my other functions and they were already 16-byte aligned and I escaped this problem. i.e. EBP is used "normally" where as EBX is 16-byte aligned only? Is that a possibility?

Title: Re: Frame pointer register?? / Opt. Advice/Opinions
Post by: MichaelW on January 01, 2010, 02:55:05 AM

Code Select


; Listing generated by Microsoft (R) Optimizing Compiler Version 13.10.3077 

	TITLE	test.c
	.386P
include listing.inc
if @Version gt 510
.model FLAT
else
_TEXT	SEGMENT PARA USE32 PUBLIC 'CODE'
_TEXT	ENDS
_DATA	SEGMENT DWORD USE32 PUBLIC 'DATA'
_DATA	ENDS
CONST	SEGMENT DWORD USE32 PUBLIC 'CONST'
CONST	ENDS
_BSS	SEGMENT DWORD USE32 PUBLIC 'BSS'
_BSS	ENDS
$$SYMBOLS	SEGMENT BYTE USE32 'DEBSYM'
$$SYMBOLS	ENDS
_TLS	SEGMENT DWORD USE32 PUBLIC 'TLS'
_TLS	ENDS
FLAT	GROUP _DATA, CONST, _BSS
	ASSUME	CS: FLAT, DS: FLAT, SS: FLAT
endif

INCLUDELIB LIBC
INCLUDELIB OLDNAMES

PUBLIC	_Laplace
; Function compile flags: /Odt
_TEXT	SEGMENT
_svSum$ = -80						; size = 40
_sv25$ = -32						; size = 32
_src$ = 8						; size = 4
_dest$ = 12						; size = 4
_width$ = 16						; size = 4
_height$ = 20						; size = 4
_Laplace PROC NEAR
; File c:\program files\microsoft visual c++ toolkit 2003\my\robione\test.c
; Line 2
	push	ebx
	mov	ebx, esp
	push	ecx
	and	esp, -16				; fffffff0H
	add	esp, 4
	push	ebp
	mov	ebp, esp
	sub	esp, 84					; 00000054H
	push	ebx
	push	esi
	push	edi
; Line 6
	mov	ecx, 1638425				; 00190019H
; Line 7
	pinsrw	xmm6, ecx, 0
; Line 8
	pinsrw	xmm6, ecx, 1
; Line 9
	movd	mm6, ecx
; Line 10
	mov	edi, DWORD PTR _dest$[ebx]
; Line 11
	mov	ebx, DWORD PTR _width$[ebx]
; Line 12
	mov	ecx, DWORD PTR _height$[ebx]
; Line 13
	mov	eax, ebx
; Line 14
	mov	edx, ebx
; Line 15
	shl	eax, 1
; Line 16
	sub	ecx, 2
; Line 17
	mov	esi, DWORD PTR _src$[ebx]
; Line 18
	lea	edi, DWORD PTR [edi+eax+2]
; Line 20
	movaps	XMMWORD PTR _sv25$[ebp], xmm1
; Line 21
	movups	XMMWORD PTR _sv25$[ebp+12], xmm4
; Line 22
	movaps	XMMWORD PTR _svSum$[ebp], xmm0
; Line 23
	movaps	XMMWORD PTR _svSum$[ebp+16], xmm3
; Line 24
	movq	MMWORD PTR _svSum$[ebp+32], mm0
; Line 25
	movd	DWORD PTR _sv25$[ebp+28], mm3
; Line 28
	mov	ebx, DWORD PTR _svSum$[ebp+2]
; Line 29
	mov	edx, DWORD PTR _svSum$[ebp+6]
; Line 30
	movzx	eax, WORD PTR _svSum$[ebp]
; Line 31
	add	ebx, edx
; Line 32
	mov	edx, ebx
; Line 33
	add	ebx, eax
; Line 34
	shr	edx, 16					; 00000010H
; Line 35
	add	ebx, edx
; Line 36
	mov	WORD PTR _svSum$[ebp], bx
; Line 38
	mov	ecx, 1
$sums$74004:
; Line 41
	inc	ecx
; Line 42
	sub	ebx, eax
; Line 43
	movzx	eax, WORD PTR _svSum$[ebp+ecx*2+10]
; Line 44
	add	ebx, eax
; Line 45
	mov	WORD PTR _svSum$[ebp+ecx*2], bx
; Line 46
	cmp	ecx, 16					; 00000010H
; Line 47
	jb	SHORT $sums$74004
; Line 50
	movaps	xmm0, XMMWORD PTR _sv25$[ebp]
; Line 51
	movaps	xmm1, XMMWORD PTR _sv25$[ebp+16]
; Line 52
	movaps	xmm2, XMMWORD PTR _svSum$[ebp]
; Line 53
	movaps	xmm3, XMMWORD PTR _svSum$[ebp+16]
; Line 56
	pop	edi
	pop	esi
	pop	ebx
	mov	esp, ebp
	pop	ebp
	mov	esp, ebx
	pop	ebx
	ret	0
_Laplace ENDP
_TEXT	ENDS

Title: Re: Frame pointer register?? / Opt. Advice/Opinions
Post by: jj2007 on January 01, 2010, 08:31:12 AM

Line 11 & 12 look cute :wink

Title: Re: Frame pointer register?? / Opt. Advice/Opinions
Post by: robione on January 01, 2010, 04:14:04 PM

I feel like I'm stuck in a time warp..... working with VC++ 98. ..... I'll have to spend the time and see if I can bring over all my libraries into VC++ '08 Express.

Title: Re: Frame pointer register?? / Opt. Advice/Opinions
Post by: GregL on January 01, 2010, 11:41:42 PM

robione,

Since your original post asked for opinions, I would recommend that you write your functions in MASM and then link those modules into your C/C++ program. You can do a whole lot more with MASM than you can with inline assembly. Inline assembly is pretty limited.

Title: Re: Frame pointer register?? / Opt. Advice/Opinions
Post by: robione on January 03, 2010, 12:01:58 PM

Thx Greg.... I was actually trying to avoid that option LOL. Mainly because when I start to get into something I find a gazillion "rabbit holes" to go down as far "stuff to learn." So I'm fairly good at getting little done (looks at this vector class I've been "updating" for days, trying to squeeze out speed with SSE). This particular case would be learning MASM's interface and how to link modules with MSVC++6.

Though I'm not against it if MASM optimizes the assembly code by reordering. (I can't imagine a x86 compiler that could do much more)

If not I'll stick to the inline assembler. I can use all instructions up to and including SSE2. That said maybe I'm not getting the picture because IDK something that you might as far as architecture, program segments, etc. go. I can't code for like 20-30 minutes w/o having to look up something online... so that's a good possibility LOL :) My soon-to-be new thread might be a good example.

The MASM Forum Archive 2004 to 2012

General Forums => The Campus => Topic started by: robione on December 31, 2009, 04:11:33 PM