I was kinda curious as to what this is? I'm continuing my fiddling with CV and coded up a Laplace edge detection SSE/MMX version and I wanted to start debugging my logic.... unfortunately I can't even get that far. I'm getting this warning in a few places:
D:\Visual Studio Projects\edge\edge.cpp(247) : warning C4731: 'Laplace' : frame pointer register 'ebx' modified by inline assembly code
My code crashes before I enter the main loop of the function.... here (mov ecx,height)
void Laplace(BYTE *src, BYTE *dest, int width, int height) {
__declspec(align(16)) short sv25[16];
__declspec(align(16)) short svSum[20];
__asm {
mov ecx,0x00190019
pinsrw xmm6,ecx,0
pinsrw xmm6,ecx,1
movd mm6,ecx
mov edi,dest
mov ebx,width
mov ecx,height
mov eax,ebx
mov edx,ebx
shl eax,1
sub ecx,2
mov esi,src
lea edi,[edi+eax+2]
// ......
BTW if anyone wants to check out my SSE Sobel code follow the link in my sig. Got some impressive times I think :)
I was also curious about optimization advice. Laplace is sort of similar to Sobel. It uses 1 5x5 mask. So I load up 5 rows of image data of 16 pixels each in the SSE registers, unpack them, add them up. Those are the values for svSum[0-15]. I take the middle line and multiple it by 25. Then psrldq 4 half the data to get rid of the two shorts I don't need. These values will be stored at sv25[0-13]. In MMX registers I move the next 4 pixels in, and essentially do the same thing. That gives me the values for svSum[16-19] and sv25[14-15]. Once I have the values computed I write them to memory.
Then I read them back into the GPRs. So I can sum up the 1st 5 values of svSum and write that back to svSum[0]. I then loop through the array, subtracting the left most value and adding in the rightmost (of the 5x5 mask). This "shrinks" the data in the svSum array into 16 elements.
Then I load up svSum and sv25 in xmm registers, do the final subtraction and pack the data back up. I didn't post the code 'cause I thought a couple of paragraphs would be easier to read than ~150 lines of assembly. A quick primer on Sobel/Laplace can be found here: http://www.pages.drexel.edu/~weg22/edge.html about 2/3s down. I know my description of it is lacking in this post. You essentially multiply the pixel values by the corresponding value in the 5x5 mask and sum them..... The mask is all -1 ecxept the center value...24.
I just am not sure if the moving of data around between the MMX, SSE and GP registers via memory is the most efficient. I could've used less data in the SSE registers (computing 12 pixels in SSE alone instead of 16 using SSE/MMX) but doing it like I am now I can use movaps instead of movups. Once I figure out this frame pointer thing I can determine which route to take.... but in either one I still do the "shrinking" of svSum the same. Which is below..... is there a better way to do the same?
//svSum and sv25 values computed. Store them
movaps sv25,xmm1
movups [sv25+12],xmm4
movaps svSum,xmm0
movaps [svSum+16],xmm3
movq [svSum+32],mm0
movd [sv25+28],mm3
//Read svSum back :/. Compute the "real" value of the summation for 1st pixel
mov ebx,dword ptr [svSum+2]
mov edx,dword ptr [svSum+6]
movzx eax,word ptr [svSum]
add ebx,edx
mov edx,ebx
add ebx,eax
shr edx,16
add ebx,edx
mov svSum,bx
mov ecx,1
sums:
//Compute the "real" value of the summation for remaining pixels
inc ecx
sub ebx,eax
movzx eax,word ptr [svSum+ecx*2+10]
add ebx,eax
mov word ptr [svSum+ecx*2],bx
cmp ecx,16
jb sums
//Read the two arrays back for final computation
movaps xmm0,sv25
movaps xmm1,[sv25+16]
movaps xmm2,svSum
movaps xmm3,[svSum+16]
ebx is the register that serves for local variables. You can use it if you have no local variables but you must save and restore it:
push ebx
... do stuff
pop ebx
Quote from: jj2007 on December 31, 2009, 04:28:49 PM
ebx is the register that serves for local variables. You can use it if you have no local variables but you must save and restore it:
push ebx
... do stuff
pop ebx
I guess you wanted to write EBP not EBX ;)
Quote from: BogdanOntanu on December 31, 2009, 05:01:21 PM
I guess you wanted to write EBP not EBX ;)
Well, yes and no. I meant ebp, but he gets a warning for ebx. I have never used C - does it use ebx instead of ebp? In any case, robione should preserve the non-trashable registers (ebx, esi, edi, ebp).
and don't forget the DF :P
Interestingly by inserting push ebp, push ebx and their corresponding pops on exit..... I go from 8 warnings to 10. :/
Why do I have problems with this function and none of my other assembly versions of stuff?
[Times passes......]
Ok as it turns out, if I remove the __declspec(align(16)) for my locals I no longer get warnings...... why would this be? Can't I just get wasted space in the stack that corresponds to the padding needed to align the local vars properly instead of a program that crashes?
The easiest way to understand would be to insert an int 3 at the top and run the exe through OllyDbg.
I can't help you a lot because I don't know whether C really uses ebx as frame pointer, or ebp as in Masm. Either way you could not use simultaneously the frame pointer and local variables. Try replacing ebp/ebx with a local var - it will be slower but if the warning is valid it should no longer crash...
QuoteI have never used C - does it use ebx instead of ebp?
No, it normally uses EBP as the frame pointer.
Quote from: Greg Lyon on January 01, 2010, 12:59:18 AM
QuoteI have never used C - does it use ebx instead of ebp?
No, it normally uses EBP as the frame pointer.
OK, so "warning C4731: 'Laplace' : frame pointer register 'ebx' modified by inline assembly code" must be a typo.
jj,
Maybe, but not necessarily. You would really need to look at the assembly listing from the compiler to see what it is doing.
Here is the MSDN info for warning C4731 (http://msdn.microsoft.com/en-us/library/ywz8xf2a.aspx).
Thanks guys. I did some register shuffling and am not using ebx anymore... low and behold it works fine. Thx JJ for the advice re: the variables. I can begin my "official" debugging now :)
I had looked at the msdn site before. I just didnt see how what I was doing matched what was in the article. Is it possible to use two frame pointers at the same time? Maybe I was lucky on my other functions and they were already 16-byte aligned and I escaped this problem. i.e. EBP is used "normally" where as EBX is 16-byte aligned only? Is that a possibility?
; Listing generated by Microsoft (R) Optimizing Compiler Version 13.10.3077
TITLE test.c
.386P
include listing.inc
if @Version gt 510
.model FLAT
else
_TEXT SEGMENT PARA USE32 PUBLIC 'CODE'
_TEXT ENDS
_DATA SEGMENT DWORD USE32 PUBLIC 'DATA'
_DATA ENDS
CONST SEGMENT DWORD USE32 PUBLIC 'CONST'
CONST ENDS
_BSS SEGMENT DWORD USE32 PUBLIC 'BSS'
_BSS ENDS
$$SYMBOLS SEGMENT BYTE USE32 'DEBSYM'
$$SYMBOLS ENDS
_TLS SEGMENT DWORD USE32 PUBLIC 'TLS'
_TLS ENDS
FLAT GROUP _DATA, CONST, _BSS
ASSUME CS: FLAT, DS: FLAT, SS: FLAT
endif
INCLUDELIB LIBC
INCLUDELIB OLDNAMES
PUBLIC _Laplace
; Function compile flags: /Odt
_TEXT SEGMENT
_svSum$ = -80 ; size = 40
_sv25$ = -32 ; size = 32
_src$ = 8 ; size = 4
_dest$ = 12 ; size = 4
_width$ = 16 ; size = 4
_height$ = 20 ; size = 4
_Laplace PROC NEAR
; File c:\program files\microsoft visual c++ toolkit 2003\my\robione\test.c
; Line 2
push ebx
mov ebx, esp
push ecx
and esp, -16 ; fffffff0H
add esp, 4
push ebp
mov ebp, esp
sub esp, 84 ; 00000054H
push ebx
push esi
push edi
; Line 6
mov ecx, 1638425 ; 00190019H
; Line 7
pinsrw xmm6, ecx, 0
; Line 8
pinsrw xmm6, ecx, 1
; Line 9
movd mm6, ecx
; Line 10
mov edi, DWORD PTR _dest$[ebx]
; Line 11
mov ebx, DWORD PTR _width$[ebx]
; Line 12
mov ecx, DWORD PTR _height$[ebx]
; Line 13
mov eax, ebx
; Line 14
mov edx, ebx
; Line 15
shl eax, 1
; Line 16
sub ecx, 2
; Line 17
mov esi, DWORD PTR _src$[ebx]
; Line 18
lea edi, DWORD PTR [edi+eax+2]
; Line 20
movaps XMMWORD PTR _sv25$[ebp], xmm1
; Line 21
movups XMMWORD PTR _sv25$[ebp+12], xmm4
; Line 22
movaps XMMWORD PTR _svSum$[ebp], xmm0
; Line 23
movaps XMMWORD PTR _svSum$[ebp+16], xmm3
; Line 24
movq MMWORD PTR _svSum$[ebp+32], mm0
; Line 25
movd DWORD PTR _sv25$[ebp+28], mm3
; Line 28
mov ebx, DWORD PTR _svSum$[ebp+2]
; Line 29
mov edx, DWORD PTR _svSum$[ebp+6]
; Line 30
movzx eax, WORD PTR _svSum$[ebp]
; Line 31
add ebx, edx
; Line 32
mov edx, ebx
; Line 33
add ebx, eax
; Line 34
shr edx, 16 ; 00000010H
; Line 35
add ebx, edx
; Line 36
mov WORD PTR _svSum$[ebp], bx
; Line 38
mov ecx, 1
$sums$74004:
; Line 41
inc ecx
; Line 42
sub ebx, eax
; Line 43
movzx eax, WORD PTR _svSum$[ebp+ecx*2+10]
; Line 44
add ebx, eax
; Line 45
mov WORD PTR _svSum$[ebp+ecx*2], bx
; Line 46
cmp ecx, 16 ; 00000010H
; Line 47
jb SHORT $sums$74004
; Line 50
movaps xmm0, XMMWORD PTR _sv25$[ebp]
; Line 51
movaps xmm1, XMMWORD PTR _sv25$[ebp+16]
; Line 52
movaps xmm2, XMMWORD PTR _svSum$[ebp]
; Line 53
movaps xmm3, XMMWORD PTR _svSum$[ebp+16]
; Line 56
pop edi
pop esi
pop ebx
mov esp, ebp
pop ebp
mov esp, ebx
pop ebx
ret 0
_Laplace ENDP
_TEXT ENDS
Line 11 & 12 look cute :wink
I feel like I'm stuck in a time warp..... working with VC++ 98. ..... I'll have to spend the time and see if I can bring over all my libraries into VC++ '08 Express.
robione,
Since your original post asked for opinions, I would recommend that you write your functions in MASM and then link those modules into your C/C++ program. You can do a whole lot more with MASM than you can with inline assembly. Inline assembly is pretty limited.
Thx Greg.... I was actually trying to avoid that option LOL. Mainly because when I start to get into something I find a gazillion "rabbit holes" to go down as far "stuff to learn." So I'm fairly good at getting little done (looks at this vector class I've been "updating" for days, trying to squeeze out speed with SSE). This particular case would be learning MASM's interface and how to link modules with MSVC++6.
Though I'm not against it if MASM optimizes the assembly code by reordering. (I can't imagine a x86 compiler that could do much more)
If not I'll stick to the inline assembler. I can use all instructions up to and including SSE2. That said maybe I'm not getting the picture because IDK something that you might as far as architecture, program segments, etc. go. I can't code for like 20-30 minutes w/o having to look up something online... so that's a good possibility LOL :) My soon-to-be new thread might be a good example.