News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Would need a bit testing help

Started by Gunther, August 26, 2010, 01:23:31 AM

Previous topic - Next topic

Gunther

I'm new in this forum and I need a bit time for the right orientation. The presented material here is overwhelming; the provided help, too. So, I'm not sure, if that's the appropriate place for my question. But we'll see.

I've to optimize an application for the fractal image compression (approximately 180 000 lines of code). That's not much, because it's a pure console application, written in C++ with the GNU tool chain. It runs under several 32bit operating systems (Windows, Linux, BSD and Intel based Mac OS X). The 64 port is on the way. So far, so good, but a small part of the encoder is tricky - or better spoken time critical. I did a serious code walk through and intensive profiling and found that: the bottle neck are 6 procedures. For example, given an image of 256x256 pixels, those routines are called over 30 million times. What is the purpose of the functions? They've to calculate several numerical and statistical parameters, especially the scalar product or dot product.

The last resort to squeeze out the last few nanoseconds per function call is assembly language, of course. I've searched for a fast implementation of the dot product. Therefore, I've written a small test suite, which computes the scalar product in several variations. I think the best way would be, to present the code here for testing on other machines and different platforms. What is the best place to do that?  I don't want cross posting. The only thing what I've to do is, to make another version for an external assembler for better readability, because at the moment, Ive implemented the assembly language part with the GCC inline assembler (AT&T syntax).

Gunther
Forgive your enemies, but never forget their names.

dedndave

perhaps you can use a compiler switch to generate an assembly listing
then, you can examine the asm code that is generated and see where there is room for improvement
offhand, i would say that SSE2 code will be your friend

oh - and, welcome to the forum   :U

jj2007

Hi Gunther,
Welcome to the Forum :thumbu
Create a thread in the Laboratory, and start with posting some of the inner loops. You will see that several members are just waiting eagerly to demonstrate that a compiler can be beaten any time by human intelligence :green2

hutch--

Hi Gunther,

I have played with GAS which allows Intel notation, is that possible in the GCC inline format ? If so it would make testing a lot easier as most here work in MASM and similar that uses the historical Intel notation. Otherwise a disassembly may be the easiest way to handle a critical algo for testing.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

Gunther

Thank you for your warm welcome.

Quote from: dedndavei would say that SSE2 code will be your friend

Without any doubt: it's my friend and the way to go. Both arrays (vectors) are appropriate aligned by 16, so it's already in use.

Quote from: jj2007Create a thread in the Laboratory, and start with posting some of the inner loops.

Okay, I'll do that. And you're right: the famous innermost loops ...

Quote from: jj2007You will see that several members are just waiting eagerly to demonstrate that a compiler can be beaten any time by human intelligence

I'm sure. That's my point of view, too.

Quote from: hutch--I have played with GAS which allows Intel notation, is that possible in the GCC inline format ?

Yes, it's possible since GCC version 3.3.? or so and should make life more easy. That was the good news. But: it is well known that Mac OS X is a BSD clone. So Apple patched not only BSD, but also the GCC. Since then the Apple based GCC doesn't "understand" the Intel switch. That's a bad joke. Therefore, I've decided to do the inline assembly part in AT&T syntax for the seamless compilation under Mac OS X. On the other hand, that syntax isn't so bad and in some cases more clear as the Intel syntax. I'm at home in both worlds. But I know: the readability for other guys...

Quote from: hutch--Otherwise a disassembly may be the easiest way to handle a critical algo for testing.

It's possible, but awkward. GCC comes up with AT&T syntax and that doesn't help. From pre-compiled code, one has to use objconv or a similar tool. The inline part are 3 functions. I've written a separate file in NASM syntax for those procedures. That's better readable and one knows what happens there. I think that's a good compromise.

Gunther
   

Forgive your enemies, but never forget their names.

hutch--

Gunther,

If you don't mind building seperate modules, GAS using Intel notation is probably a good option but seperate module have another advantage, usually recent C compilers have a lot of internal optimisation performed and inlining assembler code often makes a mess of that internal optimisation. I don't know if you can use GAS under a MAC OS but if it was possible it would save you a lot of translation.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

Gunther

Quote from: hutch--I don't know if you can use GAS under a MAC OS but if it was possible it would save you a lot of translation.

It's possible to use GAS under Mac OS X, but without ".intel_syntax noprefix", which enables the wide spread Intel syntax. With that line occurs a compilation error.

The translation isn't a problem, because it's done. I've only to insert appropriate comments for a better readability. Should I insert into the archive the running Win32 EXE for members, which haven't GCC installed? For Linux, BSD and Mac OS X that's not a point, because GCC is there, by default. What do you think?

By the way, do you sleep sometimes?  :wink You're always online, when I'm online, too, but there are several thousand miles between us. It's amazing.

Gunther
Forgive your enemies, but never forget their names.