Faster instruction timing?

indiocolifa · September 21, 2005, 06:28:48 AM

I want to increase CX by one.

What is faster in modern (Pentium + CPUs)?

INC (CX);

or

ADD (1,cx);

Thank you very much.

Sevag.K · September 21, 2005, 09:24:07 PM

A while back I did some timing tests and ADD instruction proved faster on modern pentiums and apparantly, INC was faster on older pentiums and pre-pentiums (I don't know the cut-off point).

But then, my test was based on 32 bits INC/ADD. Word sizes (though smaller) might be slower still.

The trade-off is that ADD (imm, reg) is bigger than INC (reg)

Randall Hyde · September 30, 2005, 03:49:18 PM

Quote from: indiocolifa on September 21, 2005, 06:28:48 AM
I want to increase CX by one.

What is faster in modern (Pentium + CPUs)?

INC (CX);

or

ADD (1,cx);

Thank you very much.

The PIV manual recommends the use of ADD.
Cheers,
Randy Hyde

indiocolifa · October 02, 2005, 07:23:06 AM

Tell me if i'm right in the following:

I'm using the following piece of code to get the correct offset for a field (it's an array of records) in EBX, where EAX is the index:

Code Select

INTMUL (@size(recordType), EAX, EBX);

My record size is 9 bytes.

since this mul instruction takes many cycles, may be I should do for faster operation (since 9*EAX = (8*EAX)+EAX):

Code Select

SHL (3,eax);           // eax * 8
ADD (eax,eax);       // ( eax * 8 ) + eax
MOV (eax,ebx);      // eax to ebx

This is correct?

Second approach (if I make the RECORD size to 16 bytes using padding)

Code Select

SHL (4,eax); // eax * 16
MOV (eax,ebx);

Maybe the last method is better.

V Coder · October 03, 2005, 04:47:41 PM

Of course you realise that adding eax to itself doubles eax, so you don't get (eax*8)+eax... you get (eax*8)+(eax*8), which is eax*16.

Then distribute this space largest elements first. For speed, you should really align all data to their size in bytes. So align 8byte data to 8 bytes, 4 byte data to 4 bytes.
So if your record consists of 4 bytes, followed by 3 bytes followed by 2 bytes, it may be better to have a 12 byte record: 4 bytes, 3 bytes (+1 byte padding), 2 bytes (+2 bytes padding). This ensures that accessing the 4 byte value will not be split across any 4 byte boundary, etc. This speeds up memory accesses.

If you have 6 byte data followed by 3 byte data, you need to align the 6 byte data to an 8 byte boundary, and the 3 byte data to a 4 byte boundary: 6 bytes (+ 2 bytes padding), 3 bytes (+5 bytes padding). I expanded the record to 16 bytes to ensure that the 6 byte data is always aligned to an 8 byte boundary. All records should probably be 16 byte aligned if you have the space, especially if you have space.

With 16 byte aligned data, you can simply use shl (4, eax);

On the other hand if you must conserve space then this multiplies by nine properly.
mov (eax, ebx);
shl (3, eax);
add (eax, ebx);

or to ensure you don't change the value of eax you can use
lea (ebx, [eax*8 + eax]);

If speed is your need, then three adds are probably faster than the shift, at least on the Pentium 4.
mov (eax, ebx);
add (eax, eax);
add (eax, eax);
add (eax, eax);
add (eax, ebx);

or

mov (eax, ebx);
add (eax, eax);
add (eax, eax);
add (eax, ebx);
add (eax, ebx);

indiocolifa · October 03, 2005, 06:14:01 PM

Thank you very much!

News:

Faster instruction timing?

indiocolifa

Sevag.K

Randall Hyde

indiocolifa

V Coder

indiocolifa