Print Page - Optimization manual updated again. Now covers Core 2

Title: Optimization manual updated again. Now covers Core 2
Post by: agner on August 14, 2006, 07:40:58 AM

I have updated my manual once again. Now covering everything about the new Intel Core 2 processor including a detailed study of the pipeline and execution units and complete lists of instruction timings.

This time my manual has come before the official manuals from Intel. Their software manuals for the Core 2 are not out yet. Thank you to a friendly person who gave me remote access to a prerelease sample of the Core 2. This enabled me to test almost everything.

The execution core is more powerful than anything we have seen until now. It can do up to three full 128-bit vector calculations per clock cycle. Unfortunately, the instruction fetch and predecode stage has not been expanded enough to keep up with the rest of the pipeline, so this is a serious bottleneck in many situations.

The section on AMD microarchitecture in my manual has also been revised, thanks to help from Andreas Kaiser.

http://www.agner.org/optimize/

Title: Re: Optimization manual updated again. Now covers Core 2
Post by: MazeGen on August 17, 2006, 06:59:59 AM

You work is really unique. Thanks for your hard work, agner :U

Title: Re: Optimization manual updated again. Now covers Core 2
Post by: Ghirai on August 17, 2006, 12:58:43 PM

Very nice, thanks for your work.

Would it be okay to add the asm manual to my mirror (link in my signature)?

Title: Re: Optimization manual updated again. Now covers Core 2
Post by: drhowarddrfine on August 17, 2006, 01:28:38 PM

Thank you very much, Agner! :U

Title: Re: Optimization manual updated again. Now covers Core 2
Post by: BogdanOntanu on August 17, 2006, 04:37:46 PM

Thank you very much ;)

Title: Re: Optimization manual updated again. Now covers Core 2
Post by: Vortex on August 17, 2006, 05:08:00 PM

Agner,

Very nice work :U

Title: Re: Optimization manual updated again. Now covers Core 2
Post by: stanhebben on October 02, 2006, 08:12:24 PM

I found a little error in the optimization manual:

At page 74, you state that the only way to load unaligned data in XMM registers is to use MOVDQU, MOVUPS, MOVUPD or LDDQU. You can use MOVLPD/MOVHPD pairs, which results in faster code.

Stan

Title: Re: Optimization manual updated again. Now covers Core 2
Post by: gabor on October 03, 2006, 09:28:59 AM

Hello agner!

Thanks for your work! It is very nice to see people who do a lot of work in a specific area and are ready to share the gained knowledge! Thank you again!
I had a look on your website. There are interessting topics! This Cultural Selection Theory looks very exciting....

Greets, Gábor

Title: Re: Optimization manual updated again. Now covers Core 2
Post by: gwapo on October 04, 2006, 02:27:21 AM

Great work, thanks!
I thought Mark Larson's "Code Optimization" (http://www.mark.masmcode.com/) is already the best optimization document I've got. I guess multiple "the best" documents are always better than having single "the best" document :U

\
-chris

Title: Re: Optimization manual updated again. Now covers Core 2
Post by: agner on October 04, 2006, 07:02:30 AM

Your link to Mark Larson doesn't work. Where is the document you are referring to?

Title: Re: Optimization manual updated again. Now covers Core 2
Post by: PBrennick on October 04, 2006, 10:11:07 AM

Agner,
The link works just fine. Please try it again. The server may have been down when you last tried.

Paul

Title: Re: Optimization manual updated again. Now covers Core 2
Post by: KSS on April 07, 2010, 08:35:29 AM

Agner, thanks you for your work! :U

Title: Re: Optimization manual updated again. Now covers Core 2
Post by: theunknownguy on April 14, 2010, 06:10:12 AM

Agner you rulz ! thanks so much my favorite read for optimization :dance:

Title: Re: Optimization manual updated again. Now covers Core 2
Post by: dedndave on April 14, 2010, 10:25:23 AM

you guys have woken a 4 year old thread :P

Title: Re: Optimization manual updated again. Now covers Core 2
Post by: clive on April 14, 2010, 02:30:58 PM

Quote from: dedndave
you guys have woken a 4 year old thread :P

Can we dig up the zombie P4 designers that removed the barrel shifter, and beat them to death again?

-Clive

Title: Re: Optimization manual updated again. Now covers Core 2
Post by: jj2007 on April 14, 2010, 03:20:39 PM

Quote from: clive on April 14, 2010, 02:30:58 PM
Can we dig up the zombie P4 designers that removed the barrel shifter, and beat them to death again?

Try imul instead, it's blazingly fast on modern CPUs.

Title: Re: Optimization manual updated again. Now covers Core 2
Post by: clive on April 14, 2010, 04:33:28 PM

Quote from: jj2007
Try imul instead, it's blazingly fast on modern CPUs.

I'm familiar with the math. I'd be willing to bet the latency is still significantly worse than a barrel shifter. On the P4 it was 14 cycles as I recall, and Intel was recommending ADD/LEA instruction combination. The latency is still 3-5 cycles, a barrel should be 1.

IMUL doesn't work well with right shifts, rotates, rotates through carry, using/setting carry. IDIV will eat 3 registers, has similar carry and rotate issues, and even more hideous latency.

The Core and Atom have significantly better shifting performance, AMD always has been good, some of the P4 issues were addressed in Prescott. Still the 64-bit right shifts are dogs in the entire P4 family. The P4 Willamette was a totally awful implementation, and Northwood wasn't much better, which is why I recommended beating them around the head. Whoosh!

-Clive

Title: Re: Optimization manual updated again. Now covers Core 2
Post by: jj2007 on April 14, 2010, 04:56:21 PM

Quote from: clive on April 14, 2010, 04:33:28 PM
Quote from: jj2007
Try imul instead, it's blazingly fast on modern CPUs.

I'm familiar with the math. I'd be willing to bet the latency is still significantly worse than a barrel shifter.

We aren't into betting and guessing here, Clive. Celeron M:

The MASM Forum Archive 2004 to 2012

General Forums => The Laboratory => Topic started by: agner on August 14, 2006, 07:40:58 AM