Standard strstr implementation performance - what am I missing?

Started by Xan, April 30, 2008, 05:59:03 AM

Previous topic - Next topic

Xan

Hi Everyone,
first time poster and jsut getting back into some assembly programming. For reasons too long-winded to explain, I thought I'd find enjoyment in recreating some standard C functions for a particular application I wrote. More precisely, at least at this point in time, I decided to rewrite the implementation of strstr found with Visual Studio. Taking a few liberties I know about my application, I was able to make assumptions that I thought would make it faster as I didn't have to concern myself with various potential error conditions ( such as knowing the size of the buffer is always a multiple of two, thus can safely read a word at a time ).

I have attached the standard implementation of strstr as well as my pretty basic rewrite of it. Some notable differences are that I read a word at a time in the main loop ( vs a byte at a time in the standard version ) and I've 'unraveled the loop' a bit in my main comparision section ( strs_floop ). Here's the kicker - on a machine at work ( P4 ), my version is as fast if not faster than the standard version. At home on my machine ( Core 2 Duo 6400 ), my version is measurably slower than the standard! I've tried various manipulations & reordering and I can't for the life of me figure it out.  :(

Note that to test the two implementations I created a 40 MB repeating string ( "This is a test.." ) and at the end of said string, I have "This is a te!\r\n". My search phrase is "This is a test..This is a te!". It should also be noted that my application I was writing this for is parsing data that is deliminated by a carriage return ( 0Dh ), hence the need to test for that byte in my version instead of 00h to denote the end of string. On my home machine, the results in tick counts are 125 for mine versus 109 for the standard implementation.

Any ideas what's going on?

Thanks!
Xan

[attachment deleted by admin]

hutch--

Xan,

Probably plain hardware differences, if you showed us your implimentation, especially in the Laboratory we could probably help you get it faster but there is another approach, try a Boyer Moore search algo, on normal plain text they are very fast, especially with longer strings where the step is larger each time.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

Xan

Thanks Hutch! I'll repost over in that section. FYI, my previous post does have my source included in the zip file attachment. :)