News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Efficiency in a scanner

Started by Seb, November 09, 2006, 12:51:35 AM

Previous topic - Next topic

Seb

Hey there!

This is a general purpose question. I'm building a malware scanner in Assembler for my own personal use, so my question goes: what'd be the most efficient way of "scanning" a file? The way I see it, I have two options:

1. Use CRC32 to compare to a hardcoded checksum (may not be secure though).
2. Read a byte array of size N at file offset X and compare to a hardcoded byte array.

What'd be more efficient here, do you guys reckon? If anyone has any other idea of how to cleverly find a malware, please don't hesitate to post it. :U

Thanks guys,
Seb

ecube

Well I recommend the checksum method with crc32 and md5, or others but also focus on specific byte signatures for detection, thats how avs operate. Also it'd be a good idea to have your application try and recognize things about the file such as if it imports winsock or wininet functions it can get online, avapi32 functions it can write to the registry, etc etc... Ofcourse that's not full proof because they can dynamically load everything with loadlibrary/getprocaddr. Also ofcourse pe protectors/compressors can change the code so for your malware detector to be any code it should have some sort of "cpu emulator" like various anti-virus programs have to be able to unpack files safely and find the signatures. At very minimum be able to unpack upx safely which theres code released that can do that.

Seb

Thanks, E^cube, I'm using CRC32 as a test at the moment and it's fairly good. However, if I was to add detection for a PE-infecting virus, I'd have to stick with signature detection. What I'm uncertain about when it comes to byte signatures is, where would I search for one? The code section? How large should a byte signature generally be?

Regards,
Seb

Mark Jones

I might suggest, if you want to use only a checksum algorithm like CRC32, modify the algorithm slightly or consider something else like Blowfish. This is because there is a chance that the file could be infected by an intelligent virus and modified enough to attain an identical CRC32 checksum. It should be difficult and rare, but I've heard rumors. :wink
"To deny our impulses... foolish; to revel in them, chaos." MCJ 2003.08

evlncrn8

well to patch something and to 'fix' the crc32 really only requires 4 bytes  to do it..so i wouldn't use crc32 at all
use a good hash, maybe xor the hash with the crc32 or something.. but definately do not rely on crc32 as being
good or safe...

Seb

Thanks for the replies guys. :U I have another question in store: what'd be the fastest way of reading a file - the regular file stuff or file mapping the object?

evlncrn8

purely depends on the file size.. if its small, read it into a buffer, if its big, try mapping it first...
you could then time each method on the same file to determine whats good for you...
its really a question you could have answered yourself :)