I just wanted to know why most of the functions in masm32lib don't use bound checks ?
example:
readline source :DWORD, buffer :DWORD, spos :DWORD
I'd like to use ..
readline source :DWORD, buffer :DWORD, bufSize :DWORD, spos :DWORD
otherwise everyone can crash the program by feeding data with exceedly long lines. this suxx, or did I miss something ?
-----------------
Is there a way to do signed integer comparison with some high-level if-elseif-else statements
other than using ".IF SDWORD PTR eax < 0"
coz it's such a drag to write that long statement everytime..
maybe a macro to simplify it ?
dunno how to effectively create macros ..
thanx for your help
I think I read somewhere that Hutch is slowly working on these issues. Bounds exploits are to be watched out for so be careful.
Ingrid
it might be a fact of laziness.
coz I experienced that it takes much longer to write boundary checking code.
if someone is interested I might upload some sources.
and please can anyone also give a workaround hint for the comparison issue.
thanx
you're right when you say that bound checks are a huge performance drawback
you're also right when saying that the programmer has to be advanced enough
so he doesn't need any dynamic allocation methods as in HLLs,
that's why i use 'lstrcpy' in case i know for sure that the data won't
overflow the buffer,
but if I can't predict the incoming data exactly i don't just assume that
the data will be a format or length i'd like it or expect it to be, which
means i don't need bound checks if i have control/influence over the in-
coming data, but if i haven't, i use a function which does boundary checking
example:
xstrcpy proto dest :dword, destSize :dword, src :dword
;instead of
lstrcpy proto dest :dword, src :dword
and this is fine for most simple programs (all simple gui-tools, etc.)
coz they have enough time to scan the src-buffer first
and then copy only as much bytes as fit into the destBuf by using 'lstrcpyn'
now imagine the following situation:
you release some tool which uses an md5list-file and the readline function
is used. the tool checks if the listed files are original and the md5list-file
is secured by a non-public signature. the md5list-file format of an entry is,
(i think you know already)
md5hash[32],0x20,'*'/' ',path[MAX_PATH],(0x0D,0x0A)
len=(32+2+260+2)=296 byte
so how big will the md5list-file get?,
200meg?, generally not.
Quote
If you are trying to parse text that is not in the form of either LF or CRLF
line terminations, then you write a different algo and make the buffer size
the length of the source that you are reading.
so this is a solution you gave for incoming data where the format is unknown,
but i can't feel it being a effective measure,
imagine there are 80000 entries (which is actually my L: drive),
80000 * 296 = 23680000 byte = 23,7meg
i'll then not be able to use freakin large buffers,
coz you'll hate a world where every application allocates
every buffer by 20meg, just to lower the possibility to overflow it.
as you yourself said the function is intended to read
text which is short in practise,
and as you at least now should have noticed,
this is just an assumption and a computer can't work by assumptions,
it must know, and if it doesn't, it must check.
Quote
With the example you have mentioned with "readline", see what it was designed
to do. In most instances text does not go over 100 characters each line ..
but regardless what size you allocate the buffer, your application can always
be crashed or maybe also exploited, by anyone !
isn't that ridiculous for a tool which was intended to provide security ?
the md5list-file can easily be replaced by a file which crashes the
check-tool, or if the buffer is local it might even be exploitable,
thus enabling the attacker to make the tool say the files are original
while they are not !
generally i really love buffer-overflows, but not in my own software.
the 'xstrcpy' function scans the size of the input buffer first,
which of course takes some time but if you are that fussy,
you may not even use 'lstrcat', coz it has to scan over the cat-buffer
before the actual operation. instead of that, one could use a modified
'lstrcpy' function, lets name it 'zstrcpy' which returns the pointer
to the terminating null which was just added, thus enabling the use
of eax for sequential calls:
mov edi, OFFSET bfDest
invoke zstrcpy, edi, OFFSET bfSource1
invoke zstrcpy, eax, OFFSET bfSource2
invoke zstrcpy, eax, OFFSET bfSource3
invoke zstrcpy, eax, OFFSET bfSource4
invoke zstrcpy, eax, OFFSET bfSource5
but the m$ lstrcpy function doesn't return this pointer,
not even the size, and i bet you don't care.
there are other examples where a returned size could be used to append
data, but where a cat-function is used instead, coz the address calculation
is elaborate to program not as cushy as the cat-functions. also most
functions are called with '-1' as a parameter to let windows scan the size
of the buf by 'lstrlen', whereas the size could be memorized while performing
the cpy-functions in fashion used above.
as you see and as you also have said previously,
it's up to the programmer how fast the code finally is.
but if i were a lazy coder i wouldn't even care about the
boundary check thingy, so teach me how to use my efforts
more effectively, coz i'm of course not against changing my
programming style,
Quote
The whole idea of low level components is so programmers can build
their own designs then apply whatever checking method they like with
their complete code block design.
but what should this look like in the md5-example which is just
normal and simple data processing, teach me.
the following is true, but only applies
if you know what data you get.
Quote
The most reliable memory allocation checker or array bounds checker
is a programmer who knows what they are doing
and i think it's a difference to really know what data to get
and assuming which data to get. coz this is just the point why buffers overflow !
and a programmer which can say how big a buffer must be
without knowing the data-length beeing fed at runtime, must be a prophet.
and just that you don't get me wrong,
i generally do like to create fast code, otherwise i wouldn't love assembly or
waste my time being an assembly enthusiast,
but use bullshit like VB, delphi, .NET, etc.
but I don't want assembly to present the image of being crap software which
always crashes or which is easily exploitable!
and i think that for normal user software like tools with a gui performing
several small utility functions must use bound checks by means presented above.
the things i got,
i summarize:
[ x ] there is a performance loss by using bound chexx
[ x ] bound chexx can't be acceptable for cpu-intensive or mass data algorithms
[ x ] don't use bound chexx if the data length/type is 100% predictable. (i do already)
[ x ] i think one should use bound chexx everytime data length/type
isn't 100% predictable. (i do already)
[ x ] there's a tradeoff between programming convenience and runtime speed
cheers
The problem is that most of these things are not easily canned and this means there is no catch all - automatic way to do all things. Text in almost all instances uses a line length that can be displayed on the screen which is rarely much over 100 characters so its no big deal to allocate a far larger buffer to handle it. If you have no way of knowing what size data you are going to try and load, then you read chunks of it and process it in pieces but then this would be very rare in text data and far more likely in direct binary data.
Library design is a lot more than a single instance of a procedure, it has to do with how you can combine a larger number of procedures into a call tree of your own making and if you need some form of array or memory control, you place it around the call tree, not at every stage within it or you end up with very slow junk code. Writing low level code has a lot to do with understanding what is happening and writing your code in a minimalist efficient way, something you cannot do with theories of catch all hand holding.
Languages like VC, JAVA and some of VC makes compromises of this type but its done at a serious performance penalty which is almost exclusively unacceptable to the vast majority of assembler programmers. You finally pick the development vehicle that you want and if sloppy bloated code will do the job, then use it but if you want low level efficient code, you make the effort to write it that way and very little of it can be abused or not used correctly.