News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Assembler hangs for big buffers in .data? section

Started by jj2007, January 25, 2008, 02:47:41 PM

Previous topic - Next topic

jj2007

Maybe this is a stupid newbie question, that's why I put it here  :green

I fumbled with the fast SSE zeromem algo posted by NightWare.
The algo seems to work (although I have not counted the zeroes  ::), but when I had the bright idea to allocate simply a few bytes in the .data? section, my assembler told me off - see attached example.
Dear old ml.exe just kept working hard whenever I increased the size of my buffer to values above, say, 40000h. The generated code stays put at 1536 bytes, so that is not the reason for ML's hard work...


BufLen EQU 40000h ; not a lot... but for 100000h it takes ML.exe over 2 minutes!

.data?
buffer db BufLen dup(?)

COMMENT @ ^ ^ Here is trouble. Assembly (ml.exe) times:

100000h 2 MINUTES
80000h 9 secs
80000h 6 secs
60000h 3 secs
40000h 2 secs
4000h 0 secs

@

[attachment deleted by admin]

BogdanOntanu

Yes, it is a well known bug of MASM.

I can explain what it is probably happening but there is no solution unless MASM is updated.

Hence the most simple solution is to dynamically allocate big buffers at runtime by using GlobalAlloc, or HeapAlloc or VirtualAlloc.

Anyway that is what the PE loader would do for that buffer.

At least this way you have more control at runtime: you can realloc or emit a meaningful error message.

Quote
so that is not the reason for ML's hard work...

Really? Are you not able to guess what is logically happening in there?
Ambition is a lame excuse for the ones not brave enough to be lazy.
http://www.oby.ro

jj2007

Quote from: BogdanOntanu on January 25, 2008, 04:11:33 PM
Yes, it is a well known bug of MASM.
Thanks - good to know it's not me  :bg

Quote
Hence the most simple solution is to dynamically allocate big buffers at runtime by using GlobalAlloc, or HeapAlloc or VirtualAlloc.
That's what I usually do. I just had a lazy moment. However, when reading elsewhere that PE's get by default 1MB of stack, I wonder whether using that free space has any disadvantages??

Quote
so that is not the reason for ML's hard work...
Quote
Really? Are you not able to guess what is logically happening in there?
Don't have the faintest idea. Assembly time grows eponentially, it seems, but otherwise it's a mystery how inserting a pointer into the code can take such a long time. But since you say it's a known bug, somebody might be able to resolve the mystery... I hope...

BogdanOntanu

Quote
However, when reading elsewhere that PE's get by default 1MB of stack, I wonder whether using that free space has any disadvantages??

It depends what compiler setups the "default". It is 1M of reserved storage but not committed storage. Hence yes you will have some problems there also.

Quote
Don't have the faintest idea. Assembly time grows eponentially, it seems, but otherwise it's a mystery how inserting a pointer into the code can take such a long time.

But give it a try... think a little.

It happens so there must be a reason. You do consider that "it should not be a reason" but obviously it is so you are wrong... World is not what you want it to be, it is what it is ... so?

It looks very much related to your own allocation issue here:

1) What does it happen when you insert a relative large amount of data into a section?
2) Hint: Can MASM preallocate a "large enough" memory buffer that will hold an unknown amount of data ?
3) Hint: What does MASM "have to do" when you go beyond his current buffer size?
4) What common "beginner optimization" error could be done on this event that will generate the exponential slowdown?

Answer Question 4 and you have your MASM bug defined... not very hard is it?



Ambition is a lame excuse for the ones not brave enough to be lazy.
http://www.oby.ro

jj2007

Quote
2) Hint: Can MASM preallocate a "large enough" memory buffer that will hold an unknown amount of data ?
First, we are talking about translating opcodes in machine language, not about actually allocating a buffer at runtime; second, it's a known (and tiny) amount of data.

Since I was not in the mood for riddles, I googled a bit. Seems I am not the first one to stumble over this bug, see this thread.
IMHO there is no plausible explanation for this bug other than a tired nerd.

BogdanOntanu

There is a very plausible and common explanation if I may say so...
Ambition is a lame excuse for the ones not brave enough to be lazy.
http://www.oby.ro

hutch--

While it is a commonly known MASM bug as Bogdan has explained, there is little chance of it every being changed as it is not a good technique to allocate large amounts of memory in the uninitialised data section. Dynamic memory allocation is far more flexible and can be adjusted to suit runtime conditions that cannot be known at assembly time.

One way you can set large amounts of memory at assembly time is to set the linker options to make a very large stack which is often done if the application uses very deep recursion.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

jdoe

Quote from: BogdanOntanu on January 25, 2008, 08:31:11 PM
Answer Question 4 and you have your MASM bug defined... not very hard is it?

Bogdan,

In fact, for someone like me that don't know how all the things are made under the hood, yes, it's hard to have an idea why.
Please, share your explanation. You piqued my curiosity about this bug and my thirst for knowledge is still not filled.

Like hutch said, I know it's not a good technique to do so but we all tried that one time or another while learning MASM.

Please, please, please   :P


sinsi

Quote from: hutch-- on January 25, 2008, 10:34:23 PM
... it is not a good technique to allocate large amounts of memory in the uninitialised data section.
If you need a buffer or three of a known size, what's wrong with doing it?
Light travels faster than sound, that's why some people seem bright until you hear them.

BogdanOntanu

Quote from: jdoe on January 26, 2008, 02:27:58 AM
Quote from: BogdanOntanu on January 25, 2008, 08:31:11 PM
Answer Question 4 and you have your MASM bug defined... not very hard is it?

Bogdan,
In fact, for someone like me that don't know how all the things are made under the hood, yes, it's hard to have an idea why.
Please, share your explanation. You piqued my curiosity about this bug and my thirst for knowledge is still not filled.

Like hutch said, I know it's not a good technique to do so but we all tried that one time or another while learning MASM.

Please, please, please   :P



First of all... let me say that you overestimate my knowledge. I do not know how ALL things work "under the hood"...only 3/4 of the Universe :P

However I do understand. The process of understanding can not be shared. Knowledge can be shared but by doing so you kill the intelligence of your interlocutor.

Then let me say that this understanding about MASM "well known bug" is not factual, I have no inside information on this issue. Instead it is a simple and plausible logical deduction... so to speak: "Elementary dear Watson.." ...

But since you ask so nice... I will try to share my thought process with you. maybe this way you will learn how to learn and understand the Universe "under the hood". Knowledge is irrelevant, understanding is everything.

Let us recapitulate our input data:
=========================

Fact 1)
======
When a large non initialized buffer is allocate in a MASM source code then MASM does stall and delay the compilation process "abnormally" for the task at hand.

Fact 2)
=======
This allocation is typically performed by something like this:


my_data db 400000h dup <?>



Fact 3)
=======
We all know this should not happen but it does.


Now the Analysis:
=============

Use info (3)
========
The problem must not be obvious for a programmer at first glance.
It must look like the correct solution. It can not be spotted with ease.

This means it is a logical bug and not a code bug. The hardest kind of bugs to find because the compiler has no ideea.

Use info (2)
========

This allocation makes use of DUP operator.
You have to know your tools. In MASM DUP is a macro. In fact you can write something like this:


buffer db 100 dup ( 5 dup (1,2,3,4) )


This means that the DUP operation will not be performed in one single steep.
And it also means that the premise of jj2007 that the amount of data is "known" is FALSE.

It is most likely that MASM will generate 40000h of db ? in a buffer and assemble that buffer. Other open source assembler perform exactly like this (but they name the keyword "TIMES").

Anyway this is only half of the problem.

The other half was suggested by my hnts.

MASM does have to preallocate a buffer for it's sections at startup. This buffer can not be fixed and static because this is "bad programming practice" in professional software development and anyway source code is dynamic, the programmer can and will at times generate or allocate any amount of data or code in any section.

Because of this the buffer for the section has to be dynamic and has to be REALLOCATED sometimes.

NOW, guess when will this buffer be re allocated my dear Watson?

Yes, it is more likely to be re allocated when a large amount of data is generated by the source code.
Something like the buffer of jj2007 here.

AHA... now we are getting close...

But still...Why this big slow down?
Surely  a simple re allocation and associated data copy might slow down things a little but  NOT THAT MUCH ?

Well my dear Watson...here comes Joe the Programmer into action.

Like all programmers Joe is obsessed with optimizations. So he decides he will not waste space with this re allocation, he will do the "wise" thing: ALLOCATE EXACTLY AS MUCH MEMORY AS NEEDED!. Not even one extra dword or god forbids 4K... otherwise his colleagues will make fun of him

And there is nothing more damaging to your intelligence than a group. They all "know" how tings should be done!
Of course nobody understands...but that is another issue... who needs understanding anyway ..when you can have "instant knowledge" with no need to think.

Anyway back to Sherlock...

So, do you remember the DUP operator?
Do you now realize what MASM has to do when it assemblers 4M of "db ?" in a repetitive manner?

The re allocation will be done for 4 bytes (or anyway a small amount) in order not to waste memory. But then another DB ? will come and the buffer will need reallocation AGAIN... and AGAIN and AGAIN  and AGAIN ... And data will have to be copied again and again and again....

Of course none of this would happen if Joe The Programmer would have had decided to allocate a bigger buffer at each reallocation and "waste" some memory in the process... but such is life.

Of course all this is just logical guess work... it could be a completely different reason. And of course you will say it is a stupid mistake and blah blah blah... but how many times  have YOU done "early optimizations" without really understanding the logical consequences of your actions?

However this looks simple, plausible and has exactly the observed effect... IMHO is is plausible under the circumstances.

It was not very hard...was it? Elementary my dear Watson....

But then again I have written my own assembler. It is very important to re-invent the wheel... otherwise you will never ever have a clue... and you will believe that the world is what you want it to be when in fact it is something else.

Hope it helps... but then I understand that it does not.



Ambition is a lame excuse for the ones not brave enough to be lazy.
http://www.oby.ro

BogdanOntanu

Quote from: sinsi on January 26, 2008, 02:54:54 AM
Quote from: hutch-- on January 25, 2008, 10:34:23 PM
... it is not a good technique to allocate large amounts of memory in the uninitialised data section.
If you need a buffer or three of a known size, what's wrong with doing it?

Nothing wrong IF the assembler does not have a bug exactly on this issue.

Then again, IF you do this kind of memory allocation then you can not present the user with an error message if the allocation fails.

The PE loader will perform the allocation for you so you loose some freedom... depends on your requirements.
Ambition is a lame excuse for the ones not brave enough to be lazy.
http://www.oby.ro

sinsi

Is it a bug though? Doesn't it just take a long time to build?
From what I remember (when I did it once) it assembled and linked OK, just took forever...
Quote from: BogdanOntanu on January 26, 2008, 05:26:59 AM
Then again, IF you do this kind of memory allocation then you can not present the user with an error message if the allocation fails.
Wouldn't windows come up with an error? insufficient memory,not a win32 exe, etc?
Light travels faster than sound, that's why some people seem bright until you hear them.


BogdanOntanu

Quote from: sinsi on January 26, 2008, 06:02:07 AM
Is it a bug though? Doesn't it just take a long time to build?
From what I remember (when I did it once) it assembled and linked OK, just took forever...

Yes it will take a very long time but eventually it will finish.
For me it was enough when it reached 2 minutes, I switched to TASM and now to my own assembler. But then I had very big projects. Clearely I could have just used dynamic memory allocation. And TASM is very similar to MASM anyway. My own assembler is similar to MASM and TASM ;)

Depends on how you define "a bug". A bug does not have to crash the application. As a matter of speaking: IF it is not written in the manual then it is a bug.


Quote
Wouldn't windows come up with an error? insufficient memory,not a win32 exe, etc?

Yes, windows does bring up an error message in such situations but it is not your message and you can not handle it ...

No argument here: you can surely use this method of allocation if you like it and you can live with the consequences.
I sure did and do use it when I feel like ...

Ambition is a lame excuse for the ones not brave enough to be lazy.
http://www.oby.ro

jdoe

Bogdan,

Thanks a lot for your very detailed post.

I'm not that stupid after all. Because I learned programming all by myself, I often think that I miss a lot of theory but multiples reallocations was my first guess about that bug but I couldn't explain why if I had to. You did perfectly.


Quote from: BogdanOntanu on January 26, 2008, 05:07:36 AM

Hope it helps... but then I understand that it does not.


I'm not that sure. Knowing more about DUP, workarounds to allocate larger buffer are more likely to be done faster. For example, instead of allocating 100000h of BYTE which take 2 minutes, why not allocate 40000h of DWORD ? Much faster. And why not 10000h of QWORD ?
With less reallocations (which seems to be closely related to the data type), now, for a 4M buffer it takes just a little bit more than 1 minute...

.DATA?
Buffer QWORD 40000h DUP (?)

Understanding a bug don't make it vanish but it sure helps   :wink


Quote from: BogdanOntanu on January 26, 2008, 05:07:36 AM

Knowledge can be shared but by doing so you kill the intelligence of your interlocutor.


I'm not sure I understand why you say that. Sometimes, a small detail is missing to have a bigger view on something. Finding the missing piece of the puzzle can be hard for an individual and inputs from others are important. Being stuck is not fun, oven more after 1 month or more. To me, keeping someone in ignorance, is more like killing the intelligence. I'm not sure I got your point though.


Anyway, thanks again for your post.