News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Combining Identical String Literal

Started by msqweasm, June 08, 2011, 11:46:59 AM

Previous topic - Next topic

msqweasm

If I have something that C++ would call it "string literal" in my src file,  I know I should define it like this:

.data
mymsg DB 'hello world', 0aH, 00H

My code later on will print this hello world message using the mymsg label.  But if I have many src files each is having the same hello world "string literal".  There would be many duplicated const string.  Is it possible for me to instruct the linker (MS Visual C++ 2010) to combine them automatically into 1 constant string?  C++ compiler is able to do this automatically.  How do we do this automatically in assembly?  I don't want to manually verify the messages are the same and manually reference the same external label.  I just want the linker to recognize identical constant strings and combine them into 1.

hutch--

Nope. You do as you have shown, make a single common strin then reference it by its address from as many places as you need.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

jj2007

Assemble this as MyStrings.asm to MyStrings.obj:
.386
.model flat

ExternDef _AppName:BYTE
ExternDef _Hello:BYTE

.data
_AppName db "Masm32:", 0
_Hello db "Hello World", 0

.code
dummy:
end dummy


Call it like this, adding MyStrings.obj to the linker's commandline:
include \masm32\include\masm32rt.inc

ExternDef AppName:BYTE
ExternDef Hello:BYTE

.code
start: MsgBox 0, addr Hello, addr AppName, MB_OK
exit

end start

drizz

You must use macros for this. There are ready made macros by Four-F, "Strings.mac" part of KmdKit.
http://www.freewebs.com/four-f/

QuoteThe following macros try to eliminate duplicate strings. So only a single copy of identical strings
will present in the program image, resulting in smaller programs.

$TA / $CTA / $TW / $CTW / $T / $CT
$TA0 / $CTA0 / $TW0 / $CTW0 / $T0 / $CT0

Every time you define a string using one of the above listed macros the string is stored in database.
If somewhere later in your code you use the very same macro to define the very same string, the macro
will remember its offset from the database instead of defining it second time.
The truth cannot be learned ... it can only be recognized.

jj2007

Quote from: drizz on June 08, 2011, 08:06:01 PM
You must use macros for this.

MasmBasic does it by default for some functions:
Quoteinclude \masm32\MasmBasic\MasmBasic.inc   ; Download
   
Init
   .if Exist("\Masm32\include\Windows.inc")
      MsgBox 0, "\Masm32\include\Windows.inc", "Load this file?", MB_YESNO
      .if eax==IDYES
         Recall "\Masm32\include\Windows.inc", L$()
         Print Str$("%i lines read", eax), CrLf$
         Print Str$("%i bytes read", Lof("\Masm32\include\Windows.inc")), CrLf$, CrLf$
         For_ n=0 To 9
            .if Len(L$(n))
               PrintLine Str$(n), Tb$, L$(n)
            .endif
         Next
      
.endif
   .endif
   Inkey Str$("\n\n%i identical strings condensed to one - launch Olly and look for 'push 404000' ...", found+1)
   
Exit
end start

4 identical strings condensed to one - launch Olly and look for 'push 404000' ...

xandaz

   Lol... dummy as a starting point.

jj2007

Quote from: xandaz on June 08, 2011, 10:07:31 PM
   Lol... dummy as a starting point.

It won't assemble without a .code section... if anybody knows how to create a pure .data object file, please show up :thumbu

hutch--

Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

msqweasm

I've done some research and found that MS VC++ combine identical strings by placing some information into something called COMDAT section.  It is the linker that combines the strings using the COMDAT section.  I don't know albout all the details yet.  But does ML.exe support this COMDAT section thing?

dedndave

that might cause problems
what if 2 strings are identical at assembly-time, but one of them gets modified during execution
the linker would have no way of knowing this

jj2007

Quote from: msqweasm on June 09, 2011, 06:59:12 AM
I've done some research and found that MS VC++ combine identical strings by placing some information into something called COMDAT section.  It is the linker that combines the strings using the COMDAT section.

See above, reply #2, adding MyStrings.obj to the linker's commandline

drizz

Quote from: msqweasm on June 09, 2011, 06:59:12 AM
I've done some research and found that MS VC++ combine identical strings by placing some information into something called COMDAT section.  It is the linker that combines the strings using the COMDAT section.  I don't know albout all the details yet.  But does ML.exe support this COMDAT section thing?
I think it doesn't. You would still need to use macros if it did.
There is an interesting topic about it here.
Unfortunately the tool mentioned for marking comdat is not available anymore:
http://web.archive.bibalex.org/web/20040609211043/http://launcherasm.com/technical/comdats.html

Japheth is probably the best person to ask about this. Feel free to start a discussion on comdat segments for jwasm here

Quote from: jj2007 on June 08, 2011, 10:28:26 PMIt won't assemble without a .code section... if anybody knows how to create a pure .data object file, please show up :thumbu
You don't need entry point labels for every module. You mustn't put any when linking with C runtime.

This is is the smallest compilable asm file : end
The truth cannot be learned ... it can only be recognized.

sinsi

One way to do it is to create a seperate .asm for each string, assemble to a .obj then add each .obj to a .lib
Link with the .lib and the linker will pick out the strings you want.
Light travels faster than sound, that's why some people seem bright until you hear them.