Hello everyone,
I'm having massive trouble trying to get this to work...
I'd like to build a procedure that will read a file, one character at a time, and then pass that character into an array that stores its value by frequency.
I've experimented, I have a proc that can build a freq table from a string.
I have a proc that can read in a file one character at a time.
My attempt to merge the two have failed.
This is a partial segment of my code.
I know, I'm doing something wrong... obviously, b/c its not working :p
Any feedback would be awesome.
Haven't test this but it should be about what your looking for.
CountFreq PROC uses edi esi pszFile:DWORD, pFreqBins:DWORD
LOCAL fsh:DWORD
LOCAL hFile:DWORD
LOCAL pFileData:DWORD
LOCAL cbFile:DWORD
invoke RtlZeroMemory,[pFreqBins],256
invoke CreateFile,[pszFile],GENERIC_READ,NULL,NULL,OPEN_EXISTING,NULL,NULL
mov [hFile],eax
invoke GetFileSize,[hFile],addr fsh
mov [cbFile], eax
invoke VirtualAlloc,NULL,eax,MEM_COMMIT,PAGE_READWRITE
mov [pFileData],eax
invoke ReadFile,[hFile],[pFileData],[cbFile],addr fsh,NULL
invoke CloseHandle,[hFile]
mov ecx,[fsh]
mov edi,[pFileData]
mov esi,[pFreqBins]
@@:
mov eax,[edi+ecx]
add DWORD PTR[esi+eax*4],1
dec ecx
jns @B
@@:
invoke VirtualFree,[pFileData],NULL,MEM_RELEASE
RET
CountFreq ENDP
It will fill a buffer of "bins" with the frequency that each one is encountered. For example A is 42h, so bin 42h will have the number of times character A appears in the file.
Edgar
Its like it partially works.. something is off...
So I test it with a file that contains:
ABCDEEFDABDEDAE
And when I use this method in conjunction with my sort proc i get:
D
A
B
C
F
I should get:
D
E
A
B
C
F
Well, I'd check the sort proc to make sure it isn't the problem. try dumping the array to make sure that the data for E is present and if it is revisit the sort proc. Here's the same thing with mapping so you don't have to allocate so much memory:
CountFreq2 PROC uses edi esi pszFile:DWORD, pFreqBins:DWORD
LOCAL fsh:DWORD
LOCAL hFile:DWORD
LOCAL MapFileOffset:DWORD
LOCAL hMapFile:DWORD
invoke RtlZeroMemory,[pFreqBins],1024
invoke CreateFile,[pszFile],GENERIC_READ,NULL,NULL,OPEN_EXISTING,NULL,NULL
mov [hFile],eax
invoke CreateFileMapping,[hFile],NULL,PAGE_READONLY,0,0,NULL
mov [hMapFile],eax
invoke MapViewOfFile,[hMapFile],FILE_MAP_READ,0,0,0
mov [MapFileOffset],eax
invoke GetFileSize,[hFile],addr fsh
mov ecx,eax
mov edi,[MapFileOffset]
mov esi,[pFreqBins]
xor eax,eax
@@:
mov al,[edi+ecx]
add DWORD PTR[esi+eax*4],1
dec ecx
jns @B
invoke UnmapViewOfFile,[MapFileOffset]
invoke CloseHandle,[hMapFile]
invoke CloseHandle,[hFile]
RET
CountFreq2 ENDP
Well, I tried it and dumped a file containing ABCDEFHIJA, the result was correct:
00403BE8: 00 02 01 01-01 01 01 00-01 01 01 00-00 00 00 00 ................
I cut out most of the dump because it was just filled with 0's. I used byte sized bins instead of DWORD sized ones for the test but I tested it with DWORD sized ones and it works fine as well.
@@: mov al,[edi]
inc dword ptr [esi+eax*4]
inc edi
dec ecx
jnz @B
This was an interesting little project, thanks. I have tested a sort for the bins and found that it is easier to store the character in the bin as well, so I defined a structure for the bin and recycled a sort routine for it. A quick test shows it works though since I wrote it pretty quickly I can't vouch for the speed...
CHARBIN STRUCT
sum dd ?
char dd ?
CHARBIN ENDS
CountFreq2 PROC uses edi esi pszFile:DWORD, pFreqBins:DWORD
LOCAL fsh:DWORD
LOCAL hFile:DWORD
LOCAL MapFileOffset:DWORD
LOCAL hMapFile:DWORD
invoke RtlZeroMemory,[pFreqBins],2048
mov eax,255
mov edi,[pFreqBins]
@@:
; Load the array
mov [edi+eax*8+4],eax
dec eax
jns @B
invoke CreateFile,[pszFile],GENERIC_READ,NULL,NULL,OPEN_EXISTING,NULL,NULL
mov [hFile],eax
invoke CreateFileMapping,hFile,NULL,PAGE_READONLY,0,0,NULL
mov [hMapFile],eax
invoke SetLastError,0
invoke MapViewOfFile,[hMapFile],FILE_MAP_READ,0,0,0
mov [MapFileOffset],eax
invoke GetFileSize,[hFile],addr fsh
mov ecx,eax
dec ecx
mov edi,[MapFileOffset]
mov esi,[pFreqBins]
xor eax,eax
@@:
mov al,[edi]
add DWORD PTR[esi+eax*8],1
inc edi ; thanks Dave
dec ecx
jns @B
invoke UnmapViewOfFile,[MapFileOffset]
invoke CloseHandle,[hMapFile]
invoke CloseHandle,[hFile]
RET
CountFreq2 ENDP
SortBins proc uses esi edi ebx edx pData:DWORD
LOCAL done :DWORD
LOCAL Jump :DWORD
LOCAL nItems :DWORD
LOCAL i :SDWORD
mov [nItems],256
mov Jump,255
mov eax,pData
.WHILE Jump > 1
shr Jump,1
.REPEAT
mov edi,nItems
sub edi,Jump
mov i,edi
mov edx,1
mov done,TRUE
.WHILE SDWORD PTR edx <= i
mov esi,Jump
add esi,edx
mov ebx,[eax+esi*8-8]
mov ecx,[eax+edx*8-8]
.IF ecx < ebx
mov [eax+edx*8-8],ebx
mov [eax+esi*8-8],ecx
;Swap the char data
mov ebx,[eax+esi*8-4]
mov ecx,[eax+edx*8-4]
mov [eax+edx*8-4],ebx
mov [eax+esi*8-4],ecx
mov done,FALSE
.ENDIF
inc edx
.ENDW
.UNTIL done
.ENDW
ret
SortBins endp
You call it like this:
Bins CHARBIN 256 DUP (<?>)
invoke CountFreq2,offset FileName,offset Bins
invoke SortBins,offset Bins
I made one small change to the CountFreq2 proc, it decrements the file size by one so we don't get the NULL at the end.
Tested on the following:
AABBCDDEFHIBBJA
result:
00403050: 04 00 00 00-42 00 00 00-03 00 00 00-41 00 00 00 ....B.......A...
00403060: 02 00 00 00-44 00 00 00-01 00 00 00-43 00 00 00 ....D.......C...
00403070: 01 00 00 00-46 00 00 00-01 00 00 00-49 00 00 00 ....F.......I...
00403080: 01 00 00 00-45 00 00 00-01 00 00 00-4A 00 00 00 ....E.......J...
00403090: 01 00 00 00-48 00 00 00-00 00 00 00-01 00 00 00 ....H...........
Edgar
Scalemail Ted,
The actual error is:
Quote mov al, sourcefile_buffer
That simple.
The problem is that you load only the last byte of a 4 byte register (the other three upper bytes remain unchanged), then you use the entire register to index the location to update. You can fix either by:
xor eax,eax
to clear eax BEFORE you load al, or use:
movzx eax,BYTE PTR sourcefile_buffer
which clears it while loading the LoByte.
Dave.
Thanks everyone, I have it working now! :U
I do have another question pertaining to the register sizes and writing data to a file.
What exactly is occurring when you write to a file. Do you have to convert your data into byte sized char values to read them into the buffer so that you can write it out to a file?
So far I can read a file, index all of its characters, sort the characters from most common to most rare, and now I want to write that out to a new file. but the newly created files contain inconsistent chars.
Im assuming I need to to convert the data before I can write it to a file?
Quotebut the newly created files contain inconsistent chars
the data you have created is binary, in this case dwords
what you'd probably like to see is a list of decimal numbers that represent the counts for each of the 256 ASCII characters
something like...
000 123
001 456
.
.
255 1234567890
those are decimal ASCII strings, seperated by a space, and followed by a carriage return/line feed sequence
the first column can just be generated - no need to convert - and can include the space character
the second column needs a little routine to convert binary dwords into decimal ASCII strings
a dword can hold unsigned values up to 4294967295 (10 decimal digits)
you can use a macro to do this conversion, but you don't really learn anything that way
use the forum search tool to see if you can find a dword to decimal ASCII routine
you can create the entire list in a buffer area, then write it to disk all at once
Quote from: dedndave on March 17, 2010, 11:55:45 AM
Quotebut the newly created files contain inconsistent chars
the data you have created is binary, in this case dwords
what you'd probably like to see is a list of decimal numbers that represent the counts for each of the 256 ASCII characters
something like...
000 123
001 456
.
.
255 1234567890
those are decimal ASCII strings, seperated by a space, and followed by a carriage return/line feed sequence
the first column can just be generated - no need to convert - and can include the space character
the second column needs a little routine to convert binary dwords into decimal ASCII strings
a dword can hold unsigned values up to 4294967295 (10 decimal digits)
you can use a macro to do this conversion, but you don't really learn anything that way
use the forum search tool to see if you can find a dword to decimal ASCII routine
you can create the entire list in a buffer area, then write it to disk all at once
awesome.
Just to make sure I understand properly. If I have an integer value stored within a DWORD (say for example 9000) and if I wanted to write it to a file I would use this latter method and convert it into a new decimal ascii string && then if I wanted to read that file later I would run a similar routine to reconvert that string back into my original number?
well - if you want the data in binary form, you may want to save it that way and skip the conversion steps
i figured you wanted a text file that you could view with notepad or something
but, you can store the raw binary data, then look at it with a hex editor or something to view it
Unless you need to read the string in ASCII outside of your code with something else, just write the DWORD data 4 bytes at a time and read it back the same way. Saves conversions in both directions.
yah - what Hutch said :P
Let me explain what I want to do...
It seems so simple its stupid to ask... but it has me stuck right now and I dont know how to deal with it yet.
I'm building a file compressor... and so now i want to start to write the zipped file.
the first thing i wanted to do was write a header containing the numerical count of total characters in the file and then my ascii/keycode index.
seems simple, but my counter which tallies up the character count is a dword... my test case was 9258 characters. How do I wirte this to a file so that when i decompress it my program will recogonize it?
Am i limited to just the char in write file...will i need to reanlate my keys in the header.... or will the computer know what the values are when it reads the file again with no additional actions from me?
write it in binary just as it is
understand, though, you won't be able to recognize the ASCII characters in the binary file if you use "type" at the prompt
to verify the values are being written correctly, use a hex editor
i use HxD - a very nice free one by Mael Horz
http://mh-nexus.de/en/
it is available in installable and portable (for bootable CD) versions
Awesome. Thanks!!!
ok... I have another question... I wanted to ask the user for the address to save the output file and then I wanted to ask then for a filename... then concanate the two ... I tried to create a var to store the first string then offset the edi but its not working... how do u concanate ???
To concatenate 2 strings you can do this (ANSI only)
mov edi,offset output
mov esi,offset string1
@@:
mov al,[esi]
mov [edi],al
inc edi
inc esi
test al,al
jnz @B
dec edi
mov esi,offset string2
@@:
mov al,[esi]
mov [edi],al
inc edi
inc esi
test al,al
jnz @B
Or if you want to use the API, PathAppend works well for your purposes and will add a backslash if necessary.
Edgar
Thanks!!!