News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Character Tree code question

Started by hutch--, March 19, 2005, 01:47:23 PM

Previous topic - Next topic

Tedd

Quote from: gabor on March 24, 2005, 08:26:44 AM
Tedd, please don't have mercy on me and hit me with an explanation or link to resources about the Finite State Recognisers!

Just for you :lol

Strings to recognize
mov movs movsb movsd movsw movsx movzx

Table for the recognizer

  .abcdefghijklmnopqrstuvwxyz
0 -------------1-------------
1 ---------------2-----------
2 ----------------------3----
3 #------------------4-------
4 #-#-#------------------##-5
5 ------------------------#--

the . means null (end of string) - it's required when a string is also the start of another string (eg. mov is the start of movs/movsb/etc)

Examples
input = "movsw"
start @ state 0
(m,0) = 1              ;in state 0, with char 'm', therefore move to state 1
(o,1) = 2
(v,2) = 3
(s,3) = 4
(w,4) = #              ;reached accept-state, so "movsw" is recognized

input = "mov"
start @ state 0
(m,0) = 1
(o,1) = 2
(v,2) = 3
(.,3) = #              ;actually no more input, so check '.' ==> recognize

input = "moo"
start @ state 0
(m,0) = 1
(o,1) = 2
(o,2) = -              ;failed!! (not recognized)

input = "jmp"
start @ state 0
(j,0) = -              ;failed!! (not recognized)




for general instruction set (no fpu/mmx/sse) ~ 260 instructions, alphabet = {null,a..z,0..9}: table is 220 states :- size < 8kB

(inserting strings by length, rather than alphabetically, allows to use state indices as offsets from current - allowing >256 states; while still allowing indices to be bytes)
No snowflake in an avalanche feels responsible.

QvasiModo

Hutch--, the biggest problem with this trees is not speed but size. Maybe you could "compact" the tree by storing variable length strings instead of just characters (I think this variant of trie is called "patricia").

hutch--

I have the test piece nearly working but its not much related to any of the tree structures I have so far seen. With the example by Rea and what I could scour off the internet, it seems what I had in mind is different to the common notion of a binary tree. The one I have nearly going is iterative, not recursive and it does the comparison on match/nomatch, not greater/less than.

Its shape will be dictated by the order of words written to it and it does not look like it will balance all that well but as an iterative algo it should be fast enough anyway but I won't know that until its up and reliable so I can benchmark it.

What is critical to its operation is that is can be used while scanning text to both add new words and test words so running a balancing phase is not really possible. More to come later.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php