News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Html Parser

Started by ragdog, September 14, 2008, 11:21:14 AM

Previous topic - Next topic

ragdog

Hi

I have a question to a html parser example or idea!

how I can get the string "Welcome username" and this date "12.08.2008 13:58" and show in a messagesbox?


here is a snipped from the html file:


<tr>
<td class="row2" width="100%">
<table style="border: 1px solid rgb(224, 224, 224);" cellpadding="2" cellspacing="0" width="100%">

                <tbody><tr>
                          <td style="text-align: center;" align="center" width="100%">
                                Welcome Username
                    </td>
                            </tr>
                <tr>
                    <td style="text-align: center;" align="center" width="100%">
                                <div align="center"><img src="index.php-Dateien/photo5019.jpg" alt="" border="0" height="48" width="64"></div>
                    </td>
                </tr>
                <tr>
                    <td style="text-align: center;" align="center" width="100%">
                                <span style="color: red; font-weight: bold;">Last visited:</span>
                    </td>
                </tr>
                <tr>
                    <td style="text-align: center;" id="tdblock" align="center" width="100%">
                                12.08.2008
                    </td>
                </tr>
               
                <tr>
                    <td align="left" width="100%">


Can you help me please

regards
ragdog


hutch--

I have not made any great efort to parse HTML but it looks like a byte scanner that is geared to scan for opening and closing angle brackets "< & >", record the data within the tags and match opening and closing tags. Now what will make that fun is the way HTML allows multiple forms within a pair of angle brackets.

If you are after a crude and simple method, try a library module in the masm32 library that strips out "<" and content and ">" and what you have left appears to be what you want from the text.


StripRangeI proc lpszSource:DWORD,lpszDest:DWORD,stByte:BYTE,enByte:BYTE
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

hoverlees

You can try regular expression although it's not the best way.

the regular expression

<tr>[\s]*\r\n[\s]*<td [^>]*>[\s]*\r\n[\s]*([\w ]*)[\s]*\r\n[\s]*</td>[\s]*\r\n[\s]*</tr>

matchs

<tr>
                          <td style="text-align: center;" align="center" width="100%">
                                Welcome Username
                    </td>
                            </tr>

and the first match is "Welcome Username"

hoverlees

Here I write a simple code,and you can download the perl regular expression lib from here.
http://www.masm32.com/board/index.php?topic=1922.0



[attachment deleted by admin]

ragdog

Nice big thanks to Hutch and hoverlees for this nice example :U

I must now of all study how it works

best regards