News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

HTTP protocol or coding error?

Started by white scorpion, October 13, 2005, 10:55:43 AM

Previous topic - Next topic

white scorpion

Hi all,

I'm writing a simple program which should be able to retrieve predefined info from a predefined site, and for this i need to connect to the site using the HTTP protocol.
Although the program is working like it should on most sites, on some sites it only returns half of the data: only the http header.

When i use a similar technique with telnet or netcat to do the same manually it does return the complete page.

I've tested this on the following sites:

www.google.nl/  --> works fine
www.start.nl/  --> works fine
www.white-scorpion.nl/old/ --> only http header
www.masmforum.com/simple/index.php --> only http header or half the site.
www.white-scorpion.nl/forums/index.php --> only http header + half of page
www.masm32.com/mlicence.htm --> works fine.

here's the code:


.686
.model flat,stdcall
option casemap:none

include \masm32\include\windows.inc

include \masm32\include\kernel32.inc
includelib \masm32\lib\kernel32.lib

include \masm32\include\user32.inc
includelib \masm32\lib\user32.lib

include \masm32\include\user32.inc
includelib \masm32\lib\user32.lib

include \masm32\include\wsock32.inc
includelib \masm32\lib\wsock32.lib

MakeConnection PROTO

.DATA
send1       db "GET ",0
send2       db " HTTP/1.1",13,10
            db "Host: ",0
send3       db 13,10,"User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1) WebAlert",13,10
            db "Accept: */*",13,10
            db "Accept-Language: en-us",13,10
            db "Proxy-Connection: Keep-Alive",13,10,13,10,0
           
 

DomainUrl   db "www.masm32.com",0
DirUrl      db "/mlicence.htm",0     
PortNumber  DWORD 80


.DATA?

wsa             WSADATA     <>
sin             sockaddr_in <>
connectsock     SOCKET ?
SendBuf         db 4096 dup (?)
RecvBuf         db 24096 dup (?) ; *** Should be sufficient ***



.CODE
start:
    invoke MakeConnection
    invoke ExitProcess,0

;# The actual connection procedure
;#################################################
MakeConnection PROC
    invoke lstrcpy,addr SendBuf,addr send1
    invoke lstrcat,addr SendBuf,addr DirUrl
    invoke lstrcat,addr SendBuf,addr send2
    invoke lstrcat,addr SendBuf,addr DomainUrl
    invoke lstrcat,addr SendBuf,addr send3


    invoke MessageBoxA,NULL,addr SendBuf,addr DomainUrl,MB_OK
   
    invoke WSAStartup,0101h,addr wsa
    invoke socket,AF_INET,SOCK_STREAM,0
    .IF eax==INVALID_SOCKET
        invoke WSACleanup
        mov eax,1
        ret
    .ENDIF   
    mov connectsock,eax
    mov sin.sin_family,AF_INET
    invoke htons,PortNumber
    mov sin.sin_port,ax

    invoke gethostbyname,addr DomainUrl
    .IF eax==0
        invoke WSACleanup
        mov eax,1
        ret
    .ENDIF   
    mov eax,[eax+12]
    mov eax,[eax]
    mov eax,[eax]
    mov sin.sin_addr,eax

    invoke connect,connectsock,addr sin,sizeof sin
    .IF eax==SOCKET_ERROR
        invoke WSACleanup
        mov eax,1
        ret
    .ENDIF

    invoke send,connectsock,addr SendBuf,sizeof SendBuf,0
    .IF eax==SOCKET_ERROR
        invoke WSACleanup
        mov eax,1
        ret
    .ENDIF
    invoke recv,connectsock,addr RecvBuf,sizeof RecvBuf,0
        .IF eax==SOCKET_ERROR
            invoke WSACleanup
            mov eax,1
            ret
        .ENDIF
    invoke closesocket,connectsock       
    invoke MessageBoxA,NULL,addr RecvBuf,addr DomainUrl,MB_OK 
    xor eax,eax
    ret
MakeConnection ENDP

end start   


I don't really see any logic in it.
Some pages change their response when you change the order in which the http header members are send:

e.g.

get / http/1.1
host:
user-agent:
accept:


or


get / http/1.1
accept:
user-agent:
host:


i've also tried adding and removing the proxy-settings and everything else that is send by or Internet explorer or Firefox when requesting a page.

Any ideas on this? I've tried everything i can think of so any help is appreciated!!!




QvasiModo

Hello White Scorpion :)

First, I'd recommend you using the wininet or winhttp libraries, they're much easier to use and solve a lot of problems. Unless you want to use sockets for some reason (learning, for example).

About the code, I see the following problems:

1) The "Proxy-Connection" header is only meant to be used with proxies, not webservers. It surely does no harm but there's no reason to put it there - specially if you're not implementing Keep-Alive to begin with (you're closing the connection when you're done). For more info on the HTTP protocol you can go straight to the specs (boring) or read "HTTP The Definitive Guide" (much better).

2) The "send" and "recv" functions are meant to be called in a loop. That's because they may not send all of the data, nor fill the whole buffer, in a single call. Maybe your program isn't sending the whole request, that would explain the different server responses when changing the order of the headers (that shouldn't happen). Check out MSDN for more info.

3) You're not reading the "Content-Length" header at all from the response. Then how do you know when to stop reading? Same problem as above, you have to call recv() in a loop. But you also have to do some memory management, since you don't know the size of the response.

http://msdn.microsoft.com/library/en-us/winsock/winsock/recv_2.asp
http://msdn.microsoft.com/library/en-us/winsock/winsock/send_2.asp

Hope that helps!  :U

white scorpion

Thanks for your response!

I know i don't need the Proxie part, i also know i should try looping it, but since i need to have the whole content in one buffer this makes things a bit more difficult, although i have tried so.

I can't imagine the HTTP protocol is the problem, but still, i am interested in the book ;-)
I will take a look at wininet and winhttp libraries since i never heared of them before and it might be a perfect solution  :U

I will let you know how it worked out.



Flax

Hello

When you are connected to the server, send the request all at once, as a single string,
terminated with a double CRLF: this will "tell" the server this is the end of the request.
Knowing this he will determine what is the host, the ressource, its location, and send it in packets.

Try these 3 requests:

GET /? HTTP/1.0 CRLF
Host: www.white-scorpion.nl CRLFCRLF

GET /old? HTTP/1.0 CRLF
Host: www.white-scorpion.nl CRLF
User-Agent: Agent_WhiteScorpion_TestingPurposeBeta1.0 CRLFCRLF

GET /old/index.php?action=1&id=28 CRLF
Host: www.white-scorpion.nl CRLFCRLF


Additional notes
- "HTTP/1.0" value will be handled correctly by a HTTP/1.1 compliant server. Opposite is not sure.
- If you specify the user-agent, tell the real one to the server (Unless you're writing the Mozilla browser lol!)

Next step for you is to parse client-side the HTTP header from the server.
See RFC 2616 for additional exhaustive informations.

Best Regards,
Flax

hutch--

If you just want the file rather than displaying it in a browser, try this, it seems to work fine.


; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    include \masm32\include\masm32rt.inc
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

comment * -----------------------------------------------------
                        Build this  template with
                       "CONSOLE ASSEMBLE AND LINK"
        ----------------------------------------------------- *

    include \masm32\include\urlmon.inc
    includelib \masm32\lib\urlmon.lib

    .code

start:
   
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

    call main

    exit

; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

main proc

    fn URLDownloadToFile,0, \
                    "http://www.masm32.com/index.htm", \
                    "masm32.htm",0,0


    ret

main endp

; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

end start
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

QvasiModo

Quote from: Flax on October 30, 2005, 08:53:10 AM
- If you specify the user-agent, tell the real one to the server (Unless you're writing the Mozilla browser lol!)

Actually, it's alright to use "Mozilla compatible" user-agents. Even Internet Explorer does this... :wink

Flax

Hi!
The section 14.43 of the RFC 2616 explains it quite welll... Of course as a programmer you can make the program say what you want but as a web administrator it's just an indication.
That being said it's both a protocol and Winsock error i guess he's welcome to check madwizzard.org for socks examples (although they're not HTTP compliant from my point of view).
Ciao,
Flax