Hi all,
I'm writing a simple program which should be able to retrieve predefined info from a predefined site, and for this i need to connect to the site using the HTTP protocol.
Although the program is working like it should on most sites, on some sites it only returns half of the data: only the http header.
When i use a similar technique with telnet or netcat to do the same manually it does return the complete page.
I've tested this on the following sites:
www.google.nl/ --> works fine
www.start.nl/ --> works fine
www.white-scorpion.nl/old/ --> only http header
www.masmforum.com/simple/index.php --> only http header or half the site.
www.white-scorpion.nl/forums/index.php --> only http header + half of page
www.masm32.com/mlicence.htm --> works fine.
here's the code:
.686
.model flat,stdcall
option casemap:none
include \masm32\include\windows.inc
include \masm32\include\kernel32.inc
includelib \masm32\lib\kernel32.lib
include \masm32\include\user32.inc
includelib \masm32\lib\user32.lib
include \masm32\include\user32.inc
includelib \masm32\lib\user32.lib
include \masm32\include\wsock32.inc
includelib \masm32\lib\wsock32.lib
MakeConnection PROTO
.DATA
send1 db "GET ",0
send2 db " HTTP/1.1",13,10
db "Host: ",0
send3 db 13,10,"User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1) WebAlert",13,10
db "Accept: */*",13,10
db "Accept-Language: en-us",13,10
db "Proxy-Connection: Keep-Alive",13,10,13,10,0
DomainUrl db "www.masm32.com",0
DirUrl db "/mlicence.htm",0
PortNumber DWORD 80
.DATA?
wsa WSADATA <>
sin sockaddr_in <>
connectsock SOCKET ?
SendBuf db 4096 dup (?)
RecvBuf db 24096 dup (?) ; *** Should be sufficient ***
.CODE
start:
invoke MakeConnection
invoke ExitProcess,0
;# The actual connection procedure
;#################################################
MakeConnection PROC
invoke lstrcpy,addr SendBuf,addr send1
invoke lstrcat,addr SendBuf,addr DirUrl
invoke lstrcat,addr SendBuf,addr send2
invoke lstrcat,addr SendBuf,addr DomainUrl
invoke lstrcat,addr SendBuf,addr send3
invoke MessageBoxA,NULL,addr SendBuf,addr DomainUrl,MB_OK
invoke WSAStartup,0101h,addr wsa
invoke socket,AF_INET,SOCK_STREAM,0
.IF eax==INVALID_SOCKET
invoke WSACleanup
mov eax,1
ret
.ENDIF
mov connectsock,eax
mov sin.sin_family,AF_INET
invoke htons,PortNumber
mov sin.sin_port,ax
invoke gethostbyname,addr DomainUrl
.IF eax==0
invoke WSACleanup
mov eax,1
ret
.ENDIF
mov eax,[eax+12]
mov eax,[eax]
mov eax,[eax]
mov sin.sin_addr,eax
invoke connect,connectsock,addr sin,sizeof sin
.IF eax==SOCKET_ERROR
invoke WSACleanup
mov eax,1
ret
.ENDIF
invoke send,connectsock,addr SendBuf,sizeof SendBuf,0
.IF eax==SOCKET_ERROR
invoke WSACleanup
mov eax,1
ret
.ENDIF
invoke recv,connectsock,addr RecvBuf,sizeof RecvBuf,0
.IF eax==SOCKET_ERROR
invoke WSACleanup
mov eax,1
ret
.ENDIF
invoke closesocket,connectsock
invoke MessageBoxA,NULL,addr RecvBuf,addr DomainUrl,MB_OK
xor eax,eax
ret
MakeConnection ENDP
end start
I don't really see any logic in it.
Some pages change their response when you change the order in which the http header members are send:
e.g.
get / http/1.1
host:
user-agent:
accept:
or
get / http/1.1
accept:
user-agent:
host:
i've also tried adding and removing the proxy-settings and everything else that is send by or Internet explorer or Firefox when requesting a page.
Any ideas on this? I've tried everything i can think of so any help is appreciated!!!
Hello White Scorpion :)
First, I'd recommend you using the wininet or winhttp libraries, they're much easier to use and solve a lot of problems. Unless you want to use sockets for some reason (learning, for example).
About the code, I see the following problems:
1) The "Proxy-Connection" header is only meant to be used with proxies, not webservers. It surely does no harm but there's no reason to put it there - specially if you're not implementing Keep-Alive to begin with (you're closing the connection when you're done). For more info on the HTTP protocol you can go straight to the specs (boring) or read "HTTP The Definitive Guide" (http://www.oreilly.com/catalog/httptdg/) (much better).
2) The "send" and "recv" functions are meant to be called in a loop. That's because they may not send all of the data, nor fill the whole buffer, in a single call. Maybe your program isn't sending the whole request, that would explain the different server responses when changing the order of the headers (that shouldn't happen). Check out MSDN for more info.
3) You're not reading the "Content-Length" header at all from the response. Then how do you know when to stop reading? Same problem as above, you have to call recv() in a loop. But you also have to do some memory management, since you don't know the size of the response.
http://msdn.microsoft.com/library/en-us/winsock/winsock/recv_2.asp
http://msdn.microsoft.com/library/en-us/winsock/winsock/send_2.asp
Hope that helps! :U
Thanks for your response!
I know i don't need the Proxie part, i also know i should try looping it, but since i need to have the whole content in one buffer this makes things a bit more difficult, although i have tried so.
I can't imagine the HTTP protocol is the problem, but still, i am interested in the book ;-)
I will take a look at wininet and winhttp libraries since i never heared of them before and it might be a perfect solution :U
I will let you know how it worked out.
Hello
When you are connected to the server, send the request all at once, as a single string,
terminated with a double CRLF: this will "tell" the server this is the end of the request.
Knowing this he will determine what is the host, the ressource, its location, and send it in packets.
Try these 3 requests:
GET /? HTTP/1.0 CRLF
Host: www.white-scorpion.nl CRLFCRLF
GET /old? HTTP/1.0 CRLF
Host: www.white-scorpion.nl CRLF
User-Agent: Agent_WhiteScorpion_TestingPurposeBeta1.0 CRLFCRLF
GET /old/index.php?action=1&id=28 CRLF
Host: www.white-scorpion.nl CRLFCRLF
Additional notes
- "HTTP/1.0" value will be handled correctly by a HTTP/1.1 compliant server. Opposite is not sure.
- If you specify the user-agent, tell the real one to the server (Unless you're writing the Mozilla browser lol!)
Next step for you is to parse client-side the HTTP header from the server.
See RFC 2616 for additional exhaustive informations.
Best Regards,
Flax
If you just want the file rather than displaying it in a browser, try this, it seems to work fine.
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
include \masm32\include\masm32rt.inc
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
comment * -----------------------------------------------------
Build this template with
"CONSOLE ASSEMBLE AND LINK"
----------------------------------------------------- *
include \masm32\include\urlmon.inc
includelib \masm32\lib\urlmon.lib
.code
start:
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
call main
exit
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
main proc
fn URLDownloadToFile,0, \
"http://www.masm32.com/index.htm", \
"masm32.htm",0,0
ret
main endp
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
end start
Quote from: Flax on October 30, 2005, 08:53:10 AM
- If you specify the user-agent, tell the real one to the server (Unless you're writing the Mozilla browser lol!)
Actually, it's alright to use "Mozilla compatible" user-agents. Even Internet Explorer does this... :wink
Hi!
The section 14.43 of the RFC 2616 explains it quite welll... Of course as a programmer you can make the program say what you want but as a web administrator it's just an indication.
That being said it's both a protocol and Winsock error i guess he's welcome to check madwizzard.org for socks examples (although they're not HTTP compliant from my point of view).
Ciao,
Flax