I have always wondered the best way to buffer a incoming TCP Winsock stream
I have seen several different methods. What i used to do it receive until i had one full "data packet" than process it, than shuffle around the whole buffer for each one. That does not seem too correct. A circular buffer would be awesome accept when it overlaps :) so i have come up with what i have below. It is just in some made up pseudo language. And designed around the header is 6 bytes, the first byte is 0x02a, second byte is channel and the length is a word long after a DWORD. (OSCAR :) I know this is a masm forum and not a winsock forum, but does anyone have advice on how they manage their buffers with minimal shuffling?
GlobalBuffer
GBSize
GBLength
-----
Recieve Procedure
LocalBuffer; Begin
LBSize ; End
LBPos ; Position
LBLength ; Position filled to
COPY GlobalBuffer -> LocalBuffer
LBPos=0;
LBLength=GBLength
LBLength += Recv(LocalBuffer,LBSize-LBLength)
IF (LBLength==0)
QUIT
WHILE TRUE
IF (LBLength -LBPos) < 6 ; Not a Full HEADER
STORE IN GlobalBuffer
BREAK
IF (BYTE PTR[LocalBuffer+LBPos] != 0x2a) ; Not a Valid HEADER
QUIT;
IF (WORD PTR[LocalBuffer+LBPos+4) > LBLength-LBPos-6 ; Not a Full Packet
STORE IN GlobalBuffer
BREAK
IF (CHANNEL ==2)
Proccess2(LocalBuffer+LBPos+6,DataLength) ; Process Without HEADER
ELSEIF (CHANNEL ==1)
Proccess1(LocalBuffer+LBPos+6,DataLength) ; Process Without HEADER
ELSEIF (CHANNEL ==3)
Proccess3(LocalBuffer+LBPos+6,DataLength) ; Process Without HEADER
ELSEIF (CHANNEL ==4)
Proccess4(LocalBuffer+LBPos+6,DataLength) ; Process Without HEADER
INCREASE LBPOS by DataLength + 6
ENDWHILE
END PROCEDURE
Use a single buffer into which you accumulate incoming data, it needs to be sufficiently large to take the largest chunk of data you need. You can't control the size of the data over the socket, the tcp/ip stack will packetize and fragment it.
Use a sliding window within that buffer to access the data, when you reach a threshold pull down the remaining data to the beginning of the buffer with a single memcpy, adjust the end of buffer pointer, and start accumulating data from the socket again.
Alternatively use a state-machine to track if you are waiting for header data or processing a chunk of channel data. When you accumulate enough data to handle the header process it, once you have channel data start filling the channel buffer, once you have all the channel data feed it off to the processing routine, and then go back to collecting a header. Repeat. State specific data request sizes would minimize the amount of unnecessary copying required.
-Clive
so maybe more like this.. I can move a few more things inside loop instead of after it, but not as clear to read.
BuffBegin
BuffEnd
WindBegin
WindEnd
ReceiveProc
size = Recv(WindEnd,BuffEnd - WindEnd);
if (size < 1)
Quit ;Winsock Error or Disconnect
endif
WindEnd +=size ; Advance end of window
while(true)
if (WindBegin - WindEnd) < 6
break ; Not a Full Header
endif
if (BYTE PTR[WindBegin] != 0x2a)
Quit ;Not a Valid Header
endif
need = WORD PTR[WindBegin+4)
have = WindEnd - WindBegin - 6
if ( have < need)
break ;Not Enough Data
endif
WindBegin += 6 ; Skip Header
Process(WindBegin, need)
WindBegin += need ; Advance Begin of window
endwhile
if ( WindBegin == WindEnd)
WindBegin = BufferBegin ; empty window so reset to
WindEnd = BufferBegin ; begining of buffer
return
endif
if (WindowBegin + Need > BufferEnd)
if (WindowBegin == BufferBegin)
Quit ; Data cannot fit in buffer
endif
moving = WindowEnd-WindowBegin ; Get size of window
memmove(BufferBegin,WindowBegin,moving) ; Slide Data to beginning of buffer
WindowBegin=BufferBegin ; Reset pointers to beginning
WindowEnd=BufferBegin+moving ; of buffer
endif
End RecieveProc
Like clive said, cut and slice, which can be done in many ways.
If you need a senddata routine I have made one that you can use:
; Returns SOCKET_ERROR on failure and 0 on success
TransferData PROC TheSock:DWORD, DataPtr:DWORD, DataLen:DWORD
push esi
push ebx
push edi
push ebp
xor esi, esi
mov ebx, DataLen
mov edi, DataPtr
mov ebp, TheSock
jmp again
ALIGN 16
pre:
add edi, eax
again:
INVOKE send, ebp, edi, ebx, esi
test eax, eax
js abandon
sub ebx, eax
jnz pre
mov eax, esi
abandon:
pop ebp
pop edi
pop ebx
pop esi
ret
TransferData ENDP
A sliding window offers no benefit over a circular buffer with reference to avoiding overlapping. The solution is simply to ensure that your buffer is big enough (in both cases.)
The sliding window substitutes a little complexity in preference for repeated copying of the data (to push it back to the start of the buffer.)
For the circular buffer you avoid the copying, but need to deal with wrapping which results in an extra recv call - though it can often be avoided by resetting the read & write pointers when the buffer becomes empty (which should happen regularly, as network data comes in bursts.)
Well it all depends. The pull-down method works exceedingly well in situations where you consume the majority of the buffer, and for that matter if you have to process the data in-place using other functions that can't handle the fragmentation. The code to handle the roll-over point in a circular buffer can be quite significant and pervasive, and frankly I've seen far to many examples where coders simply fail to handle buffer boundary spanning conditions properly. The cost of moving a few hundred bytes, once every few dozen KB, can in many circumstances by far cheaper than checking/handling the boundary cases for every byte processed. It is certainly far cheaper than having to double-buffer the data all the time.
People should pick a method based on the most effective one for the data they are handling, and the nature of that data and it's flow. That and what they can reliably/robustly implement and test. In many cases it's far more important that code functions correctly and is delivered on time, than is the optimum solution.
Circular buffer do work exceedingly well in hardware designs where the wrapping can be totally transparent.
Another trick with circular buffers is to have a slightly larger buffer that extends beyond the natural wrap point, and copy data from the front to the extended portion. The cost of copying a few dozen bytes from known locations, vs masking the address/offset on every access. Or one can simply handle the wrap/span case by copying data, in that situation, to a secondary linear buffer.
When a burst stops, the ping time between your client and the server can be 1 ms for example. In that short amount of time you can probably copy a Megabyte of data around in memory while waiting for the next burst. You just have to know when a stream is completed (not when a burst within a stream is completed)
Like it was said above, it depends what you are going to use it for. Some protocols works best with small chunks of data, the irc protocol for example, interprets bursts of 512 characters. I'm not sure with telnet, ftp but taking small chunks of data and interpreting them in a fixed manner and delay can give better responsiveness on the client.
EDIT: 100 MB of data using masm32 MemCopy in 78 ms. which is 1,28 MB of data per ms or 1282 bytes per microsecond, certainly plenty enough time to parse irc messages of 512 characters :naughty: