News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Developing a Winsock read procedure

Started by joemc, March 02, 2010, 08:02:38 PM

Previous topic - Next topic

joemc

I have always wondered the best way to buffer a incoming TCP Winsock stream
I have seen several different methods. What i used to do it receive until i had one full "data packet" than process it, than shuffle around the whole buffer for each one. That does not seem too correct.  A circular buffer would be awesome accept when it overlaps :) so i have come up with what i have below.  It is just in some made up pseudo language. And designed around the header is 6 bytes, the first byte is 0x02a, second byte is channel and the length is a word long after a  DWORD. (OSCAR :)   I know this is a masm forum and not a winsock forum, but does anyone have advice on how they manage  their buffers with minimal shuffling?

GlobalBuffer
GBSize
GBLength
-----
Recieve Procedure

  LocalBuffer; Begin
  LBSize     ; End
  LBPos      ; Position
  LBLength   ; Position filled to

  COPY GlobalBuffer -> LocalBuffer
  LBPos=0;
  LBLength=GBLength

  LBLength += Recv(LocalBuffer,LBSize-LBLength)

  IF (LBLength==0)
    QUIT

  WHILE TRUE

    IF (LBLength -LBPos) < 6  ; Not a Full HEADER
      STORE IN GlobalBuffer
      BREAK

    IF (BYTE PTR[LocalBuffer+LBPos] != 0x2a) ; Not a Valid HEADER
      QUIT; 

    IF (WORD PTR[LocalBuffer+LBPos+4) > LBLength-LBPos-6 ; Not a Full Packet
      STORE IN GlobalBuffer
      BREAK

    IF (CHANNEL ==2)
      Proccess2(LocalBuffer+LBPos+6,DataLength)  ; Process Without HEADER
    ELSEIF (CHANNEL ==1)
      Proccess1(LocalBuffer+LBPos+6,DataLength)  ; Process Without HEADER
    ELSEIF (CHANNEL ==3)
      Proccess3(LocalBuffer+LBPos+6,DataLength)  ; Process Without HEADER
    ELSEIF (CHANNEL ==4)
      Proccess4(LocalBuffer+LBPos+6,DataLength)  ; Process Without HEADER
 
    INCREASE LBPOS by DataLength + 6

  ENDWHILE
END PROCEDURE

clive

Use a single buffer into which you accumulate incoming data, it needs to be sufficiently large to take the largest chunk of data you need. You can't control the size of the data over the socket, the tcp/ip stack will packetize and fragment it.

Use a sliding window within that buffer to access the data, when you reach a threshold pull down the remaining data to the beginning of the buffer with a single memcpy, adjust the end of buffer pointer, and start accumulating data from the socket again.

Alternatively use a state-machine to track if you are waiting for header data or processing a chunk of channel data. When you accumulate enough data to handle the header process it, once you have channel data start filling the channel buffer, once you have all the channel data feed it off to the processing routine, and then go back to collecting a header. Repeat. State specific data request sizes would minimize the amount of unnecessary copying required.

-Clive
It could be a random act of randomness. Those happen a lot as well.

joemc

so maybe more like this..  I can move a few more things inside loop instead of after it, but not as clear to read.


BuffBegin
BuffEnd
WindBegin
WindEnd


ReceiveProc

  size = Recv(WindEnd,BuffEnd - WindEnd);

  if (size < 1)
    Quit ;Winsock Error or Disconnect
  endif
 
  WindEnd +=size  ; Advance end of window

  while(true)

    if (WindBegin - WindEnd) < 6
      break ; Not a Full Header
    endif

    if (BYTE PTR[WindBegin] != 0x2a)
      Quit  ;Not a Valid Header
    endif
 
    need = WORD PTR[WindBegin+4)
    have = WindEnd - WindBegin - 6
    if ( have < need)
      break  ;Not Enough Data
    endif
   
    WindBegin += 6 ; Skip Header
   
    Process(WindBegin, need)
   
    WindBegin += need  ; Advance Begin of window

  endwhile
 
  if ( WindBegin == WindEnd)
    WindBegin = BufferBegin  ; empty window so reset to
    WindEnd = BufferBegin    ; begining of buffer
    return
  endif

  if (WindowBegin + Need > BufferEnd)
    if (WindowBegin == BufferBegin)
      Quit                                                           ; Data cannot fit in buffer
    endif
    moving = WindowEnd-WindowBegin                ; Get size of window
    memmove(BufferBegin,WindowBegin,moving)   ; Slide Data to beginning of buffer
    WindowBegin=BufferBegin                             ; Reset pointers to beginning
    WindowEnd=BufferBegin+moving                   ;   of buffer
  endif 
End RecieveProc

zemtex

#3
Like clive said, cut and slice, which can be done in many ways.

If you need a senddata routine I have made one that you can use:


; Returns SOCKET_ERROR on failure and 0 on success
TransferData PROC TheSock:DWORD, DataPtr:DWORD, DataLen:DWORD

push esi
push ebx
push edi
push ebp
xor esi, esi
mov ebx, DataLen
mov edi, DataPtr
mov ebp, TheSock
jmp again

ALIGN 16
pre:
add edi, eax
again:
INVOKE send, ebp, edi, ebx, esi
test eax, eax
js abandon
sub ebx, eax
jnz pre
mov eax, esi

abandon:
pop ebp
pop edi
pop ebx
pop esi
ret

TransferData ENDP



I have been puzzling with lego bricks all my life. I know how to do this. When Peter, at age 6 is competing with me, I find it extremely neccessary to show him that I can puzzle bricks better than him, because he is so damn talented that all that is called rational has gone haywire.

Tedd

A sliding window offers no benefit over a circular buffer with reference to avoiding overlapping. The solution is simply to ensure that your buffer is big enough (in both cases.)
The sliding window substitutes a little complexity in preference for repeated copying of the data (to push it back to the start of the buffer.)
For the circular buffer you avoid the copying, but need to deal with wrapping which results in an extra recv call - though it can often be avoided by resetting the read & write pointers when the buffer becomes empty (which should happen regularly, as network data comes in bursts.)
No snowflake in an avalanche feels responsible.

clive

Well it all depends. The pull-down method works exceedingly well in situations where you consume the majority of the buffer, and for that matter if you have to process the data in-place using other functions that can't handle the fragmentation. The code to handle the roll-over point in a circular buffer can be quite significant and pervasive, and frankly I've seen far to many examples where coders simply fail to handle buffer boundary spanning conditions properly. The cost of moving a few hundred bytes, once every few dozen KB, can in many circumstances by far cheaper than checking/handling the boundary cases for every byte processed. It is certainly far cheaper than having to double-buffer the data all the time.

People should pick a method based on the most effective one for the data they are handling, and the nature of that data and it's flow. That and what they can reliably/robustly implement and test. In many cases it's far more important that code functions correctly and is delivered on time, than is the optimum solution.

Circular buffer do work exceedingly well in hardware designs where the wrapping can be totally transparent.

Another trick with circular buffers is to have a slightly larger buffer that extends beyond the natural wrap point, and copy data from the front to the extended portion. The cost of copying a few dozen bytes from known locations, vs masking the address/offset on every access. Or one can simply handle the wrap/span case by copying data, in that situation, to a secondary linear buffer.

It could be a random act of randomness. Those happen a lot as well.

zemtex

When a burst stops, the ping time between your client and the server can be 1 ms for example. In that short amount of time you can probably copy a Megabyte of data around in memory while waiting for the next burst. You just have to know when a stream is completed (not when a burst within a stream is completed)

Like it was said above, it depends what you are going to use it for. Some protocols works best with small chunks of data, the irc protocol for example, interprets bursts of 512 characters. I'm not sure with telnet, ftp but taking small chunks of data and interpreting them in a fixed manner and delay can give better responsiveness on the client.

EDIT: 100 MB of data using masm32 MemCopy in 78 ms. which is 1,28 MB of data per ms or 1282 bytes per microsecond, certainly plenty enough time to parse irc messages of 512 characters  :naughty:
I have been puzzling with lego bricks all my life. I know how to do this. When Peter, at age 6 is competing with me, I find it extremely neccessary to show him that I can puzzle bricks better than him, because he is so damn talented that all that is called rational has gone haywire.