Hello,
I'm learning how to use websockets, using Java for the server. I'm admittedly not an expert in socket programming, and I'm having some problems understanding how to read the incoming data in a reliable way in some cases.
Some quick info:
- The server receives websocket messages sent from a web browser.
- A message consists of one or more frames.
- Every frame tells the server if it should expect more frames or not.
- Every frame carries the length of the payload (the body; the "actual" content) in early bytes
(among other things), and lastly, the payload itself (or parts of it).- The web browser may send control frames unexpectedly in between frames that are related to eachother.
When receiving a message split into multiple frames, I'm not sure how to determine where the first one actually ends and where the second one actually starts.
What I think I know and assume:
- A socket acts only as a queue of bytes, should always be waiting for another byte to add to the queue, and doesn't expect/demand any type of structure.
- Hence, the socket adds no delimiters between frames.
- Incoming data should always be expected to be "faked" and sent from malicious software.
So it seems like the only way to find out where the first message ends, is to trust the payload length that has been maliciously included in the frame.
The - perhaps unlikely - scenario that I'm worried about, is if the payload length of the first frame is longer than its actual payload, so that the server will keep reading the second frame - or a sudden control frame - as part of the first frame's payload.
Example:
- The "user" sends a message split into 2 frames.
- The server receives frame 1 with a stated payload length of 10 bytes, but with an actual payload of only 2 bytes.
- The server suddenly receives an unrelated control frame, with a total length of 8 bytes.
- The server has now interpreted the entire frame 2 as byte 3-10 of frame 1's payload.
- The server receives frame 2.
Frame 1 will tell that it's not the last frame.
Frame 2 will only tell that it's a "continuation frame" but not of how many.
Frame 3 will tell if it's the last frame.
Am I by any chance missing something obvious? What should I rely on?
Thanks!