Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Framing Should use a length based approach #1

Open
aikar opened this issue Mar 16, 2011 · 10 comments
Open

New Framing Should use a length based approach #1

aikar opened this issue Mar 16, 2011 · 10 comments

Comments

@aikar
Copy link

aikar commented Mar 16, 2011

Currently Socket.IO spec has listed as FRAMEMARKER,DATA,FRAMEMARKER

Ideally, a message frame should include the length of the data payload so that the parser will know exactly how long a message is and does not have to 'scan' for the ending marker.

so:
MARKER,LENGTHOFDATA,DATA,MARKER

so: 0xFF0xFD3,foo0xFF0xFD

Can identify that fhe data set will be a length of 3 before the end marker.

So if you receive 0xFF0xFD3,f
as a buffer you know you need 2 more characters before you should even expect to see the end marker.

This lets the parser simply check the length and not look for the end marker until it knows it has all of the data and then it knows exactly where the endmarker is.

This is of course better for performance, and very trivial to implement.

@tj
Copy link

tj commented Mar 16, 2011

+1

@rauchg
Copy link
Contributor

rauchg commented Mar 16, 2011

This is how we are right now, I just wanted to keep it more consistent with the websocket protocol.

@aikar
Copy link
Author

aikar commented Mar 16, 2011

well I meant anywhere your doing custom framing (like HTTP connections instead of WebSocket) implemented.

On the websocket side, its of course forced and yeah we dont need double framing (someone should recommend updating the spec to be length based since weve already broke capabilities to do raw connections anyways :()

but as I read the frame identifiers in the spec, anywhere this framing is implemented outside of websocket should use length.

@madari
Copy link

madari commented Mar 16, 2011

@aikar: I didn't quite understand what you meant by your last comment.
Just out of curiosity, you suggested MARKER LENGTH DATA MARKER. Why would
you want to keep the MARKERs?

@aikar
Copy link
Author

aikar commented Mar 16, 2011

markers still help provide error testing and recovery.

Without them, you have no way to detect if the frame matches up,

With them, your not 'scanning' for the end marker but you know exactly where it SHOULD be, and doing a simple buffer[index] check to ensure it matches the marker symbol ensures the data is valid.

If you check for where a marker should be, and find it isnt, you can add error recovery to then fallback to a scan of the next start marker and continue the parse, and throw a warning/alert that data was malformed, and continue its business.

Where as w/o them, theres no way to detect a corrupted/malformed message and itll destroy the entire parser.

@madari
Copy link

madari commented Mar 17, 2011

@aikar: That's a good point from the streams point of view. Although it doesn't guarantee that the message isn't malformed/corrupted (that's what TCP does) but it does guarantee that the stream of messages is still flowing as it should be. And the overhead is negligible too. So I agree with your original suggestions: FRAMEMARKER LENGTH DATA FRAMEMARKER.

@madari
Copy link

madari commented Mar 17, 2011

Also the LENGTH should be in bytes not in utf-8 codepoints (as it has been previously). If you decide not to add LENGTH to the messages then the spec should state how to escape FRAMEMARKERs.

@madari
Copy link

madari commented Mar 17, 2011

Another variation would be FRAMEMARKER LENGTH DATA where the FRAMEMARKER would be a protocol identifier (e.g. 0xFFFD as proposed) and explictly
indicate the beginning of a frame. That would enable recovery (by always looking for the next FRAMEMARKER) and the trailing FRAMEMARKER could be dropped.

@tj
Copy link

tj commented Mar 17, 2011

@madari +1 to that suggestion, makes sense, no reason you could not recover on the frame instead of a trailing marker, probably even easier to implement this way actually

@3rd-Eden
Copy link

For the sake of simplified parsing across different platforms, +1 for @madari 's suggestion.

highlight @guille

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants