Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Response body that could possibly fail #84

Open
kevinresol opened this issue May 12, 2017 · 18 comments
Open

Response body that could possibly fail #84

kevinresol opened this issue May 12, 2017 · 18 comments

Comments

@kevinresol
Copy link
Member

kevinresol commented May 12, 2017

I was trying to fix the static file middleware, where the file read stream could possibly fail (e.g. someone deleted the file while transferring / the file is in network drive and the network failed). Since the body of OutgoingResponse is an IdealStream, we have to somehow rescue the unsafe file read. But we can't simply rescue with an Empty source because that will possibly fail the client who is expecting <content-length> sized body and the transmission will be deemed as "paused" in the middle. The worst case is that the next response will be treated as the continuation of the body, if the connection is reused.

There really isn't much we can do after the header is sent out, except forcibly killing the connection. One possible solution is to make the body of OutgoingResponse a RealStream and let the container deal with the failure.

@kevinresol
Copy link
Member Author

I am now more and more inclined to put RealStream in OutgoingRequest and OutgoingResponse.

For OutgoingRequest it is mostly used in Client.request which returns a Promise thus able to handle the Error coming from the RealStream body.

For OutgoingResponse, with a RealStream body means that the Container would have a chance to deal with the error. (explained in the previous post)

Does anyone have an objection? @back2dos @benmerckx @gene-pavlovsky

@back2dos
Copy link
Member

back2dos commented Oct 4, 2018

Hmm, objection is a bit of a strong word. However it seems to me that regardless of ideal/real stream the container MUST close the connection if content-length was sent and body was too short (could always happen by some mistake). With that behavior in place, we could we could stick with body being IdealStream and the user can recover by just cutting it off.

@kevinresol
Copy link
Member Author

Hmm, objection is a bit of a strong word.

Sorry for my bad English. (The equivalent word in my primary language isn't so strong, seems like I have literally translated from there)

Wouldn't it be easier if the Container gets a RealStream and terminate the connection upon an Error?
On node for example (by default) sends out the body in a chunked manner, in that case the container has no concept of content-length.

@kevinresol
Copy link
Member Author

kevinresol commented Oct 4, 2018

Another reason for the proposed change is that the idealize API is not very user-friendly, consider this code:

fetch(request.url, {
	method: request.method,
	body: source.idealize(e -> Source.EMPTY),
}).all();

The error basically get swallowed in the process, assuming that the remote blindly accepts the early-terminated body.

In order to properly capture the error, user would have to write something like this:

Future.async(function(cb) {
	fetch(request.url, {
		method: request.method,
		body: source.idealize(e -> {
			cb(Failure(e));
			Source.EMPTY;
		}),
	}).all().handle(cb);
});

In short, it is just a bit too cumbersome, which kind of contradicts the philosophy of "don't encourage user to skip error handling" which run though the tink libs.

And IdealStream is compatible to RealStream, so one can still opt to handle the error explicitly (thus produce an Ideal stream) even if http accepts the latter

@back2dos
Copy link
Member

back2dos commented Oct 4, 2018

Nah, your English is just fine! What I meant is that that I have a remark rather than an actual objection ;)

So ok, assuming the container gets an error from the outgoing body. What should it do?

@kevinresol
Copy link
Member Author

kevinresol commented Oct 4, 2018

Close the connection. I think there is nothing else the container could do.

However, if user has concern on that behaviour, he can still use idealize to capture the error and try to recover.

On the other hand, I doubt if one can really idealize a stream. Take FS read as an example, if for any reason the read halted somewhere in the middle, we might try to open the file again re-read from the last position. But it is still just a RealStream, not ideal. This kind of leads me to a thought that idealize is just a error-swallowing API because it basically just produces a error-free but incorrect stream.

@back2dos
Copy link
Member

back2dos commented Oct 4, 2018

Hmm ... well ok, for one idealize should be fixed in many ways:

  1. it should tell you how many bytes you've written, so that you can resume.
  2. it should tell you the number of retries it has already done (it rescues recursively until you return a stream that can be depleted without error).
  3. it should be protected against stack overflows from retries.

And all of it needs to be fixed in tink_streams to make matters more complicated. And if properly done, it could be nice. You could recover from an SQL query where the connection goes down while you're in the middle of the result set and then you can just run the same query adjusted so that you get only the rest of the result set (starting from the last received ID or something - which is not necessarily completely accurate though but probably still better than just stopping in the middle I guess?). You could resume an HTTP download if the connection closes. Etc. There are use cases. But it's a headache inducing thing to get right and we should be pragmatic about it. Better to get stable in the foreseeable future and make a major version if anyone can summon the energy to dig through that.

As for the matter at hand. Yes, idealize can easily be used to just swallow the error. But at least the user has to explicitly make that choice and may use a moment to think about the implications. Accepting a real source makes it even easier not to think about it at all. I kinda like that but then again it's not a game changer.

My thinking is that in any case the container, if it has a persistent connection, should close it if content-length is set but the body terminates before (it should also actually do something if the body is not terminated when the content-length is reached). And in that case we may as well cut off the stream. But if you feel that real stream is the better choice here, I'm fine with that too ;)

@kevinresol
Copy link
Member Author

Now I am starting the doubt the usefulness of converting a RealStream/Source to the Ideal counterpart. Because the outcome is never really ideal. It either involves lost of information (the error) or incorrect information (you think it is gracefully ended but actually not) or both. I am not sure if it is meaningful for anyone to consume such "VirtuallyIdealStream".

I am not sure if this is correct, but I am thinking that idealize:(Error->RealStream)->IdealStream should be deprecated and replaced with recover:(Error->RealStream)->RealStream
The reason is that I think Streams has a different nature compared with Futures. Promises can be recovered to a Future because the value resolves only once, there is only good or bad not anything in between. But for a stream, it could be good for some time and fail later. Trying to idealize such a thing is simply not a correct operation IMO.

As for the container, I agree that the container should check the actual body length against the content-length header. But In the case of chunked-encoding, I think having a RealSource might be better. Because the container knows it is an error, so that it can abruptly close the connection (again I think it is the only thing it can do after sending the header) without sending the proper termination bytes. So that the other party at least know something has go wrong.

In the case of clients, I will take JsClient as an example. Since XMLHTTPRequest doesn't support streaming, currently we collect all the bytes from the stream and then send them out. If it is IdealSource the client will just collect the incorrect bytes (in case of error) and send them out. If it is RealSource the client would see the error and just not send the bytes altogether.

Anyway, I will try to change body to RealStream in a fork and experiment with it.

@back2dos
Copy link
Member

back2dos commented Oct 5, 2018

Well, there are cases of course. Take an SQL result stream, that you map it through function (o):Chunk return Json.strigify(o) + '\n' and you have an ndjson stream where you can recover by adding one last JSON document that actually contains error information. In any case, error recovery is never very easy ^^

The best solution for HTTP would be to use real streams for the body, always output them with chunked encoding and in case of an error send HTTP trailers to signal the error, but unfortunately support for those is pretty miserable. This would mean that the connection doesn't have to close and the error propagates to the client.

kevinresol added a commit that referenced this issue Oct 5, 2018
@gene-pavlovsky
Copy link
Member

This is quite a dark forest for me guys. I haven't looked at tink_streams and tink_http really. Well I did read the READMEs but not at how to use this stuff in practice. Any guides/examples for the curious?

@kevinresol
Copy link
Member Author

kevinresol commented Oct 6, 2018

@gene-pavlovsky
The code snippets in the tink_io doc is a bit out of sync with the actual code in the repo.

In general, the most frequent use case for tink_io is file read/write and network send/receive, in streaming (async) manner. Think sys.io.File.getContent() or haxe.Http.request() with streaming. It is pretty much same concept with nodejs's streams (Writable/Readable)

@gene-pavlovsky
Copy link
Member

@kevinresol thanks for explanation. Can you give me some examples of practical applications where you needed to do this with streaming? I imagine if I was writing a Haxe version of some UNIX tool (e.g. sed or awk), this would be useful, but besides this?

@kevinresol
Copy link
Member Author

kevinresol commented Oct 8, 2018

Actually when you write a nodejs application you will (almost) always need streaming in all relevant aspects (file read/write, network read/write, etc). Because you don't want a single IO operation to block your whole application (node is single-threaded).

@gene-pavlovsky
Copy link
Member

Ok that is a fair point, but do you need the streaming exposed to the user? E.g. I need to read a file, I understand that I should do it as an async operation, but do I have to handle the "chunks" manually, or this can be hidden inside the library code?

@kevinresol
Copy link
Member Author

Surely you can hide it. You just collect all the bytes and present the result to the user as a promise of a single bytes/chunk object.

@gene-pavlovsky
Copy link
Member

So would you say that these are more suited as a foundation for other libraries/frameworks, rather than to be used directly by application code? Or is there still a direct use case for this in applications?

@kevinresol
Copy link
Member Author

It really depends on your use case. For example if you want to report progress of the "data transfer" then you will need streaming not just a promise of the whole chunk.

@gene-pavlovsky
Copy link
Member

I see. I was thinking about progress, indeed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants