Chttpd request processing does not reset pdict predictably #4909

chewbranca · 2023-12-11T21:28:49Z

Description

The Mochiweb http server uses a shared and re-used connection pool with chttpd:handle_request as the entry point for the Mochiweb request processing handoff.

The bug is that we don't start with a fresh process dictionary between requests, as the chttpd processes are reused so pdict values that are not expressly cleared will persist. Even worse, pdict values that are only set by specific code paths will persist until the next time an incoming request makes its way to this particular chttpd worker in the pool, at which point the old values will be updated.

An analogy would be if we used paper receipt to log each request, but instead of starting with a fresh form every time, we re-use the old form but only erase the old values when we need to modify them, so any other form entries would persist from whatever prior request filled them.

We also don't set the nonce value until

couchdb/src/chttpd/src/chttpd.erl

Line 317 in 4e0b66a

erlang:put(nonce, Nonce),

, nearly 100 lines of code into the request handling, which means that any runtime errors happening prior to the setting of the nonce value will re-use the previously generated nonce value and return that as the "unique" value of the request, and will continue to return the existing value until a request is processed that reaches that line of code to generate a new nonce.

I propose we clear the pdict between requests. I don't think it's safe to do erlang:erase() as that'll also clear out the Mochiweb pdict values from https://github.com/apache/couchdb-mochiweb/blob/main/src/mochiweb_request.erl#L64-L78 so instead I propose a filtered clearing of the pdict values, like so:

diff --git a/src/chttpd/src/chttpd.erl b/src/chttpd/src/chttpd.erl
index ab8e1e9a3..ca55a8ac8 100644
--- a/src/chttpd/src/chttpd.erl
+++ b/src/chttpd/src/chttpd.erl
@@ -221,6 +221,7 @@ stop() ->
     mochiweb_http:stop(?MODULE).
 
 handle_request(MochiReq0) ->
+    couch_util:clear_pdict(), %% Make sure we start clean, everytime
     erlang:put(?REWRITE_COUNT, 0),
     MochiReq = couch_httpd_vhost:dispatch_host(MochiReq0),
     handle_request_int(MochiReq).
diff --git a/src/couch/src/couch_util.erl b/src/couch/src/couch_util.erl
index 739df28e5..0bc9a94ba 100644
--- a/src/couch/src/couch_util.erl
+++ b/src/couch/src/couch_util.erl
@@ -46,6 +46,7 @@
 -export([verify_hash_names/2]).
 -export([get_config_hash_algorithms/0]).
 -export([remove_sensitive_data/1]).
+-export([clear_pdict/0, clear_pdict/1]).
 
 -include_lib("couch/include/couch_db.hrl").
 
@@ -870,3 +871,31 @@ remove_sensitive_data(KVList) ->
     KVList1 = lists:keyreplace(<<"password">>, 1, KVList, {<<"password">>, <<"****">>}),
     % some KVList entries are atoms, so test fo this too
     lists:keyreplace(password, 1, KVList1, {password, <<"****">>}).
+
+-spec clear_pdict() -> ok.
+clear_pdict() ->
+    clear_pdict(erlang:get()).
+
+%% Exclude mochiweb markers, otherwise just use erlang:erase/0
+-spec clear_pdict(list()) -> ok.
+clear_pdict([]) ->
+    ok;
+clear_pdict([{mochiweb_request_body, _V} | Rest]) ->
+    clear_pdict(Rest);
+clear_pdict([{mochiweb_request_body_length, _V} | Rest]) ->
+    clear_pdict(Rest);
+clear_pdict([{mochiweb_request_cookie, _V} | Rest]) ->
+    clear_pdict(Rest);
+clear_pdict([{mochiweb_request_force_close, _V} | Rest]) ->
+    clear_pdict(Rest);
+clear_pdict([{mochiweb_request_path, _V} | Rest]) ->
+    clear_pdict(Rest);
+clear_pdict([{mochiweb_request_post, _V} | Rest]) ->
+    clear_pdict(Rest);
+clear_pdict([{mochiweb_request_qs, _V} | Rest]) ->
+    clear_pdict(Rest);
+clear_pdict([{mochiweb_request_recv, _V} | Rest]) ->
+    clear_pdict(Rest);
+clear_pdict([{Key, _V} | Rest]) ->
+    erlang:erase(Key),
+    clear_pdict(Rest).

Where the skipped values are the pdict names from the linked Mochiweb code above. I'm not entirely sure about the legality nuances from copying in the Mochiweb names, so before I PR'ed that I figured it would be to hear back from @janl / @rnewson / etc on preferred approach here.

Alternatively, we could explicitly delete the values set but I really dislike how that results in any values falling through persisting for an unknown amount of time until it's cleared out by a similar request. So I proposed the "clear everything but Mochiweb entries" approach. Let me know what you think.

The text was updated successfully, but these errors were encountered:

rnewson · 2023-12-11T22:20:17Z

reusing an erlang process between requests is surprising to me but I can't find docs that reassure me that this was ever a promise from mochiweb.

however, if true, it does point out our folly in using the process dictionary so freely given the blanket advice not to do so.

I'd rather see us fix the real problem(s) instead of this PR. Or, perhaps, a patch for mochiweb to ensure a new process at the start of each request, even for keepalive connections.

rnewson · 2023-12-11T22:32:54Z

chatted with davisp and he confirms the general problem too. 🤷

he did have a better suggestion for fix. 1) record the pdict keys at the start of our handle_request 2) delete any keys not in that set at the end of handle_request (in an after). And, yes, I think we'd have to store that list in the pdict...

rnewson · 2023-12-11T22:37:58Z

noting that neither davisp or I think that a mochiweb erlang process is reused across sockets though. If that is happening we have a big problem, as that would be state/data passing between different clients.

chewbranca · 2023-12-11T22:46:49Z

Yeah I was surprised as well about the chttpd process reuse. I stumbled upon this because I thought it was not doing that lol. @nickva mentioned we probably use the same processes to keep direct access to the same socket allowing for connection re-use from load balancers, so I didn't think injecting spawning of a new process would be a clean solution here.

Hmmmm... what do you consider "better" about that suggestion? It avoids the licensing and bleed over of including the names of the Mochiweb pdict values, but I personally prefer an explicit whitelist of the fields we will allow, rather than a dynamic approach that assumes the found live pdict value is allowed.

We already hardcode mochiweb in chttpd.erl, so it's not like the use of Mochiweb is pluggable. An alternative option would be to patch into Mochiweb a function that exposes a list of atoms used in the pdicts, like mochiweb:pdict_values_used() or something and then exclude that list. We could then achieve similar results as my suggested diff with [erlang:erase(K) || K <- erlang:get() -- mochiweb:pdict_values_used()].

I'm not really attached to any particular approach, but I don't really like the dynamic nature of the alternative suggestion.

rnewson · 2023-12-11T23:03:22Z

my point is that whatever is in the pdict before handle_request is called is, by definition, mochiweb's entries. removing anything other than those at the end should avoid any "leak" now and in the future. your static approach invites problems down the line if mochiweb adds to its set and we forget to update our hard-coded list.

The true fix is to remove all reliance on the process dict in our chttpd handling.

have you a full list of potential issues caused by this process reuse across requests?

nickva · 2023-12-11T23:19:13Z

stashing away the mochiweb pdict entries and clearing everything except those at the end makes sense.

we may also have some stray messages in the mailbox and could be worth checking if we return early to the client while there are still workers finishing any doc updates or something like that

chewbranca · 2023-12-11T23:41:22Z

stashing away the mochiweb pdict entries and clearing everything except those at the end makes sense.

I think this makes sense, but how do you assert those pdict entries are for mochiweb without having a list of all mochiweb pdict entries or doing something we should avoid like string_starts_with("mochi", atom_to_list(Key)).

We need to potentially extract the usage delta from the incoming RPC message. Rather than pattern match on all possible message formats that could potentially include usage deltas, we instead utilize rexi_util:extract_delta which matches against tuples ending in `{delta, Delta}`, and separates that out from the underlying message. The subtlety here is that receiving the message to extract the delta changes the behavior as this was previously doing a selective receive keyed off of the Ref, and then ignoring any other messages that arrived. I don't know if the selective receive was intended, but I don't think it's appropriate to leave unexpected messages floating around, especially given things like issue #4909. Instead of utilizing a selective receive, this switches to extracting the message and delta like we need to do, and then in the event it finds unexpected messages they're logged and skipped. This selective receive was masking the lack of unlink on the linked rexi_mon pid in fix #4906. I've also noticed some rpc responses arriving late as well, but I haven't tracked that down, so let's log when it does happen.

chewbranca added bug needs-triage labels Dec 11, 2023

chewbranca added a commit that referenced this issue Dec 18, 2023

TMP Hack around issue #4909

50495bf

chewbranca added a commit that referenced this issue Feb 12, 2024

TMP Hack around issue #4909

90514a1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chttpd request processing does not reset pdict predictably #4909

Chttpd request processing does not reset pdict predictably #4909

chewbranca commented Dec 11, 2023

rnewson commented Dec 11, 2023

rnewson commented Dec 11, 2023

rnewson commented Dec 11, 2023

chewbranca commented Dec 11, 2023

rnewson commented Dec 11, 2023

nickva commented Dec 11, 2023

chewbranca commented Dec 11, 2023 •

edited

Chttpd request processing does not reset pdict predictably #4909

Chttpd request processing does not reset pdict predictably #4909

Comments

chewbranca commented Dec 11, 2023

Description

rnewson commented Dec 11, 2023

rnewson commented Dec 11, 2023

rnewson commented Dec 11, 2023

chewbranca commented Dec 11, 2023

rnewson commented Dec 11, 2023

nickva commented Dec 11, 2023

chewbranca commented Dec 11, 2023 • edited

chewbranca commented Dec 11, 2023 •

edited