bulk: import jobs often fail with server sent GOAWAY and closed the connection #65926
Labels
A-disaster-recovery
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
T-disaster-recovery
Projects
Time and again we have seen our roachtests fail with:
gs://cockroach-fixtures/tpce-csv/customers=2000000/746/NewsItem.txt?AUTH=implicit: http2: server sent GOAWAY and closed the connection; LastStreamID=1, ErrCode=NO_ERROR, debug="server_shutting_down"
While this is an infra flake and the only solution is to retry the import, maybe we should be retrying internally so as to not fail the job. This retry could either be at the job resumer level or could be marked as a retriable error in our external storage resuming reader implementations. Either way, the focus of this issue should be to find what error type is bubbled up in such scenarios so that we can intercept and consider it retriable.
Epic: CRDB-2556
The text was updated successfully, but these errors were encountered: