Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Floating point exception when excluding stateless processes 7.1.28 #11148

Open
wdowling opened this issue Jan 26, 2024 · 0 comments
Open

Floating point exception when excluding stateless processes 7.1.28 #11148

wdowling opened this issue Jan 26, 2024 · 0 comments
Labels

Comments

@wdowling
Copy link
Collaborator

We experienced a Floating point exception in FoundationDB while excluding stateless processes from the cluster.

Cluster details:

  • FoundationDB 7.1.28
  • SSD storage engine, double redundancy
  • 36 machines
  • 713 processes in total

10 stateless processes per machine were added in error. We began to remove them per machine, excluding each process one at a time per each machine. At node 27, the floating point exception appeared in fdbcli:

fdb> exclude 10.31.2.88:4579

WARNING: Long delay (Ctrl-C to interrupt)

The database is unavailable; type `status' for more information.

SIGNAL: Floating point exception (8)
Trace: addr2line -e fdbcli.debug -p -C -f -i 0x7ff6d4669980 0xaf2136 0xbcc750 0xbccc60 0xbcd22a 0xbcd486 0xbc9408 0xbc9943 0xbc9d3c 0xbcfda8 0x849a10 0xce1910 0xce1cfb 0x6dda80 0xd91086 0x8adc92 0xc4a5cf 0x869071 0x537923 0x7ff6d4287c87
Floating point exception

The database flipped to unavailable and didn't come back until we re-included the stateless processes and added a process with role = data_distributor.

This is a snippet of one of the tracefiles:

<Event Severity="10" Time="1706239020.314257" DateTime="2024-01-26T03:17:00Z" Type="DatabaseContextCreated" ID="333b36b83c1c8c95" Backtrace="addr2line -e fdbserver.debug -p -C -f -i 0x363477c 0x2c5e80c 0x2c61ca2 0x1f0b987 0x1069e59 0x1029541 0x1032fd0 0x103331c 0x1f30d64 0x1f30fee 0xc14ae7 0x3442cd0 0x34430bb 0x1130460 0x35d0cb6 0x9d32b9 0x7f9f4492dc87" ThreadID="16916816022340418482" Machine="10.31.2.188:4573" LogGroup="default" Roles="CP" />
<Event Severity="10" Time="1706239020.316228" DateTime="2024-01-26T03:17:00Z" Type="ProxyMetrics" ID="36fba13a6a3af673" Elapsed="0" TxnCommitIn="0 -1 0" TxnCommitVersionAssigned="0 -1 0" TxnCommitResolving="0 -1 0" TxnCommitResolved="0 -1 0" TxnCommitOut="0 -1 0" TxnCommitOutSuccess="0 -1 0" TxnCommitErrors="0 -1 0" TxnConflicts="0 -1 0" TxnRejectedForQueuedTooLong="0 -1 0" CommitBatchIn="0 -1 0" CommitBatchOut="0 -1 0" MutationBytes="0 -1 0" Mutations="0 -1 0" ConflictRanges="0 -1 0" KeyServerLocationIn="0 -1 0" KeyServerLocationOut="0 -1 0" KeyServerLocationErrors="0 -1 0" ExpensiveClearCostEstCount="0 -1 0" LastAssignedCommitVersion="0" Version="0" CommittedVersion="271534957786793" CommitBatchesMemBytesCount="0" MaxCompute="0" MinCompute="1000000000000" ThreadID="16916816022340418482" Machine="10.31.2.188:4573" LogGroup="default" Roles="CP" TrackLatestType="Original" />
<Event Severity="10" Time="1706239020.336213" DateTime="2024-01-26T03:17:00Z" Type="CoordinationPing" ID="942e5b2262db89bb" CCID="20a96194438cc119" TimeStep="299" ThreadID="16916816022340418482" Machine="10.31.2.188:4573" LogGroup="default" Roles="CP" />
<Event Severity="10" Time="1706239020.341090" DateTime="2024-01-26T03:17:00Z" Type="Role" ID="942e5b2262db89bb" Transition="Refresh" As="Worker" ThreadID="16916816022340418482" Machine="10.31.2.188:4573" LogGroup="default" Roles="CP" />
<Event Severity="10" Time="1706239020.345752" DateTime="2024-01-26T03:17:00Z" Type="PingLatency" ID="0000000000000000" Elapsed="208.071" PeerAddr="10.31.2.234:4572" MinLatency="0.000122547" MaxLatency="0.169599" MeanLatency="0.00197428" MedianLatency="0.000162601" P90Latency="0.000197887" Count="208" BytesReceived="52560" BytesSent="55400" TimeoutCount="0" ConnectOutgoingCount="0" ConnectIncomingCount="1" ConnectFailedCount="0" ConnectMinLatency="0" ConnectMaxLatency="0" ConnectMeanLatency="0" ConnectMedianLatency="0" ConnectP90Latency="0" ThreadID="16916816022340418482" Machine="10.31.2.188:4573" LogGroup="default" Roles="CP" />
<Event Severity="10" Time="1706239020.414727" DateTime="2024-01-26T03:17:00Z" Type="GotServerDBInfoChange" ID="0000000000000000" ChangeID="dc937c9d70177c48" InfoGeneration="310" MasterID="0c97f9ee868db51b" RatekeeperID="fa503afe0a2c8447" DataDistributorID="885d6de7a02b1156" BlobManagerID="0000000000000000" EncryptKeyProxyID="0000000000000000" ThreadID="16916816022340418482" Machine="10.31.2.188:4573" LogGroup="default" Roles="CP" />
<Event Severity="10" Time="1706239020.559015" DateTime="2024-01-26T03:17:00Z" Type="GotServerDBInfoChange" ID="0000000000000000" ChangeID="7e15397b37b1cac7" InfoGeneration="311" MasterID="0c97f9ee868db51b" RatekeeperID="fa503afe0a2c8447" DataDistributorID="885d6de7a02b1156" BlobManagerID="0000000000000000" EncryptKeyProxyID="0000000000000000" ThreadID="16916816022340418482" Machine="10.31.2.188:4573" LogGroup="default" Roles="CP" />
<Event Severity="10" Time="1706239020.559015" DateTime="2024-01-26T03:17:00Z" Type="KVSMemRecoveryStarted" ID="36fba13a6a3af673" SnapshotEndLocation="0.1" ThreadID="16916816022340418482" Machine="10.31.2.188:4573" LogGroup="default" Roles="CP" />
<Event Severity="10" Time="1706239020.559015" DateTime="2024-01-26T03:17:00Z" Type="KVSMemRecoveryComplete" ID="36fba13a6a3af673" Reason="Non-header sized data read" DataSize="0" ZeroFillSize="0" SnapshotEndLocation="0.1" NextReadLoc="0.1" ThreadID="16916816022340418482" Machine="10.31.2.188:4573" LogGroup="default" Roles="CP" />
<Event Severity="10" Time="1706239020.559015" DateTime="2024-01-26T03:17:00Z" Type="KVSMemRecovered" ID="36fba13a6a3af673" SnapshotItems="0" SnapshotEnd="0" Mutations="0" Commits="0" TimeTaken="0" ThreadID="16916816022340418482" Machine="10.31.2.188:4573" LogGroup="default" Roles="CP" />
<Event Severity="10" Time="1706239020.559015" DateTime="2024-01-26T03:17:00Z" Type="KVSMemStartingSnapshot" ID="36fba13a6a3af673" StartKey="" ThreadID="16916816022340418482" Machine="10.31.2.188:4573" LogGroup="default" Roles="CP" />
<Event Severity="10" Time="1706239020.559015" DateTime="2024-01-26T03:17:00Z" Type="CommitBatchesMemoryLimit" ID="0000000000000000" BytesLimit="858993459" ThreadID="16916816022340418482" Machine="10.31.2.188:4573" LogGroup="default" Roles="CP" />
<Event Severity="10" Time="1706239020.581088" DateTime="2024-01-26T03:17:00Z" Type="ConnectingTo" ID="0000000000000000" SuppressedEventCount="0" PeerAddr="10.31.2.160:4574" PeerReferences="0" FailureStatus="OK" ThreadID="16916816022340418482" Machine="10.31.2.188:4573" LogGroup="default" Roles="CP" />
<Event Severity="10" Time="1706239020.581088" DateTime="2024-01-26T03:17:00Z" Type="ConnectionExchangingConnectPacket" ID="fac4678e61aaaa6e" SuppressedEventCount="1" PeerAddr="10.31.2.160:4574" ThreadID="16916816022340418482" Machine="10.31.2.188:4573" LogGroup="default" Roles="CP" />
<Event Severity="10" Time="1706239020.583379" DateTime="2024-01-26T03:17:00Z" Type="ConnectionEstablished" ID="fac4678e61aaaa6e" SuppressedEventCount="0" Peer="10.31.2.160:4574" ConnectionId="1" ThreadID="16916816022340418482" Machine="10.31.2.188:4573" LogGroup="default" Roles="CP" />
<Event Severity="10" Time="1706239020.583379" DateTime="2024-01-26T03:17:00Z" Type="ConnectedOutgoing" ID="0000000000000000" SuppressedEventCount="0" PeerAddr="10.31.2.160:4574" ThreadID="16916816022340418482" Machine="10.31.2.188:4573" LogGroup="default" Roles="CP" />
<Event Severity="10" Time="1706239020.664504" DateTime="2024-01-26T03:17:00Z" Type="GetMagazineSample" ID="0000000000000000" Size="96" Backtrace="addr2line -e fdbserver.debug -p -C -f -i 0x363477c 0x35947a3 0x35949c5 0x1468363 0x146e2ed 0x1475ec4 0x101cd1a 0x10339c6 0xc07c8f 0x3442cd0 0x34430bb 0x1130460 0x35d0cb6 0x9d32b9 0x7f9f4492dc87" ThreadID="16916816022340418482" Machine="10.31.2.188:4573" LogGroup="default" Roles="CP" />
<Event Severity="10" Time="1706239020.697797" DateTime="2024-01-26T03:17:00Z" Type="GetMagazineSample" ID="0000000000000000" Size="96" Backtrace="addr2line -e fdbserver.debug -p -C -f -i 0x363477c 0x35947a3 0x35949c5 0x1468363 0x146e2ed 0x1475ec4 0x101cd1a 0x10339c6 0xc07c8f 0x3442cd0 0x34430bb 0x1130460 0x35d0cb6 0x9d32b9 0x7f9f4492dc87" ThreadID="16916816022340418482" Machine="10.31.2.188:4573" LogGroup="default" Roles="CP" />
<Event Severity="10" Time="1706239020.989600" DateTime="2024-01-26T03:17:00Z" Type="GetMagazineSample" ID="0000000000000000" Size="96" Backtrace="addr2line -e fdbserver.debug -p -C -f -i 0x363477c 0x35947a3 0x35949c5 0x1468363 0x146e2ed 0x1475ec4 0x101cd1a 0x10339c6 0xc07c8f 0x3442cd0 0x34430bb 0x1130460 0x35d0cb6 0x9d32b9 0x7f9f4492dc87" ThreadID="16916816022340418482" Machine="10.31.2.188:4573" LogGroup="default" Roles="CP" />
<Event Severity="10" Time="1706239020.998880" DateTime="2024-01-26T03:17:00Z" Type="GetMagazineSample" ID="0000000000000000" Size="96" Backtrace="addr2line -e fdbserver.debug -p -C -f -i 0x363477c 0x35947a3 0x35949c5 0x1468363 0x146e2ed 0x1475ec4 0x101cd1a 0x10339c6 0xc07c8f 0x3442cd0 0x34430bb 0x1130460 0x35d0cb6 0x9d32b9 0x7f9f4492dc87" ThreadID="16916816022340418482" Machine="10.31.2.188:4573" LogGroup="default" Roles="CP" />
<Event Severity="10" Time="1706239021.057283" DateTime="2024-01-26T03:17:01Z" Type="GetMagazineSample" ID="0000000000000000" Size="96" Backtrace="addr2line -e fdbserver.debug -p -C -f -i 0x363477c 0x35947a3 0x35949c5 0x1468363 0x146e2ed 0x1475ec4 0x101cd1a 0x10339c6 0xc07c8f 0x3442cd0 0x34430bb 0x1130460 0x35d0cb6 0x9d32b9 0x7f9f4492dc87" ThreadID="16916816022340418482" Machine="10.31.2.188:4573" LogGroup="default" Roles="CP" />
<Event Severity="10" Time="1706239021.109701" DateTime="2024-01-26T03:17:01Z" Type="TransactionMetrics" ID="289ddd7169a25eb5" Elapsed="5.00041" Cluster="" Internal="1" ReadVersions="0 -1 1" ReadVersionsThrottled="0 -1 0" ReadVersionsCompleted="0 -1 0" ReadVersionBatches="0 -1 1" BatchPriorityReadVersions="0 -1 0" DefaultPriorityReadVersions="0 -1 1" ImmediatePriorityReadVersions="0 -1 0" BatchPriorityReadVersionsCompleted="0 -1 0" DefaultPriorityReadVersionsCompleted="0 -1 0" ImmediatePriorityReadVersionsCompleted="0 -1 0" LogicalUncachedReads="0 -1 1" PhysicalReadRequests="0 -1 0" PhysicalReadRequestsCompleted="0 -1 0" GetKeyRequests="0 -1 0" GetValueRequests="0 -1 1" GetRangeRequests="0 -1 0" GetMappedRangeRequests="0 -1 0" GetRangeStreamRequests="0 -1 0" WatchRequests="0 -1 0" GetAddressesForKeyRequests="0 -1 0" BytesRead="0 -1 0" KeysRead="0 -1 0" MetadataVersionReads="0 -1 0" CommittedMutations="0 -1 0" CommittedMutationBytes="0 -1 0" SetMutations="0 -1 0" ClearMutations="0 -1 0" AtomicMutations="0 -1 0" CommitStarted="0 -1 0" CommitCompleted="0 -1 0" KeyServerLocationRequests="0 -1 0" KeyServerLocationRequestsCompleted="0 -1 0" StatusRequests="0 -1 0" TooOld="0 -1 0" FutureVersions="0 -1 0" NotCommitted="0 -1 0" MaybeCommitted="0 -1 0" ResourceConstrained="0 -1 0" ProcessBehind="0 -1 0" Throttled="0 -1 0" ExpensiveClearCostEstCount="0 -1 0" NumGrvFullBatches="0 -1 0" NumGrvTimedOutBatches="0 -1 1" CommitVersionNotFoundForSS="0 -1 0" LocationCacheEntryCount="1" MeanLatency="0" MedianLatency="0" Latency90="0" Latency98="0" MaxLatency="0" MeanRowReadLatency="0" MedianRowReadLatency="0" MaxRowReadLatency="0" MeanGRVLatency="0" MedianGRVLatency="0" MaxGRVLatency="0" MeanCommitLatency="0" MedianCommitLatency="0" MaxCommitLatency="0" MeanMutationsPerCommit="0" MedianMutationsPerCommit="0" MaxMutationsPerCommit="0" MeanBytesPerCommit="0" MedianBytesPerCommit="0" MaxBytesPerCommit="0" NumLocalityCacheEntries="1" ThreadID="16916816022340418482" Machine="10.31.2.188:4573" LogGroup="default" Roles="CP" />
<Event Severity="10" Time="1706239021.521726" DateTime="2024-01-26T03:17:01Z" Type="GetMagazineSample" ID="0000000000000000" Size="96" Backtrace="addr2line -e fdbserver.debug -p -C -f -i 0x363477c 0x35947a3 0x35949c5 0x1468363 0x146e2ed 0x1475ec4 0x101cd1a 0x10339c6 0xc07c8f 0x3442cd0 0x34430bb 0x1130460 0x35d0cb6 0x9d32b9 0x7f9f4492dc87" ThreadID="16916816022340418482" Machine="10.31.2.188:4573" LogGroup="default" Roles="CP" />
<Event Severity="10" Time="1706239021.579936" DateTime="2024-01-26T03:17:01Z" Type="GetMagazineSample" ID="0000000000000000" Size="256" Backtrace="addr2line -e fdbserver.debug -p -C -f -i 0x363477c 0x3595146 0x3595375 0x3570735 0x3570d80 0x146dd0a 0x1475ec4 0x101cd1a 0x10339c6 0xc07c8f 0x3442cd0 0x34430bb 0x1130460 0x35d0cb6 0x9d32b9 0x7f9f4492dc87" ThreadID="16916816022340418482" Machine="10.31.2.188:4573" LogGroup="default" Roles="CP" />
<Event Severity="40" ErrorKind="Unset" Time="1706239021.922313" DateTime="2024-01-26T03:17:01Z" Type="OutOfMemory" ID="0000000000000000" Message="Out of memory" ThreadID="16916816022340418482" Backtrace="addr2line -e fdbserver.debug -p -C -f -i 0x363477c 0x36333b0 0x363378e 0x35ff2cc 0x35ff2fc 0x3615075 0x1130460 0x35d0cb6 0x9d32b9 0x7f9f4492dc87" Machine="10.31.2.188:4573" LogGroup="default" Roles="CP" />
</Trace>

I pulled down the debug binaries and ran addr2line against them.

fdbcli

$ addr2line -e fdbcli.debug.x86_64 -p -C -f -i 0x7ff6d4669980 0xaf2136 0xbcc750 0xbccc60 0xbcd22a 0xbcd486 0xbc9408 0xbc9943 0xbc9d3c 0xbcfda8 0x849a10 0xce1910 0xce1cfb 0x6dda80 0xd91086 0x8adc92 0xc4a5cf 0x869071 0x537923 0x7ff6d4287c87                                                                                                                                                                                                                       
?? ??:0
JSONDoc::at(std::string, bool) at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/fdbclient/JSONDoc.h:285
 (inlined by) JSONDoc::operator[](std::string) at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/fdbclient/JSONDoc.h:288
 (inlined by) a_body1cont8 at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/fdbclient/SpecialKeySpace.actor.cpp:1002
getClientDatabaseStatus(JSONDoc, JSONDoc) at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/fdbclient/StatusClient.actor.cpp:502
(anonymous namespace)::StatusFetcherImplActorState<(anonymous namespace)::StatusFetcherImplActor>::a_body1cont7(int) [clone .isra.1454] at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/fdbclient/StatusClient.actor.cpp:606
ActorCallback<(anonymous namespace)::StatusFetcherImplActor, 1, Optional<StatusObject> >::error(Error) at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/flow/flow.h:1323
Callback<Void>::remove() at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/flow/flow.h:400
 (inlined by) a_exitChoose3 at /home/foundationdb_ci/foundationdb_build_output/dbdbdbdbdbdbdbdbdbdbdbdbdbdbdbdb/fdbclient/StatusClient.actor.g.cpp:2232
 (inlined by) a_callback_fire at /home/foundationdb_ci/foundationdb_build_output/dbdbdbdbdbdbdbdbdbdbdbdbdbdbdbdb/fdbclient/StatusClient.actor.g.cpp:2283
 (inlined by) fire at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/flow/flow.h:1321
(anonymous namespace)::TimeoutMonitorLeaderActorState<(anonymous namespace)::TimeoutMonitorLeaderActor>::a_body1loopBody1(int) at /home/foundationdb_ci/foundationdb_build_output/dbdbdbdbdbdbdbdbdbdbdbdbdbdbdbdb/fdbclient/StatusClient.actor.g.cpp:2484 (discriminator 14)
ISimulator::ProcessInfo::global(int) const at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/fdbrpc/simulator.h:177
 (inlined by) ISimulator::global(int) const at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/fdbrpc/simulator.h:461
 (inlined by) FlowTransport::transport() at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/fdbrpc/FlowTransport.h:267
 (inlined by) a_body1cont2 at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/fdbrpc/genericactors.actor.h:397
 (inlined by) a_body1when1 at /home/foundationdb_ci/foundationdb_build_output/dbdbdbdbdbdbdbdbdbdbdbdbdbdbdbdb/fdbrpc/genericactors.actor.g.h:6268
 (inlined by) a_callback_fire at /home/foundationdb_ci/foundationdb_build_output/dbdbdbdbdbdbdbdbdbdbdbdbdbdbdbdb/fdbrpc/genericactors.actor.g.h:6289
 (inlined by) fire at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/flow/flow.h:1321
SAV<StatusObject>::finishSendAndDelPromiseRef() at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/flow/flow.h:696
 (inlined by) a_body1cont3 at /home/foundationdb_ci/foundationdb_build_output/dbdbdbdbdbdbdbdbdbdbdbdbdbdbdbdb/fdbclient/StatusClient.actor.g.cpp:992
 (inlined by) a_body1cont7 at /home/foundationdb_ci/foundationdb_build_output/dbdbdbdbdbdbdbdbdbdbdbdbdbdbdbdb/fdbclient/StatusClient.actor.g.cpp:1014
(anonymous namespace)::WaitValueOrSignalActorState<StatusReply, (anonymous namespace)::WaitValueOrSignalActor<StatusReply> >::a_body1loopBody1Catch1(Error const&, int) at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/fdbrpc/genericactors.actor.h:386 (discriminator 4)
SAV<StatusReply>::finishSendAndDelPromiseRef() at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/flow/flow.h:696
 (inlined by) void SAV<StatusReply>::sendAndDelPromiseRef<StatusReply&>(StatusReply&) at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/flow/flow.h:690
 (inlined by) NetSAV<StatusReply>::receive(ArenaObjectReader&) at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/fdbrpc/fdbrpc.h:110
Peer::onIncomingConnection(Reference<Peer>, Reference<IConnection>, Future<Void>) at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/fdbrpc/FlowTransport.actor.cpp:902 (discriminator 1)
Reference<Peer>::operator=(Reference<Peer> const&) at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/flow/FastRef.h:131
 (inlined by) TransportData::getOrOpenPeer(NetworkAddress const&, bool) at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/fdbrpc/FlowTransport.actor.cpp:1425
void SAV<Void>::send<Void>(Void&&) at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/flow/flow.h:662
N2::Net2::run() at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/flow/Net2.actor.cpp:1450
runNetwork() at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/fdbclient/NativeAPI.actor.cpp:2612
internal_thread_helper::DoOnMainThreadVoidActor1State<ThreadSafeTransaction::set(StringRef const&, StringRef const&)::{lambda()#1}, ISingleThreadTransaction, internal_thread_helper::DoOnMainThreadVoidActor1<ThreadSafeTransaction::set(StringRef const&, StringRef const&)::{lambda()#1}, ISingleThreadTransaction> >::a_body1cont1(Void const&, int) [clone .isra.334] at /home/foundationdb_ci/foundationdb_build_output/dbdbdbdbdbdbdbdbdbdbdbdbdbdbdbdb/flow/ThreadHelper.actor.g.h:305
 (inlined by) ~DoOnMainThreadVoidActor1 at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/flow/ThreadHelper.actor.h:50
 (inlined by) a_body1Catch1 at /home/foundationdb_ci/foundationdb_build_output/dbdbdbdbdbdbdbdbdbdbdbdbdbdbdbdb/flow/ThreadHelper.actor.g.h:332
 (inlined by) a_body1cont2 at /home/foundationdb_ci/foundationdb_build_output/dbdbdbdbdbdbdbdbdbdbdbdbdbdbdbdb/flow/ThreadHelper.actor.g.h:452
 (inlined by) a_body1cont5 at /home/foundationdb_ci/foundationdb_build_output/dbdbdbdbdbdbdbdbdbdbdbdbdbdbdbdb/flow/ThreadHelper.actor.g.h:480
 (inlined by) a_body1cont1 at /home/foundationdb_ci/foundationdb_build_output/dbdbdbdbdbdbdbdbdbdbdbdbdbdbdbdb/flow/ThreadHelper.actor.g.h:352
__gnu_cxx::__normal_iterator<unsigned long*, std::vector<unsigned long, std::allocator<unsigned long> > >::__normal_iterator(unsigned long* const&) at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/stl_iterator.h:784
 (inlined by) std::vector<unsigned long, std::allocator<unsigned long> >::begin() at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/stl_vector.h:699
 (inlined by) MultiVersionApi::runNetwork() at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/fdbclient/MultiVersionTransaction.actor.cpp:2292
main at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/fdbcli/fdbcli.actor.cpp:2295
?? ??:0

fdbserver

$ addr2line -e fdbserver.debug.x86_64 -p -C -f -i 0x363477c 0x2c5e80c 0x2c61ca2 0x1f0b987 0x1069e59 0x1029541 0x1032fd0 0x103331c 0x1f30d64 0x1f30fee 0xc14ae7 0x3442cd0 0x34430bb 0x1130460 0x35d0cb6 0x9d32b9 0x7f9f4492dc87
int const& std::min<int>(int const&, int const&) at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/stl_algobase.h:200
 (inlined by) operator<(StringRef const&, StringRef const&) at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/flow/Arena.h:807
 (inlined by) std::less<Standalone<StringRef> >::operator()(Standalone<StringRef> const&, Standalone<StringRef> const&) const at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/stl_function.h:386
 (inlined by) std::_Rb_tree<Standalone<StringRef>, std::pair<Standalone<StringRef> const, std::unique_ptr<DynamicFieldBase, std::default_delete<DynamicFieldBase> > >, std::_Select1st<std::pair<Standalone<StringRef> const, std::unique_ptr<DynamicFieldBase, std::default_delete<DynamicFieldBase> > > >, std::less<Standalone<StringRef> >, std::allocator<std::pair<Standalone<StringRef> const, std::unique_ptr<DynamicFieldBase, std::default_delete<DynamicFieldBase> > > > >::_M_lower_bound(std::_Rb_tree_node<std::pair<Standalone<StringRef> const, std::unique_ptr<DynamicFieldBase, std::default_delete<DynamicFieldBase> > > >*, std::_Rb_tree_node_base*, Standalone<StringRef> const&) at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/stl_tree.h:1888
 (inlined by) std::_Rb_tree<Standalone<StringRef>, std::pair<Standalone<StringRef> const, std::unique_ptr<DynamicFieldBase, std::default_delete<DynamicFieldBase> > >, std::_Select1st<std::pair<Standalone<StringRef> const, std::unique_ptr<DynamicFieldBase, std::default_delete<DynamicFieldBase> > > >, std::less<Standalone<StringRef> >, std::allocator<std::pair<Standalone<StringRef> const, std::unique_ptr<DynamicFieldBase, std::default_delete<DynamicFieldBase> > > > >::lower_bound(Standalone<StringRef> const&) at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/stl_tree.h:1203
 (inlined by) std::map<Standalone<StringRef>, std::unique_ptr<DynamicFieldBase, std::default_delete<DynamicFieldBase> >, std::less<Standalone<StringRef> >, std::allocator<std::pair<Standalone<StringRef> const, std::unique_ptr<DynamicFieldBase, std::default_delete<DynamicFieldBase> > > > >::lower_bound(Standalone<StringRef> const&) at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/stl_map.h:1239
 (inlined by) std::map<Standalone<StringRef>, std::unique_ptr<DynamicFieldBase, std::default_delete<DynamicFieldBase> >, std::less<Standalone<StringRef> >, std::allocator<std::pair<Standalone<StringRef> const, std::unique_ptr<DynamicFieldBase, std::default_delete<DynamicFieldBase> > > > >::operator[](Standalone<StringRef> const&) at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/stl_map.h:495
 (inlined by) DynamicEventMetric::registerFields(MetricKeyRef const&, std::vector<Standalone<StringRef>, std::allocator<Standalone<StringRef> > >&) at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/flow/TDMetric.cpp:218
IFailureMonitor::failureMonitor() at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/fdbrpc/FailureMonitor.h:132
 (inlined by) a_body1loopBody1 at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/fdbrpc/LoadBalance.actor.h:623
Callback<GetMappedKeyValuesReply>::remove() at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/flow/flow.h:397
 (inlined by) a_exitChoose2 at /home/foundationdb_ci/foundationdb_build_output/dbdbdbdbdbdbdbdbdbdbdbdbdbdbdbdb/fdbclient/NativeAPI.actor.g.cpp:17941
 (inlined by) a_callback_error at /home/foundationdb_ci/foundationdb_build_output/dbdbdbdbdbdbdbdbdbdbdbdbdbdbdbdb/fdbclient/NativeAPI.actor.g.cpp:17977
 (inlined by) cancel at /home/foundationdb_ci/foundationdb_build_output/dbdbdbdbdbdbdbdbdbdbdbdbdbdbdbdb/fdbclient/NativeAPI.actor.g.cpp:18275
Promise<Void>::~Promise() at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/flow/flow.h:930
 (inlined by) NotifiedQueue<OpenDatabaseCoordRequest>::~NotifiedQueue() at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/flow/flow.h:983
 (inlined by) NetNotifiedQueue<OpenDatabaseCoordRequest>::~NetNotifiedQueue() at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/fdbrpc/fdbrpc.h:646
 (inlined by) NetNotifiedQueue<OpenDatabaseCoordRequest>::~NetNotifiedQueue() at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/fdbrpc/fdbrpc.h:646
 (inlined by) NetNotifiedQueue<OpenDatabaseCoordRequest>::destroy() at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/fdbrpc/fdbrpc.h:654
 (inlined by) NotifiedQueue<OpenDatabaseCoordRequest>::delPromiseRef() at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/flow/flow.h:1046
FieldHeader<long>::update(long const&) at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/flow/TDMetric.actor.h:381
 (inlined by) FieldLevel<long, FieldHeader<long>, FieldValueBlockEncoding<long> >::log(long, unsigned long, bool&, long&) at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/flow/TDMetric.actor.h:496
std::vector<Tag, std::allocator<Tag> >::_M_check_len(unsigned long, char const*) const at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/stl_vector.h:1646
 (inlined by) void std::vector<Tag, std::allocator<Tag> >::_M_range_insert<__gnu_cxx::__normal_iterator<Tag*, std::vector<Tag, std::allocator<Tag> > > >(__gnu_cxx::__normal_iterator<Tag*, std::vector<Tag, std::allocator<Tag> > >, __gnu_cxx::__normal_iterator<Tag*, std::vector<Tag, std::allocator<Tag> > >, __gnu_cxx::__normal_iterator<Tag*, std::vector<Tag, std::allocator<Tag> > >, std::forward_iterator_tag) at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/vector.tcc:718
 (inlined by) void std::vector<Tag, std::allocator<Tag> >::_M_insert_dispatch<__gnu_cxx::__normal_iterator<Tag*, std::vector<Tag, std::allocator<Tag> > > >(__gnu_cxx::__normal_iterator<Tag*, std::vector<Tag, std::allocator<Tag> > >, __gnu_cxx::__normal_iterator<Tag*, std::vector<Tag, std::allocator<Tag> > >, __gnu_cxx::__normal_iterator<Tag*, std::vector<Tag, std::allocator<Tag> > >, std::__false_type) at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/stl_vector.h:1549
 (inlined by) __gnu_cxx::__normal_iterator<Tag*, std::vector<Tag, std::allocator<Tag> > > std::vector<Tag, std::allocator<Tag> >::insert<__gnu_cxx::__normal_iterator<Tag*, std::vector<Tag, std::allocator<Tag> > >, void>(__gnu_cxx::__normal_iterator<Tag const*, std::vector<Tag, std::allocator<Tag> > >, __gnu_cxx::__normal_iterator<Tag*, std::vector<Tag, std::allocator<Tag> > >, __gnu_cxx::__normal_iterator<Tag*, std::vector<Tag, std::allocator<Tag> > >) at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/stl_vector.h:1270
 (inlined by) void LogPushData::addTags<std::vector<Tag, std::allocator<Tag> > >(std::vector<Tag, std::allocator<Tag> >) at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/fdbserver/LogSystem.h:753
 (inlined by) a_body1loopBody1cont1 at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/fdbserver/CommitProxyServer.actor.cpp:423
 (inlined by) a_body1loopBody1break1 at /home/foundationdb_ci/foundationdb_build_output/dbdbdbdbdbdbdbdbdbdbdbdbdbdbdbdb/fdbserver/CommitProxyServer.actor.g.cpp:1511
 (inlined by) a_body1loopBody1loopBody1 at /home/foundationdb_ci/foundationdb_build_output/dbdbdbdbdbdbdbdbdbdbdbdbdbdbdbdb/fdbserver/CommitProxyServer.actor.g.cpp:1470
SAV<ResolveTransactionBatchReply>::get() const at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/flow/flow.h:652
 (inlined by) Future<ResolveTransactionBatchReply>::get() const at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/flow/flow.h:796
 (inlined by) a_body1cont1 at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/flow/genericactors.actor.h:1057
CommitBatch::PostResolutionActorState<CommitBatch::PostResolutionActor>::a_body1cont11cont1(int) at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/fdbserver/CommitProxyServer.actor.cpp:1332
(anonymous namespace)::CreateAndLockProcessIdFileActorState<(anonymous namespace)::CreateAndLockProcessIdFileActor>::a_body1loopBody1cont2(int) at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/fdbserver/worker.actor.cpp:2677
SAV<ErrorOr<Reference<IAsyncFile> > >::delFutureRef() at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/flow/flow.h:743
 (inlined by) Future<ErrorOr<Reference<IAsyncFile> > >::~Future() at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/flow/flow.h:832
 (inlined by) StrictFuture<ErrorOr<Reference<IAsyncFile> > >::~StrictFuture() at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/flow/flow.h:894
 (inlined by) a_body1loopBody1 at /home/foundationdb_ci/src/oOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo/foundationdb/flow/flow.h:894

I see there are OOM's in the tracefiles which I also see in our charts. I'm wondering if this is a known or expected issue?

@wdowling wdowling added the bug label Jan 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant