Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Witness_node randomly stops syncing #2798

Open
4 of 17 tasks
abitmore opened this issue Dec 13, 2023 · 1 comment
Open
4 of 17 tasks

Witness_node randomly stops syncing #2798

abitmore opened this issue Dec 13, 2023 · 1 comment

Comments

@abitmore
Copy link
Member

abitmore commented Dec 13, 2023

Bug Description

Start a new witness_node instance and wait, sometimes it hangs during syncing (unable to sync to latest block). Restarting works sometimes.

See #2798 (comment) for more info.

Unable to stably reproduce so far.
Haven't found very interesting info in log files yet.
Probably caused by memory corruption.

Impacts
Describe which portion(s) of BitShares Core may be impacted by this bug. Please tick at least one box.

  • API (the application programming interface)
  • Build (the build process or something prior to compiled code)
  • CLI (the command line wallet)
  • Deployment (the deployment process after building such as Docker, Travis, etc.)
  • DEX (the Decentralized EXchange, market engine, etc.)
  • P2P (the peer-to-peer network for transaction/block propagation)
  • Performance (system or user efficiency, etc.)
  • Protocol (the blockchain logic, consensus, validation, etc.)
  • Security (the security of system or user data, etc.)
  • UX (the User Experience)
  • Other (please add below)

Host Environment
Please provide details about the host environment. Much of this information can be found running: witness_node --version.

  • Host OS: Ubuntu (various versions)
  • Host Physical RAM Sufficient
  • BitShares Version: 7.0.1 / test-7.0.3
  • OpenSSL Version: -
  • Boost Version: -

CORE TEAM TASK LIST

  • Evaluate / Prioritize Bug Report
  • Refine User Stories / Requirements
  • Define Test Cases
  • Design / Develop Solution
  • Perform QA/Testing
  • Update Documentation
@abitmore abitmore moved this from To do to In progress in Bugfix Release (7.0.2) Dec 13, 2023
@abitmore abitmore added this to the 7.0.2 - Bugfix Release milestone Dec 13, 2023
@abitmore
Copy link
Member Author

Found the reason.

In PR #2791 for the last releases (7.0.1 and test-7.0.3), we updated read_write_handler and read_write_handler_with_buffer to throw canceled_exception when a boost::asio::error::operation_aborted error occurs.

The canceled_exception might then be caught in message_oriented_connection_impl::read_loop():

catch ( const fc::canceled_exception& e )
{
wlog( "caught a canceled_exception in read_loop. this should mean we're in the process of deleting this object already, so there's no need to notify the delegate: ${e}", ("e", e.to_detail_string() ) );
throw;
}
catch ( const fc::eof_exception& e )
{
wlog( "disconnected ${e}", ("e", e.to_detail_string() ) );
call_on_connection_closed = true;
}
catch ( const fc::exception& e )
{
elog( "disconnected ${er}", ("er", e.to_detail_string() ) );
call_on_connection_closed = true;
exception_to_rethrow = fc::unhandled_exception(FC_LOG_MESSAGE(warn, "disconnected: ${e}", ("e", e.to_detail_string())));
}

as a result, call_on_connection_closed is no longer set to true, so node::on_connection_closed() is not called as before, which means that certain cleanup steps are not performed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

No branches or pull requests

1 participant