Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tproxy: HexError(OddLength) in mining.subscribe #896

Closed
plebhash opened this issue May 4, 2024 · 17 comments · Fixed by #952
Closed

tproxy: HexError(OddLength) in mining.subscribe #896

plebhash opened this issue May 4, 2024 · 17 comments · Fixed by #952
Assignees
Milestone

Comments

@plebhash
Copy link
Collaborator

plebhash commented May 4, 2024

recently we had PR #851 by @nikicat

I noticed we didn't have an issue here on github to track this (potential) bug

@plebhash
Copy link
Collaborator Author

plebhash commented May 4, 2024

also, it took me some time to figure out how to reproduce this bug deterministically.

this issue is a good place to register my findings

here are instructions to reproduce the (potential) bug via netcat:

  1. start tProxy
  2. send a mining.subscribe with an extranonce of even length (this should work fine):
echo '{"id": 1, "method": "mining.subscribe", "params": ["", "00"]}' | nc localhost 34255
  1. send a mining.subscribe with an extranonce of off length (this should trigger the bug on tProxy):
echo '{"id": 1, "method": "mining.subscribe", "params": ["", "0"]}' | nc localhost 34255

@plebhash
Copy link
Collaborator Author

plebhash commented May 4, 2024

I'm also backtracing the bug to find the root cause:

inside pub async fn new_downstream we have a call to handle_incoming_sv1 :

let res = Self::handle_incoming_sv1(self_.clone(), incoming).await;

which calls handle_message:

async fn handle_incoming_sv1(
self_: Arc<Mutex<Self>>,
message_sv1: json_rpc::Message,
) -> Result<(), super::super::error::Error<'static>> {
// `handle_message` in `IsServer` trait + calls `handle_request`
// TODO: Map err from V1Error to Error::V1Error
let response = self_.safe_lock(|s| s.handle_message(message_sv1)).unwrap();

which calls handle_request:

self.handle_request(msg.try_into()?)

the issue seems to be in the try_into()?, which is implemented at:

impl<'a> TryFrom<Message> for Client2Server<'a> {
type Error = MethodError<'a>;
fn try_from(msg: Message) -> Result<Self, Self::Error> {
let method: Method = msg.try_into()?;

the nested try_into()? (line 146 above) is implemented at:

impl<'a> TryFrom<Message> for Method<'a> {
type Error = MethodError<'a>;
fn try_from(msg: Message) -> Result<Self, MethodError<'a>> {
match &msg {
Message::StandardRequest(request) => match &request.method[..] {
"mining.suggest_difficulty" => {
Ok(Method::Client2Server(Client2Server::SuggestDifficulty()))
}
"mining.subscribe" => {
let method = request
.clone()
.try_into()
.map_err(|e: ParsingMethodError| e.as_method_error(msg))?;
Ok(Method::Client2Server(Client2Server::Subscribe(method)))
}

the nested try_into()? (line 226 above) is implemented at:

impl<'a> TryFrom<StandardRequest> for Subscribe<'a> {
type Error = ParsingMethodError;
fn try_from(msg: StandardRequest) -> Result<Self, Self::Error> {
match msg.params.as_array() {
Some(params) => {
let (agent_signature, extranonce1) = match &params[..] {
// bosminer subscribe message
[JString(a), Null, JString(_), Null] => (a.into(), None),
[JString(a), JString(b)] => {
(a.into(), Some(Extranonce::try_from(hex::decode(b)?)?))
}
[JString(a)] => (a.into(), None),
[] => ("".to_string(), None),
_ => return Err(ParsingMethodError::wrong_args_from_value(msg.params)),
};
let id = msg.id;
let res = Subscribe {
id,
agent_signature,
extranonce1,
};
Ok(res)
}
None => Err(ParsingMethodError::not_array_from_value(msg.params)),
}
}
}
#[derive(Debug, Clone)]

@plebhash
Copy link
Collaborator Author

plebhash commented May 4, 2024

after going through the backtrace above, it seems the root cause of the issue is this specific line:

(a.into(), Some(Extranonce::try_from(hex::decode(b)?)?))

it is calling hex::decode on the extranonce1 field that was sent as the second parameter of the JSON-RPC message (JString(b)).

if b is a string with an odd number of characters, hex::decode returns an error, as documented in the Rust Docs for hex crate:

OddLength
A hex string’s length needs to be even, as two digits correspond to one byte.

so the question here is:

on mining.subscribe message extranonce1 is always encoded as hex bytes in the JSON-RPC, so there is no bug here.

an improvement to our tProxy implementation is to make it clear to the user that the provided extranonce1 needs to be a valid hex string. That would require adapting the match arm to check for how many hex characters there are, and call tracing::error!("Make sure the provided extranonce is a valid hex string") in case there's an odd number.

 [JString(a), JString(b)] => { 
+     if b.len() % 2 == 1 {
+         tracing::error!("Make sure the provided extranonce is a valid hex string. Beware that faulty SV1 firmware could send invalid extranonces with odd string length");
+     }
     (a.into(), Some(Extranonce::try_from(hex::decode(b)?)?)) 
 } 

@plebhash
Copy link
Collaborator Author

plebhash commented May 4, 2024

hey @nikicat I just re-read #851 (comment)

it was just a coincidence that a miner with an odd-sized extranonce stopped subscribing.

Did you see this with:

  • real ASIC hardware?
  • CPU mining? which (SRI or github.com/pooler/cpuminer)?

Tying back to the question I asked on my comment above (which is outside the scope of my SV1 knowledge):

on mining.subscribe message, is extranonce1 always encoded as hex bytes in the JSON-RPC?

Or in other words, does this situation (extranonce1 with odd characters) ever really happens in real life, or did you manually generate this mining.subscribe message?

@plebhash
Copy link
Collaborator Author

plebhash commented May 4, 2024

as a potential answer to my question from the above comments:

on mining.subscribe message, is extranonce1 always encoded as hex bytes in the JSON-RPC?

image
ref: https://en.bitcoin.it/wiki/Stratum_mining_protocol#mining.subscribe

if the extranonce1 value is hex encoded in the result response, that is a pretty good indication that it is also hex encoded in the request, which makes me believe that sending extranonce1 with an odd number of characters is an impossible situation

@plebhash
Copy link
Collaborator Author

plebhash commented Jun 1, 2024

@nikicat confirmed to me that this behavior was generated from a real machine. It is probably running crazy firmware that is not compliant with SV1 spec.

@plebhash
Copy link
Collaborator Author

plebhash commented Jun 1, 2024

closed via #851

@plebhash plebhash closed this as completed Jun 1, 2024
@Fi3
Copy link
Collaborator

Fi3 commented Jun 2, 2024

Not sure that we should accept odd len string. 1 byte is always 2 char so how is possible to have an odd len string? You assume a leading 0 that is not in the string, where is the 0?

I would rather close the connection since invalid coinbase means invalid block. (for example when the extranonce end up being 1 byte shorter or longer). If we are 100% that this can not happen, and we want to accept odd len string, we should document why is ok to accept odd len string.

@Fi3 Fi3 reopened this Jun 2, 2024
@plebhash
Copy link
Collaborator Author

plebhash commented Jun 2, 2024

Not sure that we should accept odd len string. 1 byte is always 2 char so how is possible to have an odd len string? You assume a leading 0 that is not in the string, where is the 0?

I would rather close the connection since invalid coinbase means invalid block. (for example when the extranonce end up being 1 byte shorter or longer). If we are 100% that this can not happen, and we want to accept odd len string, we should document why is ok to accept odd len string.

I think one way to be 100% sure is to reproduce this scenario via MG and check if it results on an invalid block.
From that we have two courses of action:

  • Invalid block happens: we revert PR #851 before dev branch is merged into main
  • No invalid block happens: we add a new PR documenting why this is accepted behavior on tProxy.

@Fi3 would you agree with this approach?

@plebhash plebhash added this to the 1.0.2 milestone Jun 2, 2024
@plebhash plebhash self-assigned this Jun 2, 2024
@Fi3
Copy link
Collaborator

Fi3 commented Jun 2, 2024

yep make sense

@plebhash
Copy link
Collaborator Author

plebhash commented Jun 2, 2024

Not sure that we should accept odd len string. 1 byte is always 2 char so how is possible to have an odd len string? You assume a leading 0 that is not in the string, where is the 0?
I would rather close the connection since invalid coinbase means invalid block. (for example when the extranonce end up being 1 byte shorter or longer). If we are 100% that this can not happen, and we want to accept odd len string, we should document why is ok to accept odd len string.

I think one way to be 100% sure is to reproduce this scenario via MG and check if it results on an invalid block. From that we have two courses of action:

  • Invalid block happens: we revert PR #851 before dev branch is merged into main
  • No invalid block happens: we add a new PR documenting why this is accepted behavior on tProxy.

@Fi3 would you agree with this approach?

ok I changed my mind... even though I'm 100% in favor of writing as many MG tests as possible, I don't think it's worth the effort on this issue.

From my research listed above, this is an impossible situation on a protocol level, and @Fi3 seems to be in agreement with this.

Even if this happens on real ASICs (@nikicat mentioned https://www.miningrigrentals.com), this seems to be a fault on firmware side that could potentially end up in invalid blocks, so I agree it should be rejected by tProxy.

I propose we revert #851 via #952 and finally close this issue.

@plebhash
Copy link
Collaborator Author

plebhash commented Jun 2, 2024

@nikicat we appreciate your contribution, I hope this doesn't take your motivation away

if you still want to help SRI, I have a suggestion on this comment: #896 (comment)

feel free to turn it into a new PR

@plebhash plebhash linked a pull request Jun 4, 2024 that will close this issue
@plebhash
Copy link
Collaborator Author

plebhash commented Jun 4, 2024

closed via #952

@plebhash plebhash closed this as completed Jun 4, 2024
@nikicat
Copy link

nikicat commented Jun 5, 2024

Hello, it's a little frustrating that fixing this bug took less time than the discussion about whether the fix should be accepted.
A bit of the story: I'm developing a Bitcoin <-> EVM bridge (one more, yeah) and need liquidity to test it on a testnet. I've struggled to find a reliable faucet for testnet3 so I tried to mine it using rented miners (from https://www.miningrigrentals.com/). To my confusion, the p2pool was abandoned many years ago, but luckily I found this project so I can solve my issue. I've set up a local cluster with pool, jds, jdc, and tproxy, made tproxy port available for external miners, and rented a couple of rigs from https://www.miningrigrentals.com/ hoping that I buy this way enough tBTC for my development. But pretty soon the HexError(OddLength) started to show in logs and ruin the miners' performance (yes, not all miners send truncated hex strings, but a significant part of them). I've looked through the Stratum codebase and pretty quickly found that it uses two different methods for parsing hex strings in received messages: using plain hex::decode and using (hex_decode)[https://github.com/nikicat/stratum/blob/dev/protocols/v1/src/utils.rs#L43-L51] from protocols/v1/src/utils.rs that patches odd-sized hex strings. I've come to the pretty obvious (at least for me) conclusion that it's just a little bug that this function is not used in all the cases, so I patched the code and my pool started to fully utilize rented miners. After that, I opened #851 PR to share my fix with others who will work with similarly behaving miners too. Afterward, I added some tests when @GitGab19 asked me to do so.
That's mostly the story, and I want to conclude:

  • There is no clear standard on the stratum v1 protocol AFAIK
  • I prefer to follow (Robustness principle)[https://en.wikipedia.org/wiki/Robustness_principle] when I develop network services
  • There could be (and there actually are - i.e. I have another similar non-compliant fix) other "violations" of the stratum v1 protocol by hardware miners

So, if you are not willing to accept such changes, what in your opinion is a "true fix" for my or similar cases? As you understand, I can't force miner owners to somehow "fix" their miners (I suppose they can't even if they wanted to). Should it be a (separate from Stratum) translation proxy that fixes "broken" stratum v1 messages? Or should it be just a more permissive fork of Stratum (this is my actual solution)? I just want to know your policy and plans so I can better allocate my time on this project.

@plebhash
Copy link
Collaborator Author

plebhash commented Jun 5, 2024

So, if you are not willing to accept such changes, what in your opinion is a "true fix" for my or similar cases? As you understand, I can't force miner owners to somehow "fix" their miners (I suppose they can't even if they wanted to). Should it be a (separate from Stratum) translation proxy that fixes "broken" stratum v1 messages? Or should it be just a more permissive fork of Stratum (this is my actual solution)? I just want to know your policy and plans so I can better allocate my time on this project.

I wrote a proposal on this comment: #896 (comment)

I understand your frustration, but from an objectively technical standpoint, a Proxy that allows for shares on invalid blocks is not a sane implementation of the protocol

economically speaking, the fact that there is faulty SV1 firmware in the wild is a market inefficiency, and you should re-evaluate on whether miningrigrentals.com is meeting your business needs. My advice is that you give them this feedback and look for alternatives.

hoping that I buy this way enough tBTC

if that is all you need, just drop a ckpool-solo. It's been around for ages and I already mined a bunch of testnet4 tBTC with it (however no rentals, just some ASIC I have locally).

@nikicat
Copy link

nikicat commented Jun 5, 2024

from an objectively technical standpoint, a Proxy that allows for shares on invalid blocks is not a sane implementation of the protocol

As far as I can tell all produced blocks are valid with this patch.

economically speaking, the fact that there is faulty SV1 firmware in the wild is a market inefficiency, and you should re-evaluate on whether miningrigrentals.com is meeting your business needs. My advice is that you give them this feedback and look for alternatives.

From my point of view, they work fine - I've got a bunch of testnet3 (and later testnet4) BTC for a reasonable price of several bucks.

@nikicat
Copy link

nikicat commented Jun 5, 2024

hoping that I buy this way enough tBTC

if that is all you want, just drop a ckpool-solo. It's been around for ages and I already mined a bunch of testnet4 tBTC with it (however no rentals, just some ASIC I have locally).

I'll look through it, thanks. But, taking into account that it's written in C and the time I've spent with the Stratum codebase, I think it's currently easier for me to continue working with Stratum.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done ✅
Development

Successfully merging a pull request may close this issue.

3 participants