Update from Chia on the Dust Storm #9049
Replies: 10 comments 54 replies
-
Generally speaking the chain has handled it well with most nodes keeping things running smoothly Could you provide some data that show that "most nodes keeping things running smoothly." Otherwise, we have to assume that you pulled that out of a hat just to get off of the hook. Also, not all traffic is just around transactions. Plenty of farms have huge number of stalled partials. Those are potentially farms where their upload bandwidth got overloaded by "low performance" / starving nodes. It look like those nodes cannot process data that they are getting from the network fast enough, as such are dropping that network data on the floor, and starting to request the same data over and over. However, there are a decent number of nodes out there who are either running low performance nodes or are otherwise suboptimal in their configuration. Could be more specific what resource requirements those nodes are not meeting? I guess, blaming everything on RPis out there is just nonsense. You can easily say that all the nodes need to run on the latest server based chips, and home desktops will not cut. Q: Why isn’t Chia capable of preventing this? If those are fact of life events, then why Chia didn't run such well known scenarios on the test network? Do you have QA dept? Q: If I’m feeling strain on my node, is there anything I can do to alleviate it? Looks like you don't quite understand the problem. The issue is not with the number of connections, but rather with the total upload bandwidth used. Also, your suggestion to "you have the option to terminate their connection" is basically asking farmers to engage in a whack-a-mole game, as looks like there is plenty of those 'starving' nodes, and once you delete one, the next one is waiting to connect. As such, why don't you prioritize farming traffic over peer-syncing requests. The second issue is what do you do to back-off from overloading the upload bandwidth. Looks like there is no throttling protocol there, and nodes are just happily choking the upload bandwidth, when some "starving" nodes are potentially requesting the same data over and over. |
Beta Was this translation helpful? Give feedback.
-
Chia could only be so lucky to have the success that Ethereum garners with an over 200K active developer community. Ethererum is and should be the measure that new POW/POS/POST blockchains seek to mimic to obtain the same kind of success. Ignoring what has and continues to work is to go on an unknown and risky path as we are now seeing. It is not too late to right this ship and move towards a better future. All that is needed is a simple public statement from the Chia team advising farmers with a node that "just isn't good enough" to move to pools that have developed Official Protocol solutions that allow them to farm without a node. |
Beta Was this translation helpful? Give feedback.
-
11/1 updateI wanted to give everyone a “day-after” update, since I’m sure you are all expecting one and we have some things to share as well. As I mentioned in my previous statement on Sunday, we’ve had a lot of the team heads down looking at things this weekend to find out what exactly the pain points were behind the more obvious symptoms you felt, and determined what we could be doing differently to alleviate them. All in all, machines that are above spec did fine through the Dust Storm (generally speaking, but there were more than a few who were above par but had a majority of node peers who were under powered and lagging behind, hurting them). On a whole, the chain continued to progress, and transactions were processed. However, there were some signs of slowdown here and there, nodes with weak peers struggled, there were some issues with signage points being out of order for otherwise healthy nodes, some pools felt pains that trickled down to their farmers, and transactions with no fees attached were delayed a few hours. The takeaway is this: While the chain remained strong and stable, it was not a great user experience for about 20% of you and had it continued on indefinitely, the symptoms while not catastrophic were unignorable, and we want to fix that. We have always known there was plenty of room for optimizations in how we do certain things, and like all software projects we balance going back and revisiting optimizations against new features we need to add to advance the software forward. Over time, we’ve done those things when we can, with the expectation we would phase in more optimizations gradually over time but ahead of the curve of the network load growth and need for them. However, this event pushed that timetable up dramatically shortening said curve, and so we shall do the same with our optimizations. Thanks to our anonymous tester, we now have zeroed in specifically on several areas of potential optimization. Some are pretty clear to us, some require further testing to validate. Over the course of this week we’ll be adding a few of these optimizations to the forthcoming 1.2.11 release that we were already planning to put out within the week. Others will come in subsequent patches, depending on the body of work and validation needed. The exact details of those changes I’m not prepared to go into right now, because some are still up in the air, but they will be covered in the release notes of those updates and any future post mortems we may do. Additional to this, the other fact of reality is that the heady days of constant zero-fee transactions are behind us. When the blocks are not full, one can still send a zero fee transaction and have it processed right away, however if another Dust Storm kicks up, then you will need to add fees to your transaction to jump ahead. Even the bare minimum of 0.00005 will be enough to jump ahead of a Duster, however. This also means that pools who did not already implement fee support into their back end operations need to add those as well, to avoid delays in times of congestion. From the looks of things, many of them did this over the weekend and those who haven’t yet are working on it. If you are wondering why your pool of choice seemingly never had an issue, well odds are they built from the ground up to always support fees and just turned them on when the need arises, while simultaneously having a node with already strong peers. We also have some work to do on our side regarding fees as well. Just like some pools did not yet implement fee support because it wasn’t a requirement and they opted to work on it later in the interests of rapid deployment, we too do not yet have fee support for plotNFT commands in the GUI and CLI. The functionality to support that does exist in the rpc code itself, but no user interface elements currently connect to that. We’ve got someone working on that as I post this as well, in parallel to the work being done for the optimizations. In the meantime, if you make a plotnft change, if there is low traffic it will still go through right away, if there is high traffic then it might take a few hours to process. Between these things, we expect to make meaningful changes over the coming days, as well as some reprioritizations over the coming weeks, that will reduce this pain. This will probably be the last “big” update I give on this, (unless things get spicy again) until we do a post mortem, though we’ll be around to answer questions where able. So in summary:
|
Beta Was this translation helpful? Give feedback.
-
Ironically, on october 31st, in the middle of the storm, I had my higher day reward from the pool, despite a 3% stale partials and a signage point mess all around the logs. If it had been a test, I'd have sayed you passed it. |
Beta Was this translation helpful? Give feedback.
-
when will release new version to fix the Dust Storm? |
Beta Was this translation helpful? Give feedback.
-
Dust Storm exposed two problems: 1. delayed transactions, 2. 10-20% of network went down. Transactions fees can easily address the first problem, you are right about it. However, the second problem is that just one guy in a basement using two wallets and running one script for a couple of hours at a time brought 20% of network down. I would also assume that behind that 20% of network that went down were about 50-70% of nodes (virtually all peers that were connecting to my node were the starving ones - based on 78 connected peers), as most of the nodes are on the low end of the spectrum. Imagine what would happen, if just people on this thread would pooled, and run that attack for a week (he/she has made the code available). This is the main problem. Nothing Else Matters / Metallica |
Beta Was this translation helpful? Give feedback.
-
Hey, for me this is a problem now. I planed to run the full-node on a PI4 with 128GB SD Card. Now it sounds like that this is not possible because the software is too slow on lowend hardware (maybe because Python?). But if I want a energy efficient setup it makes no sense just because of inefficient software to add a ~ 100W CPU for a decent Price just for this task. And my main Computer is not running all the time (should not, that's why I want to move everything to the PI). This inefficiency makes small farming near impossible. My Computer eats more energy than the Storage. Even a 100W Nettop eats more Power than my HDDs. And turning off the farmer 18 hours a day makes also no sense. I am absolute clueless what to do now. Selling my 45TB drives and say good bye to this project? Or waiting a year to the point where the software got stable. And just a side note: Today my blockchain db got screwed. Needed to resync for no reasons. sqlite3 did not show any error on the database. Just the software did not connect to the network. Only removing the blockchain db fixed that problem. Annoying. No errors even in the log file... no connection and loading spinners all over the gui. 12h now to block 450000 with 100mbit network, m.2 SSD and Ryzen 5 3600XT with high CPU load and RAM usage with 10 start_full_node processes (each with 3-10% CPU usage and 50 to 400MB Ram)... no wonder that a PI would run out of breath. How long will that take in a year? one month? ^^ Just a lil disappointed User who solo farmed since April and replotts now for pooling. Feels like I was right with my skeptical view on python as the main language for a blockchain project. Greetings |
Beta Was this translation helpful? Give feedback.
-
Also, Chia being Green isn't about a single <100TB farmer not wanting to sip another 100w-200w out of their system, it's on the aggregate level. Read https://www.chia.net/2021/10/20/mining-vs-farming.en.html for actual data and facts. |
Beta Was this translation helpful? Give feedback.
-
Hi all, To return to the subject, DustStorm was well executed test with well executed response even though I was severely affected and I'm not a IT-guy. But from security stress testing of a system I know how bad/good execution/response looks like. Don't be afraid in testing underdog-project fanatics patience while you still can because in 2 years you won't be able to afford it and users will expect smooth sailing with 100% of the time ;) |
Beta Was this translation helpful? Give feedback.
-
Hola, a todos, hay alguna forma de identificar el origen, a mi parecer solo es estrategia con la finalidad de cobrar x transacción, tanto rollo y poca solucion, no pueden o no quieren identificar al causante.... se esta volviendo ridiculo chia de 1500 Dolares a 80Dolares...... ya sabemos la codicia que abunda en el globo.... |
Beta Was this translation helpful? Give feedback.
-
Over the last 24 hours there has been a lot of discussion about the current state of the Chia blockchain, and we wanted to clear up some understanding about what is/isn't happening and what we are currently working on to address it.
Since mid afternoon on Saturday the 30th (PST) there has been increasing waves of transaction spam, what is also commonly known as a "Dust storm" on other crypto networks. This is when an individual user sends exceptionally large amounts of minimum sized transactions (in this case 1 mojo) to thousands of wallets, in an attempt to strain the network.
All they have really done, however, is take the unused overhead in each block that as of now was simply waiting to be filled with transactions and filled more of it. Generally speaking the chain has handled it well with most nodes keeping things running smoothly. Additionally, if users include fees with their transaction (a previously unneeded requirement due to market demands), then your transactions will leap ahead of the Duster’s and deprioritize them.
However, there are a decent number of nodes out there who are either running low performance nodes or are otherwise suboptimal in their configuration. (We are also currently investigating reports of edge cases where an optimal setup also struggles at times potentially.) These nodes are struggling to keep up, and as a result users dependent on them, either because it was their node for their network, or because their own node was peered with these nodes, are experiencing pain in staying synced and farming off of this node. This pain has naturally spread to some pool operators as well (especially those who did not include transaction fees support in their code), which depending on how their pool is built, may also impact their farmers.
While we trust the majority of the network to run smoothly and for the rest of it to self-heal from this, (and indeed it has in the pauses between each wave of these), we recognize the pain that it brings to a non trivial number of users is unacceptable from their point of view.
We have always known there was a lot of room for optimization in our code, particularly for full nodes running on low end hardware like Raspberry Pi4, and like all software projects we have to balance carefully between spending resources on optimization against adding critical new functionality. We recognize now that there is a significant need for more optimizations sooner than we anticipated, and are currently all hands on deck looking for ways to get out short term optimization tweaks as well as long term ones as well, to alleviate this pain for folks experiencing it.
While I don’t have specifics on what those are at this moment, rest assured the dev team is deep into looking into this as we share this, and we will have updates as they become available. One thing that is clear now however, is that the days of the "zero transaction fee" world are behind us. That unknown point on the horizon where TX Fees would be a normal thing, appears at this point to be today.
A quick Q&A:
Q: Why isn’t Chia capable of preventing this?
A: “Dust Storms” are a fact of life for any blockchain. They happen all the time, however the combination of transaction fees and decentralization minimize the impact to where you generally never see them. Because Chia is so new, we are still in the early stages of life where most blocks were partially empty and transaction fees were not needed. If anything, this will simply bring about the mainstream use of transaction fees sooner than later to alleviate the majority of it. It did however highlight certain opportunities for optimization we had not yet prioritized which we are looking into currently. (In fact, we already early-on implemented a "minimum" fee of 0.00005 for a 2 spend coin, by making anything lower than that all the way down to 1 mojo be treated all the same as 1 mojo, for the express purpose of making these kinds of Dust Storms cost prohibitive and preventing the "1 mojo, 2 mojo, 3 mojo" bidding wars.)
Q: What can I do to make sure my transactions go through?
A: All still are, though they might get delayed by a block or so. If you want one to go through ASAP, just include a transaction fee of 0.0001 or higher, and you will stand well above the dust noise. (Note that transaction fees below the minimum are all considered 0. There is no real difference between a 1 mojo transaction fee and a 100 mojo transaction fee.)
Q: My pool isn’t paying me as fast as they usually do, or calculating my rewards as quickly.
A: This is to be expected, since they are relying on transactions to execute operations, and their nodes may be peered with slow nodes affected. We are working with the pool operator community to help them implement transaction fees (for the ones who did not already have them) to prioritize their transactions. Rest assured your pool likely has your best interests in mind and is working to get your experience back to what you are used to, but also please note these last few months have been an unusual world of “zero-fee transactions” that was bound to end sooner or later, which would require a shift in end-user expectations at some point.
Q: I’m running a node on a Pi, what can I do to make it better in light of this?
A: We’re still trying to understand which changes will and won’t make a difference in handling this for individuals on the lower end of the spectrum, but we will update you with more constructive guidance once we have hard facts. Some obvious ones that are good standards regardless are to run your node DB off of an SSD, NOT the internal SD card. Finally, run the CLI version of Chia, not the GUI. In the meantime, while it is a suboptimal answer, if you DO have stronger hardware available than the Pi for running a node, we advise moving to that for the time being. You can often just transition your Pi to a remote harvester and farm from a more powerful node.
Q: If I’m feeling strain on my node, is there anything I can do to alleviate it?
A: You can lower your default peer count in config.yaml from 80 to something smaller, like 40 or 50 for example, or maybe lower based on your needs. Additionally you can monitor your peer connections and if you see peers that are woefully behind in blocks, and if they show no signs of catching up and are not benefitting from you and only dragging you down, you have the option to terminate their connection from the CLI. (Please only do this for nodes sandbagging you however. If you see peers slowly catching up thanks to you, be a good neighbor and help them!) Also, if you are plotting on the same machine that is your node, you could try splitting the workload between machines or temporarily pausing plotting while your node catches up. Lastly, while we encourage and support the spirit of Chia Forks, halting them on your machine and freeing up resources for Chia specifically will obviously help, especially if you are one of those power users farming 10+ forks on one machine!
Q: Where can I get more information on what is happening as it unfolds?
A: You are welcome to swing through our Keybase server, where many of the team is interacting in real time with advice and support where we can provide it, in both the #general and #support channels. The most up-to-date announcements will likely hit there first in the #announcements channel before we distill them down into updates elsewhere.
Q: You mention making optimizations to the network because of this. Does this mean a fork is coming?
A: No. Chia was built in such a way that there are a great number of things we can improve and modify without the need for a network fork. Forking the chain has, and will always be, a “break glass in case of emergency” solution to a critical situation, not a “make-things-easy” tool for tough problems.
Q: I’m a pool operator, what can/should I be doing right now?
A: First off, reach out to TheSargonas on Keybase and get added to our pool operators group, so you can stay in touch with us and other pool ops in real time, this should be useful overall and not just for this event. Primarily however, make sure you are including transaction fees going forward. Pools who had implemented them last night after the first wave of this have experienced little to no trouble at all when the bigger waves hit. Secondly, re-examine your node configuration. Months back at the onset, some of the pool operators out there deployed nodes in the cloud using low-spec instances, because at the time it was all they needed. As the weeks and months went by, tribal knowledge meant they just honestly forgot to revisit that. Make sure your pool nodes are configured with the power they need and maybe even some auto scaling where possible.
Beta Was this translation helpful? Give feedback.
All reactions