Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-8101. Add FSO repair tool to ozone CLI in read-only and repair modes. #6608

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

DaveTeng0
Copy link
Contributor

What changes were proposed in this pull request?

Bugs like HDDS-7592 can break the FSO tree and cause data to be orphaned in the OM. We have developed a tool to identify and repair this condition in the OM and tested it on affected clusters. This jira is to contribute the tool back to the community under the ozone CLI.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-8101

How was this patch tested?

Unit test, integration test.

@DaveTeng0
Copy link
Contributor Author

cc. @errose28

Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @DaveTeng0 for the patch.

Some comments about POM and CLI. Note: I haven't checked the code of the tool itself (FSORepairTool).

@errose28 errose28 changed the title Add FSO repair tool to ozone CLI in read-only and repair modes HDDS-8101. Add FSO repair tool to ozone CLI in read-only and repair modes. Apr 30, 2024
Copy link
Contributor

@errose28 errose28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we still need to decide what the CLI for this should look like. We could do ozone {debug,repair} fso-tree or ozone repair fso-tree [--dry-run]. Also as we add more of these type of commands I think ones that are specific to a component should be under their own subcommand for organization, like ozone repair om fso-tree.

Attila also brought up the --dry-run mode. I think if the command is under repair only, then dry run would not be the expected default value. If we add the read-only invocation under debug then that becomes the equivalent of dry run and no flag is needed.

@DaveTeng0
Copy link
Contributor Author

I think we still need to decide what the CLI for this should look like. We could do ozone {debug,repair} fso-tree or ozone repair fso-tree [--dry-run]. Also as we add more of these type of commands I think ones that are specific to a component should be under their own subcommand for organization, like ozone repair om fso-tree.

Attila also brought up the --dry-run mode. I think if the command is under repair only, then dry run would not be the expected default value. If we add the read-only invocation under debug then that becomes the equivalent of dry run and no flag is needed.

Yeah! extracted common codes between FSODebugCLI and FSORepairCLI to separated base classes FSOBaseCLI and FSOBaseTool, and make them reuse same logic.

@DaveTeng0
Copy link
Contributor Author

Hello team! please feel free to let me know if there is any new comment~ Thanks!

Copy link
Contributor

@errose28 errose28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @DaveTeng0 I just did a quick pass on the code outside of the main repair logic.

There seems to be a lot of failures on your branch, although I haven't looked deeply and it could just be a bad CI run. Can you get that to a green state?

For the CLI, If we go the route of different debug and repair commands, I think we want ozone debug om fso-tree and ozone repair om fso-tree to be the two options. Just having ozone repair om fso-tree [--dry-run] isn't a bad approach either. I think I slightly prefer that one and it prevents the need for an extra om subcommand under each option (for now at least), but I'm ok with two separate commands as well.

@@ -87,6 +89,11 @@ public static ManagedRocksDB open(
);
}

public static ManagedRocksDB open(final String path) throws RocksDBException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we actually want to use RocksDatabase here, which wraps ManagedRocksDB. Looks like it should already have all the operations we need implemented except drop column family, which would need to be added to RocksDatabase.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, makes sense! updated to use RocksDatabase class.

public Void call() throws Exception {

try {
// TODO case insensitive enum options.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add this to the change, it should be easy to support. I think this was copy/pasted into the other child classes too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to use boolean value '--dry-run' instead.

Comment on lines 27 to 29
/**
* Parser for scm.db file.
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an old comment that needs to be updated. Child classes have this as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

based on another suggestion, this class is removed.

"INFO and DEBUG levels."
)
@MetaInfServices(SubcommandWithParent.class)
public class FSOBaseCLI implements Callable<Void>, SubcommandWithParent {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure there's enough shared code to warrant a base class for debug and repair. The shared CLI flags like --db and --verbose can be used in both commands with PicoCLI mixins.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

based on another suggestion, this class is removed.

}

if (verbose) {
System.out.println("FSO inspection finished. See client logs for results.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be updated as well since I think in a previous comment we decided to output everything to stdout and use --verbose to change the level of output. This is more user friendly than log4j for client output.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense! updated to use log4j for here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants