Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PHOENIX-7108 Provide support for pruning expired rows of views using Phoenix level compactions #1855

Open
wants to merge 25 commits into
base: PHOENIX-6978-feature
Choose a base branch
from

Conversation

jpisaac
Copy link
Contributor

@jpisaac jpisaac commented Mar 12, 2024

No description provided.

@jpisaac
Copy link
Contributor Author

jpisaac commented Mar 12, 2024

Rebased old PR with the feature branch master.
Fixed ViewTTLIT test failures.

@jpisaac
Copy link
Contributor Author

jpisaac commented Mar 12, 2024

TableTTLIT failing with the following error

Caused by: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(java.io.IOException): java.io.IOException: Key a/0:_0/1710201666000/Put/vlen=1/seqid=137 followed by a smaller key a/0:/1710201636000/DeleteFamily/vlen=0/seqid=107 in cf 77875878a1beda46043a6f04753c8bdc/0
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:451)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:132)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:369)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:349)
Caused by: java.lang.AssertionError: Key a/0:_0/1710201666000/Put/vlen=1/seqid=137 followed by a smaller key a/0:/1710201636000/DeleteFamily/vlen=0/seqid=107 in cf 77875878a1beda46043a6f04753c8bdc/0

@kadirozde will need your help to resolve this TF

@@ -63,8 +89,8 @@ public class CompactionScanner implements InternalScanner {
private final Configuration config;
private final RegionCoprocessorEnvironment env;
private long maxLookbackWindowStart;
private long ttlWindowStart;
private long ttl;
//private long ttlWindowStart;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These commented lines should be removed

this.emptyCF = emptyCF;
this.emptyCQ = emptyCQ;
this.emptyCF = SchemaUtil.getEmptyColumnFamily(table);
this.emptyCQ = table.getEncodingScheme() == PTable.QualifierEncodingScheme.NON_ENCODED_QUALIFIERS ?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can add a new method called SchemaUtil.getEmptyColumnName(table)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added SchemaUtil.getEncodingScheme(table), I am assuming that's what u meant

Copy link
Contributor

@kadirozde kadirozde Mar 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant SchemaUtil.getEmptyColumnName(table) or maybe better to call it SchemaUtil.getEmptyColumnQualifier(table). Since we have SchemaUtil.getEmptyColumnFamily(table), we should also have SchemaUtil.getEmptyColumnQualifier(table)

phoenixLevelRowCompactor = new PhoenixLevelRowCompactor();
hBaseLevelRowCompactor = new HBaseLevelRowCompactor();
emptyCFStore = familyCount == 1 || columnFamilyName.equals(Bytes.toString(emptyCF)) || localIndex;
// TODO: check if is it appropriate to throw an IOException here
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be okay to generate an exception here. What is the reason for throwing an exception?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exceptions can be thrown if SYSCAT is not available or calls to SYSCAT fails.

this.isMultiTenant = table.isMultiTenant();

int startingPKPosition = 0;
if (this.isMultiTenant && this.isSalted && this.isIndexTable) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we turn this into more readable if then else statements by checking only one condition per if statement? I think it is better to create a lookup table for this, for example, int[][][] StartingPKPosition = new int[2][2][2];

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline created a more verbose if then else statement for the 8 conditions.

return node;
}

int index = (int) key[depth] % (int) R;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would the following be sufficient to convert byte to unsigned int and simplify the code?
int index = key[depth] & 0xFF;

private static final Logger LOGGER = LoggerFactory.getLogger(RowKeyMatcher.class);

public static final int R = 256;
private TrieNode root = new TrieNode();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The trie data structure implemented here is not efficient in terms of space and time. We should be using a compressed trie data structure. I suggest using PATRICIA, please see https://commons.apache.org/proper/commons-collections/apidocs/org/apache/commons/collections4/trie/PatriciaTrie.html

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not have to replace the current implementation of trie with PATRICIA in this PR. We can do it in a separate improvement jira later.

}
Connection
tableConnection =
QueryUtil.getConnectionOnServer(tenantProps, configuration);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The connection is not closed here. I suggest opening it within a try statement. Also, can we use one global connection instead of a separate connection for every view?

@jpisaac
Copy link
Contributor Author

jpisaac commented May 25, 2024

FYI @kadirozde I think the PR is almost ready except to removing debug logging and additional docs and comments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants