Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(bigtable): Add a DirectPath fallback integration test #3384

Merged
merged 13 commits into from Jan 13, 2021

Conversation

mohanli-ml
Copy link
Contributor

Add a DirectPath fallback test for bigtable in Golang. The logic of the test is:

  1. First wait until DirectPath traffic is observed;
  2. Blackhole DirectPath net, and wait until the client fallback to use CFE;
  3. Unblackhole DirectPath net, and the client should upgrade back to use DirectPath again.
    The DirectPath and CFE traffic are distinguished by peer IP.

@mohanli-ml mohanli-ml requested review from tritone and a team as code owners December 4, 2020 18:20
@google-cla google-cla bot added the cla: yes This human has signed the Contributor License Agreement. label Dec 4, 2020
@product-auto-label product-auto-label bot added the api: bigtable Issues related to the Bigtable API. label Dec 4, 2020
Comment on lines 2147 to 2148
// Blackhole directpath address to test fallback.
func TestIntegration_DirectPathFallback(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Blackhole directpath address to test fallback.
func TestIntegration_DirectPathFallback(t *testing.T) {
// TestIntegration_DirectPathFallback tests the fallback when the directpath address is unavailable.
func TestIntegration_DirectPathFallback(t *testing.T) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. I remember I should let the first word in the comment the same as the function name. Thanks for this!

for i := 0; i < numRPCsToSend; i++ {
_, _ = table.ReadRow(ctx, "jadams")
if _, useDp := isDirectPathRemoteAddress(testEnv); useDp != isBlackhole {
atomic.AddUint64(&numCount, 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm missing something -- is this run in parallel?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bigtable client will open 4 channels by default (https://github.com/googleapis/google-cloud-go/blob/master/bigtable/bigtable.go#L78), so they could be run in parallel.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This particular code will not run in parallel. The code you linked to is for pooling the underlying network connections.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing out. I removed the atomic operation.

Comment on lines 2323 to 2326
blackholeDpv6Cmd := "sudo ip6tables -I INPUT -s 2001:4860:8040::/42 -j DROP && sleep 5 && echo blackholeDpv6"
blackholeDpv4Cmd := "sudo iptables -I INPUT -s 34.126.0.0/18 -j DROP && sleep 5 && echo blackholeDpv4"
allowDpv6Cmd := "sudo ip6tables -I INPUT -s 2001:4860:8040::/42 -j ACCEPT && sleep 5 && echo allowDpv6"
allowDpv4Cmd := "sudo iptables -I INPUT -s 34.126.0.0/18 -j ACCEPT && sleep 5 && echo allowDpv4"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes me really nervous. I don't expect a test to need sudo or alter iptables. Is there another way to do this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test itself is a bit tricky because we want to create a bad situation to make the client think it can not receive dp response (i.e., blackholed), so that we can test fallback. Previously in java, we modified the grpc netty library to achieve the blackhole (code here: https://github.com/googleapis/java-bigtable/blob/master/google-cloud-bigtable/src/test/java/com/google/cloud/bigtable/data/v2/it/DirectPathFallbackIT.java#L184), but this is very hacky and sorry I have no idea how to do a similar thing in Go.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that this seems really problematic as-is.

I'm more familiar with the HTTP clients but if I were testing something similar there, I would look into either setting up a small mock server to hit for my test, or using a custom roundtripper at the transport layer to produce the behavior I was looking for. Is it possible to do something similar in grpc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this test, we want to create a low level failure on TCP/IP, and grpc does not have any API to do that. Besides, a mock server is more like what we will use in unit tests, but this is an integration test and we want to make it as real as possible, so we may not want to use a mock server.

When we are running the tests, we have the ownership of the VM that runs the test (create the VM, use sudo to setup the VM, run the test, destroy the VM..), so it will fine to use sudo without worrying about environment consistency.

Also notice that users are not supposed to run this test, since users are not supposed to know if the traffic is DirectPath or CFE. I added a check for AttemptDirectPath flag at the beginning of the test. Does this looks good?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the feedback, tests should avoid mutating the machine state.
Ideally it should follow a pattern similar to the java impl:
https://github.com/googleapis/java-bigtable/blob/master/google-cloud-bigtable/src/test/java/com/google/cloud/bigtable/data/v2/it/DirectPathFallbackIT.java

Where the network failures are injected in process. It appears that grpc-go has some hooks to do something similar via the DialContext. the grpclb_fallback test that @kolea2 linked to, seems to use those hooks to test this exact scenario

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kolea2 @igorbernstein2 The grpclb_fallback test is using shell command to blackhole ip, https://github.com/grpc/grpc-go/blob/ff1fc890e43ac77e922f53a2cef396b3c6a8f2a1/interop/grpclb_fallback/client.go#L202-L207. And from what I know the command is iptables. So it seems we still need to use iptables here?

}

// Precondition: wait for DirectPath to connect.
countEnough := exerciseDirectPath(ctx, testEnv, table /*blackholeDP = */, false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go style does not have inline comments like this -- it's usually a sign the function being called should be split into more than one function. Perhaps one for the true case and one for the false case. That would also hopefully make it easier to understand & debug.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I do not understand this comment. I found this test https://github.com/googleapis/google-cloud-go/blob/master/bigtable/integration_test.go#L2013 also have similar inline comments. So could you give me an example about how should I write comments? Thanks for help!

}
}

func exerciseDirectPath(ctx context.Context, testEnv IntegrationEnv, table *Table, isBlackhole bool) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unclear to me what the return value is. Perhaps a different function name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the function name to examineTraffic. The function checks and returns if enough DirectPath or CFE traffic is seen within 2 minutes.

@mohanli-ml mohanli-ml requested a review from tbpg December 4, 2020 21:19
Copy link
Contributor

@tritone tritone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @kolea2

Comment on lines 2323 to 2326
blackholeDpv6Cmd := "sudo ip6tables -I INPUT -s 2001:4860:8040::/42 -j DROP && sleep 5 && echo blackholeDpv6"
blackholeDpv4Cmd := "sudo iptables -I INPUT -s 34.126.0.0/18 -j DROP && sleep 5 && echo blackholeDpv4"
allowDpv6Cmd := "sudo ip6tables -I INPUT -s 2001:4860:8040::/42 -j ACCEPT && sleep 5 && echo allowDpv6"
allowDpv4Cmd := "sudo iptables -I INPUT -s 34.126.0.0/18 -j ACCEPT && sleep 5 && echo allowDpv4"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that this seems really problematic as-is.

I'm more familiar with the HTTP clients but if I were testing something similar there, I would look into either setting up a small mock server to hit for my test, or using a custom roundtripper at the transport layer to produce the behavior I was looking for. Is it possible to do something similar in grpc?

@mohanli-ml
Copy link
Contributor Author

I have an idea about the iptables. I still think we need to use iptables for blackhole directpath traffic (which is the same as Igor's example https://github.com/grpc/grpc-go/blob/ff1fc890e43ac77e922f53a2cef396b3c6a8f2a1/interop/grpclb_fallback/client.go#L46-L47). However, inspired by this example, I made the shell command flags, so that we can avoid them accidentally mess up machines. Please note that this test is supposed to only run internally on google3, instead of kokoro. What do you think? @tbpg @tritone @kolea2 @igorbernstein2

Copy link
Contributor

@tbpg tbpg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the flag setup much more. Thanks!

for i := 0; i < numRPCsToSend; i++ {
_, _ = table.ReadRow(ctx, "jadams")
if _, useDp := isDirectPathRemoteAddress(testEnv); useDp != isBlackhole {
atomic.AddUint64(&numCount, 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This particular code will not run in parallel. The code you linked to is for pooling the underlying network connections.

Comment on lines +90 to +93
flag.StringVar(&blackholeDpv6Cmd, "it.blackhole-dpv6-cmd", "", "Command to make LB and backend addresses blackholed over dpv6")
flag.StringVar(&blackholeDpv4Cmd, "it.blackhole-dpv4-cmd", "", "Command to make LB and backend addresses blackholed over dpv4")
flag.StringVar(&allowDpv6Cmd, "it.allow-dpv6-cmd", "", "Command to make LB and backend addresses allowed over dpv6")
flag.StringVar(&allowDpv4Cmd, "it.allow-dpv4-cmd", "", "Command to make LB and backend addresses allowed over dpv4")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment with the expected values for these flags?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment added.

}

if len(blackholeDpv6Cmd) == 0 {
t.Fatal("-it.blackhole-dpv6-cmd unset")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this will fail the test if the flags aren't passed in. Perhaps we should skip instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have that 3 lines above, from L2164 to L2166:

if !testEnv.Config().AttemptDirectPath {
  return
}

If the test want to test DirectPath, then we can fail the test if no command is given.

}

// Precondition: wait for DirectPath to connect.
countEnough := examineTraffic(ctx, testEnv, table /*blackholeDP = */, false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what this blackholdDP = comment means.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I want to make /*blackholeDP=*/ after the comma, like countEnough := examineTraffic(ctx, testEnv, table, /*blackholeDP = */false), to indicate the meaning of the parameter, i.e., DirectPath is blackholed or not, because it could be confusing with only true and false here. However, gofmt will move the comment before the comma.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those types of comments are not ordinarily done in Go style, which is why gofmt doesn't handle them well -- they're uncommon and not designed for. I gave a few suggestions below of ways we could simplify.

Comment on lines 2357 to 2359
}
} else {
if blackholeDP {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
}
} else {
if blackholeDP {
}
return
}
if blackholeDP {

https://github.com/golang/go/wiki/CodeReviewComments#indent-error-flow is related.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Thanks for catching the format!

@mohanli-ml mohanli-ml requested a review from tbpg December 18, 2020 00:31
@mohanli-ml mohanli-ml added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jan 5, 2021
@kokoro-team kokoro-team removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jan 5, 2021
Copy link
Contributor

@tbpg tbpg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly a Go style review. I'm not 100% sure I understand the goal of all of the code, so some questions are very open.

minCompleteRPC = 40
)

countEnough := false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we return as soon as we set this to true, thereby avoiding the need for this variable at all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I have removed the variable.


// examineTraffic counts RPCs use DirectPath or CFE traffic.
func examineTraffic(ctx context.Context, testEnv IntegrationEnv, table *Table, expectDP bool) bool {
var numCount uint64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider:

Suggested change
var numCount uint64
count := 0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

numCount++
}
time.Sleep(100 * time.Millisecond)
countEnough = numCount >= minCompleteRPC
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the suggestion to return early above, the check should be above the time.Sleep call. No need to sleep if we're already done.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 2219 to 2223
if _, useDP := isDirectPathRemoteAddress(testEnv); useDP != expectDP {
numCount++
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does useDP ever change in this loop? It appears that it will always be the same value?

Also, why send any RPCs if this will never be true? Could we just return before doing anything?

Perhaps the caller should check isDirectPathRemoteAddress then this function doesn't need the expectDP argument?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

useDP does get changed. useDP is returned by isDirectPathRemoteAddress(), which will return true if the traffic is DirectPath, and false if the traffic is CFE. So when DP is allowed, we expect useDP to be true, but when DP is blackholed, we expect useDP to be false.

Sorry the name of the variable expectDP is misleading, I have rename it as blackholeDP. Thus, when useDP != blackholeDP, we get the traffic as expected. Specifically as described in the PR description, this test has three stages, as follows:

  1. First wait until DirectPath traffic is observed;
    In this stage, useDP should be true, and blackholeDP should be false.

  2. Blackhole DirectPath net, and wait until the client fallback to use CFE;
    In this stage, useDP should be false, and blackholeDP should be true.

  3. Unblackhole DirectPath net, and the client should upgrade back to use DirectPath again.
    The DirectPath and CFE traffic are distinguished by peer IP.
    In this stage, useDP should be true, and blackholeDP should be false.

@@ -2259,3 +2343,34 @@ func isDirectPathRemoteAddress(testEnv IntegrationEnv) (_ string, _ bool) {
// DirectPath ipv6 can use either ipv4 or ipv6 traffic.
return remoteIP, strings.HasPrefix(remoteIP, directPathIPV4Prefix) || strings.HasPrefix(remoteIP, directPathIPV6Prefix)
}

func blackholeOrAllowDirectPath(testEnv IntegrationEnv, t *testing.T, blackholeDP bool) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we split this into two functions, one corresponding to blackholeDP=true and one for false?

That would get rid of the need for the blackholeDP argument at all, making calls more clear and easier to debug.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}

// Precondition: wait for DirectPath to connect.
countEnough := examineTraffic(ctx, testEnv, table /*blackholeDP = */, false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those types of comments are not ordinarily done in Go style, which is why gofmt doesn't handle them well -- they're uncommon and not designed for. I gave a few suggestions below of ways we could simplify.

defer cleanup()

if !testEnv.Config().AttemptDirectPath {
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this return or t.Skip? Return means the test "passes." So, skipping seems more correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 2186 to 2187
countEnough := examineTraffic(ctx, testEnv, table, false)
if !countEnough {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this name could be more clear?

Suggested change
countEnough := examineTraffic(ctx, testEnv, table, false)
if !countEnough {
dpEnabled := examineTraffic(ctx, testEnv, table, false)
if !dpEnabled {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 2349 to 2360
if testEnv.Config().DirectPathIPV4Only {
cmdRes := exec.Command("bash", "-c", blackholeDpv4Cmd)
out, _ := cmdRes.CombinedOutput()
t.Logf(string(out))
} else {
cmdRes := exec.Command("bash", "-c", blackholeDpv4Cmd)
out, _ := cmdRes.CombinedOutput()
t.Logf(string(out))
cmdRes = exec.Command("bash", "-c", blackholeDpv6Cmd)
out, _ = cmdRes.CombinedOutput()
t.Logf(string(out))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if testEnv.Config().DirectPathIPV4Only {
cmdRes := exec.Command("bash", "-c", blackholeDpv4Cmd)
out, _ := cmdRes.CombinedOutput()
t.Logf(string(out))
} else {
cmdRes := exec.Command("bash", "-c", blackholeDpv4Cmd)
out, _ := cmdRes.CombinedOutput()
t.Logf(string(out))
cmdRes = exec.Command("bash", "-c", blackholeDpv6Cmd)
out, _ = cmdRes.CombinedOutput()
t.Logf(string(out))
}
cmdRes := exec.Command("bash", "-c", blackholeDpv4Cmd)
out, _ := cmdRes.CombinedOutput()
t.Logf(string(out))
if testEnv.Config().DirectPathIPV4Only {
return
}
cmdRes = exec.Command("bash", "-c", blackholeDpv6Cmd)
out, _ = cmdRes.CombinedOutput()
t.Logf(string(out))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}
}

// examineTraffic counts RPCs use DirectPath or CFE traffic.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// examineTraffic counts RPCs use DirectPath or CFE traffic.
// examineTraffic returns whether RPCs use DirectPath (blackholeDP = false) or CFE (blackholeDP = true).

Is that right? I'm not sure if there is a more clear way to say it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is correct.

@mohanli-ml mohanli-ml requested a review from tbpg January 11, 2021 21:20
@tbpg tbpg added the automerge Merge the pull request once unit tests and other checks pass. label Jan 13, 2021
@tbpg tbpg merged commit e6684c3 into googleapis:master Jan 13, 2021
@gcf-merge-on-green gcf-merge-on-green bot removed the automerge Merge the pull request once unit tests and other checks pass. label Jan 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigtable Issues related to the Bigtable API. cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants