Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug fix: Flow Directory. (test_flow_dir) #248

Open
wants to merge 33 commits into
base: develop
Choose a base branch
from

Conversation

bdevierno1
Copy link
Contributor

@bdevierno1 bdevierno1 commented Jul 14, 2020

Test flow directory NF bug fix.

Summary:

NF was crashing when attempting to add a new rule to the shared sdn_nf flow table.

Usage:
As described in readme.

This PR includes
Resolves issues
Breaking API changes
Internal API changes
Usability improvements
Bug fixes
New functionality
New NF/onvm_mgr args
Changes to starting NFs
Dependency updates
Web stats updates

Merging notes:

  • Dependencies: None

TODO before merging :

  • PR is ready for review

This resolves issue #247

Test Plan:

Run NF with Pktgen. Test that NF does not crash as before. As I made a change to the fill key functionality. All NFs that add/remove/lookup keys to a flow table need to be tested.

Review:

Anyone.

@onvm
Copy link

onvm commented Jul 14, 2020

In response to PR creation

CI Message

Your results will arrive shortly

@onvm
Copy link

onvm commented Jul 14, 2020

In response to PR creation

CI Message

Error: Failed to parse Pktgen stats

@bdevierno1 bdevierno1 closed this Jul 14, 2020
@bdevierno1 bdevierno1 reopened this Jul 17, 2020
@bdevierno1 bdevierno1 changed the title Bug fix: Not setting action after adding to table. Bug fix: Flow Directory. Jul 17, 2020
@bdevierno1
Copy link
Contributor Author

@onvm Hello HAL, do you read me?

@onvm
Copy link

onvm commented Jul 17, 2020

@onvm Hello HAL, do you read me?

CI Message

Your results will arrive shortly

@onvm
Copy link

onvm commented Jul 17, 2020

@onvm Hello HAL, do you read me?

CI Message

Error: Failed to fetch results from nimbnode23

@bdevierno1
Copy link
Contributor Author

In my last commit I have added a new feature/example populating service chains to the flow table #60 . You can then use this to test the onvm flow dir API by starting up up to four test_flow_dir NFs. Another way is to run this NF to populate the flow table then subsequently use the simple forward NF; but first modifying the packet handler to use the action next instead.

I have made further changes to the flow table api as it was printing the keys I had pre-defined backwards.

I have reverted the flow_entry variable to initialize to NULL. However, I had to allocate memory within the packet handler which will probably be slower. I will update this soon as there is probably a better way to do this.

@bdevierno1
Copy link
Contributor Author

bdevierno1 commented Jul 28, 2020

image
This is ready for review. If users set destination to 255 it will use the next action. Notable changes in last few commits involve using onvm's rss hash during table lookup.

@bdevierno1
Copy link
Contributor Author

@onvm Hello HAL, do you read me?

@onvm
Copy link

onvm commented Jul 28, 2020

@onvm Hello HAL, do you read me

CI Message

Your results will arrive shortly

1 similar comment
@onvm
Copy link

onvm commented Jul 28, 2020

@onvm Hello HAL, do you read me

CI Message

Your results will arrive shortly

Copy link

@onvm onvm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@onvm Hello HAL, do you read me

CI Message

Run successful see results:
Test took: 14 minutes, 59 seconds
✔️ Pktgen performance check passed
❌ PR drops Speed Test performance below minimum requirement

[Results from nimbnode23]

  • Median TX pps for Pktgen: 7735755
    Performance rating - 100.46% (compared to 7700000 average)

  • Median TX pps for Speed Tester: 39947208
    Performance rating - 95.11% (compared to 42000000 average)

@kevindweb
Copy link
Contributor

@onvm is it really slow?

@onvm
Copy link

onvm commented Jul 28, 2020

@onvm is it really slow?

CI Message

Your results will arrive shortly

Copy link

@onvm onvm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@onvm is it really slow?

CI Message

Run successful see results:
Test took: 4 minutes, 1 second
✔️ Pktgen performance check passed
❌ PR drops Speed Test performance below minimum requirement

[Results from nimbnode23]

  • Median TX pps for Pktgen: 7733636
    Performance rating - 100.44% (compared to 7700000 average)

  • Median TX pps for Speed Tester: 39891946
    Performance rating - 94.98% (compared to 42000000 average)

Copy link
Member

@koolzz koolzz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comment

@@ -102,7 +102,7 @@ onvm_pkt_process_tx_batch(struct queue_mgr *tx_mgr, struct rte_mbuf *pkts[], uin
// and !<return value> is 1.
nf->stats.act_drop++;
nf->stats.tx += !onvm_pkt_drop(pkts[i]);
} else if (meta->action == ONVM_NF_ACTION_NEXT) {
} else if (meta->action == ONVM_NF_ACTION_NEXT || meta->destination == 255) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No magic numbers please, what is this 255? Why do we need it?
Can't the packet just set action ONVM_NF_ACTION_NEXT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a special number is purposeful since we want to use this to indicate the next action even if the NF wasn't designed for that purpose (eg this lets us make the SimpleFwd NF use the "next action" behavior if you set 255 at the command line as the destination ID).

But you are right that it shouldn't be totally magic -- @bdevierno1 you should #define a constant somewhere for this instead of using 255 directly here.

Copy link
Member

@koolzz koolzz Jul 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, feels like a lazy way to change the action(we could just bake in the same if meta->destination ==255 -> meta->action = ONVM_NF_ACTION_NEXT into the NF itself) but if people find it easier lets define a global const.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bdevierno1 Also if we're doing this, after #defining the constant (probably in common.h) do a onvm mgr startup check to confirm that your const number is more than MAX_NFS. Otherwise if someone has 512 NFS configured they will be suprised that forwarding to 256 doesn't work

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line was resulting in the perf drop with speed tester. I couldn't find another way to optimize this besides moving the ONVM_NF_ACTION_TONF if statement above the one for the NEXT action on line 105. But I think this would feel like it would be an over-optimization for speed tester? I'll update to use the global const for now. I could also update all the sample NFs to use a flag instead to indicate to use the next action as well if we decide that's better.

Benjamin De Vierno added 2 commits July 28, 2020 22:14
@bdevierno1
Copy link
Contributor Author

@onvm Test please.

@onvm
Copy link

onvm commented Jul 29, 2020

@onvm Test please.

CI Message

Your results will arrive shortly

Copy link

@onvm onvm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@onvm Test please.

CI Message

Run successful see results:
Test took: 4 minutes, 3 seconds
✔️ Pktgen performance check passed
❌ PR drops Speed Test performance below minimum requirement

[Results from nimbnode23]

  • Median TX pps for Pktgen: 7732411
    Performance rating - 100.42% (compared to 7700000 average)

  • Median TX pps for Speed Tester: 39828650
    Performance rating - 94.83% (compared to 42000000 average)

@twood02 twood02 changed the title Bug fix: Flow Directory. Bug fix: Flow Directory. (test_flow_dir) Jul 29, 2020
@bdevierno1
Copy link
Contributor Author

Ran two tests using c220g1 nodes in Wisconsin site. Noticed about a 2% performance drop.
Screen Shot 2020-08-03 at 3 59 31 PM
Without setting 255 as next action.
Screen Shot 2020-08-03 at 4 01 07 PM

} else if (meta->action == ONVM_NF_ACTION_NEXT) {
/* TODO: Here we drop the packet : there will be a flow table
in the future to know what to do with the packet next */
} else if (meta->action == ONVM_NF_ACTION_NEXT || meta->destination == ACTION_NEXT_DEST_ID) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bdevierno1 setting 255 as the next action would definitely decrease performance, but I think that (from CI) we see that just having to process this or || statement in the condition slows down other NFs. Because we know that speed_tester would not have set the meta->destination to 255. I think that's the cause of the ~4% performance drop from CI

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is correct. Unfortunately, I tried different ways to optimize this but did not see any substantial improvements. By removing the OR I will still have to set another conditional which will have to be placed before this statement so as to maintain the same logic. Perhaps there is another way to do this?

dennisafa
dennisafa previously approved these changes Dec 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants