Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TINKERPOP-2487 add stdev and percentile steps #1375

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

junshiguo
Copy link
Contributor

https://issues.apache.org/jira/browse/TINKERPOP-2487

Added two frequently used analytical steps for calculation of standard deviation and percentile value. The example usage is

gremlin> g.V().values('ages')
==>1
==>2
==>3
gremlin> g.V().values('ages').stdev()
==>0.816
gremlin> g.V().values('ages').fold().stdev(Scope.local)
==>0.816

gremlin> g.V().values('ages').percentile(50)
==>2
// one percentile, return single value
gremlin> g.V().values('ages').percentile(0, 100)
==>[0: 1, 100: 3]
// multiple percentiles, return a map

@spmallette
Copy link
Contributor

@junshiguo sorry, it's taken some time to get to reviewing this. i'll start with some bigger picture feedback first:

  1. Could you please add some Upgrade Documentation to call attention to this new feature.
  2. I see that you supplied changes for .NET/Javascript - will you be able to add python as well?
  3. I see some tests, but don't see any tests for the Gremlin Process Suite (example) and GLV suite (example)

Once those more major items are settled I can look at the code itself more carefully. Thanks!

@krlawrence
Copy link
Contributor

krlawrence commented Jan 6, 2021

I would really like to see a product step added as part of this PR. I think it only adds stddev and percentile from looking at the code. If we are going to add new steps for statisticians, I would prefer we do the complete set rather than do this piecemeal. I had also suggested in the earlier discussions that variance would be appropriate to also add.

@spmallette
Copy link
Contributor

i'm not against accepting this with just the steps present but i see where you are coming from @krlawrence .

does it specifically need to be part of this PR? perhaps we could just make the additional steps a blocker for 3.5.0 and not release without it? how does that sound?

@krlawrence
Copy link
Contributor

That sounds like a sensible compromise @spmallette - for 3.5.0 it would be nice to try to get whatever remaining steps we think are needed by statisticians included as part of the release.

@junshiguo
Copy link
Contributor Author

@spmallette Sorry for the late reply. The PR is updated with documentation, python support and tests as requested.
product and variance steps implementation should be very similar. I will implement them later and submit for review.

@spmallette
Copy link
Contributor

Thanks for updating the PR and offering to do the product and variance work on a future PR. Looks like there are some test failures with gremlin-javascript which is causing travis to not pass the build.

@spmallette
Copy link
Contributor

Hello @junshiguo , I just thought I'd check in on this PR. I noticed there were still some test failures here. Did you need some help looking into them or have you just not had time to come back to this yet?

@spmallette
Copy link
Contributor

Hi @junshiguo - I was just wondering if you planned to return to this pull request? If not, I will plan to pick up where you left off. Thanks for your efforts so far.

@junshiguo
Copy link
Contributor Author

@spmallette Sorry I missed the messages. I am not very familiar with the GLV tests and having trouble fixing the test failures. It seems like the generated gremlin-javascript/src/main/javascript/gremlin-javascript/test/cucumber/gremlin.js is not correct. Would you please help here?

@spmallette
Copy link
Contributor

After looking at this in more detail, I think that there are several things to discuss:

  1. Part of the problem is in the JavaTranslator which isn't recognized the signature of your int varargs because it checks for Object[] and not the primitive forms. That's worked until now. I'm not sure how we best resolve that, but I'm not sure it's best to keep hacking in "side cases" to JavaTranslator.
  2. I wonder if int is the right data type there. Looking at SQL and Cypher their approach is to use double between 0.0 and 1.0.
  3. I wonder about the return type which changes based on the number of percentiles - i.e. you get a number for a single percentile but a Map for multiple. There is some precedence for that with select() though.

I think that items 2 and 3 probably need to be brought to the dev list for further discussion to see if anyone has any thoughts on the matter.

@krlawrence
Copy link
Contributor

This PR looked promising but appears to have stalled. Are you considering completing this work @junshiguo ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants