Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MNMG KNNs updates #5073

Draft
wants to merge 10,000 commits into
base: branch-23.02
Choose a base branch
from
Draft

Conversation

viclafargue
Copy link
Contributor

This PR will :

  • Add metric and metric arg to MNMG NearestNeighbors, KNeighborsClassifier and KNeighborsRegressor
  • Implement and expose sparse API for MNMG NearestNeighbors

Answers #3464

GPUtester and others added 30 commits March 24, 2022 20:32
[gpuCI] Forward-merge branch-22.04 to branch-22.06 [skip gpuci]
Goals:
- Python documentation should be built for PRs 
- Checks should fail upon sphinx warnings
- Existing warnings are fixed in this PR

This PR covers only the python documentation. C++ docs are out of scope.

Authors:
  - Rory Mitchell (https://github.com/RAMitchell)
  - Dante Gama Dessavre (https://github.com/dantegd)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)
  - AJ Schmidt (https://github.com/ajschmidt8)
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: rapidsai#4585
[gpuCI] Forward-merge branch-22.04 to branch-22.06 [skip gpuci]
Continuation of rapidsai#2975 to change docstring to a doctest format.
I'm also adding a pytest file to execute doctest, similar to what's being done in [cudf](https://github.com/rapidsai/cudf/blob/branch-22.04/python/cudf/cudf/tests/test_doctests.py)

Authors:
  - Micka (https://github.com/lowener)
  - Yuqiong Li (https://github.com/yuqli)
  - Dante Gama Dessavre (https://github.com/dantegd)
  - Michael Demoret (https://github.com/mdemoret-nv)

Approvers:
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: rapidsai#4618
[gpuCI] Forward-merge branch-22.04 to branch-22.06 [skip gpuci]
… single GPU (rapidsai#4645)

Fixes single GPU build by dividing decomposition pxd headers so that cumlprims files are not included in single GPU cython files.

Authors:
  - Dante Gama Dessavre (https://github.com/dantegd)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)
  - AJ Schmidt (https://github.com/ajschmidt8)

URL: rapidsai#4645
[gpuCI] Forward-merge branch-22.04 to branch-22.06 [skip gpuci]
Replace `cudf.logical_not` with `~`.

Authors:
  - Andy Adinets (https://github.com/canonizer)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: rapidsai#4669
- removed default `T = float` from multi-sum
- tests with float64 for multi-sum and `child_index()`
- refactored multi-sum tests to reduce shared memory usage
  - this is needed for the tests with float64 to compile

This is pull request 1 of 3 to integrate rapidsai#4646. This pull request is partly based on the work by @levsnv.

Authors:
  - Andy Adinets (https://github.com/canonizer)

Approvers:
  - Divye Gala (https://github.com/divyegala)
  - William Hicks (https://github.com/wphicks)

URL: rapidsai#4648
This small PR :
* Rectifies/clarifies the wrong parameter description for `max_depth` parameter in `RandomForestRegressor` and `RandomForestClassifier` in RF and dask-RF. 
* Corrects a syntax error in dask-rf docs
* Corrects the `max_depth` check in RF-python layer

resolves rapidsai#4623

Authors:
  - Venkat (https://github.com/venkywonka)

Approvers:
  - Rory Mitchell (https://github.com/RAMitchell)
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: rapidsai#4666
[gpuCI] Forward-merge branch-22.04 to branch-22.06 [skip gpuci]
We don't have arm64 packages for `rapids-doc-env`, so only run the doc builds for x86_64

Authors:
   - Ray Douglass (https://github.com/raydouglass)

Approvers:
   - AJ Schmidt (https://github.com/ajschmidt8)
[gpuCI] Forward-merge branch-22.04 to branch-22.06 [skip gpuci]
Templatized functions related to FIL inference in preparation of `float64` support.

Instantiations of templates with `float64`, or tests for `float64`, _are not included_; they will be provided in a future pull request.

This is pull request 2 of 3 to integrate rapidsai#4646. This pull request is partly based on the work by @levsnv.

Authors:
  - Andy Adinets (https://github.com/canonizer)
  - Levs Dolgovs (https://github.com/levsnv)
  - Dante Gama Dessavre (https://github.com/dantegd)

Approvers:
  - Divye Gala (https://github.com/divyegala)
  - William Hicks (https://github.com/wphicks)

URL: rapidsai#4655
Fix rapidsai#4638.

Authors:
   - Rory Mitchell (https://github.com/RAMitchell)

Approvers:
   - Dante Gama Dessavre (https://github.com/dantegd)
[gpuCI] Forward-merge branch-22.04 to branch-22.06 [skip gpuci]
Add a `libcuml-tests` package to the `libcuml` recipe:
- This is a prerequisite for removing "Project Flash" from our build/CI scripts.
- The `libcuml-tests` package was added as an additional output to the existing `libcuml` recipe (which was renamed to `libcuml-split`)

Authors:
  - Jordan Jacobelli (https://github.com/Ethyling)

Approvers:
  - AJ Schmidt (https://github.com/ajschmidt8)
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: rapidsai#4635
`float64` support in FIL core and tests.

Added `float64`-based instantiations of FIL types and functions, together with `float64`-based FIL tests.

Treelite and Python layers _are not covered_ here, but will be in separate pull requests.

This is pull request 3 of 3 to enable `float64` support in FIL core. This is partly based on the work by @levsnv.

Authors:
  - Andy Adinets (https://github.com/canonizer)
  - Levs Dolgovs (https://github.com/levsnv)

Approvers:
  - Divye Gala (https://github.com/divyegala)

URL: rapidsai#4646
[REVIEW] Unpin `dask` & `distributed` for development
This PR fixes a minor typo in the cuML intro documentation.

Authors:
  - Christopher Akiki (https://github.com/cakiki)

Approvers:
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: rapidsai#4695
CMake 3.23.1 contains the bug fixes that we need to make use of CMake 3.23, so now we can update the pinnings to just avoid 3.23.0.

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Jordan Jacobelli (https://github.com/Ethyling)
  - Robert Maynard (https://github.com/robertmaynard)

URL: rapidsai#4698
`float64` support in treelite->FIL import and Python layer

Authors:
  - Andy Adinets (https://github.com/canonizer)
  - Levs Dolgovs (https://github.com/levsnv)

Approvers:
  - Philip Hyunsu Cho (https://github.com/hcho3)
  - William Hicks (https://github.com/wphicks)

URL: rapidsai#4690
Increased test coverage for TreeExplainer, greatly expanding model types tested. New tests take around 4.8s on my machine.

Fixes rapidsai#4352

New bugs found:
rapidsai#4663
dmlc/treelite#375
rapidsai#4670

Authors:
  - Rory Mitchell (https://github.com/RAMitchell)

Approvers:
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: rapidsai#4671
This PR:

- [x] Moves all pytest from `test` to `tests`
- [x] Converts `test` module to `tests` directory
- [x] Creates a `testing` module that contains all utility functions for pytests in `tests` need.

Authors:
  - GALI PREM SAGAR (https://github.com/galipremsagar)

Approvers:
  - AJ Schmidt (https://github.com/ajschmidt8)
  - Jordan Jacobelli (https://github.com/Ethyling)
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: rapidsai#4703
In this PR, I'll wrap `dask-glm` models so that it accepts `dask_cudf Dataframes` and behaves like other `cuml.dask` models. `dask-glm` provides three estimators: `LogisticRegression`, `LinearRegression` and `PoissonRegression`. MNMG  `LogisticRegression` is requested by @beckernick . @JohnZed for visibility. Thank you all.

Authors:
  - Jiwei Liu (https://github.com/daxiongshu)
  - Nick Becker (https://github.com/beckernick)
  - Dante Gama Dessavre (https://github.com/dantegd)

Approvers:
  - Ray Douglass (https://github.com/raydouglass)
  - AJ Schmidt (https://github.com/ajschmidt8)
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: rapidsai#3512
beckernick and others added 4 commits December 18, 2022 22:36
Closes rapidsai#5032
TODO: Add this docstring to doctest

Authors:
  - Micka (https://github.com/lowener)

Approvers:
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: rapidsai#5041
The KMeans C++ code is already tested in Raft.
Closes rapidsai#5058

Authors:
  - Micka (https://github.com/lowener)

Approvers:
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: rapidsai#5098
Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Loving the updates so far! This is going to be awesome when we can swap out the FAISS calls for RAFT. I know this is still draft but I figured i'd leave some early feedback while I was looking through it.

params.rowMajorQuery,
&start_indices_long,
raft::distance::DistanceType::L2SqrtExpanded);
raft::spatial::knn::brute_force_knn<std::int64_t, float, std::size_t>(handle,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may as well start using the raft::neighbors::brute_force::knn here while we're updating this code. It shouldn't conflict w/ the raft::spatial::knn namespace since that's typedef'd to raft::neighbors now.

@@ -25,6 +25,7 @@ from cuml.neighbors.nearest_neighbors_mg import NearestNeighborsMG

from pylibraft.common.handle cimport handle_t
from cuml.common.opg_data_utils_mg cimport *
from cuml.metrics.distance_type cimport DistanceType
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of having this defined in two places, why don't we start using pylibraft.distance.DistanceType instead?

@@ -25,6 +25,7 @@ from cuml.neighbors.nearest_neighbors_mg import NearestNeighborsMG

from pylibraft.common.handle cimport handle_t
from cuml.common.opg_data_utils_mg cimport *
from cuml.metrics.distance_type cimport DistanceType
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, pylibraft.

@@ -28,6 +28,7 @@ from cuml.neighbors import NearestNeighbors
from pylibraft.common.handle cimport handle_t
from cuml.common.opg_data_utils_mg cimport *
from cuml.common.opg_data_utils_mg import _build_part_inputs
from cuml.metrics.distance_type cimport DistanceType
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pylibraft provides this (and also provides the enum) so we can cut down on redundancy here.

lowener and others added 23 commits December 20, 2022 16:23
**To Do (non-exhaustive list):**
- [X] Get checks working
- [x] Get cpp builds working
- [x] Get cpp tests working
- [x] Get conda python builds working
- [x] Get pytest tests working
- [x] Fix conda warning in cpp builds (e.g. `WARNING:conda_build.utils:Adding .* to spec 'treelite 3.0.1' to ensure satisfiability.  Please consider putting {{ var_name }}.* or some relational operator (>/</>=/<=) on this spec in meta.yaml`)
- [x] Get notebooks testing working
- [x] Ensure dependency changes are correct and complete (e.g. does `libcuml` need anymore `libcu*` runtime dependencies?)

~~Future PRs~~ Also this PR (thanks Dante!):

- [x] Implement `pytest-xdist` to parallelize Python tests
- [x] Parallelize `cuml` and `cuml-dask` tests

Authors:
  - Carl Simon Adorf (https://github.com/csadorf)
  - AJ Schmidt (https://github.com/ajschmidt8)
  - Dante Gama Dessavre (https://github.com/dantegd)
  - Micka (https://github.com/lowener)

Approvers:
  - Dante Gama Dessavre (https://github.com/dantegd)
  - AJ Schmidt (https://github.com/ajschmidt8)

URL: rapidsai#5075
…sai#5101)

This PR undoes some erroneous changes I made in rapidsai#5075.

According to @bdice, it's necessary to `cd` into the relevant directories when running `pytest` to avoid issues with `codecov`.

Authors:
  - AJ Schmidt (https://github.com/ajschmidt8)

Approvers:
  - Sevag H (https://github.com/sevagh)
  - Bradley Dice (https://github.com/bdice)

URL: rapidsai#5101
This PR enables the `Recently Updated` check, which is described here: https://docs.rapids.ai/resources/recently-updated/

Authors:
   - AJ Schmidt (https://github.com/ajschmidt8)

Approvers:
   - Dante Gama Dessavre (https://github.com/dantegd)
Due to some limitations on the number of reusable workflows that can be used in a given GH Action workflow, I need to switch the nightly workflow files (`build.yaml` and `test.yaml`) to use `workflow_dispatch` instead of `workflow_call`.

Because of this switch, we can also remove the hardcoded `repo` field in `{build,test}.yaml`.
…5108)

This is a small cleanup to remove the outdated `PROJECT_FLASH` environment variable from the libcuml recipe.

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - AJ Schmidt (https://github.com/ajschmidt8)
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: rapidsai#5108
This PR adds a missing dependency for the `pr-builder` job in the `pr.yaml` workflow.

Authors:
   - Dante Gama Dessavre (https://github.com/dantegd)

Approvers:
   - AJ Schmidt (https://github.com/ajschmidt8)
Answers rapidsai#5106

Authors:
  - Victor Lafargue (https://github.com/viclafargue)

Approvers:
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: rapidsai#5116
rapids-cmake 23.02 is deprecating the magic value of `ALL` since it doesn't cleanly map to the cmake magic value of `all`. Instead we use `RAPIDS` which better represents the architectures we are building for.

Authors:
  - Robert Maynard (https://github.com/robertmaynard)

Approvers:
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: rapidsai#5124
…5125)

Small cleanup to remove `MACOSX_DEPLOYMENT_TARGET` references from a build script (macOS is not supported by RAPIDS).

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Dante Gama Dessavre (https://github.com/dantegd)
  - AJ Schmidt (https://github.com/ajschmidt8)

URL: rapidsai#5125
It looks like there are still some attributes with `ptr` in their name that are being serialized with the hdbscan object and that's causing some segfaults when serialization is performed on a separate process. 

Closes rapidsai#4986

Authors:
  - Corey J. Nolet (https://github.com/cjnolet)

Approvers:
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: rapidsai#5128
This PR adds pip wheel CI to the Conda CI, instead of having them work separately.

Authors:
  - Sevag H (https://github.com/sevagh)
  - AJ Schmidt (https://github.com/ajschmidt8)
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - AJ Schmidt (https://github.com/ajschmidt8)
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: rapidsai#5109
We are removing the faiss ANN code in rapidsai/raft#1121, in favour of
using the ivf_flat and ivf_pq implementations included with raft.

After this change, RAFT can be updated to remove the faiss ANN methods  - which is implemented in
rapidsai/raft#1121

Note that this removes the `ivf_sq` option , since there is no corresponding implementation in raft.

Closes rapidsai#5131

Authors:
  - Ben Frederickson (https://github.com/benfred)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: rapidsai#5119
@viclafargue
Copy link
Contributor Author

When a worker has multiple index partitions, the query is run against each and results are merged as expected in the dense case. However, the sparse KNN do not seem to currently offer this feature. In addition, inputs and outputs indices are required to be of the same type in sparse KNN template, but the use of int64_t does not look to be stable.

Because of this, some changes seem to be necessary in RAFT for this implementation to be completed. I listed them there : rapidsai/raft#1187.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2 - In Progress Currenty a work in progress CUDA/C++ Cython / Python Cython or Python issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet