-
Notifications
You must be signed in to change notification settings - Fork 508
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MNMG KNNs updates #5073
base: branch-23.02
Are you sure you want to change the base?
MNMG KNNs updates #5073
Conversation
[gpuCI] Forward-merge branch-22.04 to branch-22.06 [skip gpuci]
Goals: - Python documentation should be built for PRs - Checks should fail upon sphinx warnings - Existing warnings are fixed in this PR This PR covers only the python documentation. C++ docs are out of scope. Authors: - Rory Mitchell (https://github.com/RAMitchell) - Dante Gama Dessavre (https://github.com/dantegd) Approvers: - Corey J. Nolet (https://github.com/cjnolet) - AJ Schmidt (https://github.com/ajschmidt8) - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#4585
[gpuCI] Forward-merge branch-22.04 to branch-22.06 [skip gpuci]
Continuation of rapidsai#2975 to change docstring to a doctest format. I'm also adding a pytest file to execute doctest, similar to what's being done in [cudf](https://github.com/rapidsai/cudf/blob/branch-22.04/python/cudf/cudf/tests/test_doctests.py) Authors: - Micka (https://github.com/lowener) - Yuqiong Li (https://github.com/yuqli) - Dante Gama Dessavre (https://github.com/dantegd) - Michael Demoret (https://github.com/mdemoret-nv) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#4618
[gpuCI] Forward-merge branch-22.04 to branch-22.06 [skip gpuci]
… single GPU (rapidsai#4645) Fixes single GPU build by dividing decomposition pxd headers so that cumlprims files are not included in single GPU cython files. Authors: - Dante Gama Dessavre (https://github.com/dantegd) Approvers: - Corey J. Nolet (https://github.com/cjnolet) - AJ Schmidt (https://github.com/ajschmidt8) URL: rapidsai#4645
[gpuCI] Forward-merge branch-22.04 to branch-22.06 [skip gpuci]
Replace `cudf.logical_not` with `~`. Authors: - Andy Adinets (https://github.com/canonizer) Approvers: - Bradley Dice (https://github.com/bdice) - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#4669
- removed default `T = float` from multi-sum - tests with float64 for multi-sum and `child_index()` - refactored multi-sum tests to reduce shared memory usage - this is needed for the tests with float64 to compile This is pull request 1 of 3 to integrate rapidsai#4646. This pull request is partly based on the work by @levsnv. Authors: - Andy Adinets (https://github.com/canonizer) Approvers: - Divye Gala (https://github.com/divyegala) - William Hicks (https://github.com/wphicks) URL: rapidsai#4648
This small PR : * Rectifies/clarifies the wrong parameter description for `max_depth` parameter in `RandomForestRegressor` and `RandomForestClassifier` in RF and dask-RF. * Corrects a syntax error in dask-rf docs * Corrects the `max_depth` check in RF-python layer resolves rapidsai#4623 Authors: - Venkat (https://github.com/venkywonka) Approvers: - Rory Mitchell (https://github.com/RAMitchell) - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#4666
[gpuCI] Forward-merge branch-22.04 to branch-22.06 [skip gpuci]
We don't have arm64 packages for `rapids-doc-env`, so only run the doc builds for x86_64 Authors: - Ray Douglass (https://github.com/raydouglass) Approvers: - AJ Schmidt (https://github.com/ajschmidt8)
[gpuCI] Forward-merge branch-22.04 to branch-22.06 [skip gpuci]
Templatized functions related to FIL inference in preparation of `float64` support. Instantiations of templates with `float64`, or tests for `float64`, _are not included_; they will be provided in a future pull request. This is pull request 2 of 3 to integrate rapidsai#4646. This pull request is partly based on the work by @levsnv. Authors: - Andy Adinets (https://github.com/canonizer) - Levs Dolgovs (https://github.com/levsnv) - Dante Gama Dessavre (https://github.com/dantegd) Approvers: - Divye Gala (https://github.com/divyegala) - William Hicks (https://github.com/wphicks) URL: rapidsai#4655
Fix rapidsai#4638. Authors: - Rory Mitchell (https://github.com/RAMitchell) Approvers: - Dante Gama Dessavre (https://github.com/dantegd)
[gpuCI] Forward-merge branch-22.04 to branch-22.06 [skip gpuci]
Add a `libcuml-tests` package to the `libcuml` recipe: - This is a prerequisite for removing "Project Flash" from our build/CI scripts. - The `libcuml-tests` package was added as an additional output to the existing `libcuml` recipe (which was renamed to `libcuml-split`) Authors: - Jordan Jacobelli (https://github.com/Ethyling) Approvers: - AJ Schmidt (https://github.com/ajschmidt8) - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#4635
`float64` support in FIL core and tests. Added `float64`-based instantiations of FIL types and functions, together with `float64`-based FIL tests. Treelite and Python layers _are not covered_ here, but will be in separate pull requests. This is pull request 3 of 3 to enable `float64` support in FIL core. This is partly based on the work by @levsnv. Authors: - Andy Adinets (https://github.com/canonizer) - Levs Dolgovs (https://github.com/levsnv) Approvers: - Divye Gala (https://github.com/divyegala) URL: rapidsai#4646
Authors: - Jiwei Liu (https://github.com/daxiongshu) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) - AJ Schmidt (https://github.com/ajschmidt8) URL: rapidsai#4692
[REVIEW] Unpin `dask` & `distributed` for development
This PR fixes a minor typo in the cuML intro documentation. Authors: - Christopher Akiki (https://github.com/cakiki) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#4695
CMake 3.23.1 contains the bug fixes that we need to make use of CMake 3.23, so now we can update the pinnings to just avoid 3.23.0. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Jordan Jacobelli (https://github.com/Ethyling) - Robert Maynard (https://github.com/robertmaynard) URL: rapidsai#4698
`float64` support in treelite->FIL import and Python layer Authors: - Andy Adinets (https://github.com/canonizer) - Levs Dolgovs (https://github.com/levsnv) Approvers: - Philip Hyunsu Cho (https://github.com/hcho3) - William Hicks (https://github.com/wphicks) URL: rapidsai#4690
Increased test coverage for TreeExplainer, greatly expanding model types tested. New tests take around 4.8s on my machine. Fixes rapidsai#4352 New bugs found: rapidsai#4663 dmlc/treelite#375 rapidsai#4670 Authors: - Rory Mitchell (https://github.com/RAMitchell) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#4671
This PR: - [x] Moves all pytest from `test` to `tests` - [x] Converts `test` module to `tests` directory - [x] Creates a `testing` module that contains all utility functions for pytests in `tests` need. Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - AJ Schmidt (https://github.com/ajschmidt8) - Jordan Jacobelli (https://github.com/Ethyling) - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#4703
In this PR, I'll wrap `dask-glm` models so that it accepts `dask_cudf Dataframes` and behaves like other `cuml.dask` models. `dask-glm` provides three estimators: `LogisticRegression`, `LinearRegression` and `PoissonRegression`. MNMG `LogisticRegression` is requested by @beckernick . @JohnZed for visibility. Thank you all. Authors: - Jiwei Liu (https://github.com/daxiongshu) - Nick Becker (https://github.com/beckernick) - Dante Gama Dessavre (https://github.com/dantegd) Approvers: - Ray Douglass (https://github.com/raydouglass) - AJ Schmidt (https://github.com/ajschmidt8) - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#3512
Authors: - Victor Lafargue (https://github.com/viclafargue) - Dante Gama Dessavre (https://github.com/dantegd) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#4706
xref: rapidsai/integration#471 Authors: - https://github.com/jakirkham Approvers: - Dante Gama Dessavre (https://github.com/dantegd) - William Hicks (https://github.com/wphicks) - Ray Douglass (https://github.com/raydouglass) URL: rapidsai#4720
This will close rapidsai#4931 Authors: - Nick Becker (https://github.com/beckernick) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#5023
Answers rapidsai#5095 Authors: - Victor Lafargue (https://github.com/viclafargue) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#5097
Closes rapidsai#5032 TODO: Add this docstring to doctest Authors: - Micka (https://github.com/lowener) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#5041
The KMeans C++ code is already tested in Raft. Closes rapidsai#5058 Authors: - Micka (https://github.com/lowener) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#5098
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Loving the updates so far! This is going to be awesome when we can swap out the FAISS calls for RAFT. I know this is still draft but I figured i'd leave some early feedback while I was looking through it.
cpp/src/knn/knn_opg_common.cuh
Outdated
params.rowMajorQuery, | ||
&start_indices_long, | ||
raft::distance::DistanceType::L2SqrtExpanded); | ||
raft::spatial::knn::brute_force_knn<std::int64_t, float, std::size_t>(handle, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may as well start using the raft::neighbors::brute_force::knn
here while we're updating this code. It shouldn't conflict w/ the raft::spatial::knn
namespace since that's typedef'd to raft::neighbors
now.
@@ -25,6 +25,7 @@ from cuml.neighbors.nearest_neighbors_mg import NearestNeighborsMG | |||
|
|||
from pylibraft.common.handle cimport handle_t | |||
from cuml.common.opg_data_utils_mg cimport * | |||
from cuml.metrics.distance_type cimport DistanceType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of having this defined in two places, why don't we start using pylibraft.distance.DistanceType
instead?
@@ -25,6 +25,7 @@ from cuml.neighbors.nearest_neighbors_mg import NearestNeighborsMG | |||
|
|||
from pylibraft.common.handle cimport handle_t | |||
from cuml.common.opg_data_utils_mg cimport * | |||
from cuml.metrics.distance_type cimport DistanceType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, pylibraft.
@@ -28,6 +28,7 @@ from cuml.neighbors import NearestNeighbors | |||
from pylibraft.common.handle cimport handle_t | |||
from cuml.common.opg_data_utils_mg cimport * | |||
from cuml.common.opg_data_utils_mg import _build_part_inputs | |||
from cuml.metrics.distance_type cimport DistanceType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pylibraft provides this (and also provides the enum) so we can cut down on redundancy here.
Fixes rapidsai#4906 Authors: - Micka (https://github.com/lowener) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#4996
**To Do (non-exhaustive list):** - [X] Get checks working - [x] Get cpp builds working - [x] Get cpp tests working - [x] Get conda python builds working - [x] Get pytest tests working - [x] Fix conda warning in cpp builds (e.g. `WARNING:conda_build.utils:Adding .* to spec 'treelite 3.0.1' to ensure satisfiability. Please consider putting {{ var_name }}.* or some relational operator (>/</>=/<=) on this spec in meta.yaml`) - [x] Get notebooks testing working - [x] Ensure dependency changes are correct and complete (e.g. does `libcuml` need anymore `libcu*` runtime dependencies?) ~~Future PRs~~ Also this PR (thanks Dante!): - [x] Implement `pytest-xdist` to parallelize Python tests - [x] Parallelize `cuml` and `cuml-dask` tests Authors: - Carl Simon Adorf (https://github.com/csadorf) - AJ Schmidt (https://github.com/ajschmidt8) - Dante Gama Dessavre (https://github.com/dantegd) - Micka (https://github.com/lowener) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) - AJ Schmidt (https://github.com/ajschmidt8) URL: rapidsai#5075
…sai#5101) This PR undoes some erroneous changes I made in rapidsai#5075. According to @bdice, it's necessary to `cd` into the relevant directories when running `pytest` to avoid issues with `codecov`. Authors: - AJ Schmidt (https://github.com/ajschmidt8) Approvers: - Sevag H (https://github.com/sevagh) - Bradley Dice (https://github.com/bdice) URL: rapidsai#5101
This PR enables the `Recently Updated` check, which is described here: https://docs.rapids.ai/resources/recently-updated/ Authors: - AJ Schmidt (https://github.com/ajschmidt8) Approvers: - Dante Gama Dessavre (https://github.com/dantegd)
Due to some limitations on the number of reusable workflows that can be used in a given GH Action workflow, I need to switch the nightly workflow files (`build.yaml` and `test.yaml`) to use `workflow_dispatch` instead of `workflow_call`. Because of this switch, we can also remove the hardcoded `repo` field in `{build,test}.yaml`.
…5108) This is a small cleanup to remove the outdated `PROJECT_FLASH` environment variable from the libcuml recipe. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - AJ Schmidt (https://github.com/ajschmidt8) - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#5108
This PR adds a missing dependency for the `pr-builder` job in the `pr.yaml` workflow. Authors: - Dante Gama Dessavre (https://github.com/dantegd) Approvers: - AJ Schmidt (https://github.com/ajschmidt8)
…i#5113) Authors: - Dante Gama Dessavre (https://github.com/dantegd) Approvers: - Corey J. Nolet (https://github.com/cjnolet) - Divye Gala (https://github.com/divyegala) - Ray Douglass (https://github.com/raydouglass) URL: rapidsai#5113
closes rapidsai#4986 Authors: - Divye Gala (https://github.com/divyegala) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#5102
Answers rapidsai#5106 Authors: - Victor Lafargue (https://github.com/viclafargue) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#5116
…#5117) Authors: - William Hicks (https://github.com/wphicks) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#5117
rapids-cmake 23.02 is deprecating the magic value of `ALL` since it doesn't cleanly map to the cmake magic value of `all`. Instead we use `RAPIDS` which better represents the architectures we are building for. Authors: - Robert Maynard (https://github.com/robertmaynard) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#5124
…5125) Small cleanup to remove `MACOSX_DEPLOYMENT_TARGET` references from a build script (macOS is not supported by RAPIDS). Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) - AJ Schmidt (https://github.com/ajschmidt8) URL: rapidsai#5125
It looks like there are still some attributes with `ptr` in their name that are being serialized with the hdbscan object and that's causing some segfaults when serialization is performed on a separate process. Closes rapidsai#4986 Authors: - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#5128
This PR adds pip wheel CI to the Conda CI, instead of having them work separately. Authors: - Sevag H (https://github.com/sevagh) - AJ Schmidt (https://github.com/ajschmidt8) - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - AJ Schmidt (https://github.com/ajschmidt8) - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#5109
Authors: - Carl Simon Adorf (https://github.com/csadorf) Approvers: - William Hicks (https://github.com/wphicks) URL: rapidsai#5026
Authors: - Divye Gala (https://github.com/divyegala) Approvers: - Victor Lafargue (https://github.com/viclafargue) - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#5137
We are removing the faiss ANN code in rapidsai/raft#1121, in favour of using the ivf_flat and ivf_pq implementations included with raft. After this change, RAFT can be updated to remove the faiss ANN methods - which is implemented in rapidsai/raft#1121 Note that this removes the `ivf_sq` option , since there is no corresponding implementation in raft. Closes rapidsai#5131 Authors: - Ben Frederickson (https://github.com/benfred) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#5119
When a worker has multiple index partitions, the query is run against each and results are merged as expected in the dense case. However, the sparse KNN do not seem to currently offer this feature. In addition, inputs and outputs indices are required to be of the same type in sparse KNN template, but the use of Because of this, some changes seem to be necessary in RAFT for this implementation to be completed. I listed them there : rapidsai/raft#1187. |
3bc1de0
to
e7fd6cc
Compare
This PR will :
NearestNeighbors
,KNeighborsClassifier
andKNeighborsRegressor
NearestNeighbors
Answers #3464