Skip to content

Commit

Permalink
Merge pull request #3677 from rcurtin/pca-doc
Browse files Browse the repository at this point in the history
Document `PCA`
  • Loading branch information
rcurtin committed Apr 9, 2024
2 parents 8f722ef + 787e14b commit 6586ed6
Show file tree
Hide file tree
Showing 45 changed files with 1,272 additions and 464 deletions.
2 changes: 2 additions & 0 deletions HISTORY.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,8 @@
* Fix floating-point accuracy issue for decision trees that sometimes caused
crashes (#3595).

* Allow PCA to take different matrix types (#3677).

### mlpack 4.3.0
###### 2023-11-27
* Fix include ordering issue for `LinearRegression` (#3541).
Expand Down
1 change: 1 addition & 0 deletions doc/css/gfm-mod.css
Original file line number Diff line number Diff line change
Expand Up @@ -1062,6 +1062,7 @@ div#sidebar {
top: 5px;
min-width: 200px;
font-size: 90%;
max-width: 15.1515%;
}

div#sidebar ul {
Expand Down
2 changes: 1 addition & 1 deletion doc/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -120,7 +120,7 @@ Prepare data for machine learning algorithms.

Transform data from one space to another.

<!-- TODO: add some -->
* [`PCA`](user/methods/pca.md): principal components analysis

### Modeling utilities

Expand Down
17 changes: 14 additions & 3 deletions doc/sidebar.html
Original file line number Diff line number Diff line change
Expand Up @@ -172,9 +172,20 @@

<!-- Transformations -->
<li>
<a href="LINKROOTindex.html#transformations">
Transformations
</a>
<details>
<summary>
<a href="LINKROOTindex.html#transformations">
Transformations
</a>
</summary>
<ul>
<li>
<a href="LINKROOTuser/methods/pca.html">
<code>PCA</code>
</a>
</li>
</ul>
</details>
</li>

<!-- Modeling utilities -->
Expand Down
80 changes: 77 additions & 3 deletions doc/user/core.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,9 @@ classes, each of which are documented on this page.
mlpack provides a number of additional mathematical utility classes and
functions on top of Armadillo.

* [Aliases](#aliases): utilities to create and manage aliases (`MakeAlias()`,
`ClearAlias()`, `UnwrapAlias()`).

* [`Range`](#range): simple mathematical range (i.e. `[0, 3]`)

* [`ColumnCovariance()`](#columncovariance): compute covariance of
Expand All @@ -31,9 +34,6 @@ functions on top of Armadillo.
* [Logarithmic utilities](#logarithmic-utilities): `LogAdd()`, `AccuLog()`,
`LogSumExp()`, `LogSumExpT()`.

<!-- TODO: do something with MakeAlias(); but it needs to be refactored first
-->

* [`MultiplyCube2Cube()`](#multiplycube2cube): multiply each slice in a cube by each slice in another cube
* [`MultiplyMat2Cube()`](#multiplymat2cube): multiply a matrix by each slice in a cube
* [`MultiplyCube2Mat()`](#multiplycube2mat): multiply each slice in a cube by a matrix
Expand All @@ -47,6 +47,80 @@ functions on top of Armadillo.

---

### Aliases

Aliases are matrix, vector, or cube objects that share memory with another
matrix, vector, or cube. They are often used internally inside of mlpack to
avoid copies.

***Important caveats about aliases***:

- An alias represents the same memory block as the input. As such, changes to
the alias object will also be reflected in the original object.

- The `MakeAlias()` function is not guaranteed to return an alias; it only
returns an alias *if possible*, and makes a copy otherwise.

- If `mat` goes out of scope or is destructed, then `a` ***becomes invalid***.
_You are responsible for ensuring an invalid alias is not used!_

---

* `MakeAlias(a, mat, rows, cols, strict=true)`
- Make `a` into an alias of `mat` with the given size.
- If `strict` is `true`, the size of `a` cannot be changed.
- `mat` and `a` should have the same matrix type (e.g. `arma::mat`,
`arma::fmat`, `arma::sp_mat`).
- If an alias cannot be created, the matrix will be copied. Sparse types
cannot have aliases and will be copied.

* `MakeAlias(a, cube, rows, cols, slices, strict=true)`
- Make `a` into an alias of `cube` with the given size.
- If `strict` is `true`, the size of `a` cannot be changed.
- `cube` and `a` should have the same matrix type (e.g. `arma::cube`,
`arma::fcube`).
- If an alias cannot be created, the matrix will be copied.

* `MakeAlias(a, memptr, rows, cols, strict=true)`
- Make `a` into an alias of the memory block starting at `memptr` of size
`rows` by `cols`.
- The memory at `memptr` should be arranged in a [column-major
ordering](matrices.md#representing-data-in-mlpack).
- If `strict` is `true`, the size of `a` cannot be changed.
- `a` should be a dense matrix type (e.g. `arma::mat`, `arma::fmat`), and
`memptr` should be a non-const pointer of the matrix's element type (e.g.
`double*`, `float*`).

* `MakeAlias(a, memptr, rows, cols, slices, strict=true)`
- Make `a` into an alias of the memory block starting at `memptr` of size
`rows` by `cols` by `slices`.
- The memory at `memptr` should be arranged in a [column-major
ordering](matrices.md#representing-data-in-mlpack).
- If `strict` is `true`, the size of `a` cannot be changed.
- `a` should be a cube type (e.g. `arma::cube`, `arma::fcube`), and `memptr`
should be a non-const pointer of the matrix's element type (e.g. `double*`,
`float*`).

---

* `ClearAlias(a)`
- If `a` is an alias, reset `a` to an empty matrix, without modifying the
aliased memory. `a` is no longer an alias after this call.

---

* `UnwrapAlias(a, in)`
- If `in` is a matrix type (e.g. `arma::mat`), make `a` into an alias of
`in`.
- If `in` is not a matrix type, but instead, e.g., an Armadillo expression,
fill `a` with the results of the evaluated expression `in`.
- This can be used in place of, e.g., `a = in`, to avoid a copy when
possible.
- `a` should be a matrix type that matches the type of the expression or
matrix `in`.

---

### `Range`

The `Range` class represents a simple mathematical range (i.e. `[0, 3]`),
Expand Down
4 changes: 2 additions & 2 deletions doc/user/load_save.md
Original file line number Diff line number Diff line change
Expand Up @@ -157,7 +157,7 @@ With a `data::DatasetInfo` object, categorical data can be loaded:
contents of the file:
- `.csv`, `.tsv`, or `.txt` for CSV/TSV (tab-separated)/ASCII
(space-separated)
- `.arff` for [ARFF](https://www.cs.waikato.ac.nz/~ml/weka/arff.html)
- `.arff` for [ARFF](https://ml.cms.waikato.ac.nz/weka/arff.html)
* `matrix` is an `arma::mat&`, `arma::Mat<size_t>&`, or similar (e.g., a
reference to an Armadillo object that data will be loaded into or saved
Expand Down Expand Up @@ -723,7 +723,7 @@ The format of mixed categorical data is detected automatically based on the
file extension and inspecting the file contents:
- `.csv`, `.txt`, or `.tsv` indicates CSV/TSV/ASCII format
- `.arff` indicates [ARFF](https://www.cs.waikato.ac.nz/~ml/weka/arff.html)
- `.arff` indicates [ARFF](https://ml.cms.waikato.ac.nz/weka/arff.html)
---
Expand Down

0 comments on commit 6586ed6

Please sign in to comment.