Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix broken links in Topic Modeling notebook #33

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
8 changes: 4 additions & 4 deletions nbs/2. Topic Modeling with NMF and SVD.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -100,13 +100,13 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"- [Data source](http://scikit-learn.org/stable/datasets/twenty_newsgroups.html): Newsgroups are discussion groups on Usenet, which was popular in the 80s and 90s before the web really took off. This dataset includes 18,000 newsgroups posts with 20 topics.\n",
"- [Data source](https://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data.html#loading-the-20-newsgroups-dataset): Newsgroups are discussion groups on Usenet, which was popular in the 80s and 90s before the web really took off. This dataset includes 18,000 newsgroups posts with 20 topics.\n",
"- [Chris Manning's book chapter](https://nlp.stanford.edu/IR-book/pdf/18lsi.pdf) on matrix factorization and LSI \n",
"- Scikit learn [truncated SVD LSI details](http://scikit-learn.org/stable/modules/decomposition.html#lsa)\n",
"\n",
"### Other Tutorials\n",
"- [Scikit-Learn: Out-of-core classification of text documents](http://scikit-learn.org/stable/auto_examples/applications/plot_out_of_core_classification.html): uses [Reuters-21578](https://archive.ics.uci.edu/ml/datasets/reuters-21578+text+categorization+collection) dataset (Reuters articles labeled with ~100 categories), HashingVectorizer\n",
"- [Text Analysis with Topic Models for the Humanities and Social Sciences](https://de.dariah.eu/tatom/index.html): uses [British and French Literature dataset](https://de.dariah.eu/tatom/datasets.html) of Jane Austen, Charlotte Bronte, Victor Hugo, and more"
"- [Scikit-Learn: Out-of-core classification of text documents](http://scikit-learn.org/stable/auto_examples/applications/plot_out_of_core_classification.html): uses [Reuters-21578](https://archive.ics.uci.edu/dataset/137/reuters+21578+text+categorization+collection) dataset (Reuters articles labeled with ~100 categories), HashingVectorizer\n",
"- [Text Analysis with Topic Models for the Humanities and Social Sciences](https://github.com/DARIAH-DE/tatom): uses [British and French Literature dataset](https://github.com/DARIAH-DE/tatom/tree/develop/data) of Jane Austen, Charlotte Bronte, Victor Hugo, and more"
]
},
{
Expand Down Expand Up @@ -418,7 +418,7 @@
"The SVD algorithm factorizes a matrix into one matrix with **orthogonal columns** and one with **orthogonal rows** (along with a diagonal matrix, which contains the **relative importance** of each factor).\n",
"\n",
"<img src=\"images/svd_fb.png\" alt=\"\" style=\"width: 80%\"/>\n",
"(source: [Facebook Research: Fast Randomized SVD](https://research.fb.com/fast-randomized-svd/))\n",
"(source: [Facebook Research: Fast Randomized SVD](https://research.facebook.com/blog/2014/9/fast-randomized-svd/))\n",
"\n",
"SVD is an **exact decomposition**, since the matrices it creates are big enough to fully cover the original matrix. SVD is extremely widely used in linear algebra, and specifically in data science, including:\n",
"\n",
Expand Down