Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial topic ideas #19

Open
HugoGranstrom opened this issue Jun 10, 2021 · 26 comments
Open

Initial topic ideas #19

HugoGranstrom opened this issue Jun 10, 2021 · 26 comments

Comments

@HugoGranstrom
Copy link
Member

Let's brainstorm ideas for the articles we would want to see here eventually. And then when we have a decent amount of ideas we can start to get a sense of how best to structure the content topic-wise.

Here's some on top of my head (and a tad bit leaning toward Numericalnim...):

  • Numerical integration (1D, both scalar and cumulative)
  • Interpolation (1D, 2D, 3D)
  • ODEs (IVP)
  • Plotting
  • Matrices/Tensors

If you have any topic you think would need a specific article (a specific kind of plotting like bar plots for example) go ahead and add it to your list as well.

Let the brainstorming begin!

@HugoGranstrom

This comment has been minimized.

@pietroppeter
Copy link
Contributor

pietroppeter commented Jun 10, 2021

  • statistical learning/Machine Learning algorithms (linear regression, logistic regression, k-means clustering, decision trees, random forests, SVM, neural networks...) but also dimensionality reduction, feature engineering, model evaluation (basically the type of stuff that https://scikit-learn.org/stable/ provides for python)
  • data wrangling for dataframes: filtering, sorting, grouping, ... the kind of features that pandas/dplyr provide to Python/R respectively (see for a short example: https://gist.github.com/conormm/fd8b1980c28dd21cfaf6975c86c74d07)

@pietroppeter
Copy link
Contributor

see also for other ideas the Meta issue "are we scientists yet?": nim-lang/needed-libraries#77

@HugoGranstrom

This comment has been minimized.

@Vindaar
Copy link
Member

Vindaar commented Jun 10, 2021

data wrangling for dataframes: filtering, sorting, grouping, ... the kind of features that pandas/dplyr provide to Python/R respectively (see for a short example:
https://gist.github.com/conormm/fd8b1980c28dd21cfaf6975c86c74d07)

I can write a conversion of that using the ggplotnim DataFrame. It is a dplyr inspired syntax after all and seems like a good general overview.

As other topics I would add:

  • curve fitting
  • more general non-linear optimization problems
  • physicsy computations aided by unit checking
  • and more I can't think of right now :)

edit: for the time being the direct conversion exists here:

https://gist.github.com/Vindaar/6908c038707c7d8293049edb3d204f84

I'll write a derived version as its own getting-started page.

@pietroppeter

This comment has been minimized.

@HugoGranstrom

This comment has been minimized.

@HugoGranstrom

This comment has been minimized.

@Clonkk

This comment has been minimized.

@HugoGranstrom

This comment has been minimized.

@HugoGranstrom
Copy link
Member Author

Idea from discord: tutorials specifically aimed at users of libraries in other languages. For exampel "Datamancer for Pandas developer" and "Arraymancer for Numpy developers". Alternatively "Nim for Pandas/Numpy developer" if we don't want to tie it to specific Nim libraries. They should mention the likenesses and differences between the Nim and Python/R/etc libraries and it wouldn't hurt having a section where a simple/intermidiate program is ported to Nim with a line-by-line explaination.

@HugoGranstrom

This comment has been minimized.

@Clonkk

This comment has been minimized.

@xioren
Copy link

xioren commented Jun 15, 2021

Well I have had impulse downloaded on my computer for months but have yet to really learn it. (Subjectively) I think a tutorial on working with impulse, fft/dct and images would be useful.

@Araq
Copy link

Araq commented Jun 15, 2021

Deep learning. In particular, how could you write something like this in SciNim: https://github.com/numenta/numenta-apps, the "sparse networks" ideas are very interesting. See also

https://numenta.com/neuroscience-research/research-publications/papers/sparsity-enables-100x-performance-acceleration-deep-learning-networks

@Clonkk
Copy link
Member

Clonkk commented Jun 15, 2021

@al6x
Copy link

al6x commented Jun 17, 2021

I think it would be useful to replicate some of the most popular python introductionary notebooks. With lots of visuals and simple math. Seems like Titanic Tutorial is quite good and popular, and exists in Python and R versions.

It's easier for people to learn when they already know some part of a new thing. So maybe some people from Python and R communities well be more inclined to try Nim for something they already knew.

@HugoGranstrom
Copy link
Member Author

Inspired by the answers in this forum post we should have a tutorial on how to easily input unicode characters on the different OSes and editors.

@bung87
Copy link

bung87 commented Jun 21, 2021

About Deep learning, Take some example from https://d2l.ai

@al6x
Copy link

al6x commented Jul 7, 2021

After thinking, I'm taking back my suggestion about Titanic dataset. Maybe analysis of movies would be more interesting. Because the classical tutorials about Iris or Titanic, are boring, nobody know anything about it or cares.

But dataset about movies are interesting. Everyone watch movies. And there are tons of data - genres, actors, ratigns, popularity, reviews, maybe even texts of lyrics to showcase NLP. That kind of stuff is interesting.

@pietroppeter
Copy link
Contributor

today I ran into this free book "Probability 4 data science" which has code snippets in Matlab, Python, Julia, R. It could be nice to try and reproduce what we can with Nim (I guess we would discover some gaps to be filled). Example of code from first chapter: https://probability4datascience.com/python01.html

@al6x
Copy link

al6x commented Oct 22, 2021

I recently saw a very nicely done interactive course in Julia Introduction to Computational Thinking.

It's made with Jupyther-like notebook thing, with all the examples and code interactive and could be changed online. Looks really nice.

@kerrycobb
Copy link

I started writing a tutorial demonstrating how to infer parameters of a linear model using Bayesian inference. Would there be any interest in including it here when it's finished? If so, I would welcome any suggestions.
You can see it here: https://kerrycobb.github.io/nim-bayes/

@pietroppeter
Copy link
Contributor

I love it, I would say definitely yes :)

As for suggestions, the code seems complete and straight to the point, I would go in the direction of expanding explanations (references to learn more about Bayesian linear regression and MCMC, explain the existence of a distribution package - which is not in stdlib, explain that we do step by step, explain there is not a MCMC library but we can code it from scratch, what do the different plots tell us in terms of what we expected and what we see...).

On top of explanations, I love the choice of parameters for the simplest case possible, would it be worth exploring at least another case (to see how things change...)? In the future I hope it will be easy to do with nimib interactive stuff like your other JavaScript repo on exploring priors and posteriors.

Gotta say I am still so happy when I see a new document produced with nimib, cannot still shake the surprise and excitement of seeing unkown people actually using it 🤩.

On that topic since I could peek easily the code, a very minor detail I see is the unnecessary line on mathjax_support added to context (residual for experimenting with mathjax?).

Finally, as a general suggestion for this thread, we could definitely use a tutorial for doing simple linear regression (I should actually do that myself!).

@pietroppeter
Copy link
Contributor

pietroppeter commented Nov 19, 2021

Just because I recently ran into it (through Gelman's blog) and is related to our recent discussion, here is a nice explanation of the advantages of Bayesian linear regression in the applied context of media mix modelling: https://getrecast.com/bayesian-methods-for-mmm/

This serves also as a reminder that the usual statistical way to present linear regression, calling it OLS and focusing on inference instead of prediction, comes with a bunch of associated statistical metrics and it is something basic that afaik is still missing in our ecosystem.

@Vindaar
Copy link
Member

Vindaar commented Feb 6, 2022

@kerrycobb Just skimmed over your tutorial again and saw the following:

TODO: Figure out why this isn't plotting correctly

var standardized = seqsToDf(stX, stY)
ggplot(standardized, aes("x", "y")) + geom_point() +
    ggsave("images/st-simulated-data.png")

The reason is simply that seqsToDf without explicit keys generates a DF with keys of the names of the variables. In the aes call within ggplot you then hand the strings "x" and "y", which should be "stX" and "stY".

And I'd love for this to be included!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants