Skip to content

stefanjwojcik/julia_from_scratch

Repository files navigation

Julia from Scratch

A learning repository dedicated to the Julia Language. This is a short course for understanding basic critical data science techniques, and implementing them 'by hand' when at all possible.

The goal is to teach elegant customized programming that scales.

Chapter 1: The Digital Age

  • How the Digital Age differs from prior eras, and why it makes data science skills mandatory.

The Digital Age, also known as the Information Age, is characterized by a shift from traditional industry brought about by the Industrial Revolution to an economy based on information technology. The Digital Age is often considered to have begun in the latter half of the 20th century, with several key developments marking its onset. The transition into the Digital Age was gradual, marked by a series of technological advancements rather than a single event. The invention of the transistor at Bell Labs is a foundational event; and its subsequent miniaturization led to microprocessors that enabled wide adoption for computing. This burst in computing power might have had a more muted effect had it not been for the invention of the Internet in the late 1960's and 1970's, which enabled the sharing of data across computers.

It began in the late 20th century, thanks to the rapid advancement of digital computing and communication technologies. This era is marked by the widespread adoption of computers, the Internet, and digital technologies that transform how we create, store, share, and analyze information.

  • Basic occupational definitions: Data Science, Machine Learning Engineer, Data Engineer - what are these jobs?
  • Basic hardware terms: Terminal/bash, API, SQL, Docker, IDE.

Motivation: Understanding the news

  • There is so much information, but it's hard to know what is worth believing. To better understand the world, we collect data. And a lot of it.
  • Birdwatch example: misinformation - it's hard to know which Tweets you can trust, but we know that people react to tweets, so perhaps we can use those reactions to figure out whether a Tweet is trustworthy or not. We use raters and notes - how to know which notes are reliable?
    • Looking at the matrix of notes and raters, and creating a note quality score.

Chapter 2: The Tools you need

  • Julia programming (loops, types, broadcasting, regex, CSV, DataFrames, Functions, Packages, compilation and speed).
  • SQL (select, group by, where, join)
  • Bash (cd, ls, mv, ssh, pip)
  • Docker (build, run, push)
  • Which IDE you need (VSCODE)

Chapter 3: Seeing data and Data Visualization

  • Loading a csv and looking at key observations
  • Plotting w Gadfly

Chapter 4: Statistical Significance

  • Basic understanding of statistics, distributions
  • Concepts: distributions, mean, variance, differences in means, t-tests, correlation, causal inference

Chapter 5: Non-Neural ML Algorithms

  • Machine learning models and their forms

Chapter 6: Neural ML Algorithms

  • Perceptron
  • DNN
  • RNN
  • CNN

Chapter 7: Generative AI

  • Basic differences between ordinary neural networks and what drives generative AI.
  • Calling the OpenAI API to make an app.
  • Querying documents.

Chapter 8: Other Skills

  • Bash
  • Docker
  • SSH
  • Github
  • SQL

About

A learning repository dedicated to the Julia Language.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published