Quartz sync: Feb 10, 2024, 10:40 PM

AlphaGit · Feb 11, 2024 · 3b4ad7c · 3b4ad7c
1 parent 39b376e
commit 3b4ad7c
Show file tree

Hide file tree

Showing 19 changed files with 202 additions and 18 deletions.
diff --git a/content/ai/Activation functions.md b/content/ai/Activation functions.md
@@ -5,7 +5,6 @@ tags:
 - activation
 - neural networks
 ---
-
 Activation functions are a part of a "neuron" in a neural network. It introduces a non-linearity so that the network can learn more than linear (or polynomial) relationships in the input-to-output data.
 
 It is called "activation function" because it decides how much that particular neuron will participate in the generation of the output.
@@ -23,15 +22,12 @@ None of these rules are unbreakable, but good guidelines.
 
 Example of activation functions:
 
-- ReLU (Rectified Linear Unit): $f(x) = max({0, x})$
-- Binary/Step: $$\begin{split}f(x) = \begin{cases}
-       0, & \text {if } x < 0, \\
-       1, & \text{if } x \ge 0
-\end{cases}\end{split}$$
-- Sigmoid: $f(x) = \frac{1}{1 + e^{-x}}$
-- Linear: $f(x) = x$
-- Hyperbolic tangent function (tanh): $f(x) = \frac{1-e^{-x}}{1 + e^{-x}}$
-- Softmax: $f(X) = \frac{e^{x_i}}{\sum{e^{x_i}}}$
+- [[ReLU]]
+- [[Binary or Step function]]
+- [[Sigmoid]]
+- [[Linear function]]
+- [[Hyperbolic tangent function (tanh)]]
+- [[Softmax]]
 
 ## Sources
 

diff --git a/content/ai/Binary or Step function.md b/content/ai/Binary or Step function.md
@@ -0,0 +1,13 @@
+---
+title: Binary or Step function
+tags:
+  - activation
+  - ai
+  - neural_networks
+---
+[[Activation functions|Activation function]], mostly used in [[neural networks]]. 
+
+$$\begin{split}f(x) = \begin{cases}
+       0, & \text {if } x < 0, \\
+       1, & \text{if } x \ge 0
+\end{cases}\end{split}$$
diff --git a/content/ai/Extreme Learning Machines.md b/content/ai/Extreme Learning Machines.md
@@ -6,8 +6,7 @@ tags:
   - science
   - papers
 ---
-
-Extreme Learning Machines are single hidden-layer feed-foreward [[neural networks]]. They are one of the [[neural network]] approaches to timeseries forecasting (opposed to [[statistical timeseries forecasting]]).
+Extreme Learning Machines are single hidden-layer feed-forward [[neural networks]]. They are one of the [[neural network]] approaches to timeseries forecasting (opposed to [[statistical timeseries forecasting]]).
 
 Original paper by Hung et al, 2004. 
 
@@ -21,9 +20,10 @@ The training process consists of these steps:
 
 In short, with a single shot we can avoid the multi-step process of iterative training and the backpropagation algorithm that is usually used with feed-forward neural networks.
 
-The tuning of the network will mostly be around its hyperparameters:
+The tuning of the network will mostly be around its [[Hyperparameters|hyperparameters]]:
+
 - Hidden layer size
-- Selection of activation function
+- Selection of [[Activation functions|activation function]]
 - Selection of input sources
 - Selection of the distribution for random values used in the initialization step
 

diff --git a/content/ai/Hyperbolic tangent function (tanh).md b/content/ai/Hyperbolic tangent function (tanh).md
@@ -0,0 +1,10 @@
+---
+title: Hyperbolic tangent function (tanh)
+tags:
+  - ai
+  - activation
+  - neural_networks
+---
+[[Activation functions|Activation function]], mostly used in [[neural networks]]. 
+
+$$f(x) = \frac{1-e^{-x}}{1 + e^{-x}}$$
diff --git a/content/ai/Hyperparameters.md b/content/ai/Hyperparameters.md
@@ -0,0 +1,19 @@
+---
+title: Hyperparameters
+tags:
+  - ml
+  - ai
+---
+Since ML algorithms have the capability to learn the parameters that make them operate, the parameters that regulate how the learning takes place are named "hyper"-parameters.
+
+Examples:
+
+- [[Learning rate]]
+- [[Epochs]]
+- [[Early Stop]]
+- [[Generation size]]
+- [[Mutation rate]]
+- [[Number of clusters]]
+- [[Neural network architecture]]
+- [[Activation functions|Activation function]]
+- [[Weight initialization values]]
diff --git a/content/ai/LLM Speed Performance.md b/content/ai/LLM Speed Performance.md
@@ -63,10 +63,12 @@ prefill = torch.compile(
 ### 4. Improve the attention mechanism
 
 The attention mechanism (that picks the right token based on the changing context) is also a quadratic algorithm. All tokens attend to all tokens, leading to $N^2$ scaling.
-#### 3.1. Use vLLM or paged-attention
+#### 3.1. Use vLLM or paged attention
 
 Both these techniques work as a middle-step between the memory available and the memory required. It's very similar to another level of memory where chunks of it are loaded in the device while the rest of it is paged and kept in a lower level.
 
+[[vLLM]]
+
 #### 3.2. FlashAttention
 
 Instead of storing the full attention matrix in the HBM, do block-wise computation of the dot product, such that all the computation is performed in the L2 cache.

diff --git a/content/ai/Linear function.md b/content/ai/Linear function.md
@@ -0,0 +1,10 @@
+---
+title: Linear function
+tags:
+  - ai
+  - neural_networks
+  - activation
+---
+Very simple [[activation functions|Activation function]], mostly used in [[neural networks]]. 
+
+$$f(x) = x$$
diff --git a/content/ai/ReLU.md b/content/ai/ReLU.md
@@ -0,0 +1,12 @@
+---
+title: ReLU
+tags:
+  - activation
+  - ai
+  - neural_networks
+---
+[[Activation functions|Activation function]], mostly used in [[neural networks]]. 
+
+ReLU stands for "Rectified Linear Unit".
+
+$$f(x) = max({0, x})$$
diff --git a/content/ai/Sigmoid.md b/content/ai/Sigmoid.md
@@ -0,0 +1,10 @@
+---
+title: Sigmoid
+tags:
+  - activation
+  - neural_networks
+  - ai
+---
+[[Activation functions|Activation function]], mostly used in [[neural networks]]. 
+
+$$f(x) = \frac{1}{1 + e^{-x}}$$
diff --git a/content/ai/Softmax.md b/content/ai/Softmax.md
@@ -0,0 +1,10 @@
+---
+title: Softmax
+tags:
+  - activation
+  - ai
+  - neural_networks
+---
+[[Activation functions|Activation function]], mostly used in [[neural networks]]. 
+
+$$f(X) = \frac{e^{x_i}}{\sum{e^{x_i}}}$$
diff --git a/content/ai/benchmarks/MTEB.md b/content/ai/benchmarks/MTEB.md
@@ -9,7 +9,7 @@ tags:
   - embedding
   - papers
 ---
-> MTEB consists of 58 datasets covering 112 languages from 8 embedding tasks: Bitext mining, classification, clustering, pair classification, reranking, retrieval, STS and summarization.
+> MTEB consists of 58 datasets covering 112 languages from 8 embedding tasks: Bitext mining, classification, [[clustering]], pair classification, reranking, retrieval, STS and summarization.
 
 ## Tasks
 
@@ -34,6 +34,4 @@ Code available at: https://github.com/embeddings-benchmark/mteb
 
 Leaderboard in HuggingFace: https://huggingface.co/spaces/mteb/leaderboard
 
-Other [[NLP Benchmarks]]
-
 [^MTEB]: [MTEB: Massive Text Embedding Benchmark](https://arxiv.org/pdf/2210.07316.pdf)
diff --git a/content/ai/llms/vLLM.md b/content/ai/llms/vLLM.md
@@ -0,0 +1,23 @@
+---
+title: vLLM
+tags:
+  - ai
+  - llm
+---
+ vLLM utilizes **PagedAttention**, our new attention algorithm that effectively manages attention keys and values. vLLM equipped with PagedAttention redefines the new state of the art in LLM serving: it delivers up to 24x higher throughput than HuggingFace Transformers, without requiring any model architecture changes.
+
+ In the autoregressive decoding process, all the input tokens to the LLM produce their attention key and value tensors, and these tensors are kept in GPU memory to generate next tokens. These cached key and value tensors are often referred to as KV cache. The KV cache is
+
+- _Large:_ Takes up to 1.7GB for a single sequence in LLaMA-13B.
+- _Dynamic:_ Its size depends on the sequence length, which is highly variable and unpredictable.
+
+PagedAttention partitions the KV cache of each sequence into blocks, each block containing the keys and values for a fixed number of tokens. During the attention computation, the PagedAttention kernel identifies and fetches these blocks efficiently.
+
+PagedAttention has another key advantage: efficient memory sharing. For example, in _parallel sampling_, multiple output sequences are generated from the same prompt.
+
+PageAttention’s memory sharing greatly reduces the memory overhead of complex sampling algorithms, such as parallel sampling and beam search, cutting their memory usage by up to 55%.
+
+## Sources
+
+- [vLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention](https://blog.vllm.ai/2023/06/20/vllm.html)
+
diff --git a/content/architecture/API Gateway.md b/content/architecture/API Gateway.md
@@ -0,0 +1,23 @@
+---
+title: API Gateway
+tags:
+  - architecture
+---
+An API gateway acts as a single entry point for client requests. The API gateway is responsible for request routing, composition, and protocol translation. It also provides additional features like authentication, authorization, caching, and rate limiting.
+
+The API Gateway:
+
+- parses and validates the attributes in the HTTP request.
+- checks allow/deny lists.
+- authenticates and authorizes through an identity provider.
+- applies rate-limiting rules.
+- routes the request to the relevant backend service by path matching.
+- transforms the request into the appropriate protocol and forwards it to backend microservices.
+- handles any errors that may arise during request processing for graceful degradation of service.
+- implements resiliency patterns like [[circuit brakes]] to detect failures and prevent overloading interconnected services, avoiding cascading failures.
+- utilizes observability tools for logging, monitoring, tracing, and debugging.
+- can optionally cache responses to common requests to improve responsiveness.
+
+The API gateway is different from a load balancer. While both handle network traffic, the API gateway operates at the application layer, mainly handling HTTP requests; the load balancer mostly operates at the transport layer.[^bbg]
+
+[^bbg]: [ByteByteGo: 6 More Microservices Interview Questions](https://blog.bytebytego.com/p/6-more-microservices-interview-questions)
diff --git a/content/architecture/API Key Authentication.md b/content/architecture/API Key Authentication.md
@@ -0,0 +1,9 @@
+---
+title: API Key Authentication
+tags:
+  - security
+  - http
+---
+Assigns unique keys to users or applications, sent in headers or parameters; while simple, it might lack the security features of token-based or OAuth methods.[^bbg91]
+
+[^bbg91]: [ByteByteGo EP91: REST API Authentication Methods](https://blog.bytebytego.com/p/ep91-rest-api-authentication-methods)
diff --git a/content/architecture/Basic Authentication.md b/content/architecture/Basic Authentication.md
@@ -0,0 +1,9 @@
+---
+title: Basic Authentication
+tags:
+  - http
+  - security
+---
+Involves sending a username and password with each request, but can be less secure without encryption.
+
+[^bbg91]: [ByteByteGo EP91: REST API Authentication Methods](https://blog.bytebytego.com/p/ep91-rest-api-authentication-methods)
diff --git a/content/architecture/OAuth Authentication.md b/content/architecture/OAuth Authentication.md
@@ -0,0 +1,9 @@
+---
+title: OAuth Authentication
+tags:
+  - security
+  - http
+---
+Enables third-party limited access to user resources without revealing credentials by issuing access tokens after user authentication.[^bbg91]
+
+[^bbg91]: [ByteByteGo EP91: REST API Authentication Methods](https://blog.bytebytego.com/p/ep91-rest-api-authentication-methods)
diff --git a/content/architecture/REST Authentication methods.md b/content/architecture/REST Authentication methods.md
@@ -0,0 +1,10 @@
+---
+title: REST Authentication methods
+tags:
+  - http
+  - security
+---
+- [[Basic Authentication]]
+- [[Token Authentication]]
+- [[OAuth Authentication]]
+- [[API Key Authentication]]
diff --git a/content/architecture/Token Authentication.md b/content/architecture/Token Authentication.md
@@ -0,0 +1,9 @@
+---
+title: Token Authentication
+tags:
+  - security
+  - http
+---
+Uses generated tokens, like JSON Web Tokens (JWT), exchanged between client and server, offering enhanced security without sending login credentials with each request.[^bbg91]
+
+[^bbg91]: [ByteByteGo EP91: REST API Authentication Methods](https://blog.bytebytego.com/p/ep91-rest-api-authentication-methods)
diff --git a/content/databases/Redis AOF (Append Only File).md b/content/databases/Redis AOF (Append Only File).md
@@ -0,0 +1,12 @@
+---
+title: Redis AOF (Append Only File)
+tags:
+  - redis
+  - databases
+---
+AOF persistence logs every write operation received by the server. These operations can then be replayed again at server startup, reconstructing the original dataset. Commands are logged using the same format as the Redis protocol itself.[^redis]
+
+Unlike a write-ahead log, the Redis AOF log is a write-after log. Redis executes commands to modify the data in memory first and then writes it to the log file.[^bbg91]
+
+[^bbg91]: [ByteByteGo EP91: REST API Authentication Methods](https://blog.bytebytego.com/p/ep91-rest-api-authentication-methods)
+[^redis]: [Redis persistence](https://redis.io/docs/management/persistence/)