Quartz sync: Mar 2, 2024, 10:02 AM

AlphaGit · Mar 2, 2024 · 723609f · 723609f
1 parent e4bfa26
commit 723609f
Show file tree

Hide file tree

Showing 4 changed files with 164 additions and 2 deletions.
diff --git a/content/Bash Expansions.md b/content/Bash Expansions.md
@@ -0,0 +1,124 @@
+---
+title: Bash Expansions
+tags:
+  - command_line
+  - bash
+---
+```bash
+variable="value"
+echo ${variable} # → value
+echo $variable # → value
+```
+
+Curly braces are used to delimit the variable.
+
+```bash
+echo ${variable}_ # → value_
+echo $variable_ # → (empty)
+
+set -o nounset
+echo $variable_ # exits with error "unbound variable"
+```
+
+## Variable indirection
+
+```bash
+variable="HOME"
+echo ${!variable} # /usr/alpha
+```
+
+## Uppercase
+
+```bash
+variable="value"
+
+echo ${variable} # → value
+echo ${variable^} # → Value
+echo ${variable^^} # → VALUE
+```
+
+## Lowercase
+
+```bash
+variable="VALUE"
+echo ${variable} # → VALUE
+echo ${variable,} # → vALUE
+echo ${variable,,} # → value
+```
+
+## Pattern matching
+
+```bash
+variable="VALUE"
+echo ${variable,,[VAL]} # → valUE
+echo ${variable,,[LUE]} # → VAlue
+
+variable="value"
+echo ${variable^^[va]} # → VAlue
+```
+
+## Arrays
+
+```bash
+variables=(one two three)
+echo ${variables[@]} # → one two three
+echo ${variables[@]^} # → One Two Three
+echo ${variables[@]^^} # → ONE TWO THREE
+echo ${variables[@]^^[oe]} # → OnE twO thrEE
+echo ${variables[*]^^[oe]} # → OnE twO thrEE
+
+echo ${variables[2]^^} # → THREE
+```
+
+## Substring removal
+
+### From begginning of string
+
+```bash
+variable="value1 value2"
+echo ${variable#va} # → lue1 value2
+echo ${variable#val} # → ue1 value2
+echo ${variable#*l} # → ue1 value2
+echo ${variable##val} # → ue1 value2
+echo ${variable##*l} # → ue2
+```
+
+### From end of string
+
+```bash
+variable="value1 value2"
+echo ${variable%lue2} # → value1 va
+echo ${variable%al*} # → value1 v
+echo ${variable%*l*} # → value1 va
+```
+
+## String replacement
+
+```bash
+echo ${variable/value/example} # → example1 value2
+echo ${variable//#value/example} # → example1 example2
+echo ${variable/#value/example} # → example1 value2
+echo ${variable/value*/example} # → example
+echo ${variable/%value2/example} # → value1 example
+```
+
+## Substring
+
+```bash
+echo ${variable:4} # → e1 value2
+echo ${variable:2:3} # → lue
+# The space before the - sign is required
+echo ${variable: -3} # → ue2
+echo ${variable:1: -1} # → alue1 value
+echo ${variable: -3: -1} # → ue
+```
+
+## Length
+
+```bash
+echo ${#variable} # → 13
+echo ${#variables[@]} # → 3
+echo ${#variables[2]} # → 5
+```
+
+[^linuxConfig]: [Introduction to Bash Shell Parameter Expansions](https://linuxconfig.org/introduction-to-bash-shell-parameter-expansions)
diff --git a/content/Polyphasic Sleep.md b/content/Polyphasic Sleep.md
@@ -0,0 +1,9 @@
+---
+title: Polyphasic Sleep
+tags:
+  - health
+---
+Polyphasic sleep is the practice of sleeping in numerous smaller blocks during the day as opposed to sleeping once as is common in many countries.[^PS]
+
+[^PS]: https://www.polyphasic.net/
+
diff --git a/content/ai/LLM Speed Performance.md b/content/ai/LLM Speed Performance.md
@@ -38,7 +38,9 @@ Increasing the batch size will improve the throughput of the model and uses the
 > To batch generation, we pass the model multiple sequences at once, generating a completion for each in the same forward pass. This requires the sequences to be padded on either the left or right with filler tokens to equal length. The padding tokens are masked in the attention mask so that they don't influence generation.
 
 This will improve the throughput and improve the use of the hardware but might increase the time to first token.
-### 3. Setup a bigger cache for the pre-fill of the KV-Cache
+### 3. Improve performance of the KV-Cache
+
+#### 3.1. Setup a bigger cache for the pre-fill of the KV-Cache
 
 > KV caching helps with the algorithmic side of LLM slowness—since we're now only passing in a single token on each step, we don't have to redo _everything_ for each new token. However, it doesn't completely banish the problem, since the KV cache still grows in size each step, slowing down the attention calculation. The size of the KV cache can also pose its own, new problem—for example, with a 1,000 token KV cache, even with the smallest GPT-2 there are 18,432,000 values being cached. If each is an fp32, that's almost 74MB of cache, for a single generation, for a comparatively tiny model! (LLMFast)
 
@@ -60,6 +62,22 @@ prefill = torch.compile(
 ) 
 ```
 
+#### 3.2. Alternatively, use a static cache
+
+```python
+from transformers import AutoModelForCausalLM, StaticCache
+device = "cuda"
+
+...
+
+model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-chat-hf", torch_dtype=torch.bfloat16)
+model = model.to(device).eval()
+
+...
+
+model._setup_cache(StaticCache, batch_size, max_cache_len=max_cache_length)
+```
+[^static_kv_cache]
 ### 4. Improve the attention mechanism
 
 The attention mechanism (that picks the right token based on the changing context) is also a quadratic algorithm. All tokens attend to all tokens, leading to $N^2$ scaling.
@@ -180,4 +198,7 @@ There are a few alternatives that claim to be more optimized and faster for infe
 - [vLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention](https://blog.vllm.ai/2023/06/20/vllm.html), vLLM Team
 - Chen, Carol. "Transformer Inference Arithmetic", https://kipp.ly/blog/transformer-inference-arithmetic/, 2022.
 - [Extensive LLama.cpp benchmark & more speed on CPU, 7b to 30b, Q2_K, to Q6_K and FP16, X3D, DDR-4000 and DDR-6000](https://www.reddit.com/r/LocalLLaMA/comments/14ilo0t/extensive_llamacpp_benchmark_more_speed_on_cpu_7b/), from  u/Chromix\_ in r/LocalLLaMA, 2023-06-25.
-- LLMFast: [How to make LLMs go fast](https://vgel.me/posts/faster-inference/), Theia @ vgel.me
+- LLMFast: [How to make LLMs go fast](https://vgel.me/posts/faster-inference/), Theia @ vgel.me
+
+[^static_kv_cache]: [
+ArthurZucker/static_kv_cache.py](https://gist.github.com/ArthurZucker/af34221def212259b43d55a2811d2dbb)
diff --git a/content/ai/llms/LazyAxolotl.md b/content/ai/llms/LazyAxolotl.md
@@ -0,0 +1,8 @@
+---
+title: LazyAxolotl
+tags:
+  - llm
+---
+Pre-built notebook with YAML-based configuration to download, run and fine-tune a LLM.
+
+https://colab.research.google.com/drive/1TsDKNo2riwVmU55gjuBgB1AXVtRRfRHW