Skip to content

Commit

Permalink
Quartz sync: Mar 2, 2024, 10:02 AM
Browse files Browse the repository at this point in the history
  • Loading branch information
AlphaGit committed Mar 2, 2024
1 parent e4bfa26 commit 723609f
Show file tree
Hide file tree
Showing 4 changed files with 164 additions and 2 deletions.
124 changes: 124 additions & 0 deletions content/Bash Expansions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
---
title: Bash Expansions
tags:
- command_line
- bash
---
```bash
variable="value"
echo ${variable} # → value
echo $variable # → value
```

Curly braces are used to delimit the variable.

```bash
echo ${variable}_ # → value_
echo $variable_ # → (empty)

set -o nounset
echo $variable_ # exits with error "unbound variable"
```

## Variable indirection

```bash
variable="HOME"
echo ${!variable} # /usr/alpha
```

## Uppercase

```bash
variable="value"

echo ${variable} # → value
echo ${variable^} # → Value
echo ${variable^^} # → VALUE
```

## Lowercase

```bash
variable="VALUE"
echo ${variable} # → VALUE
echo ${variable,} # → vALUE
echo ${variable,,} # → value
```

## Pattern matching

```bash
variable="VALUE"
echo ${variable,,[VAL]} # → valUE
echo ${variable,,[LUE]} # → VAlue

variable="value"
echo ${variable^^[va]} # → VAlue
```

## Arrays

```bash
variables=(one two three)
echo ${variables[@]} # → one two three
echo ${variables[@]^} # → One Two Three
echo ${variables[@]^^} # → ONE TWO THREE
echo ${variables[@]^^[oe]} # → OnE twO thrEE
echo ${variables[*]^^[oe]} # → OnE twO thrEE

echo ${variables[2]^^} # → THREE
```

## Substring removal

### From begginning of string

```bash
variable="value1 value2"
echo ${variable#va} # → lue1 value2
echo ${variable#val} # → ue1 value2
echo ${variable#*l} # → ue1 value2
echo ${variable##val} # → ue1 value2
echo ${variable##*l} # → ue2
```

### From end of string

```bash
variable="value1 value2"
echo ${variable%lue2} # → value1 va
echo ${variable%al*} # → value1 v
echo ${variable%*l*} # → value1 va
```

## String replacement

```bash
echo ${variable/value/example} # → example1 value2
echo ${variable//#value/example} # → example1 example2
echo ${variable/#value/example} # → example1 value2
echo ${variable/value*/example} # → example
echo ${variable/%value2/example} # → value1 example
```

## Substring

```bash
echo ${variable:4} # → e1 value2
echo ${variable:2:3} # → lue
# The space before the - sign is required
echo ${variable: -3} # → ue2
echo ${variable:1: -1} # → alue1 value
echo ${variable: -3: -1} # → ue
```

## Length

```bash
echo ${#variable} # → 13
echo ${#variables[@]} # → 3
echo ${#variables[2]} # → 5
```

[^linuxConfig]: [Introduction to Bash Shell Parameter Expansions](https://linuxconfig.org/introduction-to-bash-shell-parameter-expansions)
9 changes: 9 additions & 0 deletions content/Polyphasic Sleep.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
---
title: Polyphasic Sleep
tags:
- health
---
Polyphasic sleep is the practice of sleeping in numerous smaller blocks during the day as opposed to sleeping once as is common in many countries.[^PS]

[^PS]: https://www.polyphasic.net/

25 changes: 23 additions & 2 deletions content/ai/LLM Speed Performance.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,9 @@ Increasing the batch size will improve the throughput of the model and uses the
> To batch generation, we pass the model multiple sequences at once, generating a completion for each in the same forward pass. This requires the sequences to be padded on either the left or right with filler tokens to equal length. The padding tokens are masked in the attention mask so that they don't influence generation.
This will improve the throughput and improve the use of the hardware but might increase the time to first token.
### 3. Setup a bigger cache for the pre-fill of the KV-Cache
### 3. Improve performance of the KV-Cache

#### 3.1. Setup a bigger cache for the pre-fill of the KV-Cache

> KV caching helps with the algorithmic side of LLM slowness—since we're now only passing in a single token on each step, we don't have to redo _everything_ for each new token. However, it doesn't completely banish the problem, since the KV cache still grows in size each step, slowing down the attention calculation. The size of the KV cache can also pose its own, new problem—for example, with a 1,000 token KV cache, even with the smallest GPT-2 there are 18,432,000 values being cached. If each is an fp32, that's almost 74MB of cache, for a single generation, for a comparatively tiny model! (LLMFast)
Expand All @@ -60,6 +62,22 @@ prefill = torch.compile(
)
```

#### 3.2. Alternatively, use a static cache

```python
from transformers import AutoModelForCausalLM, StaticCache
device = "cuda"

...

model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-chat-hf", torch_dtype=torch.bfloat16)
model = model.to(device).eval()

...

model._setup_cache(StaticCache, batch_size, max_cache_len=max_cache_length)
```
[^static_kv_cache]
### 4. Improve the attention mechanism

The attention mechanism (that picks the right token based on the changing context) is also a quadratic algorithm. All tokens attend to all tokens, leading to $N^2$ scaling.
Expand Down Expand Up @@ -180,4 +198,7 @@ There are a few alternatives that claim to be more optimized and faster for infe
- [vLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention](https://blog.vllm.ai/2023/06/20/vllm.html), vLLM Team
- Chen, Carol. "Transformer Inference Arithmetic", https://kipp.ly/blog/transformer-inference-arithmetic/, 2022.
- [Extensive LLama.cpp benchmark & more speed on CPU, 7b to 30b, Q2_K, to Q6_K and FP16, X3D, DDR-4000 and DDR-6000](https://www.reddit.com/r/LocalLLaMA/comments/14ilo0t/extensive_llamacpp_benchmark_more_speed_on_cpu_7b/), from u/Chromix\_ in r/LocalLLaMA, 2023-06-25.
- LLMFast: [How to make LLMs go fast](https://vgel.me/posts/faster-inference/), Theia @ vgel.me
- LLMFast: [How to make LLMs go fast](https://vgel.me/posts/faster-inference/), Theia @ vgel.me

[^static_kv_cache]: [
ArthurZucker/static_kv_cache.py](https://gist.github.com/ArthurZucker/af34221def212259b43d55a2811d2dbb)
8 changes: 8 additions & 0 deletions content/ai/llms/LazyAxolotl.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
---
title: LazyAxolotl
tags:
- llm
---
Pre-built notebook with YAML-based configuration to download, run and fine-tune a LLM.

https://colab.research.google.com/drive/1TsDKNo2riwVmU55gjuBgB1AXVtRRfRHW

0 comments on commit 723609f

Please sign in to comment.