Skip to content

Commit a401492

Browse files
committed
add GPUtilization
1 parent 2c1f69b commit a401492

File tree

8 files changed

+127
-18
lines changed

8 files changed

+127
-18
lines changed

Efficient_Coding.md

Lines changed: 17 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -4,11 +4,12 @@
44
- [2. Automatically format your code](#2-automatically-format-your-code)
55
- [3. Use a pre-commit hook to check your code](#3-use-a-pre-commit-hook-to-check-your-code)
66
- [4. Learn to use Git](#4-learn-to-use-git)
7-
- [Don't forget the `.gitignore`](#dont-forget-the-gitignore)
7+
- [Don't forget the .gitignore](#dont-forget-the-gitignore)
88
- [Make your commits more standardized](#make-your-commits-more-standardized)
99
- [Branches](#branches)
1010
- [5. Use Grammarly to check your writing](#5-use-grammarly-to-check-your-writing)
1111
- [6. Search on StackOverflow first](#6-search-on-stackoverflow-first)
12+
- [7. Automatically format your docstring]()
1213
## 1. You shouldn't miss VSCode
1314
<div align=center>
1415
<img src='images/img1.JPG' width=360 height=240>
@@ -169,10 +170,10 @@ If you are not familiar with Git commands, just follow [this guide](https://lear
169170
### **Make your commits more standardized**
170171

171172
- Git commit log summarizes the changes you made. A standardized commit log can make all the things clear. Here's a template.
172-
173+
173174
```bash
174175
<feat> # type of commit
175-
add distribute traing scripts # details of commit
176+
add distribute traing scripts # details of commit
176177
```
177178
The first line should tell the commit type, values for reference are listed below. You can create your own commit type.
178179
- `<feat>`: new features
@@ -187,7 +188,7 @@ If you are not familiar with Git commands, just follow [this guide](https://lear
187188
### **Branches**
188189
189190
- `main` or `master` branch should contain stable releases. Do not directly work on the `main/master` branch. Create a `dev` branch for development, merge it to the `main/master` branch only when your dev code is working all right.
190-
- (Optional) To make your experiment data clear, create a `exp` branch. Make a commit each time you launch a training process, in this way changes of hyperparameters can be recorded.
191+
- (Optional) To make your experiment data clear, create a `exp` branch. Make a commit each time you launch a training process, in this way changes of hyperparameters can be recorded.
191192
192193
## 5. Use Grammarly to check your writing
193194
- [Grammarly](https://app.grammarly.com/) is the favorite tool for article writers and bloggers. It can automatically check your spell or grammar error and provide a corresponding fix. It is free of charge with limited functions but is enough for people like us.
@@ -199,3 +200,15 @@ If you are not familiar with Git commands, just follow [this guide](https://lear
199200
- When you encounter a bug and need a solution, search on [StackOverflow](https://stackoverflow.com/) first. StackOverflow provides the most comprehensive solution, and almost all your questions can be answered here.
200201
201202
<img src='images/img9.jpg' height=300><img src='images/img10.jpg' height=300>
203+
204+
## 7. Automatically format your docstring
205+
- The docstring plugin in vscode can automatically generate docstring for your functions
206+
- Search and install `autoDocstring` in vscode's market
207+
<div align=center>
208+
<img src='images/autoDocstring.JPG' width=200>
209+
</div>
210+
211+
- After install, press enter after opening docstring with triple quotes (configurable """ or '''). Here is an example.
212+
<div align=center>
213+
<img src='images/docstring.JPG' width=400>
214+
</div>

Efficient_GPUtilization.md

Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
# Efficient GPU Utilization
2+
- [1. CUDA out of memory solutions ](#1-cuda-out-of-memory-solutions)
3+
- [1.1. Use a smaller batch size](#11-use-a-smaller-batch-size)
4+
- [1.2. Check if there is any accumulated history across your training loop](#12-check-if-there-is-any-accumulated-history-across-your-training-loop)
5+
- [1.3 Delete intermediate variables you don't need](#13-delete-intermediate-variables-you-dont-need)
6+
- [1.4. Check if you GPU memory is freed properly](#14-check-if-you-gpu-memory-is-freed-properly)
7+
- [1.5. Turn off gradient calculation during validation](#15-turn-off-gradient-calculation-during-validation)
8+
- [1.6. COM in Google Colab](#16-com-in-google-colab)
9+
- [2. Multiple GPUs](#2-multiple-gpus)
10+
11+
## 1. CUDA out of memory solutions
12+
13+
<div align=center>
14+
<img src='images/COM.JPG' width=360 height=240>
15+
</div>
16+
17+
- Anyone engaged in deep learning must have encountered the problem of cuda out of memory.Sometimes It's really frustrating when you've finished writing the code and debugged it for a week to make sure everything is correct. Just when you start training, the program throws a `CUDA out of memory` error. Here are some practical ways to hepl you solve this annoying problem
18+
### 1.1. Use a smaller batch size
19+
- The most frequent cause of this problem is that your batch size is set too large. Try to use a small one.
20+
- In some special scenarios, a smaller batch size may cause your network performance to drop, so a good way to balance this is to use gradient accumulation. Here is an example
21+
```python
22+
accumulation_steps = 10 # Reset gradients tensors
23+
for i, (inputs, labels) in enumerate(training_set):
24+
predictions = model(inputs) # Forward pass
25+
loss = loss_function(predictions, labels) # Compute loss function
26+
loss = loss / accumulation_steps # Normalize our loss (if averaged)
27+
loss.backward() # Backward pass
28+
if (i+1) % accumulation_steps == 0: # Wait for several backward steps
29+
optimizer.step() # Now we can do an optimizer step
30+
model.zero_grad() # Reset gradients tensors
31+
if (i+1) % evaluation_steps == 0: # Evaluate the model when we...
32+
evaluate_model() # ...have no gradients accumulate
33+
```
34+
- As you can see from the code, `model.zero_grad()` is executed only after the forward count reaches `accmulation_step`, i.e. the gradient is accumulated 10 times before updating the parameters. This allows you to have a relatively large batch size while reducing the memory footprint.
35+
- This may also have some minor problems, such as the BN layer may not be calculated accurately, etc.
36+
37+
### 1.2. Check if there is any accumulated history across your training loop
38+
- By default, computations involving variables that require gradients will keep history. This means that you should avoid using such variables in computations which will live beyond your training loops, e.g., when tracking statistics. Instead, you should detach the variable or access its underlying data.
39+
- Here is a bad example:
40+
```python
41+
total_loss = 0
42+
for i in range(10000):
43+
optimizer.zero_grad()
44+
output = model(input)
45+
loss = criterion(output)
46+
loss.backward()
47+
optimizer.step()
48+
total_loss += loss
49+
```
50+
- `total_loss` is defined outside the loop and will keep accumulating in each loop. This can lead to unnecessary memory usage and you can solve it in two ways: use `total_loss += loss.detach()` or `total_loss += loss.item()` instead.
51+
52+
### 1.3 Delete intermediate variables you don't need
53+
- If you assign a Tensor or Variable to a local, Python will not deallocate until the local goes out of scope. You can free this reference by using del x. Similarly, if you assign a Tensor or Variable to a member variable of an object, it will not deallocate until the object goes out of scope. You will get the best memory usage if you don’t hold onto temporaries you don’t need.
54+
```python
55+
for i in range(5):
56+
intermediate = f(input[i])
57+
result += g(intermediate)
58+
output = h(result)
59+
return output
60+
```
61+
- Here, intermediate remains live even while h is executing, because its scope extrudes past the end of the loop. To free it earlier, you should `del intermediate` when you are done with it.
62+
63+
### 1.4. Check if you GPU memory is freed properly
64+
- Sometimes even if your code stops running, the video memory may still be occupied by it. The best way is to find the process engaging gpu memory and kill it
65+
- find the PID of python process from:
66+
```bash
67+
nvidia-smi
68+
```
69+
- copy the PID and kill it by:
70+
```bash
71+
sudo kill -9 pid
72+
```
73+
74+
### 1.5. Turn off gradient calculation during validation
75+
- You don't need to calculate gradients for forward and backward phase during validation.
76+
```python
77+
with torch.no_grad():
78+
for batch in loader:
79+
model.evaluate(batch)
80+
```
81+
82+
### 1.6. COM in Google Colab
83+
- If you are getting this error in Google Colab, then try this
84+
```python
85+
import torch
86+
torch.cuda.empty_cache()
87+
```
88+
89+
## 2. Multiple GPUs

Efficient_Training.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -9,10 +9,10 @@
99
- [2. Faster convergence speed]()
1010
* [2.1. Use another optimizer AdamW]()
1111
* [2.2. Learning rate schedule]()
12-
12+
1313
## 1. Faster traing speed
1414
### 1.1. Set cudnn.benchmark=True
15-
- If your model architecture remains **fixed and your input size stays constant**, setting `torch.backends.cudnn.benchmark = True` might be beneficial [docs](https://pytorch.org/docs/stable/backends.html#torch-backends-cudnn). This enables the cudNN autotuner which will benchmark a number of different ways of computing convolutions in cudNN and then use the fastest method from then on.
15+
- If your model architecture remains **fixed and your input size stays constant**, setting `torch.backends.cudnn.benchmark = True` might be beneficial [docs](https://pytorch.org/docs/stable/backends.html#torch-backends-cudnn). This enables the cudNN autotuner which will benchmark a number of different ways of computing convolutions in cudNN and then use the fastest method from then on.
1616
- Add the following lines in your training code
1717
```python
1818
torch.backends.cudnn.benchmark = True
@@ -29,7 +29,7 @@ torch.backends.cudnn.benchmark = True
2929
param.grad = None
3030
```
3131
- It doesn’t create the unnecessary overhead of setting the memory for each variable. It directly sets the gradients (i.e. only the write operation is done, unlike model.zero_grad()).
32-
### 1.3. Turn off debugging
32+
### 1.3. Turn off debugging
3333
- Once you are done debugging your model, you should stop the usage of all the debug APIs because they have a significant overhead.
3434
- Add the following lines after your imports in your code:
3535
```
@@ -89,4 +89,4 @@ torch.autograd.profiler.emit_nvtx(False)
8989
### 2.2. Learning rate schedule
9090
- The learning rate (schedule) you choose has a large impact on the speed of convergence as well as the generalization performance of your model. `Cyclical learning rate` and the `1Cycle learning rate` schedule seem to accelerate convergence.
9191
![img](images/img15.JPG)
92-
- PyTorch implements both of these methods `torch.optim.lr_scheduler.CyclicLR` and `torch.optim.lr_scheduler.OneCycleLR`, see [here]() for more details.
92+
- PyTorch implements both of these methods `torch.optim.lr_scheduler.CyclicLR` and `torch.optim.lr_scheduler.OneCycleLR`, see [here]() for more details.

README.md

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,5 +19,9 @@
1919
- [Efficient Data Processing](Efficient_DataProcessing.md)
2020

2121
## 4.Efficient Training
22-
- Here are some strategies to speed up your training process
23-
- [Efficient Traininig](Efficient_Training.md)
22+
- Here are some strategies to speed up your training process
23+
- [Efficient Traininig](Efficient_Training.md)
24+
25+
## 5.Efficient GPUtilization
26+
- Here are some tips to improve your GPU utilization, including how to fix CUDA out of memory problem, and tips for multi-gpus training
27+
- [Efficient GPUtilization](Efficient_GPUtilization.md)

images/COM.jpg

26.3 KB
Loading

images/autoDocstring.JPG

14.4 KB
Loading

images/docstring.JPG

32.8 KB
Loading

tools/img2lmdb.py

Lines changed: 11 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -26,14 +26,16 @@ def writeCache(env, cache):
2626

2727

2828
def createDataset(imagePath, annoPath, outputPath, checkValid=True):
29-
"""
30-
Create LMDB dataset for training and evaluation.
31-
args:
32-
imagePath : path to images
33-
annoPath : path to annotations
34-
outputPath : LMDB output path
35-
checkValid : if true, check the validity of every image
36-
e.g.
29+
"""Create LMDB dataset for training and evaluation.
30+
31+
Args:
32+
imagePath (_type_): path to images
33+
annoPath (_type_): path to annotations
34+
outputPath (_type_): LMDB output path
35+
checkValid (bool, optional): if true, check the validity of
36+
every image.Defaults to True.
37+
38+
E.g.
3739
for text recognition task, the file structure is as follow:
3840
data
3941
|_image
@@ -49,6 +51,7 @@ def createDataset(imagePath, annoPath, outputPath, checkValid=True):
4951
annoPath='data/label'
5052
outputPath='lmdbOut'
5153
"""
54+
5255
os.makedirs(outputPath, exist_ok=True)
5356
env = lmdb.open(outputPath, map_size=109951162776)
5457
cache = {}

0 commit comments

Comments
 (0)