Mountchicken
diff --git a/‎Efficient_Coding.md
Lines changed: 17 additions & 4 deletions b/‎Efficient_Coding.md
Lines changed: 17 additions & 4 deletions
diff --git a/‎Efficient_GPUtilization.md
Lines changed: 89 additions & 0 deletions b/‎Efficient_GPUtilization.md
Lines changed: 89 additions & 0 deletions
diff --git a/‎Efficient_Training.md
Lines changed: 4 additions & 4 deletions b/‎Efficient_Training.md
Lines changed: 4 additions & 4 deletions
diff --git a/‎README.md
Lines changed: 6 additions & 2 deletions b/‎README.md
Lines changed: 6 additions & 2 deletions
diff --git a/‎images/COM.jpg
26.3 KB b/‎images/COM.jpg
26.3 KB
diff --git a/‎images/autoDocstring.JPG
14.4 KB b/‎images/autoDocstring.JPG
14.4 KB
diff --git a/‎images/docstring.JPG
32.8 KB b/‎images/docstring.JPG
32.8 KB
diff --git a/‎tools/img2lmdb.py
Lines changed: 11 additions & 8 deletions b/‎tools/img2lmdb.py
Lines changed: 11 additions & 8 deletions
@@ -4,11 +4,12 @@
   - [2. Automatically format your code](#2-automatically-format-your-code)
   - [3. Use a pre-commit hook to check your code](#3-use-a-pre-commit-hook-to-check-your-code)
   - [4. Learn to use Git](#4-learn-to-use-git)
-    - [Don't forget the `.gitignore`](#dont-forget-the-gitignore)
+    - [Don't forget the .gitignore](#dont-forget-the-gitignore)
     - [Make your commits more standardized](#make-your-commits-more-standardized)
     - [Branches](#branches)
   - [5. Use Grammarly to check your writing](#5-use-grammarly-to-check-your-writing)
   - [6. Search on StackOverflow first](#6-search-on-stackoverflow-first)
+  - [7. Automatically format your docstring]()
 ## 1. You shouldn't miss VSCode
 <div align=center>
   <img src='images/img1.JPG' width=360 height=240>
@@ -169,10 +170,10 @@ If you are not familiar with Git commands, just follow [this guide](https://lear
 ### **Make your commits more standardized**
 
 - Git commit log summarizes the changes you made. A standardized commit log can make all the things clear. Here's a template.
-  
+
     ```bash
     <feat>                              # type of commit
-    add distribute traing scripts       # details of commit  
+    add distribute traing scripts       # details of commit
     ```
     The first line should tell the commit type, values for reference are listed below. You can create your own commit type.
     - `<feat>`: new features
@@ -187,7 +188,7 @@ If you are not familiar with Git commands, just follow [this guide](https://lear
 ### **Branches**
 
 - `main` or `master` branch should contain stable releases. Do not directly work on the `main/master` branch. Create a `dev` branch for development, merge it to the `main/master` branch only when your dev code is working all right.
-- (Optional) To make your experiment data clear, create a `exp` branch. Make a commit each time you launch a training process, in this way changes of hyperparameters can be recorded. 
+- (Optional) To make your experiment data clear, create a `exp` branch. Make a commit each time you launch a training process, in this way changes of hyperparameters can be recorded.
 
 ## 5. Use Grammarly to check your writing
 - [Grammarly](https://app.grammarly.com/) is the favorite tool for article writers and bloggers. It can automatically check your spell or grammar error and provide a corresponding fix. It is free of charge with limited functions but is enough for people like us.
@@ -199,3 +200,15 @@ If you are not familiar with Git commands, just follow [this guide](https://lear
 - When you encounter a bug and need a solution, search on [StackOverflow](https://stackoverflow.com/) first. StackOverflow provides the most comprehensive solution, and almost all your questions can be answered here.
 
 <img src='images/img9.jpg' height=300><img src='images/img10.jpg' height=300>
+
+## 7. Automatically format your docstring
+- The docstring plugin in vscode can automatically generate docstring for your functions
+- Search and install `autoDocstring` in vscode's market
+<div align=center>
+<img src='images/autoDocstring.JPG' width=200>
+</div>
+
+- After install, press enter after opening docstring with triple quotes (configurable """ or '''). Here is an example.
+<div align=center>
+<img src='images/docstring.JPG' width=400>
+</div>
@@ -0,0 +1,89 @@
+# Efficient GPU Utilization
+- [1. CUDA out of memory solutions ](#1-cuda-out-of-memory-solutions)
+  - [1.1. Use a smaller batch size](#11-use-a-smaller-batch-size)
+  - [1.2. Check if there is any accumulated history across your training loop](#12-check-if-there-is-any-accumulated-history-across-your-training-loop)
+  - [1.3 Delete intermediate variables you don't need](#13-delete-intermediate-variables-you-dont-need)
+  - [1.4. Check if you GPU memory is freed properly](#14-check-if-you-gpu-memory-is-freed-properly)
+  - [1.5. Turn off gradient calculation during validation](#15-turn-off-gradient-calculation-during-validation)
+  - [1.6. COM in Google Colab](#16-com-in-google-colab)
+- [2. Multiple GPUs](#2-multiple-gpus)
+
+## 1. CUDA out of memory solutions
+
+<div align=center>
+  <img src='images/COM.JPG' width=360 height=240>
+</div>
+
+- Anyone engaged in deep learning must have encountered the problem of cuda out of memory.Sometimes It's really frustrating when you've finished writing the code and debugged it for a week to make sure everything is correct. Just when you start training, the program throws a `CUDA out of memory` error. Here are some practical ways to hepl you solve this annoying problem
+### 1.1. Use a smaller batch size
+- The most frequent cause of this problem is that your batch size is set too large. Try to use a small one.
+- In some special scenarios, a smaller batch size may cause your network performance to drop, so a good way to balance this is to use gradient accumulation. Here is an example
+    ```python
+    accumulation_steps = 10                                                              # Reset gradients tensors
+    for i, (inputs, labels) in enumerate(training_set):
+        predictions = model(inputs)                     # Forward pass
+        loss = loss_function(predictions, labels)       # Compute loss function
+        loss = loss / accumulation_steps                # Normalize our loss (if averaged)
+        loss.backward()                                 # Backward pass
+        if (i+1) % accumulation_steps == 0:             # Wait for several backward steps
+            optimizer.step()                            # Now we can do an optimizer step
+            model.zero_grad()                           # Reset gradients tensors
+            if (i+1) % evaluation_steps == 0:           # Evaluate the model when we...
+                evaluate_model()                        # ...have no gradients accumulate
+    ```
+- As you can see from the code, `model.zero_grad()` is executed only after the forward count reaches `accmulation_step`, i.e. the gradient is accumulated 10 times before updating the parameters. This allows you to have a relatively large batch size while reducing the memory footprint.
+- This may also have some minor problems, such as the BN layer may not be calculated accurately, etc.
+
+### 1.2. Check if there is any accumulated history across your training loop
+- By default, computations involving variables that require gradients will keep history. This means that you should avoid using such variables in computations which will live beyond your training loops, e.g., when tracking statistics. Instead, you should detach the variable or access its underlying data.
+- Here is a bad example:
+    ```python
+    total_loss = 0
+    for i in range(10000):
+        optimizer.zero_grad()
+        output = model(input)
+        loss = criterion(output)
+        loss.backward()
+        optimizer.step()
+        total_loss += loss
+    ```
+- `total_loss` is defined outside the loop and will keep accumulating in each loop. This can lead to unnecessary memory usage and you can solve it in two ways: use `total_loss += loss.detach()` or `total_loss += loss.item()` instead.
+
+### 1.3 Delete intermediate variables you don't need
+- If you assign a Tensor or Variable to a local, Python will not deallocate until the local goes out of scope. You can free this reference by using del x. Similarly, if you assign a Tensor or Variable to a member variable of an object, it will not deallocate until the object goes out of scope. You will get the best memory usage if you don’t hold onto temporaries you don’t need.
+```python
+for i in range(5):
+    intermediate = f(input[i])
+    result += g(intermediate)
+output = h(result)
+return output
+```
+- Here, intermediate remains live even while h is executing, because its scope extrudes past the end of the loop. To free it earlier, you should `del intermediate` when you are done with it.
+
+### 1.4. Check if you GPU memory is freed properly
+- Sometimes even if your code stops running, the video memory may still be occupied by it. The best way is to find the process engaging gpu memory and kill it
+- find the PID of python process from:
+    ```bash
+    nvidia-smi
+    ```
+- copy the PID and kill it by:
+    ```bash
+    sudo kill -9 pid
+    ```
+
+### 1.5. Turn off gradient calculation during validation
+- You don't need to calculate gradients for forward and backward phase during validation.
+    ```python
+    with torch.no_grad():
+        for batch in loader:
+            model.evaluate(batch)
+    ```
+
+### 1.6. COM in Google Colab
+- If you are getting this error in Google Colab, then try this
+    ```python
+    import torch
+    torch.cuda.empty_cache()
+    ```
+
+## 2. Multiple GPUs
@@ -9,10 +9,10 @@
 - [2. Faster convergence speed]()
   * [2.1. Use another optimizer AdamW]()
   * [2.2. Learning rate schedule]()
-  
+
 ## 1. Faster traing speed
 ### 1.1. Set cudnn.benchmark=True
-- If your model architecture remains **fixed and your input size stays constant**, setting `torch.backends.cudnn.benchmark = True` might be beneficial [docs](https://pytorch.org/docs/stable/backends.html#torch-backends-cudnn). This enables the cudNN autotuner which will benchmark a number of different ways of computing convolutions in cudNN and then use the fastest method from then on. 
+- If your model architecture remains **fixed and your input size stays constant**, setting `torch.backends.cudnn.benchmark = True` might be beneficial [docs](https://pytorch.org/docs/stable/backends.html#torch-backends-cudnn). This enables the cudNN autotuner which will benchmark a number of different ways of computing convolutions in cudNN and then use the fastest method from then on.
 - Add the following lines in your training code
 ```python
 torch.backends.cudnn.benchmark = True
@@ -29,7 +29,7 @@ torch.backends.cudnn.benchmark = True
       param.grad = None
   ```
 - It doesn’t create the unnecessary overhead of setting the memory for each variable. It directly sets the gradients (i.e. only the write operation is done, unlike model.zero_grad()).
-### 1.3. Turn off debugging 
+### 1.3. Turn off debugging
 - Once you are done debugging your model, you should stop the usage of all the debug APIs because they have a significant overhead.
 - Add the following lines after your imports in your code:
 ```
@@ -89,4 +89,4 @@ torch.autograd.profiler.emit_nvtx(False)
 ### 2.2. Learning rate schedule
 - The learning rate (schedule) you choose has a large impact on the speed of convergence as well as the generalization performance of your model. `Cyclical learning rate` and the `1Cycle learning rate` schedule seem to accelerate convergence.
 ![img](images/img15.JPG)
-- PyTorch implements both of these methods `torch.optim.lr_scheduler.CyclicLR` and `torch.optim.lr_scheduler.OneCycleLR`, see [here]() for more details.
+- PyTorch implements both of these methods `torch.optim.lr_scheduler.CyclicLR` and `torch.optim.lr_scheduler.OneCycleLR`, see [here]() for more details.
@@ -19,5 +19,9 @@
 - [Efficient Data Processing](Efficient_DataProcessing.md)
 
 ## 4.Efficient Training
-- Here are some strategies to speed up your training process 
-- [Efficient Traininig](Efficient_Training.md)
+- Here are some strategies to speed up your training process
+- [Efficient Traininig](Efficient_Training.md)
+
+## 5.Efficient GPUtilization
+- Here are some tips to improve your GPU utilization, including how to fix CUDA out of memory problem, and tips for multi-gpus training
+- [Efficient GPUtilization](Efficient_GPUtilization.md)
@@ -26,14 +26,16 @@ def writeCache(env, cache):
 
 
 def createDataset(imagePath, annoPath, outputPath, checkValid=True):
-    """
-    Create LMDB dataset for training and evaluation.
-    args:
-        imagePath  : path to images
-        annoPath     : path to annotations
-        outputPath : LMDB output path
-        checkValid : if true, check the validity of every image
-    e.g.
+    """Create LMDB dataset for training and evaluation.
+
+    Args:
+        imagePath (_type_): path to images
+        annoPath (_type_): path to annotations
+        outputPath (_type_): LMDB output path
+        checkValid (bool, optional): if true, check the validity of
+            every image.Defaults to True.
+
+    E.g.
     for text recognition task, the file structure is as follow:
       data
          |_image
@@ -49,6 +51,7 @@ def createDataset(imagePath, annoPath, outputPath, checkValid=True):
       annoPath='data/label'
       outputPath='lmdbOut'
     """
+
     os.makedirs(outputPath, exist_ok=True)
     env = lmdb.open(outputPath, map_size=109951162776)
     cache = {}