

# Ramulator 2.0 Summary

*Intelligent System  
Laboratory*

# DRAM Operations & States

## □ DRAM Operations & States



- Main DRAM states
  - Activate
  - Read/Write
  - Precharge

```
1 // DRAM.h
2 template <typename T>
3 class DRAM {
4     DRAM<T>* parent;
5     vector<DRAM<T>*>
6         children;
7     T::Level level;
8     int index;
9     // more code...
10};
```

```
1 // DDR3.h/cpp
2 class DDR3 {
3     enum class Level {
4         Channel, Rank, Bank,
5         Row, Column, MAX
6     };
7     // more code...
8 };
9 };
```



# src files <=> DRAM Operation

## □ Simulation Configuration



Fig. 1: High-level software architecture of Ramulator 2.0 using an example DDR5 system configuration

# src files <=> DRAM Operation

## □ Simulation Configuration



Fig. 1: High-level software architecture of Ramulator 2.0 using an example DDR5 system configuration



- 1. Requests are sent:** Front-end(trace file)에서 Memory Request를 보냄
- 2. Memory Addresses are Mapped:** Address Mapper가 Request Address를 DRAM 구조에 맞게 변환
- 3. Enqueue:** DRAM Ctrlr의 Buffer에 Request를 넣음
- 4. DRAM Ctrlr - Ticking Refresh Manager:** Ctrlr가 Refresh Manager를 호출해 high-priority maintenance request(ex. Refresh)을 추가
- 5. DRAM Ctrlr - Request Scheduling:** Request Scheduler에게 최적의 Request를 선택하라고 요청
- 6. DRAM Device가 Request 확인:** Scheduler가 DRAM Device Model을 참조해 적합한 Command를 Decode
- 7. Issue Command:** DRAM Ctrlr가 DRAM Command를 보냄
- 8. Updates the behavior and timing information:** DRAM Command Issue시 State & Timing이 Update
- 9. Notify the frontend:** Memory Request가 끝나면 callback으로 frontend에 알림

# main function

## □ main.cpp

```
13 ~ int main(int argc, char* argv[]) {  
14     // Parse command line arguments  
15     argparse::ArgumentParser program("Ramulator", "2.0");  
16     program.add_argument("-c", "-config").metavar("\\"dumped YAML configuration\\")  
17         .help("String dump of the yaml configuration.");  
18     program.add_argument("-f", "-config_file").metavar("path-to-configuration-file")  
19         .help("Path to a YAML configuration file.");  
20     program.add_argument("-p", "--param").metavar("KEY=VALUE")  
21         .append()  
22         .help("specify parameter to override in the configuration file. Repeat this option to change multiple parameters.");  
23       
24     :  
25     :  
26     :  
27     :  
28     :  
29     :  
30     :  
31     :  
32     :  
33     :  
34     :  
35     :  
36     :  
37     :  
38     :  
39     :  
40     :  
41     :  
42     :  
43     :  
44     :  
45     :  
46     :  
47     :  
48     :  
49     :  
50     :  
51     :  
52     :  
53     :  
54     :  
55     :  
56     :  
57     :  
58     :  
59     :  
60     :  
61     :  
62     :  
63     :  
64     :  
65     :  
66     :  
67     :  
68     :  
69     :  
70     :  
71     :  
72     :  
73     :  
74     :  
75     :  
76     :  
77     :  
78     :  
79     :  
80     :  
81     :  
82     :  
83     :  
84     :  
85     :  
86     :  
87     :  
88     // Connect the frontend and the memory system together,  
89     // this recursively calls the "setup" function in all instantiated components  
90     // so that they can get each other's parameters (if needed) after their initialization  
91     frontend->connect_memory_system(memory_system);  
92     memory_system->connect_frontend(frontend);  
93       
94     // Get the relative clock ratio between the frontend and memory system  
95     int frontend_tick = frontend->get_clock_ratio();  
96     int mem_tick = memory_system->get_clock_ratio();  
97       
98     int tick_mult = frontend_tick * mem_tick;  
99       
100    for (uint64_t i = 0; i++;)  
101        if (((i % tick_mult) % mem_tick) == 0) {  
102            frontend->tick();  
103        }  
104          
105        if (frontend->is_finished()) {  
106            break;  
107        }  
108          
109        if ((i % tick_mult) % frontend_tick == 0) {  
110            memory_system->tick();  
111        }  
112    }  
113      
114    // Finalize the simulation. Recursively print all statistics from all components  
115    frontend->finalize();  
116    memory_system->finalize();  
117      
118    return 0;  
119 }
```

## main.cpp

### 1. Argument 받는 부분

- Options

1. -c: command line dump
2. -f: YAML document
3. -p: overriding parameters in a YAML document

### 2. Long for loop를 통한 tick() 기반 simul

1. frontend(core)가 발행한 예상 instructions들을 모두 처리시 is\_finished()가 true가 됨

# yaml file

## □ example\_config.yaml

```
1  Frontend:
2    impl: SimpleO3
3    clock_ratio: 8
4    num_expected_insts: 500000
5  traces:
6    - example_inst.trace
7
8  Translation:
9    impl: RandomTranslation
10   max_addr: 2147483648
11
13 MemorySystem:
14   impl: GenericDRAM
15   clock_ratio: 3
16
17 DRAM:
18   impl: DDR4
19   org:
20     preset: DDR4_8Gb_x8
21     channel: 1
22     rank: 2
23   timing:
24     preset: DDR4_2400R
25
26 Controller:
27   impl: Generic
28   Scheduler:
29     impl: FRFCFS
30   RefreshManager:
31     impl: AllBank
32   RowPolicy:
33     impl: ClosedRowPolicy
34     cap: 4
35   plugins:
36
37 AddrMapper:
38   impl: RoBaRaCoCh
```

### 1. Frontend Interface(IFrontEnd) 부분

- **trace file**에서 Instruction 읽고, Memory Request 생성
- impl: SimpleO3  
⇒ Simple Out-of-Order (O3) CPU
- clock ratio: 8  
⇒ global CLK 대비 Frontend CLK 속도
- num expected insts: 500000  
⇒ Simulation이 해당 instruction 수에 도달 시 종료
- traces  
⇒ Instruction trace file(include memory access Inst)
- impl: RandomTranslation  
⇒ Physical Memory ↔ Virtual Memory 변환  
⇒ System의 Page Table 등을 간단히 Modeling
- max addr: 2147483648  
⇒ Translation 시 address overflow 방지

# yaml file

## □ example\_config.yaml

```
1  Frontend:
2    impl: SimpleOS
3    clock_ratio: 8
4    num_expected_insts: 500000
5  traces:
6    - example_inst.trace
7
8  Translation:
9    impl: RandomTranslation
10   max_addr: 2147483648
11
13  MemorySystem:
14    impl: GenericDRAM
15    clock_ratio: 3
16
17  DRAM:
18    impl: DDR4
19    org:
20      preset: DDR4_8Gb_x8
21      channel: 1
22      rank: 2
23    timing:
24      preset: DDR4_2400R
25
26  Controller:
27    impl: Generic
28    Scheduler:
29      impl: FRFCFS
30    RefreshManager:
31      impl: AllBank
32    RowPolicy:
33      impl: ClosedRowPolicy
34      cap: 4
35    plugins:
36
37  AddrMapper:
38    impl: RoBaRaCoCh
```

## 2. MemorySystem Interface 부분

- Frontend의 Request를 받아 DRAM Ctrlr을 통해 처리
- Latency, en/dequeue, Timing Constraints 처리
- impl: GenericDRAM
  - ⇒ 기본 DRAM 기반 System, Ctrlr와 DRAM을 통합
- clock ratio: 3
  - ⇒ global CLK 대비 MemorySystem CLK 속도
  - ⇒ 현재: DRAM이 CPU보다 느린 System (= 3:8)

# yaml file

## example\_config.yaml

```
1  Frontend:
2    impl: SimpleO3
3    clock_ratio: 8
4    num_expected_insts: 500000
5  traces:
6    - example_inst.trace
7
8  Translation:
9    impl: RandomTranslation
10   max_addr: 2147483648
11
12
13  MemorySystem:
14    impl: GenericDRAM
15    clock_ratio: 3
16
17  DRAM:
18    impl: DDR4
19    org:
20      preset: DDR4_8Gb_x8
21      channel: 1
22      rank: 2
23      timing:
24        preset: DDR4_2400R
25
26  Controller:
27    impl: Generic
28  Scheduler:
29    impl: FRFCFS
30  RefreshManager:
31    impl: AllBank
32  RowPolicy:
33    impl: ClosedRowPolicy
34    cap: 4
35  plugins:
36
37  AddrMapper:
38    impl: RoBaRaCoCh
```

```
26  inline static const std::map<std::string, std::vector<int>> timing_presets = {
27    // name rate nBL nRC nRCD nRP nRAS nRC nWR nRTP nWL nCDS nCCDL nRDRS nRRDL nWTRS nW
28    {"DDR4_1600"}, {1600, 4, 10, 10, 10, 28, 38, 12, 6, 9, 4, 5, -1, -1, -1, 2,
29    {"DDR4_1600K"}, {1600, 4, 11, 11, 11, 28, 39, 12, 6, 9, 4, 5, -1, -1, -1, 2,
30    {"DDR4_1600L"}, {1600, 4, 12, 12, 12, 28, 40, 12, 6, 9, 4, 5, -1, -1, -1, 2,
31    {"DDR4_1866L"}, {1866, 4, 12, 12, 12, 32, 44, 14, 7, 10, 4, 5, -1, -1, -1, 3,
32    {"DDR4_1866M"}, {1866, 4, 13, 13, 13, 32, 45, 14, 7, 10, 4, 5, -1, -1, -1, 3,
33    {"DDR4_1866N"}, {1866, 4, 14, 14, 14, 32, 46, 14, 7, 10, 4, 5, -1, -1, -1, 3,
34    {"DDR4_2133N"}, {2133, 4, 14, 14, 14, 36, 50, 16, 8, 11, 4, 6, -1, -1, -1, 3,
35    {"DDR4_2133P"}, {2133, 4, 15, 15, 15, 36, 51, 16, 8, 11, 4, 6, -1, -1, -1, 3,
36    {"DDR4_2133R"}, {2133, 4, 16, 16, 16, 36, 52, 16, 8, 11, 4, 6, -1, -1, -1, 3,
37    {"DDR4_2400P"}, {2400, 4, 15, 15, 15, 39, 54, 18, 9, 12, 4, 6, -1, -1, -1, 3,
38    {"DDR4_2400R"}, {2400, 4, 16, 16, 16, 39, 55, 18, 9, 12, 4, 6, -1, -1, -1, 3,
39    {"DDR4_2400U"}, {2400, 4, 17, 17, 17, 39, 56, 18, 9, 12, 4, 6, -1, -1, -1, 3,
40    {"DDR4_2400T"}, {2400, 4, 18, 18, 18, 39, 57, 18, 9, 12, 4, 6, -1, -1, -1, 3,
```

## 2. MemorySystem Interface 부분

- **DRAM Section**
- impl: DDR4  
⇒ tick() 시 Timing Check / Command Issue 실행.
- org: DDR4 8Gb x8  
⇒ 현재 **DRAM preset** - 8Gb 용량, x8bit data bus  
⇒ **Channel/Rank 설정** 시 기본 Preset 설정을 Override 함
- timing: DDR4 2400R  
⇒ **Timing preset** - nRCD등의 Timing Constraint 정의  
⇒ 이를 이용해 tick() 시 Latency 계산

```
class DDR4 : public IDRAM, public Implementation {
  RAMULATOR_REGISTER_IMPLEMENTATION(IDRAM, DDR4, "DDR4", "DDR4 Device Model")
public:
  inline static const std::map<std::string, Organization> org_presets = {
    // name density DQ Ch Ra Bg Ba Ro Co
    {"DDR4_2Gb_x4", {2<<10, 4, {1, 1, 4, 4, 1<<15, 1<<10}}},
    {"DDR4_2Gb_x8", {2<<10, 8, {1, 1, 4, 4, 1<<14, 1<<10}}},
    {"DDR4_2Gb_x16", {2<<10, 16, {1, 1, 2, 4, 1<<14, 1<<10}}},
    {"DDR4_4Gb_x4", {4<<10, 4, {1, 1, 4, 4, 1<<16, 1<<10}}},
    {"DDR4_4Gb_x8", {4<<10, 8, {1, 1, 4, 4, 1<<15, 1<<10}}},
    {"DDR4_4Gb_x16", {4<<10, 16, {1, 1, 2, 4, 1<<15, 1<<10}}},
    {"DDR4_8Gb_x4", {8<<10, 4, {1, 1, 4, 4, 1<<17, 1<<10}}},
    {"DDR4_8Gb_x8", {8<<10, 8, {1, 1, 4, 4, 1<<16, 1<<10}}},
    {"DDR4_8Gb_x16", {8<<10, 16, {1, 1, 2, 4, 1<<16, 1<<10}}},
    {"DDR4_16Gb_x4", {16<<10, 4, {1, 1, 4, 4, 1<<18, 1<<10}}},
    {"DDR4_16Gb_x8", {16<<10, 8, {1, 1, 4, 4, 1<<17, 1<<10}}},
    {"DDR4_16Gb_x16", {16<<10, 16, {1, 1, 2, 4, 1<<17, 1<<10}}},
  };
}
```

```
class DDR4 : public IDRAM, public Implementation {
  void tick() override {
    void init() override {
      RAMULATOR_DECLARE_SPECS();
      set_organization();
      set_timing_vals();
      set_actions();
      set_preqs();
      set_rowhits();
      set_rowopens();
      set_powers();
      create_nodes();
    };
  };
}
```

# yaml file

## □ example\_config.yaml

```
1  Frontend:
2    impl: SimpleOS
3    clock_ratio: 8
4    num_expected_insts: 500000
5  traces:
6    - example_inst.trace
7
8  Translation:
9    impl: RandomTranslation
10   max_addr: 2147483648
11
13  MemorySystem:
14    impl: GenericDRAM
15    clock_ratio: 3
16
17    DRAM:
18      impl: DDR4
19      org:
20        preset: DDR4_8Gb_x8
21        channel: 1
22        rank: 2
23        timing:
24          preset: DDR4_2400R
25
26    Controller:
27      impl: Generic
28      Scheduler:
29        impl: FRFCFS
30      RefreshManager:
31        impl: AllBank
32      RowPolicy:
33        impl: ClosedRowPolicy
34        cap: 4
35      plugins:
36
37    AddrMapper:
38      impl: RoBaRaCoCh
```

## 2. MemorySystem Interface 부분

- **Controller Section**
- *impl: Generic*
  - ⇒ Generic - 기본 Ctrlr
  - ⇒ Request Queue/Scheduling 등 관리
- *Scheduler - impl: FRFCFS*
  - ⇒ FRFCFS - First-Ready First-Come-First-Serve
  - ⇒ 준비된 Request를 Queue에서 꺼내 우선 처리
- *RefreshManager - impl: AllBank*
  - ⇒ AllBank - 모든 Bank simultaneous Refresh
- *RowPolicy - impl: ClosedRowPolicy*
  - ⇒ ClosedRowPolicy - 사용 후 Row 즉시 닫음(Precharge)
  - ⇒ cap:4 - 열려있는 Row 최대 수 제한
- *plugins*
  - ⇒ 현재 Ramulator에서는 Row Hammering 완화 기법을 plugin으로 제공해줌

# yaml file

## □ example\_config.yaml

```
1  < Frontend:
2    |   impl: SimpleOS
3    |   clock_ratio: 8
4    |   num_expected_insts: 500000
5  < traces:
6    |   - example_inst.trace
7
8  < Translation:
9    |   impl: RandomTranslation
10   |   max_addr: 2147483648
11
13  |   MemorySystem:
14    |     impl: GenericDRAM
15    |     clock_ratio: 3
16
17    |     DRAM:
18      |       impl: DDR4
19      |       org:
20        |         preset: DDR4_8Gb_x8
21        |         channel: 1
22        |         rank: 2
23        |         timing:
24          |           preset: DDR4_2400R
25
26    |     Controller:
27      |       impl: Generic
28      |       Scheduler:
29        |         impl: FRFCFS
30      |       RefreshManager:
31        |         impl: AllBank
32      |       RowPolicy:
33        |         impl: ClosedRowPolicy
34        |         cap: 4
35      |       plugins:
36
37    |     AddrMapper:
38      |       impl: RoBaRaCoCh
```

## 2. MemorySystem Interface 부분

- AddrMapper Section

- impl: RoBaRaCoCh

⇒ Row-Bank-Rank-Column-Channel Mapping Scheme  
⇒ Requested Address 변환(Physical → DRAM Vector)

[Physical → DRAM Vector Example]

⇒ Physical Address: 0x12345678

⇒ DRAM Vector:

[Channel:0, Rank:1, Bank:2, Row:128, Column:512]

# trace file

## □ example\_inst.trace & trace.cpp & core.cpp

- simpleO3 CPU model 기준

- Ramulator는 “Memory” Simulator

- Memory 명령어만 취급하기에, 3가지로만 Instruction을 분리한다.

- 1. Not Memory Operation

- 2. Load

- 3. Store

- 따라서, simpleO3 기반 trace 파일:

- 1st column은 Not Memory Operation Cycle 수 (or ticks 수)

- 2nd column은 load operation address

- 3rd column은 store operation address

| example_inst.trace |   |          |
|--------------------|---|----------|
| 1                  | 3 | 20734016 |
| 2                  | 1 | 20846400 |
| 3                  | 6 | 20734208 |
| 4                  | 1 | 20846400 |
| 5                  | 8 | 20841280 |
| 6                  | 0 | 20734144 |
| 7                  | 2 | 20918976 |
| 8                  | 1 | 20846400 |
| 9                  |   | 20734016 |

- 각 line은 load(& store) 동작을 나타낸다.
- 1<sup>st</sup> line : 3cycle 동안 stall → load
- 5<sup>th</sup> line : 8cycle 동안 stall → load → store

# trace file

## □ example\_inst.trace & trace.cpp & core.cpp

The diagram illustrates the flow of data from a trace file to the CPU model. A red arrow points from the highlighted code in `trace.cpp` to the `SimpleO3Core::Tick()` function in `core.cpp`. A green arrow points from the `SimpleO3Core::Tick()` function back to the trace file, indicating the generation of requests based on the trace data.

**trace.cpp (Left)**

```
src > frontend > impl > processor > simpleO3 > trace.cpp > ...
std::string line;
while (std::getline(trace_file, line)) {
    std::vector<std::string> tokens;
    tokenize(tokens, line, " ");
    int num_tokens = tokens.size();
    if (num_tokens != 2 & num_tokens != 3) {
        throw ConfigurationError("Trace {} format invalid!", file_path_str);
    }
    int bubble_count = std::stoi(tokens[0]);
    Addr_t load_addr = std::stoll(tokens[1]);
    bool has_store = num_tokens == 2 ? false : true;
    if (has_store) {
        Addr_t store_addr = std::stoll(tokens[2]);
        m_trace.push_back({bubble_count, load_addr, store_addr});
    } else {
        m_trace.push_back({bubble_count, load_addr, -1});
    }
}
trace_file.close();
m_trace_length = m_trace.size();
```

각 줄이 하나의 Instruction처럼 CPU 모델에 들어감

```
example_insttrace
1 3 20734016
2 1 20846400
3 6 20734208
4 1 20846400
5 8 20841280 20841280
6 0 20734144
7 2 20918976 20734016
8 1 20846400
9
```

**core.cpp (Right)**

```
src > frontend > impl > processor > simpleO3 > core.cpp > tick()
114 void SimpleO3Core::Tick() {
115     m_clk++;
116
117     s_insts_retired += m_window.retire();
118     if (!reached_expected_num_insts) {
119         if (s_insts_retired >= m_num_expected_insts) {
120             reached_expected_num_insts = true;
121             s_cycles_recorded = m_clk;
122         }
123     }
124
125     // First, issue the non-memory instructions
126     int num_inserted_insts = 0;
127     while (m_num_bubbles > 0) {
128         if (num_inserted_insts == m_window.m_ipc) {
129             return;
130         }
131         if (m_window.is_full()) {
132             return;
133         }
134         m_window.insert(true, -1);
135         num_inserted_insts++;
136         m_num_bubbles--;
137
138         // Second, try to send the load to the LLC
139         if (m_load_addr != -1) {
140             if (num_inserted_insts == m_window.m_ipc) {
141                 return;
142             }
143             if (m_window.is_full()) {
144                 return;
145             };
146
147         Request load_request(m_load_addr, Request::Type::Read, m_id, m_callback);
148         if (!m_translation->translate(load_request)) {
149             return;
150         }
151
152         if (m_llc->send(load_request)) {
153             m_window.insert(false, load_request.addr);
154             m_load_addr = -1;
155             if (m_writeback_addr != -1) {
156                 // If there is still writeback, return without getting the next trace line
157                 // The write back will be issued in the next cycle
158                 // TODO: Should we allow both load and writeback to issue at the same cycle?
159                 return;
160             }
161         } else {
162             return;
163         }
164
165         // Third, try to send the writeback to the LLC
166         if (m_writeback_addr != -1) {
167             Request writeback_request(m_writeback_addr, Request::Type::Write, m_id, m_callback);
168             if (!m_llc->send(writeback_request)) {
169                 return;
170             }
171         }
172
173         auto inst = m_trace.get_next_inst();
174         m_num_bubbles = inst.bubble_count;
175         m_load_addr = inst.load_addr;
176         m_writeback_addr = inst.store_addr;
177
178     }
179
180 }
```

- Trace file을 arg로 받아, Frontend (ex: simpleO3.cpp)에서 처리되어 메모리 접근 request를 생성
- simpleO3 CPU model 기준: trace의 token[0]: bubble / token[1]: load address / token[2]: store address

# trace file result

## □ ./ramulator2 -f ./example\_config.yaml

```
root@947e591ed45c:/workspace# ./ramulator2 -f ./example_config.yaml
Frontend:
  impl: simple03
  memory_access_cycles_recorded_core_0: 61
  cycles_recorded_core_0: 216815
  llc_mshr_unavailable: 0
  llc_read_misses: 37
  llc_read_access: 133336
  llc_write_misses: 8
  llc_write_access: 33334
  llc_eviction: 0
  num_expected_insts: 500000
Translation:
  impl: RandomTranslation

MemorySystem:
  impl: GenericDRAM
  total_num_other_requests: 0
  total_num_write_requests: 0
  total_num_read_requests: 6
  memory_system_cycles: 81306
DRAM:
  impl: DDR4
AddrMapper:
  impl: RoBaRaCoCh

Controller:
  impl: Generic
  id: Channel 0
  avg_read_latency_0: 46.5
  read_queue_len_avg_0: 0.00232455181
  write_queue_len_0: 0
  queue_len_0: 245
  num_other_reqs_0: 0
  num_write_reqs_0: 0
  read_latency_0: 279
  priority_queue_len_avg_0: 0.000688756059
  row_hits_0: 2
  priority_queue_len_0: 56
  row_misses_0: 4
  row_conflicts_0: 0
  read_row_misses_0: 4
  queue_len_avg_0: 0.00301330769
  read_row_conflicts_core_0: 0
  read_row_hits_0: 2
  write_queue_len_avg_0: 0
  read_row_conflicts_0: 0
  write_row_misses_0: 0
  write_row_conflicts_0: 0
  read_queue_len_0: 189
  write_row_hits_0: 0
  read_row_hits_core_0: 2
  read_row_misses_core_0: 4
  num_read_reqs_0: 6
Scheduler:
  impl: FRFCFS
RefreshManager:
  impl: AllBank

RowPolicy:
  impl: ClosedRowPolicy
  num_close_reqs: 0
```

- 기본 yaml file setting을 이용.  
=> trace file의 Inst 실행결과가 분석됨
- 1. Frontend: CPU ↔ Memory 접근 분석
- 2. MemorySystem<DRAM> 관점 분석
  - Request 수
  - Cycle 수
  - Latency
  - Queue 길이
  - Row Hit / Miss

# trace file result analysis

## □ ./ramulator2 -f ./example\_config.yaml

```
root@947e591ed45c:/workspace# ./ramulator2 -f ./example_config.yaml
Frontend:
  impl: Simple03
  memory_access_cycles_recorded_core_0: 61
  cycles_recorded_core_0: 216815
  llc_mshr_unavailable: 0
  llc_read_misses: 37
  llc_read_access: 133336
  llc_write_misses: 8
  llc_write_access: 33334
  llc_eviction: 0
  num_expected_insts: 500000
Translation:
  impl: RandomTranslation

MemorySystem:
  impl: GenericDRAM
  total_num_other_requests: 0
  total_num_write_requests: 0
  total_num_read_requests: 6
  memory_system_cycles: 81306
DRAM:
  impl: DDR4
AddrMapper:
  impl: RoBaRaCoch

Controller:
  impl: Generic
  id: channel_0
  avg_read_latency_0: 46.5
  read_queue_len_avg_0: 0.00232455181
  write_queue_len_0: 0
  queue_len_0: 245
  num_other_reqs_0: 0
  num_write_reqs_0: 0
  read_latency_0: 279
  priority_queue_len_avg_0: 0.000688756059
  row_hits_0: 2
  priority_queue_len_0: 56
  row_misses_0: 4
  row_conflicts_0: 0
  read_row_misses_0: 4
  queue_len_avg_0: 0.00301330769
  read_row_conflicts_core_0: 0
  read_row_hits_0: 2
  write_queue_len_avg_0: 0
  read_row_conflicts_0: 0
  write_row_misses_0: 0
  write_row_conflicts_0: 0
  read_queue_len_0: 189
  write_row_hits_0: 0
  read_row_hits_core_0: 2
  read_row_misses_core_0: 4
  num_read_reqs_0: 6
Scheduler:
  impl: FRFCFS
RefreshManager:
  impl: AllBank

RowPolicy:
  impl: closedRowPolicy
  num_close_reqs: 0

52 const Simple03Core::Trace::Inst& Simple03Core::Trace::get_next_inst(
53   const Inst& inst = m_trace[m_curr_trace_idx];
54   m_curr_trace_idx = (m_curr_trace_idx + 1) % m_trace_length;
55   return inst;
56 }
```

### 1. Frontend

- Last Level Cache(LLC) 분석
  - Core 0의 Memory(Cache 포함) 접근 Cycles
  - LLC의 Read Request Cache Miss
  - LLC의 Read Access 수
  - Miss Status Handling Register(MSHR) unavailable cnt  
→ MSHR가 가득 차서 Request 거부된 횟수

짧은 trace지만 반복 실행 되기에 많은 access 발생

# trace file result

## □ power는 어떻게 측정하느냐면

- yaml file에 아래 drampower\_enable 옵션을 넣음
- dram.h -> DDR4.cpp 의 power함수 enable됨

```
17  DRAM:  
18    impl: DDR4  
19    org:  
20      preset: DDR4_8Gb_x8  
21      channel: 1  
22      rank: 2  
23    timing:  
24      preset: DDR4_2400R  
25      drampower_enable: true  
26      # power_debug: true  # option: debug log  
27    voltage:  
28      preset: Default  # option: voltage preset  
29    current:  
30      preset: Default  # option: current preset  
31
```

```
DRAM:  
impl: DDR4  
active_cycles_rank1: 9361  
pre_background_energy_rank1: 3850.965247500003  
total_background_energy_rank0: 4424.017184999997  
idle_cycles_rank1: 68489  
total_cmd_energy: 3203.034273599993  
total_cmd_energy_rank0: 1601.876992799997  
total_energy_rank0: 6025.894177799989  
act_background_energy_rank0: 572.1522975000003  
pre_background_energy_rank0: 3851.864887499999  
active_cycles_rank0: 9345  
act_background_energy_rank1: 573.1319055000002  
total_energy: 12051.1486116  
idle_cycles_rank0: 68505  
total_energy_rank1: 6025.254433800007  
total_background_energy_rank1: 4424.097153000006  
total_background_energy: 8848.114337999994  
total_cmd_energy_rank1: 1601.157280799996  
AddrMapper:  
impl: RoBaRaCoCh
```

# trace file result

## □ power는 어떻게 측정하느냐면

- power\_debug option enable 시
- 오른쪽과 같은 power log 확인 가능

```
17  ✓  DRAM:  
18      impl: DDR4  
19  ✓  org:  
20          preset: DDR4_8Gb_x8  
21          channel: 1  
22          rank: 2  
23  ✓  timing:  
24          preset: DDR4_2400R  
25          drampower_enable: true  
26          # power_debug: true # option: debug log  
27  ✓  voltage:  
28          preset: Default # option: voltage preset  
29  ✓  current:  
30          preset: Default # option: current preset  
31
```

```
● root@947e591ed45c:/workspace# ./ramulator2 -f ./example_config_power_en.yaml  
[Power] Rank0 -----ACT----- @ 19  
[Power] Rank0 Rank is idle. idle_cycles: 19 active_start_cycle: 19 @ 19  
[Power] Rank0 Incrementing ACT counter. @ 19  
[Power] Rank1 -----ACT----- @ 20  
[Power] Rank1 Rank is idle. idle_cycles: 20 active_start_cycle: 20 @ 20  
[Power] Rank1 Bank2 Incrementing ACT counter. @ 20  
[Power] Rank0 -----ACT----- @ 23  
[Power] Rank0 Bank3 Incrementing ACT counter. @ 23  
[Power] Rank1 -----ACT----- @ 24  
[Power] Rank1 Bank3 Incrementing ACT counter. @ 24  
[Power] Rank0 Bank0 Incrementing RD counter. @ 35  
[Power] Rank0 Bank3 Incrementing RD counter. @ 39  
[Power] Rank0 Bank0 Incrementing RD counter. @ 43  
[Power] Rank1 Bank2 Incrementing RD counter. @ 49  
[Power] Rank1 Bank3 Incrementing RD counter. @ 53  
[Power] Rank0 Bank0 Incrementing RD counter. @ 59  
[Power] Rank0 -----PREA----- @ 9364  
[Power] Rank0 Incrementing PRE counter. @ 9364  
[Power] Rank0 Rank is not idle. active_cycles: 9345 idle_start_cycle: 9364 @ 9364  
[Power] Rank0 -----REFab----- @ 9380  
[Power] Rank0 Refresh starts. idle_cycles: 35 @ 9380  
[Power] Rank1 -----PREA----- @ 9381  
[Power] Rank1 Incrementing PRE counter. @ 9381  
[Power] Rank1 Rank is not idle. active_cycles: 9361 idle_start_cycle: 9381 @ 9381  
[Power] Rank1 -----REFab----- @ 9397  
[Power] Rank1 Refresh starts. idle_cycles: 36 @ 9397  
[Power] Rank0 -----REFab_end----- @ 9812  
[Power] Rank0 Refresh ends. idle_start_cycle: 9812 @ 9812  
[Power] Rank1 -----REFab_end----- @ 9829  
[Power] Rank1 Refresh ends. idle_start_cycle: 9829 @ 9829  
[Power] Rank0 -----REFab----- @ 18728  
[Power] Rank0 Refresh starts. idle_cycles: 8951 @ 18728  
[Power] Rank1 -----REFab----- @ 18729  
[Power] Rank1 Refresh starts. idle_cycles: 8936 @ 18729  
[Power] Rank0 -----REFab_end----- @ 19160  
[Power] Rank0 Refresh ends. idle_start_cycle: 19160 @ 19160  
[Power] Rank1 -----REFab_end----- @ 19161  
[Power] Rank1 Refresh ends. idle_start_cycle: 19161 @ 19161  
[Power] Rank0 -----REFab----- @ 28092
```

# **src/frontend folder**

## **□ frontend**

- abcd

# **src/frontend folder**

## **□ frontend**

- abcd

# **src/frontend folder**

## **□ frontend**

- abcd

# **src/memory\_system folder**

## **❑ memory\_system**

- abcd

# **src/memory\_system folder**

## **❑ memory\_system**

- abcd

# **src/addr\_mapper folder**

## **□ addr\_mapper**

- abcd

# **src/addr\_mapper folder**

## **□ addr\_mapper**

- abcd

# **src/translation folder**

## **□ translation**

- abcd

# **src/translation folder**

## **□ translation**

- abcd

# **src/translation folder**

## **□ translation**

- abcd

# src/dram folder

□ **dram**

- **abcd**

# src/dram folder

□ **dram**

- **abcd**

# src/dram folder

□ **dram**

- **abcd**

# **src/dram\_controller folder**

## **❑ dram\_controller**

- abcd

# src/base folder

□ **base**

- **abcd**