



- Semaphores can be used to handle e.g. shared resources in concurrent programming, whereas the cache coherence problem concerns a problem that occurs in multicore systems with caches. (Mutex is a lock)
- Hardware threads are scheduled by the hardware and not by the operating system.
  - Hyper-threading is another name for simultaneous multithreading, whereas VLIW is a form of multiplexing.
  - SIMD is used when there is data-level parallelism. SIMD has been and is still a very common way to solve control hazards. It represents all single core processors with one instruction stream that operates on one stream of data.
  - In data-level parallelism, each instruction operates on several data items.
  - AVX is standard for SIMD instructions and has nothing to do with hardware multithreading.
  - False sharing means that two different threads use the same cache line, which results in invalidation of the cache. It has nothing to do with virtual memories. MapReduce is a programming model, typically used in distributed computing. VLIW is used as way to achieve instruction level parallelism (ILP).
  - A mutex is useful to avoid that several different threads in a multithreaded program access the same resource. Superlinear speedup is very uncommon, and such results often means that something is wrong in the measurement. The term "super" in superscalar processors is not related to the term "super" in superlinear speedup. Typically the compiler performs scheduling for VLIW processors statically, before the program is executed. MISD is not used very much, because the meaning of multiple instructions operating on a single data item is unclear in the setting of parallelism. It has nothing to do with instruction-level parallelism.
  - Registers renaming is used for solving hazards due to out-of-order-execution. Today it's very common with many more pipeline stages e.g. 14 pipeline stages for Intel core i5.
  - AVX is used for data-level parallelism. Fine-grained multithreading is not related to the cache coherence problem. The cache coherence problem appears in a multicore processor with separate caches. Weak scaling means that the data size is also increased when the number of processors/cores are increased. Hyper-threading is intel's name and implementation of simultaneous multithreading (SMT). ILP is very common in modern processors. It gives good performance gain, compared to not using (ILP). A superscalar processor performs the scheduling in hardware. Hardware multithreading means that the processor switches between different hardware threads. The technique does not fetch multiple instructions. Instead, the technique can improve performance by hiding latencies. Moore's law still holds (for now). However, it does not concern the power problem, it states that the number of transistors double every 18-24 months. SIMD instructions can also greatly affect performance. A multicore processor utilizes task-level parallelism. Each core has a different instruction stream (not the same instruction stream as in data-level parallelism).
  - Semaphores are important in programs with ILP. False, Semaphores are important in programs with Task-level parallelism. To achieve the best performance, a processor designer must set the pipeline length so that both clock-freq and branch miss-prediction penalty become acceptable. True.
  - A cluster is a set of computers that are connected to a shared memory. False, The computers in cluster are connected to each other over a local area network (LAN). The Snooping Protocol is a common of the MapReduce programming model. False, the Snooping Protocol is a common solution to the cache coherence problem. Branch Prediction is important for handling data-hazards in a pipelined processor. False, Branch Prediction is important for handling control hazards in a pipeline processor. A processor out-of-order execution must solve hazards due to write-after-read (WAR) dependences. True.
  - Strong scaling means that when more processor cores are available, the problem size is increased so that the critical still executes in approximately the same amount of time. False, the description actually matches weak scaling. Data-level parallelism means that multiple threads operate on the same data simultaneously. False (DLP) means that multiple elements of a 1. kolla var machine code valuer intek address.

Period of 35  $\Rightarrow$  Hz = 1/35ns  $\approx$  29 MHz  
 Under att följer nedan  
 0ms  $\rightarrow$  Hz: ms. Hz = 1000  
 W har varit efter minnan...  
 16 bit timer = 65536 = period register Max  
 32 bit timer = 4 294 967 296 = period register.  
 100MHz = 100 000 000, tex 1:4 = prescale=4  
 (clock frequency (MHz)) / (prescale \* period register) = Hz

|        | Reg         | Reg     | Alu     | Br     | Mem     | SW    | ALU |
|--------|-------------|---------|---------|--------|---------|-------|-----|
| OP     | Writ DS     | Writ DS | Writ DS | See DS | Writ DS | to DS | OP  |
| r-type | 0 0 0 0 0 0 | 1       | 1       | 0      | 0       | 0     | 0   |
| lw     | 1           | 0       | 0       | 1      | 1       | 0     | 1   |
| sw     | 1           | 0       | 1       | 0      | 1       | ?     | 0   |
| beq    | 0           | 0       | 0       | 1      | 0       | 0     | 1   |
| addi   | 0           | 0       | 1       | 0      | 0       | 1     | 0   |
| sl     | 0           | 0       | 0       | 1      | 0       | ?     | ?   |

|                                                                                                              |
|--------------------------------------------------------------------------------------------------------------|
| Vid LW instruction output ALU exkluderar SW instruction.                                                     |
| Vid LW-instruktion s värdet ut efter memtoreg = 1, exkluderar W=SW register manipulation efter SW, tex addi. |
| tex A1 $\rightarrow$ RDI = ut signal blir värdet på registrer 25-21                                          |
| Zero extend = lägg tillid på 0:0.1 $\rightarrow$ 32bit ut                                                    |
| Sign extend = lägg till fler msb bitar om 0 där 0 är 1bit.                                                   |
| F2:0 Funct + adressengels vid clk sw +4 1 gång för in                                                        |
| Mer än 2 rader = ingen data hazard                                                                           |
| Control hazards löses bara av ställning han alltid lösa data hazard                                          |
| Data hazards = read after write. $\Rightarrow$ W $\rightarrow$ E                                             |
| tex addi kan forwardas från DS till DS                                                                       |
| minnesinstruction = kan forwardas vid skjutning                                                              |
| Register som inte klar = Data hazard                                                                         |
| Branch aktiveras = control hazard                                                                            |
| alla instruktioner med label = control hazard                                                                |