

2010-06-10 2010年 前学期 TOKYO TECH

## 計算機アーキテクチャ 第一 (E)

### 8. メモリ3: 半導体メモリシステム

吉瀬 謙二 計算工学専攻  
kise\_at\_cs.titech.ac.jp  
W641講義室 木曜日13:20 ~ 14:50

## 講義用の計算機環境

- 講義用の計算機
  - 131.112.16.56
  - ssh [arche@131.112.16.56](ssh://arche@131.112.16.56)
    - ユーザ名: arche
    - パスワードは講義時に連絡
  - mkdir myname (例: mkdir 06B77777)
  - cd myname (例: cd 06B77777)
- 注意点
  - 計算機演習室からは外部にsshで接続できないかもしれません。
  - WindowsからはTera Term Proなどを利用してください。

Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005

### Sample program

```
#include <stdio.h>
int main(){
    int i;
    int sum = 0;

    for(i=1; i<=100; i++) sum += i;

    return sum;
}

mipsel-linux-gcc -O0 -S main.c -o main_opt0.s
/home/share/cad/mipsel/usr/bin/mipsel-linux-gcc
```

3

Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005

### Sample program

```
#include <stdio.h>
int main(){
    int i;
    int sum = 0;

    for(i=1; i<=100; i++)
        sum += i;

    return sum;
}

mipsel-linux-gcc -O0 -S main.c -o main_opt0.s
/home/share/cad/mipsel/usr/bin/mipsel-linux-gcc
```

4

Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005

### Sample program

```
#include <stdio.h>
int main(){
    int i;
    int sum = 0;

    for(i=1; i<=100; i++)
        sum += i;

    return sum;
}
```

mipsel-linux-objdump -d ./a.out

```
00000000 <main>:
  270dffe8: addiu   $sp,$sp,-24
  4005c1:  a7fe0f10  srl    $s1,16($sp)
  4005c3:  03a9ff021  move   $s5,$sp
  4005c5:  00000000  addiu  $sp,$sp,0($sp)
  4005d1:  24020001  li     $v0,1
  4005d4:  a7c2000c  srl    $v0,12($sp)
  4005d8:  1000000a  addiu  $v0,$v0,100
  4005e1:  00000000  nop
  4005e4:  8fc30008  lw     $v1,8($sp)
  4005e6:  8fc20008  lw     $v0,12($sp)
  4005e8:  00000000  nop
  4005e9:  00000000  addiu  $v0,$v1,0
  4005f0:  a7c20008  srl    $v0,12($sp)
  4005f4:  8fc2000c  lw     $v0,12($sp)
  4005f8:  00000000  nop
  4005f9:  24020001  addiu  $v0,$v0,1
  400600:  a7c2000c  srl    $v0,12($sp)
  400604:  8fc2000c  lw     $v0,12($sp)
  400608:  00000000  nop
  400609:  28000000  sll    $v0,$v0,101
  400610:  1449ff73  bnez   $v0,4005e0  .L2+0x20>
  400614:  00000000  nop
  400618:  8fc20008  lw     $v0,8($sp)
  40061c:  03a9ff021  move   $sp,$sp
  400620:  03a9ff020  move   $sp,$sp
  400624:  27b60018  addiu  $r0,$sp,24
  400628:  03e00008  jr     $ra
  40062c:  00000000  nop
```

5

Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005

### Sample program

```
main:
    .frame  $sp,0,$31
    .mask   0x00000000,0
    .fmask  0x00000000,0
    .set    noreorder
    .set    nomacro

    j      $31
    li    $2,5050

# Makefile
all:
    mipsel-linux-gcc -O0 -S main.c -o main_opt0.s
    mipsel-linux-gcc -O1 -S main.c -o main_opt1.s
    mipsel-linux-gcc -O2 -S main.c -o main_opt2.s
    mipsel-linux-gcc -O3 -S main.c -o main_opt3.s
```

6

Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005

## Exercise 2

```

void swap (int v[], int k) {
    int temp;
    temp = v[k];
    v[k] = v[k+1];
    v[k+1] = temp;
}

void max (int v[], int n) {
    int i;
    for (i = 1; i < n; i +=1) {
        if (v[i-1] > v[i]) swap(v, i-1);
    }
}

```

Adapted from *Computer Organization and Design*, Patterson & Hennessy, © 2005

7

## レポート 問題

1. `int add (int a, int b) { return a + b; }`  
をクロスコンパイラにてMIPS命令セットにコンパイルし、コンパイルオプションによってどのように変化するかをまとめよ。
2. `swap (int v[], int k)`  
をクロスコンパイラにてMIPS命令セットにコンパイルし、コンパイルオプションによってどのように変化するかをまとめよ。
3. `void max (int v[], int n)`  
をクロスコンパイラにてMIPS命令セットにコンパイルし、コンパイルオプションによってどのように変化するかをまとめよ。
4. 同様に、サンプルアプリケーションを作成し、それをクロスコンパイラにてMIPS命令セットにコンパイルし、コンパイルオプションによってどのように変化するかをまとめよ。
5. この課題の感想をまとめること。
6. レポートはA4用紙2枚以内にまとめる事。(必ずPDFとすること)  
(2段組、コードは小さい文字でもかまわない。)

## レポート 提出方法

- 6月18日(午後7時)までに電子メールで提出
  - 人よりも先に提出している(先願性)と高得点
  - `report10a_at_arch.cs.titech.ac.jp`
- 電子メールのタイトル
  - Arch Report [学籍番号]
  - 例 : Arch Report [33\_77777]
- 電子メールの内容
  - 氏名, 学籍番号
  - 回答
    - PDFファイルを添付(必ずPDFとすること)
    - PDFファイルにも氏名, 学籍番号を記入すること。
    - A4用紙で2枚以内にまとめる事。

## Acknowledgement

- *Lecture slides* for Computer Organization and Design, Third Edition, courtesy of **Professor Mary Jane Irwin**, Penn State University
- *Lecture slides* for Computer Organization and Design, third edition, Chapters 1-9, courtesy of **Professor Tod Amon**, Southern Utah University.

Adapted from *Computer Organization and Design*, Patterson & Hennessy, © 2005

## Synchronous DRAM (SDRAM) Operation



## Other DRAM Architectures

- Double Data Rate SDRAMs – **DDR-SDRAMs** (and DDR-SRAMs)
  - Double data rate because they transfer data on both the rising and falling edge of the clock
  - Are the most widely used form of SDRAMs
- **DDR2-SDRAMs**
  - 4 data prefetch
- **DDR3-SDRAMs**
  - 8 data prefetch



### One Word Wide Memory Organization, con't

- What if the block size is **four words** and if a **page mode DRAM** is used?
  - 1 cycle to send 1st address
  - $25 + (3 * 8) = 49$  cycles to read DRAM
  - 1 cycle to return last data word
  - 51 total clock cycles** miss penalty
- Number of bytes transferred per clock cycle (**bandwidth**) for a single miss
  - $(4 \times 4) / 51 = 0.314$  bytes per clock

### Interleaved(インターリーブ) Memory Organization

- For a block size of **four words** with **interleaved memory (4 banks)**
  - 1 cycle to send 1st address
  - $25 + 3 = 28$  cycles to read DRAM
  - 1 cycle to return last data word
  - 30 total clock cycles** miss penalty
- Number of bytes transferred per clock cycle (**bandwidth**) for a single miss
  - $(4 \times 4) / 30 = 0.533$  bytes per clock

2010-06-10 2010年 前学期 TOKYO TECH

## 計算機アーキテクチャ 第一 (E)

### キャッシュシステム

吉瀬 謙二 計算工学専攻  
kise\_at\_cs.titech.ac.jp  
W641講義室 木曜日13:20 – 14:50

### The Memory Hierarchy: Why Does it Work?

- Temporal Locality** (時間的局所性, Locality in Time):
  - Keep **most recently accessed** data items closer to the processor
- Spatial Locality** (空間的局所性, Locality in Space):
  - Move blocks consisting of **contiguous words** to the upper levels

16

### Cache

- Two questions to answer (in hardware):
  - Q1: **How do we know if a data item is in the cache?**
  - Q2: **If it is, how do we find it?**
- Direct mapped**
  - For each item of data at the lower level, there is exactly one location in the cache where it might be - so lots of items at the lower level must **share** locations in the upper level
  - Address mapping: **(block address) modulo (# of blocks in the cache)**
  - First, consider block sizes of **one word**

17

### Caching: A Simple First Example

Main Memory

|        |
|--------|
| 0000xx |
| 0001xx |
| 0010xx |
| 0011xx |
| 0100xx |
| 0101xx |
| 0110xx |
| 0111xx |
| 1000xx |
| 1001xx |
| 1010xx |
| 1011xx |
| 1100xx |
| 1101xx |
| 1110xx |
| 1111xx |

Cache

| Index | Valid | Tag | Data |
|-------|-------|-----|------|
| 00    |       |     |      |
| 01    |       |     |      |
| 10    |       |     |      |
| 11    |       |     |      |

Two low order bits define the byte in the word (32-b words)

Q1: Is it there?

Compare the cache tag to the **high order 2 memory address bits** to tell if the memory block is in the cache

Q2: How do we find it?

Use **next 2 low order memory address bits** – the **index** – to determine which cache block

(block address) modulo (# of blocks in the cache)

18

### MIPS Direct Mapped Cache Example

- One word/block, cache size = 1K words

What kind of locality are we taking advantage of?

### Direct Mapped Cache

- Consider the main memory word reference string

Start with an empty cache - all blocks initially marked as not valid

|           |           |           |           |
|-----------|-----------|-----------|-----------|
| 0 miss    | 1 miss    | 2 miss    | 3 miss    |
| 00 Mem(0) | 00 Mem(1) | 00 Mem(0) | 00 Mem(0) |
| 00 Mem(1) | 00 Mem(2) | 00 Mem(1) | 00 Mem(1) |
| 00 Mem(2) | 00 Mem(3) | 00 Mem(2) | 00 Mem(2) |
| 00 Mem(3) |           | 00 Mem(3) | 00 Mem(3) |

4 miss      3 hit      4 hit      15 miss

8 requests, 6 misses

### Exercise

- Consider the main memory word reference string
  - 3, 2, 18, 3, 16, 2, 3, 18, 3

|     |        |
|-----|--------|
| Tag | 3 miss |
|     |        |
|     |        |
| 000 | Mem(3) |

9 requests, ? misses

9 requests, ? misses

### Another Reference String Mapping

- Consider the main memory word reference string

|           |           |           |           |
|-----------|-----------|-----------|-----------|
| 0 miss    | 4 miss    | 0 miss    | 0 miss    |
| 00 Mem(0) | 00 Mem(4) | 01 Mem(0) | 01 Mem(4) |
| 00 Mem(1) | 00 Mem(5) | 01 Mem(1) | 01 Mem(5) |
| 00 Mem(2) | 00 Mem(6) | 01 Mem(2) | 01 Mem(6) |
| 00 Mem(3) | 00 Mem(7) | 01 Mem(3) | 01 Mem(7) |

0 miss      4 miss      0 miss      0 miss

8 requests, 8 misses

- Ping pong effect due to **conflict** misses - two memory locations that map into the same cache block

### MIPS Direct Mapped Cache Example

- One word/block, cache size = 1K words

What kind of locality are we taking advantage of?

### Multiword Block Direct Mapped Cache

- Four words/block, cache size = 1K words

What kind of locality are we taking advantage of?



26

25



27



28



29



30



### Cache so far

- The Principle of Locality:
  - Program likely to access a relatively small portion of the address space at any instant of time
    - Temporal Locality:** Locality in Time
    - Spatial Locality:** Locality in Space
- Three major categories of cache misses:
  - Compulsory misses:** sad facts of life. Example: cold start misses
  - Conflict misses:** increase cache size and/or associativity  
Nightmare Scenario: ping pong effect!
  - Capacity misses:** increase cache size
- Cache design space
  - total size, block size, **associativity** (replacement policy)
  - write-hit policy (write-through, write-back)
  - write-miss policy (write allocate, write buffers)

33