

## 計算機アーキテクチャ 第一 (E)

### 6. メモリ2: 半導体メモリシステム, ファイルメモリシステム

吉瀬 謙二 計算工学専攻  
kise\_at\_cs.titech.ac.jp  
W641講義室 木曜日 13:20 – 14:50

## Acknowledgement

- Lecture slides for Computer Organization and Design, Third Edition, courtesy of **Professor Mary Jane Irwin**, Penn State University
- Lecture slides for Computer Organization and Design, third edition, Chapters 1-9, courtesy of **Professor Tod Amon**, Southern Utah University.

Adapted from *Computer Organization and Design*, Patterson & Hennessy, © 2005

## The Memory Hierarchy Goal

- **Fact:**  
Large memories are slow and fast memories are small
- How do we create a memory that gives the illusion of being large, cheap and fast ?
  - With **hierarchy** (階層)
  - With **parallelism** (並列性)

## A Typical Memory Hierarchy



## DRAM (dynamic random access memory)



## SRAM (static random access memory)







### DRAM Memory Latency & Bandwidth Milestones

|                             | DRAM       | Page DRAM  | FastPage DRAM | FastPage DRAM | Synch DRAM | DDR SDRAM   |
|-----------------------------|------------|------------|---------------|---------------|------------|-------------|
| Module Width                | 16b        | 16b        | 32b           | 64b           | 64b        | 64b         |
| Year                        | 1980       | 1983       | 1986          | 1993          | 1997       | 2000        |
| Mb/chip                     | 0.06       | 0.25       | 1             | 16            | 64         | 256         |
| Die size (mm <sup>2</sup> ) | 35         | 45         | 70            | 130           | 170        | 204         |
| Pins/chip                   | 16         | 16         | 18            | 20            | 54         | 66          |
| <b>BWidth (MB/s)</b>        | <b>13</b>  | <b>40</b>  | <b>160</b>    | <b>267</b>    | <b>640</b> | <b>1600</b> |
| <b>Latency (nsec)</b>       | <b>225</b> | <b>170</b> | <b>125</b>    | <b>75</b>     | <b>62</b>  | <b>52</b>   |

Patterson, CACM Vol 47, #10, 2004

- In the time that the memory to processor bandwidth doubles the memory latency improves by a factor of only 1.2 to 1.4
- To deliver such high bandwidth, the internal DRAM has to be organized as interleaved memory banks



### One Word Wide Memory Organization, con't

- What if the block size is **four words**?

- 1 cycle to send 1st address
- $4 * 25 = 100$  cycles to read DRAM
- 1 cycle to return last data word
- 102 total clock cycles miss penalty**



- Number of bytes transferred per clock cycle (**bandwidth**) for a single miss

- $(4 \times 4) / 102 = 0.157$  bytes per clock

### One Word Wide Memory Organization, con't

- What if the block size is **four words** and if a **fast page mode DRAM** is used?

- 1 cycle to send 1st address
- $25 + (3 * 8) = 49$  cycles to read DRAM
- 1 cycle to return last data word
- 51 total clock cycles miss penalty**



- Number of bytes transferred per clock cycle (**bandwidth**) for a single miss

- $(4 \times 4) / 51 = 0.314$  bytes per clock

### Interleaved(インターリープ) Memory Organization

- For a block size of **four words with interleaved memory (4 banks)**

- 1 cycle to send 1st address
- $25 + 3 = 28$  cycles to read DRAM
- 1 cycle to return last data word
- 30 total clock cycles miss penalty**



- Number of bytes transferred per clock cycle (**bandwidth**) for a single miss

- $(4 \times 4) / 30 = 0.533$  bytes per clock

### アナウンス

- 講義スライドおよびスケジュール

- [www.arch.cs.titech.ac.jp](http://www.arch.cs.titech.ac.jp)
- 講義日程が変更になることがあるので頻繁に確認すること。