

























## ° If a miss, we might have to choose a block ot be replaced

- Random:
  - Any block is randomly selected for replacement providing uniform allocation.
  - Simple to build in hardware.
  - The most widely used cache replacement strategy.
- Least-recently used (LRU):
  - Accesses to blocks are recorded and the block replaced is the one that was not used for the longest period of time.
  - Full LRU is *expensive* to implement, as the number of blocks to be tracked increases, and is usually approximated by block usage bits that are cleared at regular time intervals.

Fall 2009

| 0     |                                       | 4                                                                |                                                                                              | 0                                                                                                                                                                                                                                                         |                                                                                                                                                                                                                                                                                 |
|-------|---------------------------------------|------------------------------------------------------------------|----------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 2-V   | Vay                                   | 4-0                                                              | /ay<br>Dendem                                                                                | 8-W                                                                                                                                                                                                                                                       | Bondom                                                                                                                                                                                                                                                                          |
|       | Random                                |                                                                  |                                                                                              |                                                                                                                                                                                                                                                           | Kandon<br>A ocov                                                                                                                                                                                                                                                                |
| 5.18% | 5.69%                                 | 4.67%                                                            | 5.29%                                                                                        | 4.39%                                                                                                                                                                                                                                                     | 4.96%                                                                                                                                                                                                                                                                           |
| 1.88% | 2.01%                                 | 1.54%                                                            | 1.66%                                                                                        | 1.39%                                                                                                                                                                                                                                                     | 1.53%                                                                                                                                                                                                                                                                           |
| 1.15% | 1.17%                                 | 1.13%                                                            | 1.13%                                                                                        | 1.12%                                                                                                                                                                                                                                                     | 1.12%                                                                                                                                                                                                                                                                           |
|       |                                       |                                                                  |                                                                                              |                                                                                                                                                                                                                                                           |                                                                                                                                                                                                                                                                                 |
|       |                                       |                                                                  |                                                                                              |                                                                                                                                                                                                                                                           |                                                                                                                                                                                                                                                                                 |
|       |                                       |                                                                  |                                                                                              |                                                                                                                                                                                                                                                           |                                                                                                                                                                                                                                                                                 |
|       | CSE                                   | -4201                                                            |                                                                                              |                                                                                                                                                                                                                                                           |                                                                                                                                                                                                                                                                                 |
|       | 2-v<br>LRU<br>5.18%<br>1.88%<br>1.15% | 2-way<br>LRU Random<br>5.18% 5.69%<br>1.88% 2.01%<br>1.15% 1.17% | 2-way 4-w<br>LRU Random LRU 1<br>5.18% 5.69% 4.67%<br>1.88% 2.01% 1.54%<br>1.15% 1.17% 1.13% | 2-way         4-way           LRU         Random         LRU         Random           5.18%         5.69%         4.67%         5.29%           1.88%         2.01%         1.54%         1.66%           1.15%         1.17%         1.13%         1.13% | 2-way       4-way       8-w         LRU       Random       LRU       Random       LRU         5.18%       5.69%       4.67%       5.29%       4.39%         1.88%       2.01%       1.54%       1.66%       1.39%         1.15%       1.17%       1.13%       1.13%       1.12% |











|                |                   | 2              |                |
|----------------|-------------------|----------------|----------------|
| Size           | Instruction cache | Data cache     | Unified cache  |
| 1 KB           | 3.06%             | 24.61%         | 13.34%         |
| 2 KB           | 2.26%             | 20.57%         | 9.78%          |
| 4 KB           | 1.78%             | 15.94%         | 7.24%          |
| 8 KB           | 1.10%             | 10.19%         | 4.57%          |
| 16 KB          | 0.64%             | 6.47%          | 2.87%          |
| 32 KB          | 0.39%             | 4.82%          | 1.99%          |
| 64 KB          | 0.15%             | 3.77%          | 1.35%          |
| 128 KB         | 0.02%             | 2.88%          | 0.95%          |
| 54 KB<br>28 KB | 0.15% 0.02%       | 3.77%<br>2.88% | 1.35%<br>0.95% |





| ite Policy             |                                                              |
|------------------------|--------------------------------------------------------------|
| Write Allo             | ocate:                                                       |
| The cach followed      | e block is loaded on a write miss<br>I by write hit actions. |
| No-Write               | Allocate:                                                    |
| The block<br>(lower ca | t is modified in the lower level<br>ache level, or main      |
| memory)                | and not loaded into cache.                                   |

| ° Which has<br>instruction<br>(0.64%, 6.4 <sup>°</sup> | a lower miss rate 16KB cache for both<br>or data, or a combined 32KB cache?<br>7%, 1.99%). |
|--------------------------------------------------------|--------------------------------------------------------------------------------------------|
| ° Assume hit<br>memory re                              | t=1cycle and miss =50 cycles. 75% of ferences are instruction fetch.                       |
| ° Miss rate of                                         | split cache=0.75*0.64%+0.25*6.47%=2.1%                                                     |
| ° Slightly wors<br>what about                          | se than 1.99% for combined cache. But, average memory access time?                         |
| ° Split cache:<br>2.05 cycles.                         | 75%(1+0.64%*50)+25%(1+6.47%*50) =                                                          |
| ° Combined c                                           | Extra cycle for load/store                                                                 |



| ° A CPU w<br>with <u>writ</u><br>probabil | ith CPI <sub>exe</sub><br><u>e back</u> , wi<br>ty a cache | <sub>cution</sub> = 1.1 u<br>ith <u>write allo</u><br>e block is di | ses a unified<br><u>cate</u> , and the<br><u>rty = 10%</u> | d L1 with<br>e |
|-------------------------------------------|------------------------------------------------------------|---------------------------------------------------------------------|------------------------------------------------------------|----------------|
| ° Instructionstore, 20                    | on mix: 5<br>% control                                     | 50% arith/log                                                       | jic, 15% loa                                               | d, 15%         |
| ° Assume<br>penalty o<br><u>}</u>         | a cache n<br>of 50 cycle                                   | niss rate of 1<br>es.<br>which in s                                 | I.5% and a n                                               | niss<br>504    |
| 1.5                                       | 1.3*                                                       | 50 • 0.                                                             | 9 +1 ·3 ×                                                  | 0.\ 100        |

























| t the "way,"            |
|-------------------------|
|                         |
| y 1 tag<br>with reading |
| clock cycle             |
|                         |
|                         |
| t                       |















## **Merging Arrays**



| ° Ass     | ume row-major matrix allocation                                                              |
|-----------|----------------------------------------------------------------------------------------------|
| /* Be     | efore */                                                                                     |
| for       | (j = 0; j < 100; j = j+1)                                                                    |
|           | for $(i = 0; i < 5000; i = i+1)$                                                             |
|           | x[i][j] = 2 * x[i][j];                                                                       |
| /* A1     | Eter */                                                                                      |
| for       | (i = 0; i < 5000; i = i+1)                                                                   |
|           | <pre>for (j = 0; j &lt; 100; j = j+1)</pre>                                                  |
|           | x[i][j] = 2 * x[i][j];                                                                       |
| Se<br>men | equential accesses instead of striding through<br>hory every 100 words in this case improves |

## **Loop Fusion**

```
Blocking
     for (\underline{i} = 0; i < 12; i = i+1)
      for (j = 0; j < 12; j = j+1)
           {r = 0;}
             for (k = 0; k < 12; k = k+1)
                  r = r + y[i][k]*z[k][j];;
            x[i][j] = r;
            };
     /* After Blocking */
     for (jj = 0; jj < N; jj = jj+B)
     for (kk = 0; kk < N; kk = kk+B)
     for (i = 0; i < N; i = i+1)
       for (j = jj; j < min(jj+B,N); j = j+1)</pre>
            {r = 0;}
             for (k = kk; k < min(kk+B,N); k = k+1) {
                   r = r + y[i][k]*z[k][j];};
             x[i][j] = x[i][j] + r;
           };
   Fall 2009
                               CSE4201
```