| M< | Computer Architecture<br>A Quantitative Approach, Sixth Edition                            |
|----|--------------------------------------------------------------------------------------------|
|    | Chapter 3<br>Instruction-Level Parallelism<br>and Its Exploitation – Dynamic<br>Scheduling |
| M< | Copyright © 2019, Elsevier Inc. All rights Reserved 1                                      |





| Dy  | namic Scheduling                                                                                                                                                                                | Dynam          |
|-----|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------|
| • E | Example 2:<br>fdiv.d f0,f2,f4<br>fmul.d f6,f0,f8<br>fadd.d f0,f10,f14<br>• fadd.d is not dependent, but the antidependence<br>makes it impossible to issue earlier without register<br>renaming | nic Scheduling |
| M<  | Copyright © 2019, Elsevier Inc. All rights Reserved                                                                                                                                             | 4              |











































|         |      |       |          |        |          | Reorder    | uffer      |            |     |        |        |     |
|---------|------|-------|----------|--------|----------|------------|------------|------------|-----|--------|--------|-----|
| Entry   | Busy | . In  | structio | n      |          | State      | D          | estination | Val |        |        |     |
| 1       | No   | £     | ld       | f6.3   | 2(x2)    | Commit     | f          | 6          | Mer | 1132+  | Regsl  | x2] |
| 2       | No   | 4     | 1d       | 12.4   | 4(x3)    | Commit     | . f.       | 2          | Mar | 1[44+  | Regsl  | x3] |
| 3       | Yes  | fi    | b.fum    | f0.1   | 2.14     | Write rest | ti T       | 0          | #2  | e Regi | (f4]   |     |
| 4       | Yes  | 1     | sub.d    | fö.    | 2.16     | Write rest | is f       | 8          | #2- | #1     |        |     |
| 5       | Yes  | fi    | div.d    | f0,1   | 0.16     | Execute    | 1          | 0          |     |        |        |     |
| 6       | Yes  | f     | 601.d    | f6,1   | 8,12     | Write res  | it f       | 6          | 14  | 12     |        | _   |
|         | 22-  |       |          |        |          | eservation | stations   |            |     |        |        | -   |
| Name    | Busy | Op    |          | vj     |          | Vk         |            |            | Q   | Qk     | Dest   |     |
| Load1   | No   |       |          |        |          |            |            |            |     |        |        | _   |
| Load2   | No   |       |          |        |          |            |            |            |     |        |        |     |
| Add1    | No   |       |          |        |          |            |            |            |     |        |        |     |
| Add2    | No   |       |          |        |          |            |            |            |     |        |        |     |
| Add3    | No.  |       |          |        |          |            |            |            |     |        |        |     |
| Mult1   | No   | - fru | 1.4      | Mem[44 | + Regilx | 311 Re     | 15[f4]     |            |     |        | 13     | _   |
| Mult2   | Yes  | fdi   | v.d      |        |          | Mer        | 1[32 + R   | egi[x2]}   | #3  |        | 15     | _   |
|         |      | _     |          |        |          | FP regis   | ter status |            |     |        |        | -   |
| Field   |      | 10    | n        | 12     | 8        | 54         | 15         | f6         | 17  | t      | 6      | n   |
| Reorder | ,    | 3     |          |        |          |            |            | 6          |     | 4      | 3      | 5   |
| Barrow  |      | Veri  | No       | No     | No       | No         | No         | Ves        |     | v      | Taba 1 | - V |







| Common<br>name               | Issue<br>structure  | Hazard                | Scheduling               | Distinguishing<br>characteristic                                          | Examples                                                                   |
|------------------------------|---------------------|-----------------------|--------------------------|---------------------------------------------------------------------------|----------------------------------------------------------------------------|
| Superscalar<br>(static)      | Dynamic             | Hardware              | Static                   | In-order execution                                                        | Mostly in the embedded<br>space: MIPS and ARM,<br>including the Cortex-A53 |
| Superscalar<br>(dynamic)     | Dynamic             | Hardware              | Dynamic                  | Some out-of-order<br>execution, but no<br>speculation                     | None at the present                                                        |
| Superscalar<br>(speculative) | Dynamic             | Hardware              | Dynamic with speculation | Out-of-order execution<br>with speculation                                | Intel Core i3, i5, i7; AMD<br>Phenom; IBM Power 7                          |
| VLIW/LIW                     | Static              | Primarily<br>software | Static                   | All hazards determined<br>and indicated by compiler<br>(often implicitly) | Most examples are in signal<br>processing, such as the TI<br>C6x           |
| EPIC                         | Primarily<br>static | Primarily<br>software | Mostly static            | All hazards determined<br>and indicated explicitly<br>by the compiler     | Itanium                                                                    |



















| Exa   | mple                                                                         |                                                                                                      | Dynamic                                     |
|-------|------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------|---------------------------------------------|
| Loop: | ld x2,0(x1)<br>addi x2,x2,1<br>sd x2,0(x1)<br>addi x1,x1,8<br>bne x2,x3,Loop | <pre>//x2=array element //increment x2 //store result //increment pointer //branch if not last</pre> | Scheduling, Multiple Issue, and Speculation |
| M<    | Copyright © 2019                                                             | , Elsevier Inc. All rights Reserved                                                                  | 28                                          |



| teration | Instru | ctions     | Issues at<br>clock cycle<br>number | Executes at<br>clock cycle<br>number | Memory access at<br>clock cycle<br>number | Write CDB at<br>clock cycle<br>number | Comment          |
|----------|--------|------------|------------------------------------|--------------------------------------|-------------------------------------------|---------------------------------------|------------------|
| 1        | 1d     | x2.0(x1)   | 1                                  | 2                                    | 3                                         | 4                                     | First issue      |
| 1        | addi   | x2,x2.1    | 1                                  | 5                                    |                                           | 6                                     | Wait for 1d      |
| 1        | sd     | x2.0(x1)   | 2                                  | 3                                    | 7                                         |                                       | Wait for addi    |
| 1        | addi   | x1.x1.8    | 2                                  | 3                                    |                                           | 4                                     | Execute directly |
| 1        | bne    | x2.x3.Loop | 3                                  | 7                                    |                                           |                                       | Wait for add1    |
| 2        | 1d     | x2.0(x1)   | 4                                  | 8                                    | 9                                         | 10                                    | Wait for bne     |
| 2        | addi   | x2.x2.1    | 4                                  | 11                                   |                                           | 12                                    | Wait for 1d      |
| 2        | sd     | x2.0(x1)   | 5                                  | 9                                    | 13                                        |                                       | Wait for addi    |
| 2        | addi   | ×1,×1,8    | 5                                  | 8                                    |                                           | 9                                     | Wait for bne     |
| 2        | bne    | x2,x3.Loop | 6                                  | 13                                   |                                           |                                       | Wait for addi    |
| 3        | 1d     | x2.0(x1)   | 7                                  | 14                                   | 15                                        | 16                                    | Wait for bne     |
| 3        | addi   | x2,x2,1    | 7                                  | 17                                   |                                           | 18                                    | Wait for 1d      |
| 3        | sd .   | x2.0(x1)   | 8                                  | 15                                   | 19                                        |                                       | Wait for addi    |
| 3        | addi   | x1.x1.8    | 8                                  | 14                                   |                                           | 15                                    | Wait for bne     |
| 3        | bne    | x2,x3,Loop | 9                                  | 19                                   |                                           |                                       | Wait for add i   |

| - |  |  |
|---|--|--|
|   |  |  |
|   |  |  |
|   |  |  |
|   |  |  |
|   |  |  |
|   |  |  |
|   |  |  |
|   |  |  |
|   |  |  |
|   |  |  |
|   |  |  |
|   |  |  |
|   |  |  |
|   |  |  |

| Iteration<br>number | Instru | ctions     | Issues<br>at clock<br>number | Executes<br>at clock<br>number | Read<br>access<br>at clock<br>number | Write<br>CDB at<br>clock<br>number | Commits<br>at clock<br>number | Comment           |
|---------------------|--------|------------|------------------------------|--------------------------------|--------------------------------------|------------------------------------|-------------------------------|-------------------|
| 1                   | ld     | x2.0(x1)   | 1                            | 2                              | 3                                    | 4                                  | 5                             | First issue       |
| 1                   | addi   | x2,x2,1    | 1                            | 5                              |                                      | 6                                  | 7                             | Wait for 1d       |
| 1                   | sd     | x2,0(x1)   | 2                            | 3                              |                                      |                                    | 7                             | Wait for add i    |
| 1                   | addi   | x1,x1,8    | 2                            | 3                              |                                      | 4                                  | 8                             | Commit in order   |
| 1                   | bne    | x2,x3,Loop | 3                            | 7                              |                                      |                                    | 8                             | Wait for add i    |
| 2                   | 1d     | x2,0(x1)   | 4                            | 5                              | 6                                    | 7                                  | 9                             | No execute delay  |
| 2                   | addi   | x2,x2,1    | 4                            | 8                              |                                      | 9                                  | 10                            | Wait for 1d       |
| 2                   | sd     | x2.0(x1)   | 5                            | 6                              |                                      |                                    | 10                            | Wait for add i    |
| 2                   | addi   | ×1.×1.8    | 5                            | 6                              |                                      | 7                                  | 11                            | Commit in order   |
| 2                   | bne    | x2,x3,Loop | 6                            | 10                             |                                      |                                    | 11                            | Wait for add i    |
| 3                   | Td     | x2,0(x1)   | 7                            | 8                              | 9                                    | 10                                 | 12                            | Earliest possible |
| 3                   | addi   | x2.x2.1    | 7                            | 11                             |                                      | 12                                 | 13                            | Wait for 1d       |
| 3                   | sd     | x2,0(x1)   | 8                            | 9                              |                                      |                                    | 13                            | Wait for add 1    |
| 3                   | addi   | x1,x1,8    | 8                            | 9                              |                                      | 10                                 | 14                            | Executes earlier  |
| 3                   | bne    | x2,x3,Loop | 9                            | 13                             |                                      |                                    | 14                            | Wait for add 1    |













| Processor           | Implementation<br>technology | Clock    | Power                                   | SPECCInt2006<br>base | SPECCFP 2006<br>baseline |
|---------------------|------------------------------|----------|-----------------------------------------|----------------------|--------------------------|
| Intel Pentium 4 670 | 90 nm                        | 3.8 GHz  | 115 W                                   | 11.5                 | 12.2                     |
| Intel Itanium 2     | 90 nm                        | 1.66 GHz | 104 W<br>approx. 70 W one<br>core       | 14.5                 | 17.3                     |
| Intel i7 920        | 45 nm                        | 3.3 GHz  | 130 W total<br>approx. 80 W one<br>core | 35.5                 | 38.4                     |







