

















| Exam                                                                                | ples                                                           |                                                                                                                                                                       | Introduction |  |  |
|-------------------------------------------------------------------------------------|----------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------|--|--|
| Example     DADDU     BEQZ     DSUBU     L:     OR                                  | <u>1:</u><br>R1,R2,R3<br>R4,L<br>R1,R1,R6<br>R7,R1,R8          | <ul> <li>OR instruction dependent<br/>on DADDU and DSUBU</li> <li>Preserving the order alone<br/>is not sufficient (must have<br/>the correct value in R1)</li> </ul> | Iction       |  |  |
| <ul> <li>Example<br/>DADDU<br/>BEQZ<br/>DSUBU<br/>DADDU<br/>skip:<br/>OR</li> </ul> | 2:<br>R1,R2,R3<br>R12,skip<br>R4,R5,R6<br>R5,R4,R9<br>R7,R8,R9 | <ul> <li>Assume R4 isn't used after skip</li> <li>Possible to move DSUBU before the branch</li> </ul>                                                                 |              |  |  |
| Copyright © 2012, Elsevier Inc. All rights reserved. 10                             |                                                                |                                                                                                                                                                       |              |  |  |



|                                                              | e Stal                                                               | 13                                                                                                                                                                   |                                                                                                                                                                                                                                                                                                                                                                       |
|--------------------------------------------------------------|----------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| L.D                                                          | F0.0(R1)                                                             |                                                                                                                                                                      | 1                                                                                                                                                                                                                                                                                                                                                                     |
| stall                                                        | - / - (                                                              |                                                                                                                                                                      | 2                                                                                                                                                                                                                                                                                                                                                                     |
| ADD.D                                                        | F4,F0,F2                                                             |                                                                                                                                                                      | 3                                                                                                                                                                                                                                                                                                                                                                     |
| stall                                                        |                                                                      |                                                                                                                                                                      | 4                                                                                                                                                                                                                                                                                                                                                                     |
| stall                                                        |                                                                      |                                                                                                                                                                      | 5                                                                                                                                                                                                                                                                                                                                                                     |
| S.D                                                          | F4,0(R1)                                                             |                                                                                                                                                                      | 6                                                                                                                                                                                                                                                                                                                                                                     |
| DADDUI R1,R1,#-8<br>stall (assume integer load latency is 1) |                                                                      | 7                                                                                                                                                                    |                                                                                                                                                                                                                                                                                                                                                                       |
|                                                              |                                                                      | 8                                                                                                                                                                    |                                                                                                                                                                                                                                                                                                                                                                       |
| BNE                                                          | R1,R2,Loop                                                           | 0                                                                                                                                                                    | 9                                                                                                                                                                                                                                                                                                                                                                     |
| ction proc                                                   | lucing result                                                        | Instruction using result                                                                                                                                             | Latency in clock cycles                                                                                                                                                                                                                                                                                                                                               |
| .U op                                                        |                                                                      | Another FP ALU op                                                                                                                                                    | 3                                                                                                                                                                                                                                                                                                                                                                     |
| .U op                                                        |                                                                      | Store double                                                                                                                                                         | 2                                                                                                                                                                                                                                                                                                                                                                     |
| louble                                                       |                                                                      | FP ALU op                                                                                                                                                            | 1                                                                                                                                                                                                                                                                                                                                                                     |
| louble                                                       |                                                                      | Store double                                                                                                                                                         | 0                                                                                                                                                                                                                                                                                                                                                                     |
|                                                              |                                                                      |                                                                                                                                                                      |                                                                                                                                                                                                                                                                                                                                                                       |
|                                                              | stall<br>ADD.D<br>stall<br>stall<br>S.D<br>DADDU<br>stall (as<br>BNE | stall<br>ADD.D F4,F0,F2<br>stall<br>S.D F4,0(R1)<br>DADDUI R1,R1,#-8<br>stall (assume intege<br>BNE R1,R2,Loop<br>ction producing result<br>.U op<br>.U op<br>double | stall         ADD.D       F4,F0,F2         stall         stall         S.D       F4,0(R1)         DADDUI R1,R1,#-8         stall (assume integer load latency is 1)         BNE       R1,R2,Loop         ction producing result       Instruction using result         .U op       Another FP ALU op         .U op       Store double         .double       FP ALU op |

| Scheduled code:                |                          |                         |
|--------------------------------|--------------------------|-------------------------|
| .oop: L.D F0,0(R1)             |                          | 1                       |
| DADDUI R1,R1,#-8               |                          | 2                       |
| ADD.D F4,F0,F2                 |                          | 3                       |
| stall                          |                          | 4                       |
| stall                          |                          | 5                       |
| S.D F4,8(R1)<br>BNE R1,R2,Loop |                          | 6<br>7                  |
|                                |                          |                         |
| Instruction producing result   | Instruction using result | Latency in clock cycles |
| FP ALU op                      | Another FP ALU op        | 3                       |
| FP ALU op                      | Store double             | 2                       |
| Load double                    | FP ALU op                | 1                       |
| Load double                    | Store double             | 0                       |































| Software Piplines                                                             |                                                                                                                               |                                                                                                            |                                                                                                                                        |  |  |  |
|-------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------|--|--|--|
|                                                                               | Loop:                                                                                                                         | ADD.D F<br>S.D F                                                                                           | 0,0(R1)<br>4,F0,F2<br>4,0(R1)<br>1,R1,#-8                                                                                              |  |  |  |
| 1 L.D<br>2 ADD<br>3 S.D<br>4 L.D<br>5 ADD<br>6 S.D<br>7 L.D<br>8 ADD<br>9 S.D | .D F4,F0,F2<br>F4,0(R1)<br>F0,-8(R1)<br>.D F4,F0,F2<br>F4,-8(R1)<br>F0,-16(R1)<br>.D F4,F0,F2<br>F4,-16(R1)<br>DUI R1,R1,#-24 | After: Softw<br>L.D<br>ADD.I<br>L.D<br>1 S.D<br>2 ADD.I<br>3 L.D<br>4 DADDU<br>5 BNE<br>S.D<br>ADDD<br>S.D | <pre>F0,-8(R1) F4,0(R1) ;Stores M[i] F4,F0,F2 ;Adds to M[i-1] F0,-16(R1);Loads M[i-2] UI R1,R1,#-8 R1,R2,LOOP F4, 0(R1) F4,F0,F2</pre> |  |  |  |
| M<                                                                            | Copyright © 2012, Elsevier Inc. All rights reserved. 29                                                                       |                                                                                                            |                                                                                                                                        |  |  |  |

