## LINKÖPINGS TEKNISKA HÖGSKOLA Institutionen för datavetenskap Petru Eles

## Tentamen i kursen

## **Datorarkitektur - TDDI03**

2017-01-09, kl. 08-12

Hjälpmedel:

Engelsk ordbok.

Supporting material:

English dictionary.

Poänggränser:

Maximal poäng är 40. För godkänt krävs sammanlagt 21 poäng. **Points:** 

Maximum points: 40. In order to pass the exam you need a total of minimum 21 points.

Jourhavande lärare:

Petru Eles, tel. 0703681396

Good luck !!!

# Tentamen i kursen Datorarkitektur - TDDI03, 2017-01-09, kl. 8-12 Du kan skriva på svenska eller engelska!

| a)<br>b).      | Why do we need special write strategies for cache memories?  We have discussed three write strategies: write-through, write through with buffered write, copy back. How do they work? Which are their advantages and disadvantages?                                                                                                                                                                                                 | and (3p) |
|----------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------|
| 2.             | A set-associative cache consists of $512 (=2^9)$ lines, divided into two-line sets. The memory contains $128K (=2^{17})$ blocks of $128 (=2^7)$ bytes each. Show the format of the memory address (give the total number of address bits, what different groups of address indicate, and how long each of these groups is).                                                                                                         | nain     |
| 3. a) b) c)    | Consider a pipelined processor with $k$ pipeline stages. What is the theoretical acceleration (ignoring overheads) for a sequence of $n$ instruction compared to a similar but non-pipelined processor? Show how you obtain the formula! What is the acceleration of a sequence of 90 instructions if the number of pipeline stages is What is the acceleration for an infinitely long sequence if the number of pipeline stages is | 16?      |
| 4.             | Define the three types of pipeline hazards. Give an example for each.                                                                                                                                                                                                                                                                                                                                                               | (3p)     |
| 5.             | Enumerate five of the main characteristics of RISC architectures.                                                                                                                                                                                                                                                                                                                                                                   | (2p)     |
| 6.             | Branch history table: what does it contain and how is it used?                                                                                                                                                                                                                                                                                                                                                                      | (2p)     |
| 7.<br>a)<br>b) | What is a superscalar architecture?  Draw a block-diagram of a superscalar unit.                                                                                                                                                                                                                                                                                                                                                    | (3p)     |

#### Tentamen i kursen Datorarkitektur - TDDI03, 2017-01-09, kl. 8-12 Du kan skriva på svenska eller engelska!

8.

Consider the following sequence of machine instructions:

```
1:
                R0 + R2
      R1
          \leftarrow
2:
      R7
                R1 + R12
          \leftarrow
3:
      R5
                R7 - 1
          \leftarrow
                R2 * R4
4:
      R1 ←
5:
                R9 + 10
      R7 ←
6:
      R12 ←
                R4 - 25
7:
                R5 * 2
      R3 ←
8:
                R1 + R3
      R4 ←
9:
                R8 - 2
      R10 ←
10:
      R1 ←
                R1 * 3
```

- a) Indicate the data dependencies among instructions.
- b) Consider a superscalar computer on which the execution of each instruction takes one cycle; the computer has two units that execute addition&subtraction and one unit for multiplication. Produce a table, according to the model below, showing how instructions are executed in consecutive cycles, if we assume in order execution.
- c) Rename the registers in the above sequence to prevent, where possible, dependency problems. Produce a second table, according to the model below, showing how instructions (after register renaming) are executed in consecutive cycles, if we assume out of order execution.

In the table cells indicate the sequence number (1, 2, .., 10) of the instruction executed in the corresponding cycle on the respective unit.

|         | ADD/SUB | ADD/SUB | MUL |
|---------|---------|---------|-----|
| Cycle 1 |         |         |     |
| Cycle 2 |         |         |     |
| Cycle 3 |         |         |     |
| •••     |         |         |     |

(4p)

- 9. Compare VLIW architectures with superscalar architectures:
- a) Show similarities and differences.
- b) Show the advantages and disadvantages of the two approaches.
- c) Why is a superscalar consuming more power, compared to a VLIW computer?

(4p)

đ

## Tentamen i kursen Datorarkitektur - TDDI03, 2017-01-09, kl. 8-12 Du kan skriva på svenska eller engelska!

| 10.                   | What is trace scheduling? How does it work (remember the three steps)? Why is it impo with VLIW architectures?                                                                                                                           | rtant         |
|-----------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------|
|                       |                                                                                                                                                                                                                                          | (3p)          |
| 11.<br>a)<br>b)       | What is branch predication (like in the Itanium architecture)? Compare with ordinary branch prediction.                                                                                                                                  | (3p)          |
| 12.                   | Formulate Amdahl's law and comment.                                                                                                                                                                                                      |               |
|                       |                                                                                                                                                                                                                                          | (2p)          |
| 13.<br>a)<br>b)<br>c) | What is hardware multithreading? Why do multithreaded processors provide higher performance? We have described three approaches to multithreading: interleaved, blocked, and sim neous; what is the main characteristic of each of them? | ulta-<br>(3p) |
| 14.                   | What is a vector processor? Draw a block diagram. What is the basic difference between array processors and vector processors?                                                                                                           | (2p)          |
|                       |                                                                                                                                                                                                                                          |               |