### LINKÖPINGS TEKNISKA HÖGSKOLA Institutionen för datavetenskap Petru Eles ### Tentamen i kursen # Datorarkitektur - TDDI03 2017-08-18, kl. 14-18 Hjälpmedel: Engelsk ordbok. Supporting material: English dictionary. Poänggränser: Maximal poäng är 40. För godkänt krävs sammanlagt 21 poäng. Points: Maximum points: 40. In order to pass the exam you need a total of minimum 21 points. #### Jourhavande lärare: Rouhollah Mahfouzi, tel. 013 281631 Good luck !!! ## Tentamen i kursen Datorarkitektur - TDDI03, 2017-08-18, kl. 14-18 Du kan skriva på svenska eller engelska! | 1. | | | | | |-----------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|--|--| | a)<br>b). | Why do we need special write strategies for cache memories? We have discussed three write strategies: write-through, write through with buffered write, a copy back. How do they work? Which are their advantages and disadvantages? | ınc | | | | | | 3p) | | | | | | | | | | 2. | A set-associative cache consists of $512 (=2^9)$ lines, divided into two-line sets. The material memory contains $128K (=2^{17})$ blocks of $128 (=2^7)$ bytes each. Show the format of the material memory address (give the total number of address bits, what different groups of address bindicate, and how long each of these groups is). | air | | | | | | 3p) | | | | 3. | | | | | | a)<br>b) | What is the role of the page table in a virtual memory system? What data does it store? The page table is very large, usually too large to be stored in main memory. Such a large size at the same time, makes accept to the same time. | ze, | | | | | at the same time, makes access to the page table very slow. How is this solved in current microprocessor architectures. | ent | | | | | | 3p) | | | | | | | | | | 4.<br>a) | Consider a pipelined processor with $k$ pipeline stages.<br>What is the theoretical acceleration (ignoring overheads) for a sequence of $n$ instructions, | | | | | b) | compared to a similar but non-pipelined processor? Show how you obtain the formula! What is the acceleration of a sequence of 80 instructions if the number of pipeline stages is 1. | 19 | | | | c) | What is the acceleration for an infinitely long sequence if the number of pipeline stages is 14 | | | | | | (3 | p) | | | | | | | | | | 5. | Define the three types of pipeline hazards. Give an example for each. | | | | | | (3 | p) | | | | _ | Demonis has also at 11 discussion at 2 | | | | | 5. | Dynamic branch prediction with a two-bit scheme. How does it work? Illustrate with the case of a loop like the one below. Compare with one-bit prediction. | | | | | | | | | | | | LOOP | | | | | | | | | | | | BNZ LOOP | | | | | | (3) | n) | | | | | | ۲) | | | | 7. | Delayed load. Why do we need it with RISC architectures? How does it work? | | | | | | Give an example. | | | | | | (3) | p) | | | ### Tentamen i kursen Datorarkitektur - TDDI03, 2017-08-18, kl. 14-18 Du kan skriva på svenska eller engelska! 8. - a) What is a superscalar architecture? - b) Draw a block-diagram of a superscalar unit. (3p) 9. Consider the following sequence of machine instructions: 1: R1 $$\leftarrow$$ R0 + R2 2: R7 $\leftarrow$ R1 + R12 3: R5 $\leftarrow$ R7 - 1 4: R1 $\leftarrow$ R2 \* R4 5: R7 $\leftarrow$ R9 + 10 6: R12 $\leftarrow$ R4 - 25 7: R3 $\leftarrow$ R5 \* 2 8: R4 $\leftarrow$ R1 + R3 9: R10 $\leftarrow$ R8 - 2 10: R1 $\leftarrow$ R1 \* 3 - a) Indicate the data dependencies among instructions. - b) Consider a superscalar computer on which the execution of each instruction takes one cycle; the computer has two units that execute addition&subtraction and one unit for multiplication. Produce a table, according to the model below, showing how instructions are executed in consecutive cycles, if we assume in order execution. - c) Rename the registers in the above sequence to prevent, where possible, dependency problems. Produce a second table, according to the model below, showing how instructions (after register renaming) are executed in consecutive cycles, if we assume out of order execution. In the table cells indicate the sequence number (1, 2, .., 10) of the instruction executed in the corresponding cycle on the respective unit. | | ADD/SUB | ADD/SUB | MUL | |---------|---------|---------|-----| | Cycle 1 | | | | | Cycle 2 | | | | | Cycle 3 | | | | | • • • | | | | ## Tentamen i kursen Datorarkitektur - TDDI03, 2017-08-18, kl. 14-18 Du kan skriva på svenska eller engelska! | 10. | What is loop unrolling? How does it work? Why is it important with VLIW architectur | res? (2p) | |-----------------|--------------------------------------------------------------------------------------------------------------------------------|-----------| | 11.<br>a)<br>b) | What is branch predication (like in the Itanium architecture)? Compare with ordinary branch prediction. | (3p) | | 12. | Formulate Amdahl's law and comment. | (2p) | | 13. | Flynn's Classification of Computer Architectures: Explain and illustrate by figures. | (3p) | | 14. | What is a vector processor? Draw a block diagram. What is the basic difference between array processors and vector processors? | (2p) |