SFSU Engr 851 - Spring 2013 - Homework 1
Prof. Seapahn Megerian
- Name three models that can be used to describe the functionality of an embedded system. For each model, give a simple example and draw the model.
- List five examples of properties of embedded systems that can be captured and represented by a model.
- List three pros and cons for VLIW and three pros and cons for Superscalar architectures and briefly explain your reasoning.
- List three use cases where VLIW would be more suitable than Superscalar.
- List three use cases where Superscalar would be more suitable than VLIW.
- Consider the n-tap FIR example discussed in class:
yn = cn * xn + cn-1 * xn-1 + cn-2 * xn-2 + ... + c1 * x1
given constants cn ... c1 and input values xi at time step i. Unless otherwise specified, assume n=4, the number of CPU registers you can use is 128, and that you have a RISC type processor with the following single-cycle instructions:
LOAD Ri, M(address)
STORE Ri, M(address)
ADD Rs1, Rs2, Rd
MUL Rs1, Rs2, Rd
MOV Rs, Rd
JGE Ri, NAME
JLE Ri, NAME
JE Ri, NAME
The conditional jump instructions compare with 0 (i.e. >=0, <=0, ==0).
- Suppose the current value of x, i.e. xn, is available through memory-mapped IO at M(1000). In other words, you can access the current value of x by loading from M(1000). The constants are stored in memory location M(1001) ... M(1000+N). Write the set of instructions that execute one whole loop of the FIR filter. You don't have to worry about the timing of when xn is read.
Hint: you have to decide which registers correspond to your x's. Every time the FIR executes, you load just ONE value for x. So you have to somehow remember the old ones. You can assume initially all registers contain 0 (so no need to initialize x's).
- How many clock cycles does the first loop of your FIR implementation require? This should include the steps needed to load all necessary data items.
- Assume each load/store operation takes 10 clock cycles to execute. How many clock cycles does your FIR require?
- If the processor has a 5-stage pipeline, then how many clock cycles does your FIR require? Remember that now, each pipeline stage executes in 1 clock cycle. Repeat your calculation for the case where memory load/store take 10 instructions.
- Suppose you now can also use a multiply-accumulate (MAC) instruction:
MAC Rd, Rx, Rc
where Rd = Rd + Rc*Rx. Rewrite your FIR implementation to use this new instruction.
- How many clock cycles does your new FIR require in the non-pipelined and 5-stage pipelined versions? Assume single-cycle load/stores.
- What is the minimum number of registers required to make your standard and MAC-based implementations work?
- Suppose registers are very expensive. What is the absolute minimum number of registers required to implement a working FIR? Justify your answer by writing an implementation in assembly using the given instructions. You can assume you have as much memory as you need.
- Suppose now your processor can perform two MAC operations in a single clock cycle (no other modifications to the register file or memory access). Write a new implementation in assembly to take advantage of this and calculate how many clock cycles it requires to execute.