E

SFSU Engr 851  Spring 2013  Homework 1
Prof. Seapahn Megerian
 Name three models that can be used to describe the functionality of an embedded system. For each model, give a simple example and draw the model.
 List five examples of properties of embedded systems that can be captured and represented by a model.
 List three pros and cons for VLIW and three pros and cons for Superscalar architectures and briefly explain your reasoning.
 List three use cases where VLIW would be more suitable than Superscalar.
 List three use cases where Superscalar would be more suitable than VLIW.
 Consider the ntap FIR example discussed in class:
y_{n} = c_{n} * x_{n} + c_{n1} * x_{n1} + c_{n2} * x_{n2} + ... + c_{1} * x_{1}
given constants c_{n} ... c_{1} and input values x_{i} at time step i. Unless otherwise specified, assume n=4, the number of CPU registers you can use is 128, and that you have a RISC type processor with the following singlecycle instructions:
LOAD R_{i}, M(address) STORE R_{i}, M(address) ADD R_{s1}, R_{s2}, R_{d} MUL R_{s1}, R_{s2}, R_{d} MOV R_{s}, R_{d} JMP NAME JGE R_{i}, NAME JLE R_{i}, NAME JE R_{i}, NAME The conditional jump instructions compare with 0 (i.e. >=0, <=0, ==0).
 Suppose the current value of x, i.e. x_{n}, is available through memorymapped IO at M(1000). In other words, you can access the current value of x by loading from M(1000). The constants are stored in memory location M(1001) ... M(1000+N). Write the set of instructions that execute one whole loop of the FIR filter. You don't have to worry about the timing of when x_{n} is read.
Hint: you have to decide which registers correspond to your x's. Every time the FIR executes, you load just ONE value for x. So you have to somehow remember the old ones. You can assume initially all registers contain 0 (so no need to initialize x's).
 How many clock cycles does the first loop of your FIR implementation require? This should include the steps needed to load all necessary data items.
 Assume each load/store operation takes 10 clock cycles to execute. How many clock cycles does your FIR require?
 If the processor has a 5stage pipeline, then how many clock cycles does your FIR require? Remember that now, each pipeline stage executes in 1 clock cycle. Repeat your calculation for the case where memory load/store take 10 instructions.
 Suppose you now can also use a multiplyaccumulate (MAC) instruction:
MAC R_{d}, R_{x}, R_{c}
where R_{d} = R_{d} + R_{c}*R_{x}. Rewrite your FIR implementation to use this new instruction.
 How many clock cycles does your new FIR require in the nonpipelined and 5stage pipelined versions? Assume singlecycle load/stores.
 What is the minimum number of registers required to make your standard and MACbased implementations work?
 Suppose registers are very expensive. What is the absolute minimum number of registers required to implement a working FIR? Justify your answer by writing an implementation in assembly using the given instructions. You can assume you have as much memory as you need.
 Suppose now your processor can perform two MAC operations in a single clock cycle (no other modifications to the register file or memory access). Write a new implementation in assembly to take advantage of this and calculate how many clock cycles it requires to execute.
