Please analyze the hazards in the assembly code which may cause the pipeline to stall, assuming all the instructions and data
are in the instruction and data caches and do not cause stalls in the IF and MEM stages.
申論題內容
(i) [5 points] Suppose we want to apply different convolution kernels to one input sequence, can we take advantage of a
multiprocessor to increase the performance? Please describe how such multiprocessing can be done, or why it cannot be done.