3.A old program needs to be parallelized. Then, it can run faster on modern multicore
processors. In order to execute the program with parallel and serial portions more efficiently, a
custom heterogeneous processor needs to be designed.
The processor has one large core which executes code more quickly but takes greater die area on-
chip, the multiple small cores which execute code more slowly but consume less area, all sharing
one processor die.
When program in its parallel portion, all of its threads execute only on small cores.
When program in its serial portion, the one active thread executes on the large core.
Performance (execution speed) of a core is proportional to the square root of its area.
Assume 16 units of die area available. A small core takes 1 unit of die area. The large core can
take any number of units of die area n2, where n is the positive number. Area not used by the large
core will be filled with smaller cores.
The serial portion is only 10% of total work, and the parallel portion is the remaining 90%.