110年 - 110 國立中山大學_碩士班招生考試_資工系(甲、乙組)：計算機結構#104310

科目：中山◆資工◆計算機結構 | 年份：110年 | 選擇題數：5 | 申論題數：7

試卷資訊

所屬科目：中山◆資工◆計算機結構

選擇題 (5)

1.1 (2%) A computer has 128MB of memory. Each word in the computer is eight bytes. How many bits at least are needed to address any single word in memory? (A) 8 bits (B) 16 bits (C) 24 bits (D) 32 bits

1.2 (2%) In modern computer architectures, TLB is usually involved to improve the efficiency of the memory hierarchy. If we meet a situation of TLB miss, what is the following description true? (A) The data requested by the CPU must be not in the cache. (B) The data requested by the CPU must be not in the main memory. (C) The CPU is failed to get the physical address of the required data. (D) The CPU must send a request to access the main memory immediately.

1.3 (2%) To solve the cache coherence problem, Snoopy protocol is the simplest. Regarding the Snoopy coherence protocol, what is the following description true? (A) Any CPU should invalidate the cache block if it receives a write miss signal to the data in the local cache block from the bus. (B) Any CPU should invalidate the cache block if it receives a read miss signal to the data in the local cache block from the bus. (C) The Snoopy coherence protocol is an atomic operation. (D) The Snoopy coherence protocol is proper to apply to a single-core system.

1.4 (2%) Regarding the cache miss, what is the following description true? (A) Compulsory miss indicates that the cache cannot contain all blocks required for the execution. (B) Coherence miss only happens in multi-core systems. (C) The miss rate will go down while the block size is made very large. (D) Increasing the associativity decreases miss rate due to lower compulsory miss.

1.5 (2%) Regarding the cache design, what is the following description true? (A) With the criteria of the identical entries, we need more tag bits in a multi-word cache than in set-accusative cache. (B) Through interchanging the loops in a code, the cache miss rate cannot be degraded. (C) Through the way prediction technique, the miss penalty during the cache access can be reduced. (D) If we pipeline the cache access, the cache bandwidth can be improved.

申論題 (7)

2.1 (10%) What is the CPI if this single-core processor only has one level of cache?

2.2 (20%) To speed up the process, we increase the CPU cores from 1-core CPU to 2-core CPU and retain other design settings. We assume 60% of instructions must be executed sequentially. Please estimate the speedup ratio by using the new architecture.

3. (20%) Assume a GPU architecture that contains 10 SIMD processors. Each SIMD instruction has a width of 32 and each SIMD process ssor contains 8 lanes for single-precision arithmetic and load/store instructions, meaning that each non-diverged SIMD instruction can produce 32 results every 4 cycles. Assume a kernel that has divergent branches that cause on average 80% of threads to be active. Assume that 70% of all SIMD instructions executed are single-precision arithmetic and 20% are load/store. Since not all memory latencies are covered, assume an average SIMD instruction issue rate of 0.85. Assume that the GPU has a clock speed of 1.5 GHz. Please compute the throughput, in GFLOP/sec, for this kernel on this GPU.

4.1 (10%) Please determine the number of bits required in the page table, TLB in the L1 cache, and TLB in the L2 cache.

a. (5%) L1 cache is implemented by using 2-way associative mapping strategy.

b. (5%) L2 cache is implemented by using 2-way associative mapping strategy.

4.3 (20%) Without considering the data transferring time, we assume the access time of L2 cache is 5 ns including all the miss handling； the access time of the main memory is 100 ns including all the miss handling； the access time of the hard disk is 1 us including all the miss handling. According to the data transference time between each memory level, we ignore the data transference time between the L1 and L2 cache and the data transference time between the main memory and the lowest level cache and disk are both 50 ns ineluding all the miss handling. In this system, the TLB will be located at the lowest level cache. When a data request comes, the TLB must be accessed first. If the TLB miss happens, we need to spend ions to handle the TLB-miss exception. When we adopt direct mapping strategy, the miss rate of the L1 cache and the embedded TLB are both 2%； the miss rate of the L2 cache and the embedded TLB are both 0.5%. If the 2-way mapping strategy is applied, the miss rate of the L1 cache and the embedded TLB are both 1%； the miss rate of the L2 cache and the embedded TLB are both 0.1%. At last, the miss rate of the main memory is 0.1%. During manufacturing, we need to spend 0.01 USD to handle one bit in each kind of memory. Please provide a design suggestion, including how many cache level you suggest and what kind of mapping strategy for each cache level you suggest, to your customer by considering the system performance and the manufacturing simultaneously cost.