Solutions
5
Chapter 5 Solutions S-3
5.1
5.1.1 4
5.1.2 I, J
5.1.3 A[I][J]
5.1.4 3596 8 800/4 288/4 8000/4
5.1.5 I, J
5.1.6 A(J, I)
5.2
5.2.1
Word
Address
Binary
Address Tag Index Hit/Miss
3 0000 0011 0 3 M
180 1011 0100 11 4 M
43 0010 1011 2 11 M
2 0000 0010 0 2 M
191 1011 1111 11 15 M
88 0101 1000 5 8 M
190 1011 1110 11 14 M
14 0000 1110 0 14 M
181 1011 0101 11 5 M
44 0010 1100 2 12 M
186 1011 1010 11 10 M
253 1111 1101 15 13 M
5.2.2
Word
Address
Binary
Address Tag Index Hit/Miss
3 0000 0011 0 1 M
180 1011 0100 11 2 M
43 0010 1011 2 5 M
2 0000 0010 0 1 H
191 1011 1111 11 7 M
88 0101 1000 5 4 M
190 1011 1110 11 7 H
14 0000 1110 0 7 M
181 1011 0101 11 2 H
44 0010 1100 2 6 M
186 1011 1010 11 5 M
253 1111 1101 15 6 M
S-4 Chapter 5 Solutions
5.2.3
Cache 1 Cache 2 Cache 3
Word
Address
Binary
Address Tag index hit/miss index hit/miss index hit/miss
3 0000 0011 0 3 M 1 M 0 M
180 1011 0100 22 4 M 2 M 1 M
43 0010 1011 5 3 M 1 M 0 M
2 0000 0010 0 2 M 1 M 0 M
191 1011 1111 23 7 M 3 M 1 M
88 0101 1000 11 0 M 0 M 0 M
190 1011 1110 23 6 M 3 H 1 H
14 0000 1110 1 6 M 3 M 1 M
181 1011 0101 22 5 M 2 H 1 M
44 0010 1100 5 4 M 2 M 1 M
186 1011 1010 23 2 M 1 M 0 M
253 1111 1101 31 5 M 2 M 1 M
Cache 1 miss rate 100%
Cache 1 total cycles 12 25 12 2 324
Cache 2 miss rate 10/12 83%
Cache 2 total cycles 10 25 12 3 286
Cache 3 miss rate 11/12 92%
Cache 3 total cycles 11 25 12 5 335
Cache 2 provides the best performance.
5.2.4 First we must compute the number of cache blocks in the initial cache
con guration. For this, we divide 32 KiB by 4 (for the number of bytes per word)
and again by 2 (for the number of words per block). is gives us 4096 blocks and
a resulting index eld width of 12 bits. We also have a word o set size of 1 bit and a
byte o set size of 2 bits. is gives us a tag eld size of 32 15 17 bits. ese tag
bits, along with one valid bit per block, will require 18 4096 73728 bits or 9216
bytes. e total cache size is thus 9216 32768 41984 bytes.
e total cache size can be generalized to
totalsize datasize (validbitsize tagsize) blocks
totalsize 41984
datasize blocks blocksize wordsize
wordsize 4
tagsize
32 log2(blocks) log2(blocksize) log2(wordsize)
validbitsize 1
Chapter 5 Solutions S-5
Increasing from 2-word blocks to 16-word blocks will reduce the tag size from
17 bits to 14 bits.
In order to determine the number of blocks, we solve the inequality:
41984 64 blocks 15 blocks
Solving this inequality gives us 531 blocks, and rounding to the next power of
two gives us a 1024-block cache.
e larger block size may require an increased hit time and an increased miss
penalty than the original cache. e fewer number of blocks may cause a higher
con ict miss rate than the original cache.
5.2.5 Associative caches are designed to reduce the rate of con ict misses. As
such, a sequence of read requests with the same 12-bit index eld but a di erent
tag eld will generate many misses. For the cache described above, the sequence
0, 32768, 0, 32768, 0, 32768, …, would miss on every access, while a 2-way set
associate cache with LRU replacement, even one with a signi cantly smaller overall
capacity, would hit on every access a er the rst two.
5.2.6 Yes, it is possible to use this function to index the cache. However,
information about the ve bits is lost because the bits are XOR’d, so you must
include more tag bits to identify the address in the cache.
5.3
5.3.1 8
5.3.2 32
5.3.3 1 (22/8/32) 1.086
5.3.4 3
5.3.5 0.25
5.3.6 Index, tag, data
000001
2
, 0001
2
, mem[1024]
000001
2
, 0011
2
, mem[16]
001011
2
, 0000
2
, mem[176]
001000
2
, 0010
2
, mem[2176]
001110
2
, 0000
2
, mem[224]
001010
2
, 0000
2
, mem[160]
S-6 Chapter 5 Solutions
5.4
5.4.1 e L1 cache has a low write miss penalty while the L2 cache has a high
write miss penalty. A write bu er between the L1 and L2 cache would hide the
write miss latency of the L2 cache. e L2 cache would bene t from write bu ers
when replacing a dirty block, since the new block would be read in before the dirty
block is physically written to memory.
5.4.2 On an L1 write miss, the word is written directly to L2 without bringing
its block into the L1 cache. If this results in an L2 miss, its block must be brought
into the L2 cache, possibly replacing a dirty block which must rst be written to
memory.
5.4.3 A er an L1 write miss, the block will reside in L2 but not in L1. A subsequent
read miss on the same block will require that the block in L2 be written back to
memory, transferred to L1, and invalidated in L2.
5.4.4 One in four instructions is a data read, one in ten instructions is a data
write. For a CPI of 2, there are 0.5 instruction accesses per cycle, 12.5% of cycles
will require a data read, and 5% of cycles will require a data write.
e instruction bandwidth is thus (0.0030 64) 0.5 0.096 bytes/cycle. e
data read bandwidth is thus 0.02 (0.130.050) 64 0.23 bytes/cycle. e
total read bandwidth requirement is 0.33 bytes/cycle. e data write bandwidth
requirement is 0.05 4 0.2 bytes/cycle.
5.4.5 e instruction and data read bandwidth requirement is the same as in
5.4.4. e data write bandwidth requirement becomes 0.02 0.30 (0.130.050)
64 0.069 bytes/cycle.
5.4.6 For CPI1.5 the instruction throughput becomes 1/1.5 0.67 instructions
per cycle. e data read frequency becomes 0.25 / 1.5 0.17 and the write frequency
becomes 0.10 / 1.5 0.067.
e instruction bandwidth is (0.0030 64) 0.67 0.13 bytes/cycle.
For the write-through cache, the data read bandwidth is 0.02 (0.17 0.067)
64 0.22 bytes/cycle. e total read bandwidth is 0.35 bytes/cycle. e data write
bandwidth is 0.067 4
0.27 bytes/cycle.
For the write-back cache, the data write bandwidth becomes 0.02 0.30
(0.170.067) 64 0.091 bytes/cycle.
Address 0 4 16 132 232 160 1024 30 140 3100 180 2180
Line ID 0 0 1 8 14 10 0 1 9 1 11 8
Hit/miss M H M M M M M H H M M M
Replace N N N N N N Y N N Y N Y