CST 131 Previous Homework Assignments -- Spring 2025
- Set #6 (Due Wed May 21) 7.29 (7.29 4th ed, 7.25 3rd
ed), 7.33 (7.33 4th ed,
7.29 3rd ed), C.5, C.6, C.7, C.8
7.29)
Only do
parts a and b. (7.25 in the 3rd ed) In this problem,
there is a typo. The track to track seek time is given, but the
average seek time is not. Assume that the table entry
"Track-to-track seek time of 8ms" was supposed to be "Average seek
time of 8ms".
Make
sure you show all your work in each part. Present your answer in
part a in MB using the definition that
MB = 106 Bytes
(instead of 2
20 Bytes).
Disk manufactures use this definition in order to
attempt to avoid confusing consumers (though I'm not sure how much
it makes things better or worse).
29) Suppose a disk drive has the following
characteristics:
• 5 surfaces
• 1024 tracks per surface
• 256 sectors per track
• 512 bytes/sector
• Track-to-track seek time of 8 milliseconds
• Rotational speed of 7500 RPM.
a) What is the capacity of the drive?
b) What is the access time?
c) Is this disk faster than the one described in
question 28?
7.33) (7.29 in the
3rd ed)
Ignore the hint given in this question in the text!
Note that the question is
not about large versus small
sectors, but about the relative advantages and disadvantages of
having more or less of a particular size (i.e. 512 byte) of
sectors in a cluster.
That is, what are the relative
advantages and disadvantages of dividing up a disk into fewer
big clusters (clusters with more sectors in them) or more
smaller clusters (clusters with fewer sectors in them). Don't
just list the advantages and disadvantages of having fewer sectors
per cluster,
also describe how these advantages and
disadvantages result from fewer sectors per cluster.
33) What are the advantages and disadvantages of
having a small number of sectors per disk cluster?
(Hint: You may want to think about retrieval time and the
required lifetime of the archives.)
C.5) Headers and Trailers in hard disk sectors
a) A unique pattern appears in the
header at the beginning of each sector in order to mark the
beginning of the sector. One use of this pattern is to allow the
read electronics to get in sync with the bit spacing on the disk
(since there will be some variation in the speed of the disk
over time). Other than its use for synchronization, why is a
mark at the beginning of each sector needed?
b) Why do disks need
track and sector ID information in the header of each
sector? How is this information used?
c) Error correcting codes (ECC) are stored in the trailer of
each sector. These codes use up space we could use for storing
more data. Instead of using ECCs, we could test a drive when we
formatted it by writing known data to each sector and comparing
the data we read back with the known value. If it didn't match,
we would mark the cluster containing that sector bad in the file
allocation table. How does the use of an ECC actually make
more data available than the proposed approach?
C.6) Optical Storage
a) Do CDs use a constant or
variable bit density along the spiral track.
b) What advantage does this type of bit density give CDs?
c) Why was this type of bit density a disadvantage for early CD
reader/writers?
d) Why is Mode 2 (as described in the text) not appropriate for
computer data storage on a CD?
e) Blu-Ray drives use a 405nm wavelength laser in order to allow
more bits on each disk layer. Why was a shorter wavelength laser
(shorter than for DVD and CD) needed in order to achieve this?
C.7) RAID
(assuming the disk array has no failed disks)
a) In RAID Levels 4 and 5, the
strip may be large enough that during a write, only one strip in
the stripe needs to be changed. When writing one strip in either
level 4 or 5, the parity for the strip also needs to be updated.
It is not necessary to read all the other disks in order to
determine what the new parity should be. To see how this can be,
consider changing the msb on Disk 3 from 0 to 1.
Disk 0
|
Disk 1
|
Dsk 2
|
Disk 3
|
Disk 4
|
Parity disk
|
????????
|
????????
|
?????????
|
0.......
|
????????
|
1.......
|
i) What will the new MSb on the parity disk be? _____
(Assume even parity).
ii) Why don't we need to know what the other (msb) bits are?
b) Based on the above, describe the sequence of steps needed to
write one strip of a stripe in RAID level 4 or 5
(assuming no disks are failed).
c) What makes RAID level 5 better for writes to individual
strips than RAID level 4? Describe
why this is so.
d) Sometimes the strips are small enough, or the data being
changed is large enough so that it makes more sense to write the
whole stripe. How does the sequence of steps you described in b)
change if we write the whole stripe?
e) If we are writing the whole stripe, does RAID level 5 have an
advantage over RAID level 4? Why or why not?
f) Does RAID level 5 have an advantage over RAID level 4 for
reads if there are no failed disks? Why or why not?
C.8) RAID (assuming that there is one failed disk in the disk
array that has not yet been replaced)
a) Assume (hypothetically) that a
read of byte-wide strips in a stripe on a RAID level 3 system
provides the following data (only the MSb of each byte is
shown):
Disk 0
|
Disk 1
|
Disk 2
|
Disk 3
|
Parity disk
|
1........
|
0........
|
0........
|
1........
|
1........
|
i) If we assume even parity, is at least one MSb of the data
bad? How do you know?
ii) Can parity tell us which disk provided the bad data?
iii) So, When a disk in a RAID system fails (provides
incorrect data) we need some other mechanism to know which
disk failed. How can we know which disk failed?
b) Suppose that disk 2 in a 5 disk RAID level 3 system fails.
The other disks provide the following data on the read of the
bit-wide strips:
Disk 0
|
Disk 1
|
Disk 2
|
Disk 3
|
Parity disk
|
1........
|
0........
|
bad
|
1........
|
1........
|
Assuming that even parity is used, what should the msb of the
byte from Disk 2 actually be? ___ .
c) Based on part b, describe what must be done in a RAID level 5
system in order to be able to "read" the data from a strip on a
failed disk. Assume we only need to read the missing strip.
(Note that the system can continue to provide the data that
would have come from the disk that failed so long as only one
disk fails. This data will eventually be used to reconstruct the
data on a replacement disk, but for the purposes of this
question, assume we haven't replaced the disk yet.)
d) There are two factors involved in "reading" data from a strip
on a failed disk (as described in part c) that will slow down
the system compared to one that has no failed disks. Describe
them both. (Assume we haven't replaced the failed disk yet. Also
don't assume that we have a "hot" spare.) Note that one of the
factors is not "using parity to compute the missing data". While
this does have to be done, the time needed to do so is
negligible compared to the time needed to do a disk seek.
- Set#5 (Due Wed, May 14) 7.2 (7.2 4th and 3rd ed.), 7.8 (7.8 4th ed., 7.6 3rd
ed), C.1, C.2, C.3
7.2) Calculate the overall speedup
of a system that spends 40% of its time in calculations with a
processor upgrade that provides for 100% greater throughput.
Note: In this problem, saying that the new processor is100% faster
is the same as saying that the new processor is 2 times faster
than the old one.
7.8) Show your work. First determine the speedup and performance
increase resulting from both options. Note that the performance
increase is the
amount
over 100%. Then
in part a) give the cost for a 1% performance
increase for
each option and state which is best when judged on this basis.
This is also the criteria to be used in part c).
Keep intermediate results in your
calculator rather than using the rounded off intermediate
results determined in one part in the next part of the problem. It
will make a
difference in this problem.
7.8) Suppose the daytime
processing load consists of 60% CPU activity and 40% disk
activity. Your customers are complaining that the system is
slow. After doing some research, you have learned that you can
upgrade your disks for $8,000 to make them 2.5 times as fast as
they are currently. You have also learned that you can upgrade
your CPU to make it 1.4 as fast for $5,000.
a) Which option would you choose to yield the best performance
improvement for the least amount of money?
b) Which option would you choose if you don't care about the
money, but want a faster system?
c) What is the break-even point for the upgrades? That is, what
price would we need to charge for the CPU (or the disk – change
only one) so the result was the same cost per
1% increase for both?
C.1) Complete a table like the following one naming the 4
different I/O methods, listing the hardware (beyond the CPU and
the I/O device and its device interface
(i.e. its "ports")) needed to
implement each architecture, and ranking each architecture by how
much software overhead is involved in that architecture (compared
to the other methods) and describing where the software overhead
comes from in that method
and
why it is more or less than the previous method. Note that
software overhead is
instructions the CPU executes in
support of I/O. Also note, that (as discussed in class) Memory
mapped I/O and Instruction-based I/O are
not I/O methods.
I/O Method
|
Additional Hardware
|
Software Overhead Rank
(most, least, less than ... etc.)
|
Source of
Software Overhead
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
C.2)
Response time is the
time from when the I/O device requests service and when the device
actually gets service. For
each
of the I/O control methods, describe the response time by
describing the sequence of events
that occurs between when a device requests service (by
setting a bit in a status register, asserting an interrupt line,
or asserting a DMA request line)
and when the device is serviced (data is read from or
written to the device)
.
C.3) Protocol, Handshake
a) Define the term protocol as it relates to
I/O. (Use the definition the author gives early in section 7.4
of the text, but add
"timing (or sequence) of signals" to the definition.)
b) Define the term handshake
as it relates to I/O. (Again, use the definition the author
gives in section 7.4 of the text.)
c) Describe why handshaking is necessary in I/O.
d) What is a timing diagram and
what is its purpose for I/O?
- Set #4 (Due Wed, May 7) B.8, B.9, B.10, B.11, B.12, 6.25 (6.25
4th ed, 6.20 3rd ed), B.14
B.8) Cache Write Policies
a) At first it seems that "write
through" would really slow down the operation of a cache memory
system. Describe why it does not slow it down as much as you
might at first think for most programs.
b) In what situations can a "write back" policy be more
efficient (result in fewer writes to main memory) than a "write
through" policy? Consider that in write back, we have to write
back the whole cache block even if only one word of it was
changed since there is only a single dirty bit for the whole
block.
B.9) A particular memory system consists of a cache and a main
memory. (This system does not support virtual memory). The cache
has a hit ratio of 95% and an access time of 12 ns. The main
memory has an access time of 25ns. What is the effective access
time (EAT) of this memory system? Please show your work.
B.10) The equation for EAT given on page 378 (page 367 4th
ed, page 334 3rd ed) describes how to compute the effective access
time for a
two level
memory system consisting of cache and main memory. The equation
for EAT on page 392 (page 382 4th ed, page 349 3rd ed) describes
how to compute the effective access time for a two-level system
consisting of virtual memory (on disk) and the main memory (but
note that it assumes either no TLB or that the TLB missed -- in
class we adjusted this equation in order to represent what happens
when a TLB hits : EAT = 0.99 × (5ns + 200ns) + .01 × 10ms :
where the page fault rate was .01 (or 1%) and the TLB hit time was
5ns, the main memory access time was 200ns and the disk access
time is 10ms). A full
memory
system including virtual memory usually has
three levels: Cache, Main
memory, and Disk. Let the Cache have a 12ns (nanoseconds) access
time and a hit rate of 95%. Let the Main memory have an
access time of 50ns and a page fault rate of 0.0001% (10
-6
or .000001). Let the disk have an access time of 10ms
(milliseconds). Compute the effective access time of this memory
system.
a) First, compute the effective
access time of the two level system containing the Main memory
and Disk. Assume that a TLB is used, that it always hits, and
that a TLB hit takes 5ns.
b) Next, compute the overall effective access time of the whole
memory system by assuming it is a two-level system containing
the Cache and a main memory that has an access time equal to the
effective access time you found in part a.
Make sure you show your work.
B.11) Virtual Memory.
a) What does the term "
page fault" describe?
b) What (in general) must a computer system do in response to a
page fault?
c) The page fault rate can be determined by dividing the number
of page faults by _______ .
For each of the following
options, indicate if it could (by itself) be used to correctly
complete the previous statement (put a Y or N in each slot).
- the number of page frames ___
- the number of virtual pages in the program ___
- 100 ___
- the number of times we check the page table ___
- the number of times we check the TLB ___
- the number of page hits ___
- the total number of times the CPU attempts a data access
___
- the total number of times the CPU attempts an instruction
access ___
- the total number of times the CPU attempts an instruction
or data access ___
d) What does the term 'TLB" stand for?
e) What
is a
TLB?
f) Running a virtual memory system without a TLB would make it
very slow because every CPU request to reference main memory for
an instruction or data item would require
(fill in the bank with a number) ____
references to main memory (assuming there is no page fault).
Also, describe what each
reference is for.
B.12) For each of the following cases, first say if the sequence
of events is possible or not. Then, in those cases where it is not
possible, explain why it is not possible.
a) TLB hit, Cache hit
b) TLB hit, Cache miss
c) TLB miss, Page table entry valid, Cache hit
d) TLB miss, Page table entry valid, Cache miss
e) TLB miss, Page table entry invalid (Page Fault), Cache hit
(after the page fault is handled, the TLB is updated, and the
access is restarted and the TLB now hits).
6.25) This problem from the text assumes that bytes are addressed.
Note that the problem does not show all the page table entries,
only the valid ones are given. Please show your work.
25.) A system implements a paged
virtual address space for each process using a one-level page
table. The maximum size of virtual address space is
16MB. The page table for the running process includes the
following valid entries (the → notation indicates that a virtual
page maps to the given page frame, that is, it is located in
that frame):
Virtual page 0 → page frame 1
Virtual page 3 → page frame 16
Virtual page 1 → page frame 2 Virtual page 4 →
page frame 9
Virtual page 2 → page frame 4
The page size is 1024 bytes and the maximum physical memory size
of the machine is 2MB.
a) How many bits are required for each virtual
address?
b) How many bits are required for each physical address?
c) What is the maximum number of entries in a page table?
d) To which physical address will the virtual address 0x5F4
translate?
e) Which virtual address will translate to physical address
0x400?
B.14) Suppose that a page table for a particular process contains
the entries shown below.
Frame
|
Valid
|
Dirty
|
1
|
1
|
0
|
-
|
0
|
0
|
0
|
1
|
0
|
3
|
1
|
1
|
-
|
0
|
0
|
-
|
0
|
0
|
2
|
1
|
1
|
-
|
0
|
0
|
a) The valid bit indicates
whether or not the page table entry itself is valid. What does a
valid or invalid page table entry also imply?
b) What is the purpose of the dirty bit?
c) What must be done for pages that have their dirty bit set
that does not have to be done for pages that don't have their
dirty bit set?
d) In the page table itself, the entries do not contain virtual
page numbers, so how do we know where to go in the table in
order to find out what memory frame a particular virtual page
maps to?
e) Based on the above table, what virtual page maps to memory
frame 2?
f) Why do we have to add virtual page numbers to the page table
entries in the TLB? (Consider that, as noted in part d., the
page table entries don't have virtual page numbers in them. So,
why should TLB
entries have them?)
g) When main memory is full, and we need to bring in another
page, how do we decide which page to replace?
h) Assuming that we want to bring virtual page #5 into main
memory and that we will put it in memory frame 3, show how the
page table changes. Make your changes directly on a copy of the
above table.
- Set #3 (Due Wed, Apr 23) B.1, B.2, B.3, B.4, B.5, 6.9 (5th, 4th
and 3rd ed), 6.11 (5th, 4th and 3rd ed), B.6, B.7
B.1)
a) What is meant by the term "volatile" as it refers to a
memory characteristic?
b) Why is it important that at least some of the memory in the
physical address space of the processor in a computer
system is non-volatile?
c) What, specifically, is meant by the term "dynamic" as it refers to
a memory characteristic? Note that "dynamic" is a general term
referring to more than just memory that is constructed using
capacitors as storage elements, so don't reference capacitors or
capacitor behavior in your answer.
B.2) In a cache, the
"mapping function" determines where
in the cache certain blocks of memory may be placed. Most mapping
functions restrict where a block from memory can be placed in the
cache and some are more restrictive than others.
a) Fill in the following table,
under the column "How Restrictive" with the phrases
"none",
"
some", or "
completely restrictive" as
appropriate. Under "Description", describe what restriction (if
any) the mapping function imposes on memory block placement.
Mapping Function |
How Restrictive
|
Description |
Fully Associative |
|
|
Direct Mapped |
|
|
Set Associative |
|
|
b) One would suspect that the least restrictive mapping
function would result in the highest hit rate. Why then is
this mapping function not commonly used?
c) Which mapping function is most commonly used and why?
(Note that no mapping function is "faster" or "slower" than
any other, so speed is not a reason.)
B.3) A particular
Direct Mapped cache has a total of 1024
blocks with 16 32-bit words (64 bytes) in each block. The cache is
designed to work with a CPU that uses 32-bit
byte addresses.
a) How many tags are stored in
the cache?
b) How many bits are needed to select the byte in a word?
c) How many bits are needed to select a word in a block?
d) How many bits are needed to select the cache block?
e) How many bits are in each tag?
f) In what cache block could one expect to find the word whose
main memory address is 71F32AC2h? Give the answer in Hex or in
decimal. Show your work.
g) How many comparators does this cache have?
h) How many bits wide is each comparator?
B.4) A particular
8-way Set-Associative cache has a total
of 1024 blocks with 16 32-bit words in each block. The cache is
designed to work with a CPU that uses 32-bit byte addresses.
a) How many
blocks are assigned to each set?
b) How many tags are stored in the cache?
c) How many sets are in
the cache?
d) How many bits are needed to select a set?
e) In what cache set could one expect to find the word whose
main memory address is 71F32AC2h? Give the answer in Hex or in
decimal. Show your work.
f) How many comparators does this cache have?
g) How many bits wide is each comparator?
B.5) A particular
fully associative cache has a total of
1024 blocks with 16 32-bit words in each block. The cache is
designed to work with a CPU that uses 32-bit byte addresses.
a) How many tags are stored in
the cache?
b) How many comparators does this cache have?
c) How many bits wide is each comparator?
d) In what cache block could one expect to find the word whose
main memory address is 71F32AC2h?
6.9) (Text exercise 9 at the end of chapter 6) (In the 4th and 3rd
edition this problem states that it is based on word addresses,
not byte addresses, but you would get the same answers).
Suppose a byte-addressable
computer using set associative cache has 2
16 bytes of
main memory and a cache of 32 blocks, and each cache block
contains 8 bytes.
a) If this
cache is 2-way set associative, what is the format of a memory
address as seen by the cache, that is, what are the sizes of
the tag, set, and offset fields?
b) If this cache is 4-way set associative, what is the format
of a memory address as seen by the cache?
6.11) (Based on text exercise 11 at the end of chapter 6) Note
that the word size is 1 byte and that each block has 4 bytes. Note
that the exercise in the text gives the address format. From this
work out the cache block each address maps to and what other
addresses are brought in to the cache block on a miss.
a) Complete the following table
showing the cache block that each reference will use and note if
the reference is a
Hit
or a
Miss
.
Reference (Word Address):
|
6E
|
B9
|
17
|
E0
|
4E
|
4F
|
50
|
91
|
A8
|
A9
|
AB
|
AD
|
93
|
94
|
Cache Block Used:
|
3
|
2
|
1
|
0
|
|
|
|
|
|
|
|
|
|
|
Hit or Miss
|
M
|
M
|
M
|
M
|
|
|
|
|
|
|
|
|
|
|
Note that the cache is initially empty before reference 6E.
b) Give the overall hit rate for this sequence. _____
c) Fill in the following table showing what will be in cache
block 2 (
addresses
of data bytes and actual tag) after the last reference. (The
text question asks you to show what is in all 4 cache blocks,
but just showing the
contents
of block 2 will be sufficient.)
Block 2
Tag Contents
|
Cache Contents
(by address)
|
|
|
|
|
|
|
|
B.6) In addition to a mapping function, a cache may have to have a
replacement policy.
a) What is a replacement policy?
(Note that since you are defining this term, you shouldn't use
the words "replacement" or "policy" in your answer.)
b) What types of cache's need replacement policies?
c) When does a replacement policy need to be applied? (i.e. for
what event in the course of the operation of the cache do you
need to know what the replacement policy is?)
d) How does one measure the effectiveness of a replacement
policy?
B.7) Fill in the following table, as we did in class, showing the
contents of the tags in one set of a 4 way set associative cache
that uses an
LRU replacement policy. Assume
that each column represents the tag fields of the four blocks (B0,
B1, B2 and B3) of the set at each point in time. Assume that the
"CPU tag field" row indicates the tag fields of the sequence of
addresses that, over time, come to this set from the CPU.
Also, mark the hits and
determine the
hit rate.
time |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
CPU tag field |
2 |
4 |
1 |
2 |
5 |
3 |
1 |
5 |
4 |
1 |
2 |
B0 tag
|
|
|
|
|
|
|
|
|
|
|
|
B1 tag
|
|
|
|
|
|
|
|
|
|
|
|
B2 tag |
|
|
|
|
|
|
|
|
|
|
|
B3 tag |
|
|
|
|
|
|
|
|
|
|
|
Hit rate: ______
- Set #2 (Due Wed Apr 16) 5.22 (3rd ed 5.18), A.6,
A.7, A.8, A.9, 5.24 (3rd ed 5.20), 5.29 (3rd ed
5.25), A.10
5.22) (Problem 22 in chapter 5 in
the 4th and 5th eds, 5.18 on the 3rd ed.) Given that memory
contains the values shown below, and that
R1 contains 0x200,
Memory Address
|
Contents
|
0x100
|
0x600
|
...
|
|
0x400
|
0x300
|
...
|
|
0x500
|
0x100
|
...
|
|
0x600
|
0x500
|
...
|
|
0x700
|
0x800
|
Show what value will be loaded into the AC for each of the
indicated instructions. (Also note in this problem, the memory
addresses and contents are given in hex.)
Instruction
|
Mode
|
Value loaded into AC
|
Load #0x500
|
Immediate
|
|
Load 0x500
|
Direct
|
|
Load (0x500)
|
Indirect
|
|
Load 0x500(R1)
|
Indexed
|
|
A.6) External memory accesses tend to take a large portion of the
execution time of an instruction. Complete the following table by
giving the total number of
external memory accesses that would be
required in the execution of a MOV D, S instruction (where D is
destination and S is source) assuming D and S are addressed as
given.
Include the initial
fetch of the MOV
instruction as one of the memory accesses.
|
Addressing for S |
Addressing for D |
# of memory accesses |
a) |
Immediate |
Direct |
|
b) |
Register |
Indirect |
|
c) |
Direct |
Indirect |
|
d) |
Register |
Direct |
|
e) |
Immediate |
Register |
|
f) |
Register |
Register Indirect |
|
A.7) Many modern computers do
not provide "memory indirect" addressing
(just listed as indirect addressing in the text and the handout),
but do provide all the other addressing modes we talked about.
Give a sequence of two Load instructions that would do the same
thing as the following instruction (which uses memory indirect
addressing for its source):
Load R1, (X)
(where X is a memory
address)
without using memory indirect addressing.
Also, give a comment
next to each instruction in your answer that states what
addressing mode it uses for the source of the instruction.
A.8) Consider the following program:
500 |
|
MOV R1, X |
501 |
|
MOV R2, Y |
502 |
|
MOV R3, Z |
503 |
|
PUSH R1 |
504 |
|
CALL 550 |
505 |
|
POP R3 |
506 |
|
CALL 560 |
507 |
|
HALT |
·
|
|
·
|
550 |
|
PUSH R2 |
551 |
|
CALL 560 |
552 |
|
POP R2 |
553 |
|
RET |
·
|
|
·
|
560 |
|
PUSH R3 |
561 |
|
PUSH R1 |
562 |
|
POP R2 |
563 |
|
POP R1 |
564 |
|
RET |
|
|
a) Give a diagram like those handed out in class that
shows the values on the stack after the
execution of each of the underlined instructions. Label each
stack image with the appropriate underlined instruction
address -- note that the first two are already labeled.
Also note that sometimes an underlined instruction will be
executed twice -- give a separate diagram for each time
the instruction is executed. Assume that call and
return instructions use the stack. Let the
initial value of memory locations X, Y, and Z be 150, 250,
and 350 respectively. Assume that the program begins
execution with the instruction at address 500.
b) What is the contents of each of R1, R2 and R3 after
the execution of the program (i.e., when the program
reaches the instruction at address 507? (be careful
here!)
|
A.9) In some instruction sets, call and return instructions make
use of a RR register to hold the return address rather than doing
so on the stack. Show how the code in the above problem should be
changed if call and return instructions used a RR register for
return addresses rather than the stack. Don't change the program's
function or its call graph (i.e. don't change the routines called
or the order in which they are called and don't change what each
routine does).
5.24) (4th ed. 5.24, 3rd ed. 5.20)
a) A nonpipelined system takes
100ns to process an instruction. The same instructions can be
processed in a 5-segment pipeline with a clock cycle of 20ns.
Determine the speedup ratio of the pipeline for 100
instructions. Assume that the pipeline is initially empty when
the 100 instructions begin execution, and assume that there are
no branches and no data-dependencies or resource dependencies.
b) What is the maximum theoretical speedup that could be
achieved with this pipeline system over a nonpipelined system?
Also remember that nano = 10-9 so that 1 sec =
1,000,000,000 ns!
5.29) (4th ed. 5.29, 3rd ed. 5.25)
Suppose an instruction takes four
cycles to execute in a nonpipelined CPU: one cycle to fetch the
instruction, one cycle to decode the instruction, one cycle to
perform the ALU operation, and one cycle to store the result. In
a CPU with a 4-stage pipeline, that instruction still takes four
cycles to execute, so how can we say the pipeline speeds up the
execution of the program?
Make sure your answer describes
how the pipeline actually speeds up execution
while addressing the concerns raised in the question.
A.10) Pipelines
a) List the factors that can keep
a k-stage pipeline from providing a true k times speedup.
b) Why would you expect that a pipeline with more stages will
usually have a larger branch penalty (the number of cycles that
no useful work is getting done)?
c) What is a data-dependency? Describe both what it is and state how it
can slow down a pipeline.
- Set #1 (Due Wed Apr 9) A.1, A.2, A.3, 5.17 a) (5.15
in the 3rd ed.), A.4, A.5, 5.2
-Note, the reference "5.17 a)" above refers to part a of
exercise number 17 of Chapter 5 (A Closer Look at Instruction
Set Architectures) in the 5th edition text. Where the 3rd
edition or 4th edition text problem number is different
than the 5th edition number, the differences will be given in
parenthesis.
A.1) Write the code needed to
execute the following expression: X = (A + B × C) / (D - E × F) on
--
a) a
3 address machine.
Use instructions like those
given in the 3 address part of example 5.7 (Example 7 in
chapter 5) on page 303. Assume that you can have at most 2
memory addresses, and no more than three register addresses in
each instruction.
b) a
3 address load-store machine.
In a load-store machine, the
only instructions that can access memory are load and store
and these instructions take only one memory address and one
register (i.e. Load R2,
X). All the instructions that carry out arithmetic
and logical operations are register-to-register and specify at
most three register addresses (and no memory addresses). For
example, the instruction: Sub R1, R2, R3 subtracts the
contents of register R3 from the contents of register R2 and
puts the result in register R1.
c) a
2 address machine.
Use instructions like those
given in the 2 address part of example 5.7. Assume that you
can have at most 1 memory address, and no more than two
register references in each instruction.
d) a
1 address machine that has only an AC (accumulator)
register.
Use instructions like those
given in the 1 address part of example 5.7. Hint: This part can be done without
using any temporary memory locations other than X.
When doing these problems...
- Note that only part b has the load-store restriction. That
is, in the other parts, instructions other than load and store
are allowed to reference memory.
- Make sure you remember
operation precedence -- i.e. multiplies and divides
are computed before adds and subtracts unless parentheses
dictate otherwise -- so, in the above, B × C takes place
before adding A etc.
- Note that DIV and SUB are not commutative, so operand order
in the instruction is important. In these problems, use a left
to right order, that is dest
<- src1 OP src2. So, SUB R3,X,Y would imply that R3 <- X - Y and DIV R3,X,Y would
imply that R3 <- X/Y.
- Try to use the least number of instructions possible.
- Assume that A, B, C, D, E, F and X represent memory
addresses.
- It is good programming practice to not modify the sources,
so only read from A, B, C, D, E, and F.
- You may need to use additional locations in memory
to temporarily hold intermediate results. Let U, T and Y
represent these temporary locations.
- Let R1, R2, ..., etc. represent registers.
- In each case try to use the fewest registers and the fewest
temporary locations possible, and remember that registers are
better than memory for temporary storage.
A.2) RPN and Stacks
a) Write the expression given in
problem A.1 in postfix
notation.
b) Write the code needed to execute this expression on a stack
based machine.
- Use instructions like those given in the 0 address part of
example 5.7.
- On such a machine, most instructions are zero address
instructions.
- Binary operations (operations that take two operands)
operate on the top two elements of the stack, replacing
those top two elements with the result of the operation.
- Assume that we have the operations ADD (add top two
elements of the stack), SUB (second (from the top) element
of the stack - top element of the stack), MUL (multiply top
two elements of the stack) and DIV (second element of the
stack / top element of the stack).
- The machine also has the instructions PUSH X (pushes data
from memory at address X on the top of the stack) and POP X
(removes the first element on the top of the stack and
places it in memory at address X).
Use as few operations as possible.
A.3) Assuming that the instruction set is created as given in
Example 5.8 (on page 306 in the text), is the 16-bit instruction
represented by the hex value 0xFE64 a zero-address, one-address,
two address, or three-address instruction? _______ Assuming the 4
bit address fields in the instruction are used to select
registers, what registers (if any) does this instruction select?
______
5.17 a) (Based on text Exercise 17 at the end of chapter 5, part
a) (5.15 in the 3rd ed.)
Show how to encode the opcodes for an set of instructions that are
all
11 bits long, that have
4-bit address fields
and provides:
5 2-address instructions,
45 1-address instructions,
32 0-address instructions.
Do this by
showing the range of opcodes needed for the two, one and zero
address instructions as was done in Example
5.8.
Also,
list the opcodes that were not needed in your
approach in order to meet the problem requirements, if any.
A.4) Consider the following program.
XOR R1, R2
XOR R2, R1
XOR R1, R2
Where R1 and R2 are registers and the XOR instruction performs a
bit-wise exclusive-or function.
a) Assume that the contents of
registers R1 and R2 are initially
11001011b and
11100101b
respectively. Give the values that will be in each of R1
and R2 after the execution of each instruction. Put your answer
in the form of the following table:
|
R1
|
R2
|
original values:
|
1100 1011b
|
1110 0101b
|
XOR R1, R2
|
|
|
XOR R2, R1
|
|
|
XOR R1, R2
|
|
|
b) Describe the overall function of this program. (Hint: compare
what finally ends up in R1 and R2 with their original values.)
c) Give a program using only MOV instructions (i.e. MOV R1, R2)
that accomplishes the same overall function as the above
program. MOV R1, R2 simply moves the contents of R2 into R1.
A.5) A 32 bit quantity is stored in
byte-addressable
memory.
a) How many bytes of memory does
the 32-bit quantity use?
b) If the address of the 32-bit quantity is given as 0x5C4A1234, give the range
the addresses of the bytes that store it. That is, what is
the lowest address and highest address that refers to any of the
bytes in the 32-bit quantity.
5.2) (Based on text Exercise 2 at the end of chapter 5) Fill in
the following table (or one like it) to answer this question
showing what bytes would be at each memory address in each case.
B.E = big endian and L.E = little endian. Note that Figure 5.1
(Figure 1 in chapter 5) in the text only shows the two LSbs (least
significant bits) of the memory address.
|
Address
|
0x10
|
0x11
|
0x12
|
0x13
|
a)
0x456789A1
|
B.E.
|
|
|
|
|
L.E.
|
|
|
|
|
b)
0x0000058A
|
B.E
|
|
|
|
|
L.E.
|
|
|
|
|
c)
0x14148888 |
B.E.
|
|
|
|
|
L.E.
|
|
|
|
|