Integrated Circuit Engineering 2

Design Specification 2003


4-bit Multiplier


Introduction

This year's exercise is to design a multiplier. The multiplier should accept two 4-bit operands and produce and an 8-bit result. The operands and results are all unsigned integers.

Possible solutions include single cycle multipliers and multi cycle multipliers with either pipelined or un-pipelined architectures.

The performance of the designs will be assessed in terms of latency (time to produce the first result) and throughput (rate at which subsequent results can be produced).

Multiplication & Multipliers

The simplest algorithm for multiplication involves repeated shift and add operations. In the figure below, the register A is initially loaded with the multiplicand and the register B is initially loaded with the multiplier while P which will contain the partial product is initially zero.

In the first cycle A is added to P if the least significant bit of B is '1'. A is shifted left while B is shifted right.

This process is repeated in subsequent cycles and completes when B=0. For a 4-bit multiplication the algorithm will complete in no more than 4 cycles.

The technique is simply one of long multiplication. Below you can see the long multiplication of two 4-bit values to produce an 8-bit result.

 Multiplicand         1011              11
 Multiplier           1101 x            13 x
                      ----             ---
                      1011              33
                     0000              11
                    1011               ---
                   1011                143
                  --------
 Product          10001111

The architecture shown above uses a 7-bit register for A and a 7-bit adder to produce an 8-bit result. This is an inefficient use of resources since we are never adding a value greater than four bits in length.

Alternative Multi-Cycle Implementation

A more efficient implementation uses a 4-bit adder and four 4-bit registers:

Note that PHI contains the four most significant bits of the partial product while PLO contains the four least significant bits.

The only disadvantage of this technique is that it takes 4 cycles regardless of the value of B (i.e. there is no opportunity to terminate the algorithm early where B is a small number).

Single Cycle Multiplier

By replicating the hardware used in each cycle we can perform the multiplication within a single cycle.

More adders are needed and the clock cycle time must be extended but the overall time to produce each result will be slightly reduced (since the overhead involved in storing data between cycles is lost).

Pipelining

By adding pipeline registers we can increase the clock speed and hence the throughput while having a minimal impact on the latency.

In the system shown a single set of pipeline registers are added to provide a two stage pipeline. If the work done in each pipeline stage is well balanced we may expect to nearly double the throughput compared to the single cycle design.

Your Design Choices

The four designs presented so far show a range of different approaches to the problem. You could base your design on one of these or you could base it on another from the available literature or from your imagination. Should you base your design on one of these four you should be aware that they are not optimal (for instance the standard multi-cycle implemention saves on registers by using the same space for B and for PLO, while no single cycle or pipelined implementation should actually include an adder with one of its inputs held at zero).

Your final design choice should balance the requirements for performance against the limited resources you have available in terms of chip area and manpower. A simple design which fits and is submitted on time will always receive more marks than a potentially high performing design that is either too big to fabricate or submitted after the fabrication deadline.

Input and Output Specification

The design must support the following inputs and outputs:

Inputs

Outputs

Note that the signal names given here are case sensitive and are fixed. If you name your READY signal Ready or ready then the design will fail to simulate and will be rejected for fabrication.

Datapath and Controller

As with any reasonably complex digital system you will need to divide and conquer when approaching the design. The first division that you should make is that between the datapath and the controller. The datapath includes the registers and functional units for manipulating the data, while the control unit will deal with STROBE, ACK and READY signals and provide any necessary control signals to the datapath (e.g. The alternative multi-cycle design would need a loadA signal to control the A register which is loaded at the start of the multiplication and thereafter must retain its value).

Bitslicing

You are strongly encouraged to bitslice your datapath. A properly bitsliced datapath will result in a smaller and neater design. More importantly it will reduce the complexity of the place and route task by an order of magnitude thus increasing the chance of your design being submitted on time. You should end up with four identical bitslices each containing one quarter of the datapath. Because of the various shift functions inherent in the multiplication algorithm this division into bitslices is not trivial but is worth the effort.

Note that the first of the example designs uses a 7-bit adder and would therefore be better divided into seven bitslices. All of the remaining example designs use 4-bit adders and are suited to the four bitslice approach.


Iain McNally

14-2-2003