• Simple multiplication algorithm.

– two n-bit numbers will produce a 2n-bit result.

#### • Implementation



- rather slow for single cycle operation

• Parallel Implementation



- should be faster<sup>1</sup>
- faster implementations exist using more complex building blocks

<sup>&</sup>lt;sup>1</sup>note that the overall delay is less than three adder delays due to the skew of rippling carries

• Multi-cycle implementation



- partial product is initially set to zero
- calculation completes after n cycles (or when B=0)

#### • Alternative implementation



• Alternative multi-cycle implementation



- A is not shifted giving a fixed width value on one input of the adder
- the top n-1 bits of a dedicated adder could be half adders

• Software implementation



- double word result
  - - use multi word shift left and add
- single word result
  - - check for overflow on each addition and each left shift of P

• Reduce adder widths



- each adder produces an (n+1)-bit result
  - - the least significant bit is fed straight to P
  - - the most significant n bits are fed to the next adder

• Multi-cycle implementation



- after each addition the least significant bit of the result is placed in P0 which is the least significant word of the partial product

• Algorithm Comparison



- justify each term before addition
- don't justify terms
  - - beginning with most significant term add each term to least significant slot and shift left
  - - beginning with least significant term add each term to most significant slot and shift right

• Standard implementation



- P0 uses the same register as B
- standard *n*-bit adder with carry out

• Multiplication of 2's complement numbers.

- allowing for a negative multiplicand requires only minor modifications (sign extension is used to retain sign information).
- coping with a negative multiplier is somewhat harder and will not be covered here.

• Possible implementation?
signed multiplicand (A) - unsigned multiplier (B)
Sign bit (arithmetic shift)



- we see that we need to perform an arithmetic shift on the result of the addition rather than using the carry out.
- unfortunately this doesn't account for a possible overflow (which will give the wrong sign)

• Standard implementation signed multiplicand (A) - unsigned multiplier (B)



- use overflow information to recover the correct sign information (all other bits will be correct when overflow occurs)

#### **ALU Functionality**

Where a single cycle multiply is not appropriate we may implement multiply as:

- A multi-cycle instruction
  - The 6809 MUL instruction performs a complete multiplication of two unsigned 8-bit numbers yielding a 16-bit result.

$$ACCA' : ACCB' \leftarrow ACCA \times ACCB$$

or

- A single-cycle multiply step instruction
  - The SPARC MULSCC P1, A, P1 performs a single multiply step instruction. The instruction uses a dedicated Y register which contains B/P0.

• SPARC implementation



- takes a step longer due to arrangement of registers and shifters.
- since neither signed nor unsigned numbers are dealt with properly in this system a few additional instructions are used to post-process (fiddle) the answer for signed or unsigned multiplication.