# Speed Enhanced Multiprecision Multiplier Using Compressing Techniques

# Muhsina.J, Anju Iqubal

Abstract - In DSP systems, the performance of multiplier's plays a crucial role in determining the processor's performance. In this paper, a high speed multi precision multiplier with minimum area and power consumption is proposed. This multiplier also enables parallel processing so that it is possible to perform higher precision multiplications. The main focus of this paper is to increase the performance of the multipliers. The speed of a multiplier relies on generation of partial products. Here, it is suggested to use compressing techniques to improve the speed of multipliers. In addition to that scaling of supply voltage and frequency management are also done. This flexible multiplier combining variable precision processing, voltage and frequency management can be used efficiently to reduce circuit power consumption and delay. Simulation of results is done on ModelSim 6.3f and synthesis of power and area is done on Xilinx ISE Design 8.1.

#### Index Terms - DSP; multi precision; parallel processing

#### I. INTRODUCTION

Multipliers are the key components in digital signal processors, microprocessors, FIR filters etc. Since multipliers are the slowest element in the system, the performance of a system depends on its multipliers. Also high precision multipliers consumes large amount of area in DSP kits. Therefore it is important to optimize speed and performance of a multiplier. The process of multiplication includes the following three steps: 1. Generation of partial products. 2. Partial products are reduced to one row of final sums and one row of carries. 3. The final sums and carries are added to generate the result.

Generally multipliers are typically designed for fixed maximum word length to suit the worst case conditions. This would result in power loss thereby reducing the efficiency of a multiplier. Numerous works has been done for this word length optimization. Earlier, word length optimization was achieved by taking the advantage of routing the incoming operands to the smallest multiplier that can compute the result.

But it was an expensive method. Later a method of reusing the functional units was introduced as a solution to this problem. A dramatic reduction in power consumption can achieved by using error tolerant DVS [1]. Error tolerant DVS based on razor flip flop overcome the limitations of the conventional DVS. Combining MP multiplier with DVS can provide a dramatic reduction in power consumption by adjusting the voltage according to circuit's run-time, workload rather than fixing it to cater the worst case situations. The main focus of this paper is to propose a novel method to improve the multiplier's efficiency. By using 4:2 compressors, it is possible to optimize the delay introduced by the multiplier.

#### II. RELATED WORKS

### A. 4 subblock multiplier

FWM generate an output with the same width as the input. But in this case it is inefficient to perform a smaller precision multiplication in a high precision multiplier. Therefore a multi precision multiplier was designed.

Let U and V be 2n-bit wide multiplicand and multiplier respectively.  $U_H$  and  $V_H$  are the n-bit MSB's and  $U_L$  and  $V_L$  are the n-bit LSB's. The multiplication result can be expressed by the following equation:

$$P = (U_H V_H) 2^{2n} + (U_H V_L + U_L V_H) 2^n + U_L V_L$$
(1)

This equation reveals that multiplication process requires four n\*n multipliers are required.

Comparison of this 4-subblock multiplier with conventional FWM shows that this would overheads of 13% and 18% for the power and silicon area respectively. This resulted in working with 3-subblock multipliers.



Fig.1. Block diagram of 4 subblock multiplier

#### B.3 subblock multiplier

MUHSINA.J, Department of ECE, Kerala University, Younus College of Engineering and Technology, Kollam, Kerala, India,

**ANJU IQUBAL**, Department of ECE, Kerala University, Younus College of Engineering and Technology, Kollam, Kerala, India,



Fig.2 Block diagram of Three subblock multiplier

In 3-subblock multiplier, it is defined as follows:

$$U1 = U_H + U_L$$
$$V1 = V_H + V_I$$

Then the equation for the product can be rewritten as

$$P = (U_H V_H) 2^{2n} + (U1 V1 - U_H V_H - U_L V_L) 2^n + U_L V_L(2)$$

From equation 2, it is clear that one n\*n bit multiplier and one 2n- bit adder is replaced by two n- bit adder and 2n + 2 bit subtractor. So inorder to perform 32- bit multiplication on a 16 bit multiplier it is only required to use two 34 bit subtractor. This results in the reduction of silicon area and power head of 4 subblock multipliers.

#### III. PROPOSED ARCHITECTURE

The selection of a multiplication algorithm depends on the application to be performed be a multiplier. Array based algorithm are the most commonly used due to its regular structure. In array multiplier the circuit is based on add and shift algorithm. The addition can be done by using normal carry propagate adder. With the objective of further improvement in the speed of the parallel multiplier Wallace tree algorithm was proposed.

In Wallace tree architecture the partial products are rearranged in tree like fashion so that the critical path and the number of adder cells to be used are reduced. In Wallace tree architecture all the bits in each column of the partial product are added simultaneously. A set of counters in each column is used to generate a new matrix of partial products. This method continues until a matrix of two rows is generated. First row represents the sum bits and the other row represent the carry bits. The most common counter used is 3:2 counter, which is a full adder circuit.

In the conventional Wallace tree multiplier, the first step is to form partial product array (of  $N^2$  bits). In the second step, groups of three adjacent rows each, is collected. Each group of three rows is reduced by using full adders and half adders. Full adders are used in each column where there are three bits whereas half adders are used in each column where there are two bits. Any single bit in a column is passed to the next stage in the same column without processing. This reduction procedure is repeated in each successive stage until only two rows remain. In the final step, the remaining two rows are added using a carry propagating adder. In a conventional Wallace multiplier, the number of rows in subsequent stages can be calculated as:

$$r_{i+1} = 2[r_i/3] + r_i \mod 3$$

Where,  $r_{\rm i} \mod 3$  denotes the smallest non-negative remainder of  $r_{\rm i}/3.$ 

The tree multiplier realizes substantial hardware savings for larger multipliers. The propagation delay is reduced as well. In fact, it can be shown that the propagation delay through the tree is equal to O (log (N)).

Usually Wallace tree multiplier algorithm is most commonly in the digital multiplier. The delay generated in Wallace tree circuit can be reduced by using approximate compressors. Compressors are used to accumulate the partial products in the multiplication process. In this technique, all columns of partial products are added in parallel without delaying for the carry signal from the previous column. Conventional full adder can be considered as a 3:2 compressor. But in a 3:2 compressor the path delay is irregular. This is due to the presence of two XOR gates in the circuit.

#### A. 4:2 compressor

As a solution to the delay of 3:2 compressor, a 4:2 compressor was proposed. The 4:2 compressor is built by connecting two 3:2 compressor in series.



Here, the architecture is connected in such a way that four of inputs are coming from the same bit position of the weight j while one bit is fed from the j-1 position. The outputs of 4:2 compressor consists of one bit in the position j and two bits in the position j+1. The output Cout, being independent of the

input Cin accelerates the carry save summation of the partial product.

In a conventional Wallace tree multiplier the five partial product bits is compressed into 4 and so on. But in a Wallace tree multiplier modified with a 4:2 compressor, compresses the five partial product into three. This enables to minimize the irregularity in the delay of a Wallace tree multiplier.

# IV. RESULTS AND DISCUSSIONS

The program code is done in the VHDL language. VHDL is a very powerful, high level, concurrent programming language.

## A. Simulation Result

ModelSim6.3f is used for the simulation of the code. Here x and y are the input bits and out 1 is the output bit.



Fig.5. Simulation result of 3 subblock multiplier modified with 4:2 compressor

## B. Area and power analysis

Area and power analysis of FWM ,4 subblock and 3 subblock multipliers running at 50 MHz frequency can done by using Xilinx ISE Design suite 8.1.

TABLE I

| AREA AND POWER ANALYSIS                             |           |                      |
|-----------------------------------------------------|-----------|----------------------|
| Scheme                                              | POWER(MW) | NO: OF GATES<br>USED |
| 32 BIT 4 SUBBLOCK MP MULTIPLIER                     | 134       | 33,191               |
| 32 BIT 3 SUBBLOCK MP MULTIPLIER                     | 104       | 25,773               |
| 32 BIT 3 SUBBLOCK MULTIPILER WITH 4:2<br>COMPRESSOR | 86        | 25,221               |

## V. CONCLUSION

Multipliers are the key component in digital circuits. Studies show that FWM are very much inefficient in DSP processors. So multiprecision multipliers which result in minimized area and power consumption is opted. Further, the speed of a multiprecision multiplier can be improved by using a modified Wallace algorithm. The modified algorithm uses a 4:2 compressor. Thus it is possible to develop a multiprecision multiplier with optimum area, power and delay.

## ACKNOWLEDGMENT

The authors would like to thank anonymous reviewers for their constructive comments and valuable suggestions that helped in the improvement of this paper.

## REFERENCES

- [1] Xiaoxiao Zhang, Farid Boussaid and Amine Bermak "32 Bit X 32 Bit Multiprecision Razor-Based Dynamic Voltage Scaling Multiplier With Operands Scheduler" IEEEtransaction on Very Large Scale integration(VLSI)systems., Vol. 22, NO. 4, April 2014.
- [2] Shaik.Kalisha Baba, D.Rajaramesh,"Design and Implementation of Advanced Modified Booth Encoding Multiplier," International Journal of Engineering Science Invention, Vol 51, No.3, pp. 1701-1717, August. 2013.
- [3] Laxmi Kumre, Ajay Somkuwar and Ganga Agnihotri, "Power efficient Carry propagate adder," International Journal of VLSI design & Communication Systems (VLSICS).Vol.4, No.3, June 2013.
- Neeta Sharma, Ravi Sindal, "Modified Booth Multiplier using [4] Wallace Structure and Efficient Carry Select Adder," International Journal of Computer Applications((0975 8887))., Volume 68 No.13, April 2013.
- [5] Xin Huang and Liangpei Zhang,"Variable Supply-Voltage Scheme for Low-Power High-Speed CMOS Digital Design," IEEE journal of solid-state circuits, VOL. 33, NO. 3, pp. 161-172, March 1998.
- [6] A. Bermak, D. Martinez, J.-L. Noullet,"High Density 16/8/4 configurable Multiplier," IEEE Pvoc.-Circuits Devices Syst., Vol. 144, No. 5, October 1997.

MUHSINA.J received B. Tech bachelor degree in ECE from Travancore



Engineering College under Kerala University in 2012. Currently she is pursuing M. Tech in Applied Electronics and Instrumentation from Younus College of Engineering and Technology under Kerala University, Kollam, Kerala. Her areas of interest in research are VLSI and Signal processing



ANJU IQUBAL received B.Tech bachelor degree in ECE from College of engineering kidangoor under CUSAT and received her master's degree M.Tech in Instrumentation and control system from TKM Engineering college Kollam unde kerala university.. She is working as Asso. Professor in Dept. of ECE, Younus College of Engineering and Technology, Kollam, Kerala. Her areas of interest are VLSI and systems, signal processing and Control embedded system.