# A Comparative Study of Various VLSI Architecture for Discrete Wavelet Transform

## Ms. Rashmi Patil, Dr.M.T.Kolte

Abstract— The wavelet transform has itself as a useful tool in the field of 1-dimensional and 2-dimensional signal compression systems. Due to the growing importance of this technique, there is an increasing need in many working groups for having a development environment which could be flexible enough and where the performance of a specific architecture could be measured, closer to reality rather than in a theoretical way. Our work is new, simple and efficient VLSI architecture for computing the Discrete Wavelet Transform (DWT).Various methodologies with their architectures has been discussed for the computation of DWT. As compared to the other architectures systolic array architecture is efficient because of efficient hardware utilization, and it works with data streams of arbitrary size.

*Index Terms*— FIR filter, hardware efficiency, low area, low power, systolic array

## I. INTRODUCTION

In recent years there has been increasing important requirement to address the bandwidth limitations over communication networks. The advent of broadband networks (ISDN, ATM, etc.) [1, 2] as well as compression standards such as JPEG, MPEG, etc. is an attempt to overcome that's limitations. With the use of more and more digital stationary and moving images, huge amount of disk space is required for storage and manipulation purpose. Image compression is very important in order to reduce storage need. The application of compression includes high definition television, video conferencing, and multimedia communication [3, 4].

Redundancies in video sequence can be removed by using Discrete Cosine Transform (DCT) [5], DCT suffers from the negative effects of blackness and mosquito noise resulting in poor subjective quality of reconstructed images at high compression.

Wavelet techniques represents real life non stationary signal which is powerful technique for achieving compression [6, 7]. Wavelet based techniques has efficient parallel VLSI implementation. Low computational complexity, flexibility in representing non stationary image signals. In order to meet the real time requirements in many applications, design and implementation of DWT is required.

Wavelet based techniques have the following features:

• Basic function matches the human visual profiles resulting in high quality of reconstructed images.

#### Manuscript received December 15, 2014.

Ms.Rashmi Patil received her B. Eng. Degree in Electronics & Communication from S.S.G.M.C.E. Shegaon, India

**Dr.M.T.Kolte** has completed his B.Tech, M.E., and Ph.D in Electronics and Telecommunication. He is working as Head of Dept.in M.I.T.C.O.E., Pune, India.

- Flexibility in representing non stationary image signals.
- Multiresolution representation which allows direct scalable access to the data or any sub-set of the data [8].
- Low computational complexity.
- Efficient parallel VLSI implementation.

## II. DISCRETE WAVELET TRANSFORM

Wavelet is a small wave whose energy is concentrated in time. Properties of wavelets allow both time and frequency analysis of signals.

DWT which is based on sub band coding, is fast computation wavelet transform. It is easy to implement and reduces the computation time and resources required. In the case of DWT, a time scale representation of the digital signal is obtained using digital filtering techniques. The signal to be analyzed is passed through filters with different cutoff frequencies at different scales.

Wavelets can be realized by iteration of filters with scaling. The resolution of the signal, which is a measure of the amount of detail information in the signal, is determined by the filtering operations, and the scale is determined by up sampling and down sampling (subsampling) operations.

A schematic of three stage DWT decomposition is shown in fig.1.



Figure 1.Three Stage DWT Decomposition using Pyramid Algorithm

In fig.1, the signal is denoted by the sequence a[n], where n is an integer. The low pass filter is denoted by L1 while the high pass filter is denoted by H1. At each level, the high pass filter produces detail information; b[n], while the low pass filter associated with scaling function produces coarse approximation, c[n].

Here the input signal a[n] has N samples. At the first decomposition level, the signal is passed through the high pass and low pass filters, followed by sub sampling by 2.The output of the high pass filter has N/2 samples and b[n].These N/2 samples constitute the first level of DWT coefficients. The output of the low pass filter also has N/2 samples and

#### A Comparative Study of Various VLSI Architecture for Discrete Wavelet Transform

c[n]. The signal is then passed through low pass and high pass filters for further decomposition. The output of the second

$$= g(0)e(0) + g(1)e(-4) + g(2)e(-8) + g(3)e(-12) + g(4)e(-16) + g(5)e(-20)$$
(2m)  

$$g(0) = h(0)e(0) + h(1)e(-4) + h(2)e(-8) + h(3)e(-12) + h(4)e(-16) + h(5)e(-20)$$
(2n)

low pass filter followed by sub sampling has N/4 samples and e[n].The output of the second high pass filter followed by subsampling has N/4 samples and d[n].The second high pass filter constitutes the second level of DWT coefficients. The low pass filter output is then filtered once again for further decomposition and produces g[n], f[n] with N/8 samples. The filtering and decimation process is continued until the desired level is reached. The maximum number of levels depends on the length of the signal.

### III. DATA DEPENDANCIES WITHIN DWT

The wavelet decomposition of a 1-D input signal for three stages is shown in fig 1. The transfer functions of the sixth order high pass (g(n)) and low pass h(n)) filter can be expressed as follows:

$$High(z) = g_0 + g_1 z^{-1} + g_2 z^{-2} + g_3 z^{-3} + g_4 z^{-4} + g_5 z^{-5}$$
(1a)

$$Low(z) = h_0 + h_1 z^{-1} + h_2 z^{-2} + h_3 z^{-3} + h_4 z^{-4} + h_5 z^{-5}$$
(1b)

For clarity the intermediate and final DWT coefficients in fig.1 are denoted by a, b, c, d, e, f, and g. The DWT computations is complex because of the data dependencies sat different octaves.Eq.2a-2n shows the relationship among a, b, c, d, e, f, and g.

1<sup>st</sup> octave:

$$b(0) = g(0)a(0) + g(1)a(-1) + g(2)a(-2) + g(3)a(-3) + g(4)a(-4) + g(5)a(-5)$$
(2a)

$$b(2) = g(0)a(2) + g(1)a(1) + g(2)a(0) + g(3)a(-1) + g(4)a(-2) + g(5)a(-3)$$
(2b)

$$b(4) = g(0)a(4) + g(1)a(3) + g(2)a(2) + g(3)a(1) \\ + g(4)a(0) + g(5)a(-1)$$
 (2c)

$$b(6) = g(0)a(6) + g(1)a(5) + g(2)a(4) + g(3)a(3) + g(4)a(2) + g(5)a(1)$$
 (2d)

$$c(0) = h(0)a(0) + h(1)a(-1) + h(2)a(-2) + h(3)a(-3) + h(4)a(-4) + h(5)a(-5)$$

$$c(2) = h(0)a(2) + h(1)a(1) + h(2)a(0) + h(3)a(-1) + h(4)a(-2) + h(5)a(-3)$$
(2f)

$$\begin{array}{lll} c(4) & = & h(0)a(4) + h(1)a(3) + h(2)a(2) + h(3)a(1) \\ & & + h(4)a(0) + h(5)a(-1) \end{array} \tag{2g}$$

$$\begin{array}{lll} c(6) & = & h(0)a(6) + h(1)a(5) + h(2)a(4) + h(3)a(3) \\ & & + h(4)a(2) + h(5)a(1) \end{array} \tag{2h}$$

2<sup>nd</sup> octave:

| d(0)                | = | g(0)c(0) + g(1)c(-2) + g(2)c(-4)  |      |
|---------------------|---|-----------------------------------|------|
|                     |   | + g(3)c(-6) + g(4)c(-8) +         |      |
|                     |   | g(5)c(-10)                        | (2i) |
| d(4)                | = | g(0)c(4) + g(1)c(2) + g(2)c(0) +  |      |
|                     |   | g(3)c(-2) + g(4)c(-4) + g(5)c(-6) | (2j) |
| $\langle 0 \rangle$ |   | 1(0)(0) = 1(1)(0) = 1(0)(1)       |      |

$$e(0) = h(0)c(0) + h(1)c(-2) + h(2)c(-4)$$

$$\begin{array}{rcl} & & + h(3)c(-6) + h(4)c(-8) + \\ & & h(5)c(-10) & (2k) \\ e(4) & = & h(0)c(4) + h(1)c(2) + h(2)c(0) + \\ & & h(3)c(-2) + h(4)c(-4) + h(5)c(-6) & (2k) \\ \end{array}$$

3<sup>rd</sup> octave:

## IV. LITERATURE REVIEW

DWT requires intensive computations; several architectural solutions using special purpose parallel processor have been proposed [9]-[10] in order to meet the real time requirement in many applications. The solutions include parallel filter architecture, SIMD linear array architecture, SIMD multigrid architecture [11], [12], 2-D block based architecture, and the AWARE'S wavelet transform processor (WTP) [13]. The first three architectures, namely the parallel filter architecture, SIMD linear array architecture, SIMD multigrid architecture are special purpose parallel processors that implement the high level abstractions of the pyramid algorithm. The 2-D block based architecture VLSI implementation that uses four multiply and accumulates (MAC) units to execute the forward and inverse transforms. It requires a small on-chip memory and implements 2-D wavelet transform directly without data transposition. However, this feature can be a drawback in certain applications. In addition the block based architecture may introduce block boundary effects degrading the visual quality.

The AWARE'S WTP is capable of computing forward and inverse wavelet transfers for 1-D input data using maximum of six filter coefficients. It can be cascaded to execute transforms using higher order filters. The WTP has been clocked at speed of 30MHz and offers 16 bits precision on input and output data. The WTP computations executed in a synchronous pipeline fashion and are under complete user control. However, the AWARE'S WTP is a complex design requiring extensive user control. Programming such a device is therefore tedious, difficult, and time consuming.

The 1-D DWT and inverse DWT (IDWT) architectures are classified into three categories: convolution based, lifting based, and B-spline based [14]. They are discussed in terms of hardware, complexity, critical path, and registers. The 2-D DWT architectures are categorized and analyzed by different external memory scan methods. The implementations issues of the internal buffer and some real-life experiments shows that the area and power for the internal buffer are highly related to memory technology and working frequency, instead of the required memory size only. As for the 2-D DWT, the large amount of the frame memory access and the die area occupied by the embedded internal buffer become the most critical issues [14].

Flipping structure for the discrete wavelet transform was proposed by the Huang et al., which aimed at shortening the critical path of the lifting-based 1-D architecture and reducing the number of pipeline register used in 1-D architecture as well as the size of temporal buffer (TB) required in the line-based 2-D architecture [15]. An improved method of mapping the registers used in 1-DWT architecture to the TB required in LBA2DDWT reduces more efficiently the size of memory required in LBA2DDWT than Haung's method. A new parallel-based lifting scheme (PLS) has many advantages over the conventional lifting scheme and the flipping structure for DWT and its VLSI architecture. The PLS of DWT not

(2e)

only reduces efficiently the critical path but also results in the quality of forward DWT and IDWT implementations. Compared with Haung's method, the PLS is more efficient in reducing critical path and no. of registers of VLSI implementation for 1-DWT, and it can be concluded that the flipping structure of DWT is a special case of the PLS based implementation [15].

The on-chip line buffer dominates the total area and power of line-based 2-D DWT [16]. It consists of two parts, the word length analysis methodology and the multiple-lifting scheme. The required word length of on-chip memory is determined firstly by the use of word length analysis methodology, and memory efficient VLSI implementation scheme. The word length analysis methodology can guarantee to avoid overflow of coefficients, and the average difference between predicted and experimental quality level is only 0.1 dB in terms of PSNR. The multiple-lifting scheme can reduce not only at least 50% on-chip memory bandwidth but also about 50% area of line buffer in 2-D DWT module [16].

An efficient multi-input/multi-output VLSI architecture (MIMOA) for 2-D lifting based DWT provides a variety of hardware implementations to meet different processing speed requirements with controlled increase of hardware cost and simple control signals. The MIMOA designed for 1-level 2-DWT can be easily extended to construct the architecture for multi-level 2-2DWT in future works. To evaluate the performance of MIMOA architecture, different 2-D DWT architectures have been compared. The results have demonstrated that the MIMOA has good performance in terms of reduction of computing time and hardware has a good performance in terms of the reduction of computing time and hardware cost, which will be an efficient alternative for future high speed application [17].

Processing core architecture for the implementation of the DWT, optimized for throughput, scalability and programmability on the RISC architecture with an instruction set specifically designed, to facilitate the implementation of wavelet-based applications and a memory controller optimized for the memory access pattern of DWT processing [18].

But MIMIOA requires more power comparing other lifting based techniques. Due to requirement of more hardware in RISC architecture, this architecture becomes bulky and costly.

A systematic high speed VLSI implementation of DWT based on hardware-efficient parallel FIR filter structure can be easily achieved for an N×N image with controlled increase of hardware cost. Compared with recently published 2-D DWT architectures with computation time of N<sup>2</sup>/3 and 2N<sup>2</sup>/3, these designs can also save a large amount of multipliers and /or storage elements. The throughput rate can be improved by a factor of 4 by the proposed approach, but the hardware cost increases by a factor of around 3 [19].

Exploiting the inherent symmetry of DWT algorithm and consequently storing only the non-repetitive combinations of filter coefficients, the size of required memory can be significantly reduced. Subsequently, a memory-efficient architecture for DWT/IDWT occupies 6.5mm2 silicon area and consumes  $46.8\mu$ W power at 1 MHz for 1.2V using 0.13 $\mu$ m standard cell technology [20].

There is clear need for designing and implementing a DWT chipset that explores the potential of DWT particularly in the

area of decomposition algorithm and hardware implementation which operates in turnkey fashion. Here, the user is required to input only the data stream and the high-pass and low-pass filter coefficients [8].

There is efficient systolic array architecture for computing DWT [21]. This VLSI architecture computes both high pass and low pass frequency coefficients in the same clock cycle and thus has efficient hardware utilization. The design is simple, modular, and cascadable for computation of 1-D or 2-D data streams of fairly arbitrary size. It requires a small on-chip interface circuitry for purpose of interconnection to a standard communication bus [22], [23].

## V. CONCLUSION

This paper is an excellent resource for the implementation and discussion throughout the research process and provides possible explanation for DWT. various methodologies with their architectures has been discussed for the computation DWT. As compared to other architectures systolic array architecture is efficient because of efficient hardware utilization, and it works with data streams of arbitrary size. The design is cascadable for computation of one, two and three decomposition level. The main aim for our work is to decompose the data upto third level; our proposed architecture is best option.

The DWT-SA architecture does not use any external or internal memory modules to store the intermediate results and therefore avoids the delays caused by access, read, write and refresh timing. In addition, there is no need for complex control circuitry to put the intermediate products in and out of the memory as a set of registers controlled by a global clock is employed. This results in a simple and efficient systolic implementation for the computation of 1-D DWT and hence is suitable for VLSI implementation.

## REFERENCES

[1] N. Jayant, "High Quality Networking of Audio-Visual Information", IEEE Communications Magazine, pp. 84-95, September 1993.

[2] E. A. Fox, "Advances in Interactive Digital Multimedia Systems", IEEE Computer Magazine, pp. 9-21, October 1991.

[3] P. H. Ang, P. A. Ruetz and D. Auld, "Video Compression Makes Big Gains", IEEE Spectrum Magazine, pp. 16-19, October 1991.

[4] A. K. Jain, "Image Data Compression: A Review", Proceedings of the IEEE, Vol. 69, No. 3, pp. 349- 389, March 1981.

[5] K. R. Rao and P. Yip, Discrete Cosine Transform -Algorithms, Advantages, Applications. Academic Press: San Diego, 1990.

[6] H. G. Musmann, P. Pirsch and H. J. Grailerr, "Advances in Picture Coding", Proceedings of the 1EEE, Vol. 73, No. 4, pp. 523-548, April 1985.
[7] A. bl. Netravali and B. G. Haskell, Digital Pictures, Plenum: New York, 1989.

[8] FBI Systems Technology Unit: "Gray Scale Fingerprint image Compression Status Report, Nov. 4 1991.

[9] K. K. Parhi and T. Nishitani, "VLSI architectures for discrete wavelet transforms", IEEE Trans. On VLSI Systems, pp. 191-202, June 1993.

[10] Y. Kang, "Low-power design of wavelet processors", Proc. of SPIE, vol. 2308, pp. 1800-1806, 1993.

[11] M. Vishwanath, "Discrete wavelet transform in VLSI", Proc. IEEE Int. Conf. Appl. Specific Array Processors. pp. 218-229, 1992.

[12] C. Chakrabarti, M. Vishwanath and R. M. Owens, "Architectures for wavelet transforms", Proc. IEEE VLSI Signal Processing Workshop. pp. 507-515,1993.

[13] Aware Wavelet Transform Processor (WTP) Preliminary, Aware Inc., Cambridge, MA

[14] Chao-Tsung Huang, Po-Chih Tseng, and Liang-Gee Chen," Analysis and VLSI Architecture for 1-D and 2-D Discrete Wavelet Transform", IEEE Transactions on signal processing, vol. 53, No. 4, April 2005

[15] Cheng-Yi Xiong, Jin-Wen Tian, and Jian Liu," A Note on "Flipping Structure: An Efficient VLSI Architecture for Lifting-Based Discrete Wavelet Transform", IEEE transactions on signal processing, vol. 54, No. 5, May 2006

[16] Chih-Chi Cheng, Chao-Tsung Huang, Ching-Yeh Chen, Chung-Jr Lian, and Liang-Gee Chen," On-Chip Memory Optimization Scheme for VLSI Implementation of Line-Based Two-Dimentional Discrete Wavelet Transform", IEEE Transactions on circuits and systems for video technology, vol. 17, no. 7, July 2007

[17] Xin Tian, Lin Wu, Yi-Hua Tan, and Jin-Wen Tian," Efficient Multi-Input/Multi-Output VLSI Architecture for Two-Dimensional Lifting-Based Discrete Wavelet Transform", IEEE transactions on computers, vol. 60, no. 8, August 2011

[18] Sze-Wei Lee, Soon-Chieh Lim," VLSI Design of a Wavelet Processing Core", IEEE transactions on circuits and systems for video technology, vol. 16, no. 11, November 2006

[19] Chao Cheng, Keshab K. Parhi," High-Speed VLSI Implementation of 2-D Discrete Wavelet Transform", IEEE transactions on signal processing, vol. 56, no. 1, January 2008

[20]Amit Acharyya, Koushik Maharatna, Bashir M. Al-Hashimi, Steve R. Gunn," Memory Reduction Methodology for Distributed-Arithmetic-Based DWT/IDWT Exploiting Data Symmetry", IEEE transactions on circuits and systems—ii: express briefs, vol. 56, no. 4, April 2009

[21] A. Grzeszczak, VLSI Architecture for Discrete Wavelet Transform, M.A.Sc. thesis, Department of Electrical Engineering, University of Ottawa, Canada, 1995.

[22]M.Nireesh Kumar, J.Hemanth, K.Durga Prasad," VLSI Implementation of DWT Using Systolic Array Architecture", International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-1, Issue-4, October 2012

[23]Ms.Yamini S.Bute, 2Prof. R.W. Jasutkar," Implementation of Discrete Wavelet Transform Processor For Image Compression", International

Journal of Computer Science and Network (IJCSN) Volume 1, Issue 3, June 2012 www.ijcsn.org ISSN 2277-5420

**Ms.Rashmi Patil** received her B. Eng. Degree in Electronics & Communication from S.S.G.M.C.E. Shegaon, India in 2010 and M.Tech degree in Electronics from R.T.M.N.U. Nagpur, India. Currently she is a research scholar in B.D.C.O.E., Sevagram, India. Her area of interest are applied electronics, VLSI, VHDL, and Low Power optimization.

**Dr.M.T.Kolte** has completed his B.Tech, M.E., and Ph.D in Electronics and Telecommunication. He is working as Head of Dept.in M.I.T.C.O.E., Pune, India. He has presented and published many papers in National and International Conference.