Design of Hierarchical Architecture of Multilevel Discrete Wavelet Transform Using VHDL Language

The wide spread of devices th at use i mage processing in its functions, li ke cellular phone a nd digital camer as, increases the need for specialized processors for the se functions as a replacement for software programs that consume more time and resources. Thi s paper presents a hardware description for discrete wavelet transform (DWT) module in VHDL language. The design invol ves the forw ard DWT (fDWT) a nd its inverse (iDWT) characterized by variable number of transformation levels, ranging from one level to seven levels. Each one of thes e two modules is designed as hierarchical scheme that uses one-dimensional processing module twice to represent two-dimensional processing. The module ca n be used repeatedly on the same i mage for multilevel process ing. Three versions of the design are presented (v64, v128 and v256), each one adapted different image size. Synthesis process showed that the design frequency is about 56M Hz. The simula tion process showed that the maximum possible rounding error is about 0.012%. This resolution with the variable number of processing le vel adapts this design t o fi t in many applications. Final ly, a comparison of the proposed design with other related work is presented, considering performance and specifications.


Introduction
The increasing growth in demand for image and video data transmission, increase the stress upon developing new image compression technology.Among the several compression standards available, the JPEG image compression standard is in wide spread use today [1].JPEG uses the Discrete Cosine Transform (DCT) as the transform, applied to 8-by-8 blocks of image data.The newer standard JPEG2000 is based on the Discrete Wavelet Transform (DWT) [2].Wavelet Transform offers multiresolution image analysis, which appears to be well matched to the low level characteristic of human vision.The DCT is essentially unique but wavelet transform has many possible realizations [3].
The most basic wavelet transform is the Haar discrete transform as it encapsulates the basic concepts of wavelet transforms used today.The Haar transform works well (provides a relatively sparse wavelet representation) for signals that are approximately piecewise constant [4].
The Haar wavelet transform describes the image in terms of a coarse overall shape, plus some details that range from broad to narrow.The Haar wavelet transform is applied iteratively on an image to generate multilevel decomposition.At level l decomposition, 3l + 1 subbands are produced.
A one-level Haar wavelet transform decomposes the original image (see where, x(i) and x(i + 1) are the current and next pixels values of an image, respectively.
The original pixels values could be retrieved in iDWT process from the High and Low components according to the next Eq(3) and Eq(4) below: ....(3) ...(4) In this paper, the hardware description of DWT chip through the use of VHDL language is addressed.The DWT module specifications; (iDWT, fDWT, and theirs algorithms) are reviewed in the next section.Section three discusses simulation results, whereas final conclusions are presented in section four.

Design layout and algorithms
The overall design is composed of two-dimensional DWT module (DWT-2D) designed as synthesisable VHDL code, in addition to memory unit designed for simulation only.The DWT-2D module is composed of one-dimensional DWT module (DWT-1D) which represents the main part of the design (see Fig. 2).
The discussion and figures below focus on the fDWT architecture, however the design of the fDWT and iDWT are almost identical.This information is presented to the DWT-2D module through Control Bus.The image size is configurable only before programming the target device (FPGA, CPLD, etc.), that must be assigned to one of the next three values: 256,128 and 64.This make the final design comes out in three versions as shown later in simulation results.

PDF created with pdfFactory
Then DWT-2D module apply the internal reset signal int_Reset at the DWT-1D then provide it with the necessary information (through int_Control Bus) to perform horizontal or vertical pass, as follow: 1. Start address of memory space occupied by the current line of pixels.
2. Start address of memory space for temporary data storage for the current line of pixels.
4. Width of current line.(Equal to image size at level one, then halved at every new level).
5. Number of lines in current image section.(Equal to image size at level one, then halved at every new level).
The piece of code that employed to implement the reset phase and receiving address vector phase of DWT-1D module is shown below:  i+1) for the iDWT) are calculated.The calculations is performed according Eq(5) through Eq(8) instead of Eq(1) through Eq(4) to avoids the extra calculation needs to divide by the value .The piece of code that employed to implement Eq(5) and Eq(6) is shown below (note that the division by two is implemented as shift to right): The design employs a similar code in the iDWT for Eq(7) and Eq(8) wihout shift to right operation.
After manipulating one line of the input image, DWT-1D module generates int_Ready signal to the DWT-2D module which in turn provide the DWT-1D module with the information for the next line if any.If all lines in the Horizontal Pass are finished then the process of Vertical Pass is initiated.

Level updating
If more than one transformation level is required, the overall process is repeated.The DWT-2D module reinitializes the DWT-1D module with the necessary information of the new image ( which represent the LL section of the image from the previous level) to start new horizontal and vertical passes phases.When no more levels are exist, DWT-2D generate Ready signal to the outside indicating process end.The complete process is demonstrated in the flowchart shown in Fig. 3.

Memory Unit Design
Memory unit needs to be of size twice as the image size.For simulation purpose, a memory was modelled using non-synthesisable VHDL code for simulation only.During simulation, the recorded process time (see Table 3

Simulation Results
Simulation was implemented for various image sizes and transformation levels for each size.Fig. 4 shows three copies of "cameraman" image (256×256, 128×128 and 64×64 respectively) with two different transformation levels for each size.
To estimate the rounding error produced by the arithmetic operations, a comparison was made between the original image and the resultant image after the fDWT and iDWT processes.The comparison includes cases shown in Fig. 4

Conclusions
In this paper, a hardware description for discrete wavelet transform (DWT) module in VHDL language is presented.The modules performs 2-dimentional wavelet transform to images of size 64X64, 128X128, and 256X256 pixels.Synthesis process showed that the three versions is similar in the maximum frequency (about 56MHZ) and the used slice of the target device (about 600 slices), while the difference comes out in the number of clock cycles required for coding.In addition to image size, number of clock cycles is also dependent on the number of levels required and it ranges from 62,729 to 1,320,827clock cycles.The simulation process shows that a maximum rounding error of 0.012% was recorded in 7 th transformation level with image size 256×256.According to the comparison results, the proposed design takes longer time to perform processing but utilize less slices in the target device.The competitive size of this design helps the module to fit in many embedded and mobile applications, while it's fine resolution and the PDF created with pdfFactory Pro trial version www.pdffactory.com Fig. 1.a) into four subbands: LL, LH, HL and HH (see Fig. 1.b).The Haar wavelet coefficients are computed by Eq(1) and Eq(2
Figure( 1) Wavelet multiresolution property of an image: (a) represents original image, (b) a one-level decomposition produces 4 sub-bands, namely LL, LH, HL and HH, (c) a four-level decomposition produces 13 sub-bands

28, No.7, 2010 Design of Hierarchical Architecture of Multilevel Discrete Wavelet Transform Using VHDL Language
PDF created with pdfFactory Pro trial version www.pdffactory.comEng.& Tech.Journal, Vol.

Table ( 2) Device utilization summary and timing summary for the three versions of the design (as stated by the Place and route reports) Table (3) Number of clock cycles required for coding completion (for all possible permutation of image size and transformation level numbers).
PDF created with pdfFactory Pro trial version www.pdffactory.

Table ( 4) Comparison of the proposed design with other related works
PDF created with pdfFactory Pro trial version www.pdffactory.comEng.

) Layout of the proposed architecture for the fDWT
PDF created with pdfFactory Pro trial version www.pdffactory.comEng.& Tech.