# A HIGH-SPEED, PROGRAMMABLE, CSD COEFFICIENT FIR FILTER Zhangwen Tang, Jie Zhang and Hao Min ASIC & System State-Key Laboratory, Fudan University, Shanghai 200433, P.R. China Abstract—A new high-speed, programmable FIR filter is presented, which is a multiplierless filter with CSD encoding coefficients. In this paper, we propose a new programmable CSD encoding structure to make CSD coefficients programmable. Compared with the conventional FIR structure with Booth multipliers, this coding structure improves the speed of filter and decreases the area. In the end of this paper, we design a 10-bits, 18-taps video luminance filter with the presented filter structure. The completed filter core occupies 6.8-6.8 mm² of silicon area in 0.6-µm 2P2M CMOS technology, and its maximum work frequency is 100MHz. Index terms— Finite Impulse Response filter, Application Specific Integrated Circuit, Canonic Signed-digit Encode, Booth Multiplier, Wallace Adder Tree. ### I. INTRODUCTION Finite Impulse response (FIR) filters have been used in consumer electronics products more and more widely, such as video and communication circuits. Hence, higher performance in speed and area is demanded. The traditional FIR filter structure [1], as shown in Fig. 1, has not yet met the high-speed demand of high performance systems because of the limit of multiplier and adder circuits in speed and area. The transform function of FIR filter is described by $$y(n) = \sum_{k=0}^{M-1} h(k)x(n-k)$$ (1) The critical path of the FIR filter in Fig. 1 is $\Gamma_M+MT_A$ , $T_M$ is the delay of one multiplication, $T_A$ is the delay of one adder, and M is the tap number. It is evident that the critical path is rapidly increasing with the tap number of FIR filter. High-speed digital filtering applications (such as, sample rates in excess of 20MHz) generally require the use of custom application specific integrated circuits (ASICs), because programmable signal processors (such as DSPs) Fig. 1. Structure of traditional FIR filter cannot accommodate such high sample rates without an excessive amount of parallel processing. And for dedicated applications, the flexibility of a filter with high-speed multipliers [2] is not necessary. In this paper, we present a new high-speed, CSD coefficient FIR filter structure. Through studying CSD coefficient filters, Booth multipliers and high-speed adders, we propose a new programmable CSD encoding structure to make CSD coefficients programmable. With this structure, we can implement any order of high-speed FIR filters, and the critical path is almost not proportional to the tap number. In the following, we will respectively address CSD encoding in FIR filters in the second part, programmable CSD encoding structure in the third part and the structure of partial-product adder tree in the fourth part. In the end, we adopt the presented structure to implement a 10-bits, 18-taps video luminance filter. #### II. CSD ENCODING IN FIR FILTER In many applications, coefficients are fixed in FIR filter. Thus we can simply shift the data bus to the left or right by an appropriate number of bits and employ a small number of adders/subtracters instead of multipliers. The resulting hardware complexity is a small fraction of the complexity of a general filter with multipliers and thus a significantly larger number of taps can be integrated into a single chip. As we all know, any fraction can be described by [3]. $$x = \sum_{k=1}^{L} s_k 2^{-p_k} \tag{2}$$ Contributed Paper This work is supported by AM (Applied Material) Funds (NO. 0108) of shanghai, in P.R.China. Zhangwen Tang, Jie Zhang and Hao Min are with ASIC & System State-key Laboratory, Fudan University, 220 Handan Road, Shanghai 200433, P.R.China. Fig. 2. Frequency response and CSD coefficients for 10bits, 18-taps luminance filter Fig. 3. Structure of one tap CSD encoding where $s_k \in \{-1,0,1\}$ and $p_k \in \{0,1,...,M\}$ . The representation given by (2) has M+1 total (ternary) digits and L nonzero digits. Canonic signed-digit (CSD) representation is defined as the minimal representation for which has no two nonzero digits sk being adjacent. Thus the number of adders/subtracters required to realize a CSD coefficient is one less than the number of nonzero digits in the fraction. For any coefficient in FIR filters can been translated into CSD coefficient [4], we develop a MATLAB program to generate and optimize the CSD coefficients of general FIR filter. The CSD coefficients and frequency response diagram for a 10-bits, 18-taps luminance filter are shown as Fig. 2. ## III. PROGRAMMABLE CSD ENCODING STRUCTURE The complexity of FIR filter can decrease rapidly with CSD coefficient multipliers instead of fixed coefficient multipliers, but the compatibility decreases too. In this paper we employ a new programmable CSD encoding structure to decrease the complexity and increase the compatibility. Fig. 3 is one-tap CSD encoding structure. Its CSD coefficients are generated by MATLAB program. There are no more than three nonzero digits in one CSD coefficient. The shift operation according to the position of nonzero digit is shown as Table 1. | Table 1. CSD Encoding Operation | | | | |---------------------------------|------------|--------------------|--| | CSD | Definition | Operation | | | Encoding | | | | | 00000 | 2° | Shift 0 bit | | | 00001-01110 | 2-1~2-14 | Shift 1 to 14 bits | | | 01111 | 0 | Zero | | | 10000 | -2° | Shift 0 bit and | | | | | Negate | | | 10001-11110 | 2-1~2-14 | Shift 1 to 14 bits | | and Negate Zero 11111 Table 1 CSD Encoding Operation The input of CSD encoding structure is 5-bits' signed binary number. MSB is a signed bit, which represents negating operation. Four low bits represent the number of shifting right. If four low bits are all one "1111", the output is all zero. In the end of this structure, three outputs (partial products) of one tap are added together. O Employing the above programmable CSD encoding structure, the partial-product number and the internal data length both decrease. As we all know, Nbits×Nbits Booth multiplier has [N/2] partial products [5] [6] and the internal data length is 2N bits. However, CSD encoding structure has only three partial products and the internal data length is a little larger than N (usually, N+2 or N+4) to guarantee the truncation error less than quantization noise. Thus the programmable CSD encoding structure is more advantageous in the complexity and compatibility than filter structure with Booth multipliers. ### IV. PARTIAL PRODUCT ADDER TREE ## A. Wallace Adder Tree in Booth Multiplier Multiplier is a fundamental unit in digital signal processing circuits, and most of multipliers (Fig. 4) in [5] [6] employ modified Booth algorithm and parallel Wallace adder tree. It consists of three modules: Booth encoder, partialproduct adder array and final adder. By employing modified Booth algorithm, the number of partial products decreases to half. If the bit number of multiplier is N, then there are [N/2] partial products. The partial product adder array adds the whole [N/2] partial products and generates 2N-bits Carry and Sum. In the interest of improving the parallelity, it adopts 4:2 compression adder instead of 3:2 full adder in partial product adder array. In the end, the final adder adds the 2N-bits C (Carry) and S (Sum) and generates the multiplier product. ## B. Partial Product Adder Tree in FIR Filters In the FIR filter design, we propose a special adder tree for FIR filters through studying the above Wallace adder tree used in Booth multiplier. The partial-product adder tree used in FIR filters is shown as Fig. 5. There are three Fig. 4. Wallace adder tree in Booth multiplier Fig. 5. Adder array in FIR filter partial products in one-tap CSD encoding structure, thus M-taps FIR filter has 3M partial products. In order to add all these partial products, we need [log<sub>2</sub>3M] level partial-product adders (4:2 compression adder) to generate two partial products C (Carry) and S (Sum). ## C. Final Adder The critical path of partial product adder array in FIR filter is only the delay of [2×log<sub>2</sub>3M] serial full adders. However, final adder is a (N+2 or N+4) bits full adder (2 or 4 is the guarantee bits to decrease the truncation error), its delay is greater than the one in adder array. Thus we need to adopt carry-selected adder (CSA) with carry look-ahead adder (CLA). Carry look-ahead adder is shown as Fig. 6, which employs two 4-bits CLA to buildup one 8-bits CSA. It can be concluded that the critical path is one 8-bits CLA and some multiplexers, thus the adder of this structure can work at high speed. Fig. 6. Carry selected adder with carry look-ahead adder ## V. IMPLEMENTATION OF PROGRAMMABLE CSD FIR FILTER A programmable CSD coefficient filter (Fig. 7) is implemented with the above modules. The top is programmable CSD encoding structure, the middle is a 4:2 compression adder tree, and the bottom is a carry-selected adder with carry look-ahead adders. To eliminate the DC gain, the filtered signal is output to a DC gain cancelled module. The presented filter is more advantageous in the speed and area than the conventional filter with Booth multipliers. Performance comparison between two structures is shown in Table 2. $T_{ca}$ is the delay of one 4:2 compression adder. $T_{fa}$ is the delay of one final adder. From Table 2, it can be concluded that the critical path of the proposed structure is almost not proportional to the tap number M. Fig. 7. Programmable, CSD coefficient FIR filter Table 2 Performance comparison | Table 2. Performance comparison | | | | | |-----------------------------------|--------------------------------------------------------------------------|--|--|--| | Proposed | Conventional | | | | | structure | structure | | | | | 3 | [N/2] | | | | | | | | | | | N+2(4) | 2×N | | | | | | | | | | | [2×log <sub>2</sub> 3M] | [2×log <sub>2</sub> [N/2]] | | | | | $\times T_{ca}$ | ×T <sub>ca</sub> | | | | | [2×log <sub>2</sub> 3M] | $[2 \times \log_2[N/2]]$ | | | | | ×T <sub>ca</sub> +T <sub>fa</sub> | ×T <sub>ca</sub> +M*T <sub>fa</sub> | | | | | | Proposed structure $3$ N+2(4) [2×log <sub>2</sub> 3M] ×T <sub>ca</sub> | | | | In digital video encoder (DVE) system, we design a 10-bits, 18-taps high-speed luminance filter using the proposed FIR filter structure. Its frequency response diagram is shown as Fig. 2. To download the coefficients into FIR filters, we design a slave-mode I<sup>2</sup>C bus controller. The whole chip is implemented in 0.6-µm 2P2M CMOS technology. The filter core area is 6.8mm×6.8mm, the maximum work frequency is 100MHz. ### VI. CONCLUSION In this paper we present a new high-speed, programmable FIR filter, which is a multiplierless filter with CSD encoding coefficients. A new programmable CSD encoding structure is proposed to make CSD coefficients programmable. Compared with the conventional FIR structure with Booth multiplier, this coding structure improves the speed of filter and decreases the area. And we design a 10-bits, 18-taps video luminance filter with the presented filter structure. The completed filter core occupies 6.8×6.8 mm<sup>2</sup> of silicon area in 0.6-µm 2P2M CMOS technology, and its maximum work frequency is 100MHz. ## VII. ACKNOWLEDGMENT The authors wish to thank Wenhong Li for providing the help in using EDA tools and Yawei Guo, Cheng Chen for DRC and LVS of the layout of the whole chip. #### VIII. REFERENCE - [1] Zhou Yaohua et al., *Digital Signal Processing* (Fudan University Press, Shanghai, 1996), pp.65. - [2] Jun Rim Choi, Lak Hyun Jang, et al., "Structured Design of a 288-TAP FIR Filter by Optimized Partial Product Tree Compression", IEEE J. Solid-State Circuit vol 32, pp. 468-476, March, 1997. - [3] A.Avizienis, "Signed Digit Number Representation for Fast Parallel Arithmetic," *IRE Trans. Electron. Computers*, vol. EC-10, pp.389-400,1961. - [4] H. Samueli, "An improved search algorithm for the design of multiplierless FIR filters with powers-of-two coefficients", IEEE Trans. Circuits Syst., vol.36, pp. 1044-1047, July 1989. - [5] M. Nagamatsu et al., "A 15ns 32×32-bit CMOS Multiplier with an Improved Parallel structure," *IEEE J. Solid-State Circuit* vol 25, pp. 494-497, April, 1990. - [6] Junji Mori, et al, "A 10ns 54×54-b parallel structured Full Array Multiplier with 0.5-μm COMS Technology," *IEEE J. Solid-State Ciruit* vol 26. pp. 600-606, April, 1991. Zhangwen Tang received the B.S. and M.S. degree in Electrical Engineering from Fudan University, Shanghai, P.R.China in 1999 and 2001 respectively, and now is pursuing the Ph.D. from Fudan University. From 1999 he was with the ASIC and System State-Key Laboratory, where he worked on the design of DVE (Digital Video Encoder). His research interests include digital video signal processing, mixed signal VLSI deign and CMOS TV tuner. **Hao Min** received the B.S. and M.S. degree in electrical engineering from Fudan University, Shanghai, P.R.China in 1985 and 1988 respectively. In 1991 he received the Ph.D. degree in material science from Fudan. From 1996 to 1998 he was a visiting scholar in Stanford University, CA. He is now the director of ASIC and System State-key Laboratory. He has published more than ten technical papers in journals and conferences. His research interests include mixed signal VLSI design and digital signal processing.