State Space Parallelization Method for a 16-Bit Turbo Encoder

Turbo codes are widely used in digital communication systems. Their ability to reach the Shannon channel capacity made it the choice for most of the communication systems. Due to the huge amount of the transmitted data, there is a need to increase the processing speed of the encoders. The researchers used the state space technique to enhance the throughput of the turbo encoder. They apply it to increase the turbo encoder throughput from one bit per cycle up to 8 bit per cycle. The researchers applied the state space method to a three-flip flop, eight state Recursive Systematic Convolution Code circuit to achieve their goal. In this paper, we explored the state space technique and applied it to a four flip-flop Recursive Systematic Convolution Code circuit so that we can achieve a throughput of 16 bit per cycle. The circuit was designed and tested using MATLAB then implemented using FPGA to verify its


Introduction
Channel coding is used to increase the reliability of the communication system by reducing the error in the received data. It accomplishes this task by adding redundant bits to the transmitted data so that the receiver can detect the error and correct it [1]. One of the essential error detection/correction circuits is the turbo encoder circuit. Turbo code provides a reliable transmission with a performance that reaches the Shannon limit [2,3]. The recent development in wireless communication that is used in devices like smartphones, notebooks, Personal Digital Assistant (PDA), increased the strain of the network capacity [2,4]. To overcome such a problem, the processing speed of the transmitted data should be increased by parallelizing the serial processes in the communication system [4]. One of the serial processing stages of the channel encoding in the wireless communication system is the turbo encoder [3]. Several researchers dealt with increasing the throughput of the turbo decoder. However few of them dealt with increasing the throughput of the turbo encoder. The throughput of the turbo decoder could be increased by the execution of the radix-4 algorithm [4,5]. Radix 4 is based on performing two processing steps /cycle so that the processing speed is increased and the clock latency is decreased [4,5]. On the other hand, the turbo encoder throughput can be increased using the state-space method [6,8].
The state-space method is used in other types of encoders to paralyze the circuits. Researchers such as [9][10][11] used the method to increase the throughput of the Cyclic Redundancy Check circuit. The use of the sate space method to paralyze the turbo encoder circuit was introduced by Purwita, Setio, and Adiono in 2011 [6]. They implement the method on a three flip-flop, 8 states Recursive Systematic Code (RSC) circuit such that the new circuit can generate 8 output bits per cycle. Their work was based on the Long-Term Evolution (LTE), technique where the turbo encoder is needed as the 8 states RSC. In this work, we extend the method used by [6] so that it can generate a parallel representation for a 16 state 4 flip-flops RSC circuit. The new circuit could be used in satellite communication systems [1,12]. The paper is organized as follows: Section 2 describes the Turbo encoder structure. Section 3 describes the state space representation of the new architecture of the Turbo encoder. The results of the proposed algorithm are shown in section 4, and section 5 concludes the paper. The paper organized into logical parts or sections. Subsections are not numbered. Any subsection is given a brief heading. The contents include the introduction that should define clearly the nature of the problem, and the references should be made to previously published papers. The theoretical, experimental, results, discussions, and conclusions form the main sections of the paper. Theoretical section extends the analytical background of the article and develops a new formulation of the problem. Calculations are achieved here using the developed equations, and the modifications should be pointed out. Depending on the suggested research methods, the experimental investigation is achieved, using the testing instruments or design and manufacturing a test rig. Materials and methods are detailed here. In the results and discussions section, the significance of the obtained results should be pointed out, and the citations and the discussions of the literature should be avoided in this section. Sometimes results and discussions are combined in one section.

The Structure of Turbo Encoder
Turbo encoder can encode the input data stream and generates the binary output data so that the output is concatenated with the transmitted data. This operation will prevent the noise from affecting the data message [1]. The general structure of a turbo encoder is shown in Figure 1, which shows that it consists of two RSC concatenated in parallel and one interleaved separated between them [13]. The purpose of using the interleaved is to rearrange the input data stream sequence; hence, the data stream is protected against error bursts. This will improve the error-correcting performance [14]. The interleaved makes the output of the two encoders completely different. Thus, even if the first encoder generates an output that will suffer from a high error rate, the other encoder will generate an output that will withstand a low error rate; this will decrease the error and improve the performance of the communication system [15]. There are many interleaved types, but the most general one which is being used is the random interleaved (pseudo interleaved) that randomize the binary input sequence depending on the given permutation pattern [16,17]. The RSC codes that are used in this design is of a rate equals to 1/3 and constraint length equal to 5. It is a 16 states encoder since there are 4 D Flip Flop registers, these registers will generate 16 states (2^4). This type of encoder is useful and used in satellite communications [1,12].

State-Space Parallel Turbo Codes
The RSC is based on the Linear Feedback Shift Register (LFSR) in which the binary input (Uk) is propagated through the shift register flip-flops and XORed to produce the required output. The performance and the throughput of the LFSR can be increased by using the state space algorithm [7,18]. The state-space is a mathematical way that is used to implement the circuit in paralle, and also it is used to simplify the parallelism operation [9]. Figure 2 shows the 16 bit RSC circuit in which the parity bit (Y k ) is the output of the LFSR, and the systematic bit (X k ) is taken directly from the input (U k ). The circuit states (X t ) changes as the time (t) is increased to (t+1).
Where 1 represents the systematic output which is the same as the input, g 1 is the feed forward output, and g 2 is the feedback from the output to the input of the RSC encoder [15]. g 1 (D), and g 2 (D) are given by: The turbo encoder inputs are represented as u 0 , u 1 ,…………U k-1. The outputs from the first RSC are Y 0 , Y 1 ,……………..Y k-1 and the outputs from the second interleaved RSC are Y 0 ', Y 1 ',……………… Y k-1 ' , where k is the number of the input code word. The state space equations driven from Eq. (2) and Eq. (3) are: (4) Where X t+1 is the next state, Xt is the current state, Yt is the output of the system and (A, B, C, D) are the state transition metrics which are given as:  (4) can be rewritten in another form to represent the relation between the previous state and the next state as. X (t+n) = A. t+ (n-1) + B. t+ (n-1) (7) Where n is positive number. To find the equation that represents the relation between the previous state, current state and the next state of X t , equation (7) can be modified as: The output denoted by Y Y (t+n) = C (t+ (n-1)) + D U ( + (n−1)) (9) By using the multiplication results of the states (A, B, C, and D) and also by using Eq.

Implementation and Results
To implement the 16 RSC model given by equations (10) and (11), two phases were presented. The first is by using MATLAB to validate the integrity of the method. The second is by using Quartus II to measure the design parameters. The circuit in Figure 2 is designed and implemented using VHDL Language under Quarts 11.1 Software. Figure 3 shows the general RSC that process one input (In1), and generates one output (out1) in each cycle. The original turbo encoder circuit that was shown in Figure 1 is built using VHDL language, and its circuit is shown in Figure 4.In which one input is entered to the first RSC to produce the systematic output and the first parity output, and the same input but interleaved is entered to the second RSC to produce the second parity output. The input signal is taken over and produces the output only when the Cleanable signal is high (logic one) and when the CLK signal is a positive edge. The reset signal is used to reset all the circuit when its signal is active low (logic 0). These signals are beneficial in the encoder circuit and in any digital circuit to control it and prevent the output signal from changing its state when the input signal is changed suddenly.

Figure 4: The original Turbo encoder circuit
Equations (10) and (11) are used to implement the parallel RSC circuit that enters 16 input bits and generates 16 output bits in one clock cycle. The parallel design of the 16 RSC circuit is shown in Figure 5. The proposed circuit used 20 registers, 41 logic elements. To check that this circuit works correctly, the circuit was tested using the ModelSim program. The same data is entered to both the proposed parallel circuit and the original circuit. The same results have been achieved, but in a different clock cycle, wherein the parallel circuit only one clock cycle was needed, but in the serial circuit, 16 clock cycles were needed to generate the 16 outputs. The period of the clock cycle was chosen as 100ps. The MATLAB software is also used in order to compare the parallel proposed circuit with the serial circuit it was noticed that the simulation results were similar. The simulation results are shown in Figures 6 and  7 of the original serial circuit and the parallel circuit. The same input bits are used in both circuits (serial circuit and the parallel circuit), the serial circuit takes 6400 ps to complete the processing operation while the parallel circuit takes only 500 ps to process 64 bits.

Conclusion
This work describes the Turbo encoder and new architecture that can provide high throughput that is needed for a wide range of communication systems. This new architecture is based on the state-space representation that is used to increase the speed of the circuit by using multiple parallel inputs. The proposed algorithm reduces the number of clock cycles of the encoder circuit by processing 16 inputs in one cycle instead of processing one input in one cycle. The circuit algorithm based on an 8-bit paralyzation algorithm given by [6]. When applying the new algorithm on the Terasic DE2 FPGA board, it used 20 registers, 41 logic elements, which is less than 1% of the Altera Cyclone II chip space. The maximum frequency that the time quist analysis gives was 582.07 MHz, so the maximum throughput is 9312 Mbit/sec. It could be concluded that this turbo encoder architecture has a very high throughput that can be used in the new communication technology.