# Simulation Design of A General Purpose Microprocessor Based on FPGA

Sabah Abdul-Hassan\*

Received on: 8/11/2009 Accepted on: 3/6/2010

#### Abstract

A General purpose microprocessor is one having the capability to execute the usual set of instructions like arithmetic, logic, branching and other control instructions. It is consisting of a set of registers to store data, a program counter, a stack pointer, ...etc. In this work, a Processor system is proposed, designed, simulated using MATLAB SIMULINK, and implemented using FPGA implementation tools. The simulation is done using MATLAB because of its simple building and testing facilities. The system then implemented using Field Programmable Gate Arrays (FPGAs). The implementation using FPGAs has the advantage of being debugged, updated, expanded simply by reprogramming the FPGA chip without the need to replace the hardware. The system is implemented using two types of chips for comparison purposes. The first is using the chip**VX1000** (G680) that contains 12288 slice while the second is using **XCV800** (FG676) that contains 9408 slice. Implementation results show higher percentage use of the smaller chip resources but faster operation due to shorter internal paths used.

# محاكات، تصميم معالج للاغراض العامة بأستخدام تقنية مصفومة البوابات المنطقية القابلة للبرمجة

#### الخلاصة

ان المعالج الدقيق المتعدد الاغراض هو الذي يمتلك القدرة على تنفيذ المجموعة النقليدية من الايعازات مثل الايعازات الحسابية و المنطقية و ايعازات التفرع و الايعازات الاخرى الخاصة بالسيطرة و يتكون هذا المعالج من مجموعة من المسجلات تستخدم لغرض خزن البيانات و عداد للبرنامج و مؤشر المكدس الخ في هذا العمل المقدم تم اقتراح النظام و التركيب الداخلي لمعالج ما مايكروي و تم تصميم و محاكاة عمله و تنفيذه لقد تمت المحاكاة باستخدام برنامج المعالج من مجموعة من المسجلات تستخدم لغرض خزن البيانات و عداد مايكروي و تم تصميم و محاكاة عمله و تنفيذه لقد تمت المحاكاة باستخدام برنامج المعالج المعالج الامالية في هذا العمل المقدم تم المحاكاة باستخدام برنامج المعالج المحالية البرباب المكدس الناخ في هذا العمل المقدم تم تم تعنيذ النظام و التركيب الداخلي لمعالج بسبب الامكانيات المتميزة في بناء الانظمة و اختبارها ثم تم تنفيذ النظام باستخدام تقنية مصفوفة البوابات المنطقية القابلة للبرمجة حيث ان التنفيذ باستخدام تقنية مصفوفة البوابات المنطقية القابلة للبرمجة حيث ان التنفيذ باستخدام تقنية مصفوفة البوابات المنطقية القابلة للبرمجة حيث ان التنفيذ باستخدام تقنية مصفوفة البوابات المنطقية القابلة للبرمجة من المحديث و التوسع ببساطة من خلال اعادة برمجة شريحة المصفوفة برون الحاجة لاستبدال المكون المادي لقد تم تنفيذ النظام باستخدام نو عين ما المحديث لل المصفوفة بدون الحاجة لاستبدال المكون المادي لقد تم تنفيذ النظام باستخدام نو عين ما السرائح المصفوفة بدون الحاجة لاستبدال المكون المادي القد تم تنفيذ النظام باستخدام نو عين ما السرائح المصفوفة بدون الحاجة لاستبدال المكون المادي القد تم تنفيذ النظام باستخدام و التي تحتوي على المصفوفة بدون الحاجة والتنفيذ السريحة (G680) 1000 (G680) و التي المرائح التوا ي التنفيذ الثاني شريحة (G680) مالاتي و التي تحتوي على العرب المحاوي المواني الموانة في التنفيذ الشريحة الثاني شريحة (G680) 2008) و التي العرب المواني المواني المانية في التنفيذ الثاني شريحة (G680) مالاتي و التي الموي على المواني الموانية التنفيذ الثاني شريحة (G680) 2000) و التي الموي التي الموي التوا لي المواني الموي المواني الموينية مالي التفيذ الثاني شريحة (G680) 2000) و التوا لي الموي المواني الموي المواني مالي الموي الموي الموي الموي المواني الموي الموواني الموي

#### \*Electrical and Electric Engineering Department, University of Technology /Baghdad

4980

https://doi.org/10.30684/etj.28.15.7

2412-0758/University of Technology-Iraq, Baghdad, Iraq This is an open access article under the CC BY 4.0 license <u>http://creativecommons.org/licenses/by/4.0</u>

PDF created with pdfFactory Pro trial version <u>www.pdffactory.com</u>

تحتوي على 9408 شريحة جزئية لقد اظهرت نتائج التنفيذ نسبة استخدام اكبر لمـوارد الشـريحة الصغيرة ولكن مع سرعة عمل اكبر وذلك بسبب استخدام مسارات داخلية اقصر

# Introduction

The Use of Fpgas In Modern Electronic Systems Is Replacing Gradually The ASIC (Application Specific Integrated Chips) Techniques. Fpgas (Field Programmable Gate Arrays) Are Integrated Chips That Can Be Programmed To Implement Any Hardware Configuration. And With A Suitable Programming Tool That Configuration May Be Changed Later Without The Need Of Any Hardwiring. The Implementation Using Fpgas Has The Advantage Of Being Debugged, Updated, Expanded Simply By Reprogramming The FPGA Chip Without The Need To Replace The Hardware [1].

In This Work Fpgas Are Used To Implement A Microprocessor System Having The Basic Features Of A Traditional Microprocessor In Order To Study The Ability Of Fpgas To Build Complex Processing Systems.

Current Research About Using FPGA In Processor Design

The Use Of Fpgas In Processor Systems Is Increasing Because Of The Powerful Features Of The Fpgas.

Keith Bergevin From The Defense Microelectronic Activity Redesigned A Microprocessor Based System Using Fpgas To Solve The Problem That The Microprocessor Used In The Evolved Sea Sparrow Missile (ESSM) System Were No Longer In Production And No Alternate Sources Were Available. He Selected The Xilinx Virtex XCV300 FPGA To Replace The Obsolete Microprocessor And Implemented A Perfect Match For The Processor [2].

In Other Cases Only Alus (Arithmetic And Logic Units) Are Designed Using An FPGA.

Shamsiah Suhaili And Othman Sidek From The School Of Electrical & Electronic Engineering, Engineering Campus, Sains University In Malaysia Designed An ALU That Combines 32-Bit Single Precision Floating Point Adder And Integer ALU (Arithmetic Logic Unit) Into A Single Unit On The Module FPGA. Was The Implemented In Verilog Hardware Description Language (HDL) And Spartan Iie XC2s300e FPGA From Xilinx Was Used [3].

# **Microprocessors Overview**

The Main Features That Describe How A Microprocessor Is Working Are The Instruction Set, Width Of The Data Bus And Address Bus, In Addition To The Physical Parameters Related To Technology Such As The Operation Speed And The Operating Voltages And Currents,..Etc [4, 5, 6]. These Parameters Will Not Be Considered In This Work And The Focus Is Only On The Execution Capabilities Of The Processor.

An Internal Block Diagram Of A General Purpose Microprocessor Is As Shown In Figure (1). The Main Units Of The Processor Are The Instruction Decoding Unit, Register Set Including Accumulator, Program Counter, Stack Pointer And The **Program Status Word Which Contains** The Usual Flags Such As The Zero And Carry Flags, And The Interconnections Buses Between Registers And Decoding And Execution Units.

The Sizes Of The Registers Are Variable And May Reach 128 Bits In Modern Processors While The Width Of The Address Bus Is Related To The Size Of Memory Usually Used With The Processor [5, 7, 8].

Modern Processors Include Other Units Like Cache Memory, Dedicated Floating Point Operation Units And Other Extra Units Related To Multimedia Instructions Execution [7, 8].

# **Fpgas Overview**

The Word FPGA Stands For (Field Programmable Gate Array). These Are Ics That Consist Of Raw AND-OR Combinations, Flip Flops, Look Up Tables, And Other Logical Basic Units. The FPGA Chip Usually Has An Internal Metal Routes That Can Connect Various Logical Input And Output Nodes Inside The Chip [9].

The Main Advantage Of Using Fpgas In Implementing Systems Is The *Reconfigurability* Feature. This Feature Means When The Design Is To Be Changed (Whether This Change Is An Error Correction Or A Development In The Structure Of The Design), The Change Can Be Done Easily By Software. The Same Chip Will Perform The New Modified Or Corrected Job. This Is An Important Feature That Reflects The Power Of Fpgas [1, 10].

This Feature Is On The Expense Of Building Blocks Size (I.E. The FPGA That Works As An FFT IC Computation Unit For Example Is Larger In Size Than The IC Designed Only To Work As An FFT Computation Unit). Definitely This Is A Simple Price Paid For The Generality And Reprogramability Properties Of The FPGA [9].

Fpgas Come Different In Configurations, Designs, And Technologies. The Main Building Blocks Of The FPGA Are The Clbs (Configurable Logic Blocks) Each Contains A Number Of Gates, Look Up Tables (Luts), And A Number Of Flip Flops. The Number Of Input And Outputs In An FPGA Chip Used To Input A Signal And To Get The Processed One Varies From Tens To More Than 500 Input Output Pin In Some Large Chips.

The Most Famous FPGA Manufacturing Companies Are The Xilinx, Altera, Verilog..Etc. The Xilinx Chips Include The Spartan, XC, Virtex, And Other Series. A Block Diagram For The Main Structure Of An FPGA Chip Is Shown In Figure (2).

# 5-The Proposed Design Of The Microprocessor

In This Work A General Purpose Microprocessor Design Is Proposed, Simulated Using MATLAB SIMULINK, And Implemented Using The Tools Of Xilinx Foundation 2.1i. The Proposed Processor Is Shown In Figure (3): The Basic Components Of The Processor Are:

The Instruction Decoding Unit: This Unit Is Responsible For Determining The Type Of The Instruction Read By The Processor; And Initiating The Set Of Control Actions To Execute That Instruction. The Instruction Decoding Unit Consists Of A Register Used To Store The \*Instruction, A Decoder, And A Control Unit.

The Register Set: Four Registers Ax (Accumulator), Bx, Cx, And Dx Are Used In Addition To The Flags Register, The Program Counter, And Stack Pointer Registers. The Registers Are All 16 Bits Wide And Are Divided Into Two Parts Upper And Lower 8 Bits Bytes. The Flag Register Consists Of Four Flag Bits (The Carry, Zero, Negative Or Sign, And The Auxiliary Carry Flag Bit For Overflow In Operations Related To The Lower 8 Bits Bytes Of The Registers).

\*The Interconnections: These Include A Set Of Controlled Buses And Multiplexers That Are Responsible For Generating The Required Data For The Registers During The Execution Of Instructions.

#### **Internal Details Of The Processor**

The Internal Structure Of All The Registers In The Processor Is Simply An Array Of 16 Flip-Flops Connected Together And Triggered By The System Main Clock And Enabled By A Signal Coming From The Write Unit Which Is Considered Part Of The Control Unit. This Applies For The Accumulator, BX, CX, DX Registers. Figure (4) Shows The Accumulator Unit Inside The Processor.

The Write Unit Shown In Figure (5) Is The Unit Responsible For Enabling The Writing Process For All Registers In The Processor. It Consists Of Sub Units Each Enabling A Certain Register. For Example, The Accumulator Write Unit Which Is Shown In Figure (6) Is Designed Based On The Instructions In Table (1) In Which The Accumulator Value Is Altered.

The Arithmetic And Logic Instructions Are Executed Inside The ALU (Arithmetic And Logic Unit) Shown In Figure (7) Which Consists Of Two Main Parts, The Arithmetic Unit (AU) And The Logic Unit (LU). Each Unit Executes Four Types Of Instructions Determined By B1B0 While B2 Controls The Multiplexer That Selects The Output Of Either The Arithmetic Unit Or The Logic Unit.

The Transfer Unit Is Simply A Multiplexer Combining The Contents Of All The Registers And Preparing The Selected Value То Its Destination. This Multiplexer Is Controlled By The Transfer Controller Unit Which Is Shown In Figure (8). The Transfer Controller Design Is Based On Column Two Of The Instruction Set In Table (1). For Example If The Instruction Is Aà C Is To Be Executed, The Transfer Controller (Depending On The Values B3B2B1B0) Controls Of The Multiplexer To Select The Input From The Accumulator.

#### The Proposed Processor Instruction Set

The Processor Instruction Consists Of A 6 Bits Word Divided Into Two Main Sections: The First Section Is The Upper Two Bits B5 And B4 Which Decide The Instruction Type As Either One Of Four Possible Types: Arithmetic And Logic, Data Transfer, Jump, And Stack Pointer Operations. The Lower Four Bits B3, B2, B1, B0 Determine The Details Of The Instruction As Shown In Table (1).

# Matlab Simulink Simulation

In Order To Verify That The Proposed Will Work System Correctly, The MATLAB SIMULINK Is Used To Simulate The Microprocessor. The Simulation Was Done With A Simple Program That Arithmetic And Involves Logic Manipulations In Addition To Branching And Control Instructions Both Conditional And In In Subroutines. The MATLAB SIMULINK Is Chosen To Perform This Task Because Of Easy Building Of The System And Simple Checking Of The Operation Due To Handy Inputs And Output Tools. The Simulation Of The Program Gives Results That Are Consisting With The Proposed Actions Of The Processor. As An Example, The Timing Diagram Of The Execution Of The Instruction A=A+B Is Shown In Figure (9). The Initial Value Of A, And B Are 755 Respectively The And 6 And Execution Of The Instruction Is Performed After 3 Clock Cycles After The Instruction Reading Operation.

# **Implementation Using Fpgas**

Implementation Using The Xillinx Foundation Tools Usually Results In A Variety Of Implementation Reports That Describe How The Proposed System Will Be Placed And Fit On The FPGA Chip. No Simulation Is Done Using The Foundation Tools Since The MATLAB SIMULINK Is Sufficient To Check The System Functionality. The Important Parameters Implementation The Produces Are:

- ✓ The Maximum Combinational Path Delay: This Parameter Represents The Time Required For A Signal To Travel Through The Longest Path Through The Chip. This Parameter Is A Measure Of How Fast The Chip May Operate. For Example If That Delay Is 100ns Then The Processor May Operate At A Maximum Frequency Of 10 Mhz.
- The Number Of Slices Used: V The FPGA Chip Contains A Number Of Slices That May Be Used In The Design According To Programmer The Will. The Number Of Slices Used In The Design Divided By The Total Number Of Slices In The Chip Is Measure Of How Α Much Percentage Of The Chip Was Used In The System. This Percentage Should Not Be Close To 100 % Because This Will Increase The Delays Due To Using Long And Complicated Metallic Interconnections The In Implementation.

✓ The Number Of Input And Output Pins Used: This Is A Measure Of How Much Complicated The System Will Be On Board [11].

Implementation Results And Discussion

- **v** The Results Of Implementing The System Are Given In Table (2). The System Is Implemented Twice Using VX1000 (G680) With Slice Chip And 12288 The XCV800 (FG676) With 9408 Slice Chip. The Results Show That XCV800 (FG676) Can Operate Faster Although The System Consumes The Same Number Of Gates (Higher Percentage Due To Smaller Chip) And Flip Flops. This Is Because Smaller Ics Usually Use Shorter Interconnections Between The Internal System Components.
- ✓ The Maximum Operation Frequency Is About 40 Mhz Which Is Good Enough Compared To Traditional Microprocessors With The Same Or Similar Capabilities.

#### Conclusions

- 1. The Speed Of Operation Is Inversely Related To The Chip Size. This Is Because For Smaller Chips The Internal Metallic Interconnections Are Shorter Than In Large Size Chips And The Shorter The Paths The Faster The Operation Will Be And This Is Clear In The Results.
- 2. The Processor Speed Of Operation (Around 40 Mhz) Is Comparable Or Even Better Than Traditional Microprocessors With Similar

Capabilities (Microprocessors That Execute Instructions With The Same Level Of Difficulty) So FPGA Is A Good Choice For Implementing Simple Processors.

#### References

[1] Bob Zeidman, "Introduction To **FPGA** CPLD And Design". Www.Chalknet.Com [2] Keith Bergevin, "FPGA Redesign Microprocessor-Based Of А Subsystem With No Impact To The Mission Software". Http://Klabs.Org/Richcontent/Mapldc on03/Abstracts/Bergevin\_A.Pdf. [3] Shamsiah Suhaili And Othman Sidek. **"DESIGN** AND Implementation Of Reconfigurable Alu Fpga", On Http://Www.Buet.Ac.Bd/Eee/Icece20 04/P014.Pdf. *"8086* [4] Language", Http://Www.Aust.Edu/Csa/Moinul/80 86.Lectures.Pdf. [5] "Addressing Modé', Http://Www.Cpu-World-Com/Arch/8086-Html. 8086" "Instruction Of [6] Http://Www.Emu8086.Com/Assembl y-Language-Tutorial-Assembler\_Reference/8086\_Instructio n-Set-Html. [7] Barry B. Brey, "THE INTEL MICROPROCESSORS", Prentice Hall, Seventh Edition, 2006. [8] Renu Singh And B. P. Singh, "MICROPROCESSORS INTERFACING AND APPLICATIONS", New Age

Simulation Design of a General Purpose Microprocessor Based on FPGA

International Limited Publishers, First Edition, 2003.

[9] "FPGA And CPLD Architectures", IEEE DESIGN & TEST OF COMPUTERS.
[10] Zainalabedin Navabi, "Analysis And Modeling Of Digital Systems", Mcgraw-Hill, Inc, 1993. [11] Xillinx Foundation 2.1i Help Documentations.

| Lower \ Higher bits | B5B4      |      |                  |         |
|---------------------|-----------|------|------------------|---------|
| B3B2B1B0            | 00        | 01   | 10               | 11      |
| 0000                | A=A+1     | AàB  | Jump if carry    | PUSH A  |
| 0001                | A=A+B     | AàC  | Jump if not      | POP A   |
|                     |           |      | carry            |         |
| 0010                | A=A+B'    | AàD  | Jump if zero     | PUSH B  |
| 0011                | A=A+B'+1  | AàPC | Jump if not zero | POP B   |
| 0100                | A=A'      | AàSP | Jump if sign     | PUSH C  |
| 0101                | A=A AND B | BàA  | Jump if not sign | POP C   |
| 0110                | A=A OR B  | CàA  | Jump if aux.     | PUSH D  |
| 0111                | A=A XOR B | DàA  | Jump if not aux. | POP D   |
| 1000                | A=A+1     | PCàA | Jump to(uncon.)  | PUSH PC |
| 1001                | A=A+B     | SPàA | Jump to(uncon.)  | POP PC  |
| 1010                | A=A+B'    | AàM  | Jump to(uncon.)  | PUSH PC |
| 1011                | A=A+B'+1  | MàA  | Jump to(uncon.)  | POP PC  |
| 1100                | A=A'      | BàC  | Jump to(uncon.)  | PUSH PC |
| 1101                | A=A AND B | BàD  | Jump to(uncon.)  | POP PC  |
| 1110                | A=A OR B  | CàB  | Jump to(uncon.)  | PUSH PC |
| 1111                | A=A XOR B | DàB  | Jump to(uncon.)  | POP PC  |

#### Table (1) the instruction code details

Table (2) the Main Parameters of the System Implementation

| Parameter                   | VX1000 (G680)          | XCV800 (FG676)         |  |
|-----------------------------|------------------------|------------------------|--|
| Number of Slices            | 3420 out of 12,288 27% | 3420 out of 9,408 36 % |  |
| Slice Flip Flops            | 112                    | 112                    |  |
| 4 input LUTs                | 177                    | 177                    |  |
| Average Connection Delay    | 11.441 ns              | 10.910 ns              |  |
| Maximum Pin Delay           | 44.122 ns              | 38.276 ns              |  |
| Minimum period              | 26.449 ns              | 24.515 ns              |  |
| Maximum Operation Frequency | 37.8 MHz               | 40.8 MHz               |  |



Figure (1) The Main Structure of a General Purpose Microprocessor of the Intel Series [5]



Figure (2) The Main Structure of An FPGA Chip [9]



Figure (3) The Proposed Processor



Figure (4) the Accumulator Register in the Processor

#### Simulation Design of a General Purpose Microprocessor Based on FPGA



Figure (5) the write unit and its connections inside the processor



Figure (6) the internal structure of the accumulator write unit

#### Eng. & Tech. Journal, Vol.28, No.15, 2010

#### Simulation Design of a General Purpose Microprocessor Based on FPGA







Figure (8) the internal structure of the transfer controller unit Time in Clock Cycles (depends on the system frequency)



Figure (9) Timing Diagram illustrating the operation A=A+B