Arabic Text Encryption Using Artificial Neural Networks

This research aims to build a cipher system using back propagation Algorithm with artificial neural network to encrypt any Arabic text and to prevent any data attack during the transition process. Encryption information holds four stages: 1) A neural network was trained by using back propagation algorithm to encrypt the whole input Arabic text and grasp final weights and consider these weights as a public key. 2) Training a second neural network by using back propagation algorithm to decrypt the input Arabic text of first stage and grasp weights and consider the weights as a private key. 3) Encrypt any Arabic text by using the weights obtained from first stage. 4) Decrypt the Arabic text from third stage by using the weights obtained from second stage. The four stages are achieved prosperously for data encryption process and decryption. This work is executed by using Matlab program version 7 and Notepad++ for writing text because it supports Arabic numbers under windows 7 as operating system.


INTRODUCTION
ncryption is the way of talking to someone while other people are listening, but without understanding what you are saying.It can also be used to protect data in storages as well as to detect active attacks, such as message or file modification [1].Encryption plaintext result in unreadable gibberish called cipher text, the process of reverting cipher text to its original plaintext is called decryption [2].In order to recover the contents of an encrypted text easily, a correct decryption key is required.The more complex encryption algorithm is, the more difficult to monitor on the communications without access to the key [3].Figures (1and 2

Theory
Cryptography is a technique used to hide the meaning of a message and is derived from the Greek word krypton (hidden).Cryptography should ensure that message could not be read.Typically the sender and receiver agree upon a message scrambling protocol beforehand and agree upon methods for encrypting and decrypting messages [5].

Basics of neural network
Artificial Neural Networks are relatively crude electronic models based on the neural structure of the brain.The brain basically learns from experience.It is natural proof that some problems beyond the scope of current computers are indeed solvable by small energy efficient packages.This brain modeling also promises a less technical way to develop machine solutions.This new approach to computing also provides a more graceful degradation during system overload than its more traditional counterparts.These biologically inspired methods of computing are thought to be the next major advancement in the computing industry [6]. Figure (3) presents the model of an artificial neuron [7].
Neuron output signal is given by the following relationship: Where f (w, x, ) is the activation function w: is the synaptic weight.
x: is the input signal .
Where (a) is the slop parameter of the sigmoid function [8].

Architecture of neural networks
To characterize a given ANN, it is necessary to specify the number of neurons, how they are interconnected and the processing that takes place throughout the network.The manner in which the neurons of neural network are structured is intimately linked with the learning algorithms used to train the network.In general, be identified: single layer feed forward networks, multilayer feedback networks, and recurrent networks [9]

Forward networks
Feed-forward ANNs allow signals to travel one way only; from input to output.There is no feedback (loops) i.e. the output of any layer does not affect that same layer.Feed-forward ANNs tend to be straight forward networks that associate inputs with outputs.They are extensively used in pattern recognition.[10].

Feedback networks
Feedback networks can have signals travelling in both directions by introducing loops in the network.Feedback networks are very powerful and can get extremely complicated.Feedback networks are dynamic; their 'state' is changing continuously until they reach an equilibrium point.They remain at the equilibrium point until the input changes and a new equilibrium needs to be found [10].

The Back-Propagation Algorithm
The original algorithm used for training a MLP is the BP algorithm, which is an iterative gradient algorithm, designed to minimize the mean-squared error between the desired output and the actual output for a particular input to the network [11].The learning rate ( ) determines the portion of weight needed to be adjusted.However, the optimum value of  depends on the problem.The momentum (  ) determines the fraction of the previous weight adjustment that is added to current weight adjustment.It accelerates the network convergence process.During the training process, the learning rate and the momentum are adjusted to bring the network out of its local minima and to accelerate the convergence of the network.The algorithm of the error back-propagation training is given below:

BP Algorithm
This algorithm is to perform: 1-Initializes the values of network weight.2-Sums weighted input and apply activation function to compute output of hidden layer.

  
Where f  : The derivative of the activation function.
6-Sums delta input for each hidden unit and calculate error term.
Where F: The number of patterns in the training set.10-Ends [12].

Training a Network
The network weights will all start at random values and the training process starts as shown in figure (5) [13].

Practical Work
A cryptosystem is a way of encoding and decoding messages so that only certain people are able to read them.Table (1) presents the Matlab Unicode conversion which includes numbers from (0 to 255) by using the instruction unicode2native and using NOTEPAD++ for writing the Arabic text because it supports Arabic numbers with script (windows 1256) similar to the script for Matlab program.Training Procedure for Encryption In this step the important computer keyboard symbols, mostly used in writing letters are taken.Their number is 51 then enters this text to the suggested program.The algorithm of the program steps will be as follows: The text is converted to Arabic Unicode then converted to binary with 8bit; each symbol in the text is represented by a code containing 51 digits for the first symbol.All digits are zeros except the first digit will be one and second symbol all digits will be zeros except second digit will be one and continue this procedure for all symbols so we create a diagonal matrix its dimensions are (51*51) as shown in figure (6).
In this procedure a neural network with three layers is used (input layer, Hidden layer, Output layer) with back propagation learning algorithm, for first layer we used 8 neurons equal to binary bits for the text and 20 neurons (by trial and error) for hidden layer and 51 neurons for output layer equal to the desired output that we want with an error rate of 10^-5.After running this program, the neural network will be trained and calculate eights between input layer and hidden layer and weights between hidden layer and output layers and stop training until the error approaches 10^-5.100000000000000000000000000000000000000000000000000 010000000000000000000000000000000000000000000000000 001000000000000000000000000000000000000000000000000 000100000000000000000000000000000000000000000000000 000010000000000000000000000000000000000000000000000 000001000000000000000000000000000000000000000000000 000000100000000000000000000000000000000000000000000 000000010000000000000000000000000000000000000000000 ..000000000000000000000000000000000000000000000000010 000000000000000000000000000000000000000000000000001

Training Procedure for Decryption
After executing the encryption training program we enter the output matrix that has dimensions (51*51) to the training decryption program.In this phase we use neural network with Back Propagation with three layers; input layer contains 51 neurons, hidden layer contains 60 neurons and output layer contains 8 neurons.In this phase the desired output must be a

Training decryption
Input the full text using NOTPAD ++ matrix with dimensions (8*51) equal to the input original text of computer keyboard entered in the first phase.Error rate in this phase is also equal to 10^-5.At the end the final weights between layers will be saved.Figure (7) presents the algorithm for training procedure for encryption and decryption phases.

Encryption Testing Phase
Any text written in Arabic and then entered in to the Special program for the testing phase, which consists of neural network of three layers; the same size of the network used in the training phase for encryption, and using weights extracted from the stage of the encryption training phase and after the implementation of this program the resulting matrix will be sent, which represents the cipher text and also send the size number of the original text (row and column ) that also encrypted by changing its size.

Decryption Testing Phase
After receipt of the encrypted message and the rows and columns file, these files will be entered to the decryption program.Which consists of neural network with three layers equal to the size of the network used in the decryption training phase and after running the program, the original text will be extracted.Figure (8) presents the algorithm for testing encryption and decryption phase.

Results
The results of the programs as appear in the command window of matlab program will be explained.

The Result for Training Phase Encryption
After entering the text in the program the neural network will be trained for all symbols in the text which represent the most important symbols in the keyboard used to write any letter.The output of this program is shown in Appendix (B).The real output of the program is a matrix with size (8*51) and also only the first column was printed which represents the decryption of symbol (?) before converting it to binary number (ascii code).Figure (12) shows numbers of epoch till the output reach error rate 10^-5(error between real output and desired output).

The Result of Encryption Testing Phase
The testing phase, which consists of neural network with three layers, the same size of the network, is used in the training phase for encryption, which comprise 8 neurons input layer and 60 neurons hidden layer and 51 neurons output layer, while using weights extracted from the stage of the training phase encryption.After the implementation of this program, the resulting matrix will be send, which represents the cipher text and also send the number of rows and columns of the original text after changing it.The input text is as shown below:

‫عليكم‬ ‫السالم‬ -----------‫للغاية‬ ‫سرية‬ ‫الرسالة‬ ‫هذه‬ ‫جزيال‬ ‫شكرا‬
The output of this program is shown in Appendix (C).The real output of this step is a matrix with size (51*138) which represents the encryption code of input text with 138 symbols.Also only the second column was printed which represents the encryption code of Arabic symbol) ‫ل‬ (.In this step the output matrix represents the code of each symbol in the input text with one column (the letters and symbols of input text will be arranged in one column).

The Result of Decryption Testing Phase
After receiving the encrypted message, the rows and columns files and executing the decryption program, the output of neural network will be binary numbers 8-bit which represents every symbol in the text and convert it to decimal to get Unicode then convert it to original text as shown below (this text token from work space of matlab program).

‫عليكم‬ ‫السالم‬ -----------‫للغاية‬ ‫سرية‬ ‫الرسالة‬ ‫هذه‬ ‫جزيال‬ ‫شكرا‬
The binary numbers and Unicode matrices are shown in Appendix (D).The actual output of this step is a matrix with size (8*138) which represents the decryption (ASCII code) of the encrypted text.It is arranged as one column (each column represent a symbol in the original text), here it shows the importance of knowing the original size of input text that is also encrypted (number of columns and rows of original text before encrypt it).only a second column was printed which represents decryption (ASCII code) of encrypted symbol ‫(ل)‬ as shown in appendix(D).

Discussion
The most important discussions that have been reached are:-1-The proposed algorithm of NNT gives us 100% results for encryption and decryption of the text.2-The users of BP suffer from many successful demonstrations of its power.Among which is the slow rate of convergence (long training times) and getting stuck in local minima.3-BP requires the specification of a number of parameters such as learning rate momentum rate, and number of neurons in the hidden layer and weights.The success and speed of training depend on these parameters.They are chosen by intuition and time consuming trial and error.4-The employment of the programming language (Matlab) becomes wealthy in the shorthand on the program volume and the ease of its correction.

Figure
Figure (3): Model of an Artificial Neuron.

Figure
Figure (2): Public Key Cryptography (PKC) h j : The actual output of hidden neuron j for input signals x. i x : Input signal of input neuron (i).
ij w : Synaptic weight between input neuron i and hidden neuron j. f : The activation function.3-Sumsweighted output of hidden layer and apply activation function to compute output of output layer.actual output of output neuron k. jk w : Synaptic weight between hidden neuron j and output neuron k.4-Computes back propagation error.
k d : The desired of output neuron k. 5-Calculates weight correction term.

Fig
Fig. (7): Algorithm for Training Encryption and Decryption

Figure
Figure (8): Algorithm for Testing Encryption and Decryption is shown in figure (9): The output of this program is shown in Appendix (A).The real output of this step is a matrix with size (51*51) because it is large only the first column was printed.This column represents the code for the first symbol in the text (?).And figure (10) shows numbers of epoch till the output reach error rate 10^-5(error between real output and desired output).