Using B + Tree To Represent Secret Messages For Steganography Purpose

In this research we suggest a new approach by using B + tree for storing the secret massages (that want to be sent) in a manner that prevent redundancy of these massages or even sub massages in order to provide efficient memory usage. The proposed system includes two stages: Stores the secret message and get its unique code by usi ng B + tree


Introduction
Steganography is the technique of hiding information within some format in a way that makes it difficult to detect by one who doesn't know it's there.
In the computer age, steganography has become quite advanced and allows for information to be hidden in all types of data files, such as images, audios, videos and documents (text) [1].
In the past, people used hidden tattoos or invisible ink to convey steganographic content.
Today, computer and network technologies provide easy-to-use communication channels for steganography.Essentially, the information-hiding process in a steganographic system starts by identifying a cover medium 's redundant bits (those that can be modified without destroying that medium's integrity).The embedding 5103 process creates a stego medium by replacing these redundant bits with data from the hidden message [2].
One important method of information hiding is constructing the CFG for the secret message and gives it a unique code then this code is embedded in a cover media, but this method is not suitable when the grammar is ambiguous [3].Thus in our proposed method we will use B + tree instead of CFG in order to avoid this problem and to provide efficient time for gating the code or retrieving the secret message.In other words we will use B + tree to represent a special dictionary for secret messages such that each message has a unique code, this code used to be imbedded in the cover text, also this code can be used to retrieve the corresponding secret message from this dictionary.So the aim of this research is finding a tool to represent secret messages that can be used for steganography purpose such that: • Prevent the ambiguity problem such that there is only one secret message correspond to its code.

Steganography In Documents
Some methods of steganography use a special tool to convert the secret message to a unique code, and then this code will be embedded into a digital object, also refereed to us the cover, in such a manner that the information (code) becomes part of the object, and the steganography categorized by the type of data that the cover belongs to, such as text or documents, images or sound [4], and in our proposed method we use a text as a cover.
The use of Steganography in documents works by simply adding white space and tabs to the ends of the lines of a document.This type of Steganography is extremely effective, because the use white space and tabs is not visible to the human eye at all, at least in most text/document editors.White space and tabs occur naturally in documents, so there isn't really any possible way using this method of Steganography would cause someone to be suspicious [5].

Using CFG for Steganography
Purpose [3,6] Build context free grammar (CFG) as a tool of steganography system, such that each secret message will be convert to a set of production rules of CFG and give them a code(this code used to be imbedded in the cover text).For example if the secret message is: "meeting: nine o 'clock at my home".Then the corresponding CFG will be: { S→ meeting: nine#, 000 A→ o'clock @, 0010 B →at my home, 0011 } Where '#' represent 'A', '@' represent 'B' and {000, 0010, 0011} represent the code for this secret message.
If the grammar is unambiguous then the receiver can extract the information by applying standard parsing techniques, otherwise he can not extract the information that he want probably.B + Tree [7,8,9] B + Tree is a structure of nodes linked by pointers is anchored by a special node called the root, and PDF created with pdfFactory Pro trial version www.pdffactory.comMessages for Steganography Purpose .

5104
bounded by leaves has a unique path to each leaf, and all paths are equal length stores keys only at leaves, and stores reference values in other, internal, nodes guides key search, via the reference values, from the root to the leaves.
B + tree is called an index to database, such that each record will be stored in the database, the reference number (and the key) of that record will be stored in the B + tree.So when we want to reach a certain record, we need to know its key to get its reference number from the B + tree.When we get the reference number of that record we can retrieve the required record directly.
B + tree is an arranged and balanced tree (see figure 1), and this is why it is so fast in retrieving the required data.

Description of the Proposed Method
In this research we suggest a new approach by using B + tree as a special dictionary for storing the secret massages (with their codes) in a manner that prevent redundancy of these massages or even sub massages in this dictionary ( in order to provide efficient memory usage).So the proposed method includes two stages: • Store the secret message in this dictionary (if it is not found) and get its unique code (at send process).• Retrieve the unique secret message when we have its code from this dictionary (at received process).
The proposed method uses one database(dbase) that represent the dictionary of the secret messages and its corresponding codes, and uses two index trees (Bt1, Bt2) that refer to the same database (dbase).Each new secret message (sentence) will be converted to a list of words (with discarding to any un useful letter) and these words will be stored in dbase in a manner that prevent the redundancy of these sentences or even sub sentences.
Bt1 is used for storing purpose to check if the sentence or even sub sentence is already found in dbase.So Bt1 use the first word of the sentence as a key, while Bt2 is used for retrieving purpose, so it uses the code of the sentence (secret message) as a key.In general if the secret message is composed of [word 1 , word 2 , …, word n ] then it will be represented as shown in figure 2. The arrows in the figure represent the database references, we follow the down arrow (↓) in send process of steganography to get the code for the secret message, and we follow the above arrow ( ↑) in received process of steganography to retrieve the secret message that correspond to a certain code.
If we want to get the code of a secret message (sentence) from the dictionary (dbase), first we must check if the first word (key) of the sentence is found in Bt1 then we have one of the following cases: 1.All the words of the sentence are found in dbase, and in this case we have: • The sentence already has a code, and in this case we need only to retrieve the stored code.

Algorithms of the Proposed Method
The proposed method includes two stages: • Send process.
Since Our paper is focus on using B + tree with steganography, so we will produce the algorithms of the functions that deal with B + tree only.

Send process
In this stage we need to get an integer value that represents the code of the secret message from B + tree (see algorithm1), and then convert this integer value to the binary code, after that this binary code was embedded in a cover media (text) by using open space method, such that single space encodes as zero and two spaces encodes as one.

Received process
In this stage, the code of the secret message need to be extracted from the cover text, and since we use the open space method, the single space decoded as 0 while two spaces decoded as 1.After that, this binary code must be converted to its integer value, then the Retrieve-message function is used to get or retrieve the unique secret message according to this code from the B + tree(see algorithm5).Algorithm5: "retrieve-message" Input: the code(C).Output: secret message(S).End.

Implementation For The Proposed Method
We will take some examples in order to describe our proposed method.

Example1:
If we want to get the code for a secret message (sentence) from the dictionary (dbase), and suppose it is anew sentence, such as: "meeting: nine o'clock today", then we put the first word of the sentence (meeting) as a key in b + tree (Bt1), and we compute the length(4) of the sentence, also we give the sentence a new unique code (101) and use this code as a key for b + tree (Bt2), and the sentence will be stored in dbase as shown in figure (3).

Example2:
If we want to get the code for a sentence, such that all its words are already found in dbase but with no code, such as: " meeting: nine o'clock", then we give it a new code (102), and store it in Bt2 as a key, it will be as shown in figure (4).

Example3:
If we want to get the code for a sentence, such that all its words are already found in dbase except the last word, such as: " meeting: nine o'clock tomorrow", then we will PDF created with pdfFactory Pro trial version www.pdffactory.com

5108
store the last word in dbase and the reference of the previous word, and store at the previous word the new reference, and give it a new code (103), it will be as shown in figure (5).

Example4:
If we want to get the code for a sentence, such that some of its words are already found in dbase, such as: "meeting: nine o'clock tomorrow at my home", then we will store the remaining words in dbase and the references of the next and previous word and give it a new code (104), it will be as shown in figure (6).

Example5:
If we want to get the code for a sentence is already found in dbase but with no code, such as: "meeting", then we give it a new code (105), and store it in Bt2 as a key, it will be as shown in figure (7).

Example6:
If we want to get the code for a sentence, such that, some of its middle words are found in dbase, such as: "today meeting: nine o'clock", then only the not found words will be store in dbase, and we will store the first word in Bt1, and store the new code (106) in Bt2, it will be as shown in figure (8).

Example7:
If we want to get the code for a sentence, such that it is already found in dbase and have a code, such as: "meeting: nine o ' clock", then we retrieve its code(102) only.
At the received process, the receiver extract the code from the cover text, he uses the same dictionary in order to retrieve the secret message that correspond to that cod.
If we take the code (101) as an example, First we will check if the code is found in Bt2, then we retrieve the term of the last word of the message that the code refer to it: word([p("today" ,~0,0011,4,101),p("tomorrow",0101, 0011,4,103]), then we will search in the list of this term on the code (101) and get: p("today" ,~0,0011,4,101) , this mean that the length of the sentence is 4 and have "today" as the last word, and the reference of the previous word is 0011, after that we follow the reference of the previous word, and take its word and concatenate it with next word, then we follow the reference of its previous word and so on, until we get the sentence "meeting: nine o ' clock today" that composed of four words.
Since there is only one term that corresponds to a reference, we can retrieve one secret message and only one, so we solved the ambiguity problem that some steganography methods suffer from.

Discussion
Some methods used to build a dictionary for secret sentence (with their codes) as a production rules.Table (1) will compare between this method and our proposed method.

Conclusions
In this paper the following points can be concluded: • Flexibility of the B + tree gives good result to build many meaningful messages and builds a special dictionary.• There is no ambiguity in retrieving secret message from its code (there is one and only one secret message correspond to its code) when we use B + tree.• Using B + tree for storing and retrieving the secret sentences will increase the steganography system efficiency.• Prevent the redundancy of messages in the dictionary or even the sub PDF created with pdfFactory Pro trial version www.pdffactory.comUsing CFG Increase the number of secret messages has no effect because B+ tree dealing with huge data.
Increase the number of secret messages Means increase in program size.Ambiguity problem solved.
Deal with unambiguous grammar only.
Provide an efficient search time for gating The code of the secret message.
Search time for gating the code of the secret message will be increased when secret messages increased.
Provide an efficient search time for retrieving the secret message from its code.
Search time for retrieving the secret message from its code will be increased when secret messages increased.

Journal, Vol.28, No. 15, 2010 Using B + Tree to represent Secret Messages for Steganography Purpose
PDF created with pdfFactory Pro trial version www.pdffactory.comEng.&Tech.PDF created with pdfFactory Pro trial version www.pdffactory.com