Speech recognition based on visual information such as the lip shape and its movement is referred to as lip reading. The visual features are derived according to the frame rate of the video sequence. The proposed work adopted in this paper based upon the lower part of human face to extract the speaker sound relevant features accurately and robustly from the inner edge of lips, using biometric to verify a person's identity by drawing their relevant physiological or behavioral characteristics curves. Lips contain a large volume of unique features. The results are promising and offer a good reaction (even with reducing the number of tested frames). The recognition rate with only audio: 86% - 100%, with only visual: 73% - 100%, and with both (audio - visual) recognition rate is: 92% - 100%.