Document Type : Research Paper


1 computer science Engineering Dept., University of Technology-Iraq, Alsina’a street, 10066 Baghdad, Iraq.

2 University of Baghdad, College of Education for Human Science-ibn rushed, Baghdad, Iraq.


Scene classification is an essential conception task used by robotics for understanding the environment. Like the street scene, the outdoor scene is composed of images with depth that has a greater variety than iconic object images. Image semantic segmentation is an important task for Autonomous driving and Mobile robotics applications because it introduces enormous information needed for safe navigation and complex reasoning. This paper provides a model for semantic segmentation of outdoor sense to classify each object in the scene. The proposed network model generates a hybrid model that combines U-NET with Xception networks to work on 2.5 dimensions cityscape dataset, which is used for 3D applications. This process contains two stages. The first is the pre-processing operation on the RGB-D dataset (data Augmentation and k- means cluster). The second stage designed the hybrid model, which achieves a pixel accuracy is 0.7874. The output module is generated using a computer with GPU memory NVIDIA GeForce RTX 2060 6G, programming with python 3.7.

Graphical Abstract


  • Scene classification is an essential conception task used by robotics for understanding the environment.
  • The deep learning technique has been proved as a great role in the challenging scene understanding application.
  • Using data augmentation to increase dataset size
  • Using K-means clustering as a preprocessor for the input dataset
  • The proposed hydride model is generated by combined two of the deep, deep neural networks as an xception and U-net models.


[1] H. K. A. H.N. Abdullah, Deep CNN Based Skin Lesion Image Denoising and Segmentation using Active Contour Method, Eng. Technol. J., 37 (2019) 464–469.
[2] N. Zou, Z. Xiang, Y. Chen, S. Chen, and C. Qiao, Simultaneous semantic segmentation and depth completion with constraint of boundary, Sensors (Switzerland), 20 (2020) 1–15, doi: 10.3390/s20030635.
[3] S. N. Hasan , Murat Gezer, Raghad Abdulaali Azeez, Sevinç Gülseçen , Skin Lesion Segmentation by using Deep Learning Techniques, Published in: 2019 Medical Technologies Congress (TIPTEKNO), IEEE Xplore: 11 November 2019, 10.1109/TIPTEKNO.2019.8895078.
[4] Y. Hu, Z. Chen, and W. Lin, RGB-D SEMANTIC SEGMENTATION: A REVIEW School of Remote Sensing and Information Engineering , Wuhan University , Wuhan , China Department of Electronic Engineering , Shanghai Jiao Tong University , Shanghai , China, 2018 IEEE Int. Conf. Multimed. Expo Work., pp. 1–6, 2018.
[5] V. S. S. A. Daliparthi, Ikshana: A Theory of Human Scene Understanding Mechanism, 2021, arxiv journal, [Online] Available:
[6] L. Wang, J. Zhang, O. Wang, Z. Lin, and H. Lu, SDC-Depth: Semantic Divide-And-Conquer Network for Monocular Depth Estimation, Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., pp. 538–547, 2020, doi: 10.1109/CVPR42600.2020.00062.
[7] T. Emara, H. E. Abd, E. Munim, and H. M. Abbas, LiteSeg : A Novel Lightweight ConvNet for Semantic Segmentation, arxiv journal, arxiv journal, arXiv:1912.06683v1 [cs.CV] 13 Dec 2019.
[8] S. Mehta, M. Rastegari, and A. Caspi, "ESPNet : Efficient Spatial Pyramid of Dilated.” arxiv journal, arXiv : 1803 . 06815v3 [ cs . CV ] 25 Jul 2018
[9] L. Chen, Z. Yang, J. Ma, and Z. Luo, Driving Scene Perception Network: Real-Time Joint Detection, Depth Estimation and Semantic Segmentation, Proc. - 2018 IEEE Winter Conf. Appl. Comput. Vision, WACV 2018, 2018 (2018) 1283–1291, doi: 10.1109/WACV.2018.00145.
[10] A. Valada, J. Vertens, A. Dhall, and W. Burgard, AdapNet: Adaptive semantic segmentation in adverse environmental conditions, Proc. - IEEE Int. Conf. Robot. Autom., pp. 4644–4651, 2017, doi: 10.1109/ICRA.2017.7989540.
[11] A. Paszke, A. Chaurasia, S. Kim, and E. Culurciello, ENet : A Deep Neural Network Architecture for Real-Time Semantic Segmentation", arxiv journal, arXiv : 1606 . 02147v1 [ cs . CV ] 7 Jun 2016, pp. 1–10.
[12] J. Brownlee, How to Configure Image Data Augmentation in Keras, Machine Learning Mastery. 2019, [Online]. Available:
[13] C. Shorten and T. M. Khoshgoftaar, A survey on Image Data Augmentation for Deep Learning, Journal of Big Data, 6 (2019), doi: 10.1186/s40537-019-0197-0.
[14] M. R. Khan, A. B. M. M. Rahman, G. M. A. Rahaman, and A. Hasnat, Unsupervised RGB-D Image Segmentation by Multi-layer Clustering, IEEE Xplore: 2016 5th International Conference on Informatics, Electronics and Vision (ICIEV) ,pp. 719–724, 2016.
[15] L. C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, Encoder-decoder with atrous separable convolution for semantic image segmentation, Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), 11211 (2018) 833–851, doi: 10.1007/978-3-030-01234-2_49.
[16] K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image recognition, Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit.,  2016 (2016) 770–778, doi: 10.1109/CVPR.2016.90.
[17] O. Ronneberger, P. Fischer, and T. Brox, U-net: Convolutional networks for biomedical image segmentation, Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), 9351 (2015) 234–241, doi: 10.1007/978-3-319-24574-4_28.
[18] F. Chollet, Xception: Deep learning with depthwise separable convolutions, Proc. - 30th IEEE Conf. Comput. Vis. Pattern Recognition, CVPR 2017,  2017 (2017) 1800–1807, doi: 10.1109/CVPR.2017.195.
[19] M. Cordts et al., The Cityscapes Dataset for Semantic Urban Scene Understanding, Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., 2016 (2016) 3213–3223, doi: 10.1109/CVPR.2016.350.
[20] S. Piao and J. Liu, Accuracy Improvement of UNet Based on Dilated Convolution, IOPScience: J. Phys. Conf. Ser., 1345 (2019), doi: 10.1088/1742-6596/1345/5/052066.
[21] A. Y. Noori, S. H. Shaker, and R. A. Azeez, 3D scenes semantic segmentation using deep learning based Survey, IOP Conf. Ser. Mater. Sci. Eng., 928 (2020), doi: 10.1088/1757-899X/928/3/032083.
[22] A. A. Abdulhussein and F.A. Raheem, Hand gesture recognition of static letters American sign language(ASL) using deep learning, Eng. Technol.  J., 38 (2020) 926-937, DOI: