The Global and local attention for automatic Arabic text diacritization
Keywords:
Natural language processing, Encoder-decoder, Attention mechanism, Luong attention, Arabic diacritizationAbstract
Automatic Arabic diacritization is the task to restore diacritic or vowel marks for a non-vocalized Arabic text. This task showed its importance in the natural language processing NLP field and it helps people with specific learning difficulties to access Arabic web content. To tackle the problem, we suggest a letter-based encoder-decoder that uses previous deep learning attention models known as Luong attention. The training of the models knew unstable loss. And, as was expected — from the proposed models — the model that uses local predictive attention achieved the best word and letter error rates. The best-achieved diacritic error rate in the test data is about 26.80%. Nevertheless, the models need improvements in future work.
Downloads
References
A. Al-Wabil, P. Zaphiris, and S. Wilson, “Web design for dyslexics: Accessibility of Arabic content,” in Computers Helping People with Special Needs, 2006, vol. 4061, pp. 817–822, doi: https://doi.org/10.1007/11788713_119.
S. Bishara, “The orthographic depth and promotion of students with learning disabilities,” Cogent Educ., vol. 6, no. 01, 2019, doi: https://doi.org/10.1080/2331186X.2019.1646384.
B. Abuali and M.-B. Kurdy, “Full Diacritization of the Arabic Text to Improve Screen Readers for the Visually Impaired,” Adv. Human-Computer Interact., vol. 2022, 2022, doi: https://doi.org/10.1155/2022/1186678.
S. Alansary, “Alserag: An Automatic Diacritization System for Arabic,” in Intelligent Natural Language Processing: Trends and Applications (Studies in Computational Intelligence, 740), 1st ed., K. Shaalan, A. E. Hassanien, and F. Tolba, Eds. Springer, 2018, pp. 523–543.
Y. Gal, “An HMM approach to vowel restoration in Arabic and Hebrew,” in Proceedings of the ACL-02 Workshop on Computational Approaches to Semitic Languages, 2002, pp. 1–7, doi: 10.3115/1118637.1118641.
M. Elshafei, H. Al-Muhtaseb, and M. M. Alghamdi, “Statistical methods for automatic diacritization of Arabic text,” in proceedings 18th National computer Conference, 2006, pp. 301–306, [Online]. Available: http://almuhtaseb.net/Research/StaMetAutDiaArTex.pdf.
M. H. Ameur, Y. Moulahoum, and A. Guessoum, “Restoration of Arabic Diacritics Using a Multilevel Statistical Model,” in 5th International Conference on Computer Computer Science and Its Applications (CIIA), 2015, pp. 181–192, doi: 10.1007/978-3-319-19578-0_15.
I. Hadjir, M. Abbache, and F. Z. Belkredim, “An Approach for Arabic Diacritization,” in 24th International Conference on Applications of Natural Language to Information Systems, NLDB 2019, 2019, vol. 11608, pp. 337–344, doi: https://doi.org/10.1007/978-3-030-23281-8_29.
M. Rashwan, A. Al Sallab, H. Raafat, and A. Rafea, “Automatic Arabic diacritics restoration based on deep nets,” in Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP), 2014, pp. 65–72, doi: 10.3115/v1/W14-3608.
Y. Belinkov and J. Glass, “Arabic Diacritization with Recurrent Neural Networks,” in Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015, pp. 2281–2285, doi: 10.18653/v1/D15-1274.
G. A. Abandah, A. Graves, B. Al-Shagoor, A. Arabiyat, F. Jamour, and M. Al-Taee, “Automatic diacritization of Arabic text using recurrent neural networks,” Int. J. Doc. Anal. Recognit., vol. 18, no. 2, pp. 183–197, 2015, doi: https://doi.org/10.1007/s10032-015-0242-2.
A. S. Metwally and M. A. Rashwan, “A multi-layered approach for Arabic text diacritization,” in 2016 IEEE International Conference on Cloud Computing and Big Data Analysis (ICCCBDA), 2016, pp. 389–393, doi: 10.1109/ICCCBDA.2016.7529589.
A. A. Zayyan, M. Elmahdy, H. Binti Husni, and J. M. Al Ja’am, “Automatic Diacritics Restoration for Dialectal Arabic Text,” Int. J. Comput. Inf. Sci., vol. 12, no. 2, pp. 159–165, 2016, doi: 10.21700/ijcis.2016.119.
A. Said, M. El-sharqwi, A. Chalabi, and E. Kamal, “A Hybrid Approach for Arabic Diacritization,” in 18th International Conference on Applications of Natural Language to Information Systems, NLDB 2013, 2013, vol. 7934, pp. 53–64, doi: 10.1007/978-3-642-38824-8.
K. Shaalan, H. M. Abo Bakr, and I. Ziedan, “A hybrid approach for building Arabic diacritizer,” in The EACL 2009 workshop on computational approaches to semitic languages, 2009, pp. 27–35, [Online]. Available: http://dl.acm.org/citation.cfm?id=1621774.1621780.
A. Shahrour, S. Khalifa, and N. Habash, “Improving Arabic Diacritization through Syntactic Analysis,” in Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015, pp. 1309–1315, doi: 10.18653/v1/D15-1152.
J. Nivre, J. Hall, and J. Nilsson, “MaltParser: A Data-Driven Parser-Generator for Dependency Parsing,” in Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06), 2006, pp. 2216–2219, [Online]. Available: http://www.lrec-conf.org/proceedings/lrec2006/pdf/162_pdf.pdf.
A. Chennoufi and A. Mazroui, “Morphological, syntactic and diacritics rules for automatic diacritization of Arabic sentences,” J. King Saud Univ. - Comput. Inf. Sci., vol. 29, no. 2, pp. 156–163, 2017, doi: 10.1016/j.jksuci.2016.06.004.
A. Fashwan and S. Alansary, “SHAKKIL: an automatic diacritization system for modern standard Arabic texts,” in Proceedings of the Third Arabic Natural Language Processing Workshop, 2017, pp. 84–93, doi: 10.18653/v1/W17-1311.
K. Darwish, H. Mubarak, and A. Abdelali, “Arabic Diacritization: Stats, Rules, and Hacks,” in Proceedings of the Third Arabic Natural Language Processing Workshop, 2017, pp. 9–17, doi: 10.18653/v1/W17-1302.
S. Alqudah, G. Abandah, and A. Arabiyat, “Investigating Hybrid Approaches for Arabic Text Diacritization with Recurrent Neural Networks,” in 2017 IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT), 2017, pp. 1–6, doi: 10.1109/AEECT.2017.8257765.
H. Abbad and S. Xiong, “Multi-components System for Automatic Arabic Diacritization,” in 42nd European Conference on Information Retrieval, ECIR 2020, 2020, vol. 12035, pp. 341–355, doi: https://doi.org/10.1007/978-3-030-45439-5_23.
M.-T. Luong, H. Pham, and C. D. Manning, “Effective Approaches to Attention-based Neural Machine Translation,” in Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015, pp. 1412–1421, doi: 10.18653/v1/D15-1166.
O. Hamed and T. Zesch, “A Survey and Comparative Study of Arabic Diacritization Tools,” J. Lang. Technol. Comput. Linguist., vol. 32, no. 1, pp. 27–47, 2017.
A. Fadel, I. Tuffaha, B. Al-Jawarneh, and M. Al-Ayyoub, “Arabic Text Diacritization Using Deep Neural Networks,” in 2019 2nd International Conference on Computer Applications & Information Security (ICCAIS), May 2019, pp. 1–7, doi: 10.1109/CAIS.2019.8769512.
Downloads
Published
How to Cite
Issue
Section
ARK
License
Copyright (c) 2023 Ali Mijlad, Yacine EL YOUNOUSSI
This work is licensed under a Creative Commons Attribution 4.0 International License.
Copyright on any article in the International Journal of Engineering and Applied Physics is retained by the author(s) under the Creative Commons license, which permits unrestricted use, distribution, and reproduction provided the original work is properly cited.
License agreement
Authors grant IJEAP a license to publish the article and identify IJEAP as the original publisher.
Authors also grant any third party the right to use, distribute and reproduce the article in any medium, provided the original work is properly cited.