Arabic Information Extraction Methods A Survey

Mazen El Sayed,

Peer Review Double Blind Handling Editor
Accepted 6 November 2023 Access Review Comments
Online published 29 April 2019

− Abstract

The IR systems developed for western languages, such as English, have high performances when used in their own languages, but they don?A ??Zt have this same performance when used for ? eastern languages such as Arabic. This is due to the fact that the Arabic language has a different and complex structure and morphology: polysemy, irregular and inflected derived forms, various spelling of certain words, various writing of certain combination character, short (diacritics) and long vowels. In addition, an Arabic word is derived from a root by concatenating some affixes based on regular set of word patterns. To address these problems, several methods have been proposed. The aim of this paper is to propose a survey of these methods. Although we not claim that this an exhaustive study, this work covers near 20 different methods. The main approaches applied in these methods are morphological or statistical analyses. To extract information from an Arabic document, the involved methods based on both approaches must answer the following question: “How can we find the root of the word we search”. To find a word in an Arabic dictionary, first we must extract the root of this word and then find this root in the dictionary, due to the fact that the vocabulary of the Arabic language is essentially built from the roots derivation. The roots are words composed of three to five consonants letters. This work will contribute to the enhancement of the Arabic information retrieval system performance, due to the fact that Arabic information extraction methods are the kernel of such system.

− Conflict of Interest

The authors declare no conflict of interest.

− Ethical Approval

Not applicable

− Data Availability

The datasets used in this study are openly available at [repository link] and the source code is available on GitHub at [GitHub link].

− Funding

This work did not receive any external funding.

− Cite this article

Generating citation...

− Related Research

Classification

FOR Code: 091599
Version of record

v1.0
Issue date

29 April 2019
Language

English

Iconic historic building with domed tower in London, UK.

Download Article

Open Access

Research Article

CC-BY-NC 4.0

LJER Volume 19 LJER Volume 19 Issue 2, Pg. 11-28

Explore Journal

Read LJER Volume 19 2 Issue Explore LJER Volume 19

Arabic Information Extraction Methods A Survey

Contact Person

− Abstract

− Conflict of Interest

− Ethical Approval

− Data Availability

− Funding

− Cite this article

Classification

Version of record

Issue date

Language

Next Research

Copy of Cardiovascular Risk Factors and Cardiovascular Risk in People Living with HIV: Comparison of Four Cardiovascular Risk Prediction Algorithms

Arabic Information Extraction Methods A Survey

Request Review Access

Order Article Reprints

Contact Person

− Abstract

− Conflict of Interest

− Ethical Approval

− Data Availability

− Funding

− Cite this article

− Related Research

Classification

Version of record

Issue date

Language

Next Research

Copy of Cardiovascular Risk Factors and Cardiovascular Risk in People Living with HIV: Comparison of Four Cardiovascular Risk Prediction Algorithms