IntelliPaper
Abstract
The translation of speech from a source to speech in a target language with generative artificial intelligence is an area of research that is presently being actively explored. This is aimed at solving global language barriers thereby ensuring seamless communication between the individuals involved. It has been well developed for high-resourced languages like English, Spanish, French and Chinese. Currently, objective evaluation metrics such as Bilingual Evaluation Understudy Scores (BLEUS), and subjective metrics such as Mean Opinion Score Naturalness (MOSN) and Mean Opinion Score Similarity (MOSS) are being used to evaluate the performance of the output of speech-to-speech models. However, low resourced languages are still undeveloped in the area of speech processing applications, especially the African indigenous languages. The output speech in the target language needs to be evaluated to determine the closeness to the ground truth, as well as how natural and intelligible it is to the intended listeners. This paper presents a review of trends from the current metrics to emerging ones such as Recall Oriented Understudy for Gisting Evaluation-L (ROUGE-L) and BLASER. The applications of speech models’ metrics on various leaderboards and modern AI platforms were also discussed. The outcome shows that while BLEU score and MOSN metrics are prevalent for speech models, there is a need to explore metrics such as ROUGE-L, and BERTScore which are machine translation metric due to their benefits.
Explore Digital Article Text
Article file ID not found.
Conflict of Interest
The authors declare no conflict of interest.
Ethical Approval
Not applicable
Data Availability
The datasets used in this study are openly available at [repository link] and the source code is available on GitHub at [GitHub link].
Funding
This work did not receive any external funding.