Multilingual Code-Mixed Sentiment Analysis in Hate Speech

Dr. Anish Singh

− Abstract

This paper presents a simple and effective method to find the sentiment (positive, negative, or neutral) in hate speech written in more than one language in the same sentence, which is common on Indian social media. This type of text, called code-mixed (for example, Hindi and English together), is difficult for traditional sentiment analysis systems that work on a single language. Because there are very few labeled datasets for such text, we first collected tweets containing both hate and non-hate speech from Twitter. We then used a pretrained transformer model, trained on multilingual social media data, to automatically give sentiment labels to these tweets. Using this labeled data, we trained six machine learning models including ensemble model. Our tests show that the ensemble model performs the best, giving higher accuracy, precision, recall, and F1-score. We found that hate speech usually has negative sentiment, while non-hate speech is often neutral or positive. This work offers a scalable framework for sentiment classification in low-resource, code-mixed environments and sets a foundation for broader applications such as toxic comment moderation and social media monitoring.

− Conflict of Interest

The authors declare no conflict of interest.

− Ethical Approval

Not applicable

− Data Availability

The datasets used in this study are openly available at [repository link] and the source code is available on GitHub at [GitHub link].

− Funding

This work did not receive any external funding.

− Cite this article

Generating citation...

− Related Research

Version of record

v1.0
Issue date

NA
Language

English

Open Access

Research Article

CC-BY-NC 4.0

Multilingual Code-Mixed Sentiment Analysis in Hate Speech

− Abstract

− Conflict of Interest

− Ethical Approval

− Data Availability

− Funding

− Cite this article

Version of record

Issue date

Language

Next Research

Copy of Cardiovascular Risk Factors and Cardiovascular Risk in People Living with HIV: Comparison of Four Cardiovascular Risk Prediction Algorithms