Hate speech against women and immigrants: A comparative analysis of machine learning and text embedding techniques
Main Article Content
Abstract
Hate speech on social media, especially against women and immigrants, is a major issue. Twitter, which promotes public discourse and diverse viewpoints, explicitly rejects violence, discrimination, and assaults based on race, nationality, ethnicity, social status, sexual orientation, age, disability, or severe illness. Hate speech harms individuals and communities, but the volume of internet content makes routine detection impractical. This challenge highlights the need to address and develop effective hate speech detection and categorization systems for women and immigrants. This research describes the deployment of two advanced machine learning paradigms, the Random Forest and Support Vector Machine (SVM), using text pre-processing, post-processing, and advanced text embedding techniques like TF-IDF, CBOW, and GloVE. Detailed categorization of a Twitter dataset into hate speech and subclassification into aggressive and targeted dimensions is the main goal. Based on the complex interaction of text embeddings and classification typology, model efficacy is carefully assessed. When combined with TF-IDF embeddings, the Random Forest classifier excels at hate speech categorization. Concurrently, merging GloVE embeddings with the SVM algorithm exhibits outstanding precision in discriminating between aggressive, non-aggressive, targeted, and non-targeted categories. Also, CBOW embeddings work well for broader hate speech classification. Thus, this work improves social media hate speech identification by providing theoretical and practical insights
Article Details
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.