ToxiGuard: An Ensemble TF-IDF Framework for Multi-label Toxic Comment Filtering

Zefeng Liu*, Fei Pan
Lingnan University, Hong Kong 999077, China
*Corresponding email: zefengliu@ln.hk
https://doi.org/10.71052/srb2024/WTWQ7807

Background: Online discussion platforms face persistent risks from toxic and abusive comments, which can degrade community health and require costly manual moderation. In practice, toxicity detection is challenging due to noisy user-generated text, severe class imbalance, and the multi-label nature of harmful behaviors (e.g., insults, threats, and identity attacks may co-occur). This study attempts to build a practical and reproducible toxicity filtering pipeline that remains effective under limited computational budgets. Methods: We propose ToxiGuard, a lightweight multi-label classification framework that combines text normalization, unigram/bigram TF-IDF feature extraction, and heterogeneous classical learners (e.g., Logistic Regression, LinearSVC, Multinomial Naïve Bayes, and gradient-boosted trees). Model selection is performed via 10-fold cross-validation using ROC-AUC as the primary criterion, and an ensemble voting strategy is adopted to improve robustness across labels. For fair evaluation, instances with missing ground-truth labels are excluded from scoring when applicable. Results: Experiments on a large-scale Wikipedia discussion dataset demonstrate that the proposed ensemble consistently outperforms individual base learners across toxicity categories, yielding more stable AUC performance under label imbalance and text noise. Conclusion: ToxiGuard provides an efficient, interpretable, and deployment-friendly baseline for multi-label toxic comment filtering and can serve as a strong foundation for subsequent enhancements such as threshold calibration, cost-sensitive learning, and neural text encoders.

References
[1] Yang, Y., Lv, H., Chen, N. (2023) A survey on ensemble learning under the era of deep learning. Artificial Intelligence Review, 56(6), 5545-5589.
[2] Sawicki, J., Ganzha, M., Paprzycki, M. (2023) The state of the art of natural language processing – a systematic automated review of NLP literature using NLP techniques. Data Intelligence, 5(3), 707-749.
[3] Jim, J. R., Talukder, M. A. R., Malakar, P., Kabir, M. M., Nur, K., Mridha, M. F. (2024) Recent advancements and challenges of NLP-based sentiment analysis: a state-of-the-art review. Natural Language Processing Journal, 6, 100059.
[4] Olujimi, P. A., Ade-Ibijola, A. (2023) NLP techniques for automating responses to customer queries: a systematic review. Discover Artificial Intelligence, 3(1), 20.
[5] Mienye, I. D., Sun, Y. (2022) A survey of ensemble learning: concepts, algorithms, applications, and prospects. IEEE Access, 10, 99129-99149.
[6] Rupapara, V., Rustam, F., Shahzad, H. F., Mehmood, A., Ashraf, I., Choi, G. S. (2021) Impact of SMOTE on imbalanced text features for toxic comments classification using RVVC model. IEEE Access, 9, 78621-78634.
[7] Cahyani, D. E., Patasik, I. (2021) Performance comparison of tf-idf and word2vec models for emotion text classification. Bulletin of Electrical Engineering and Informatics, 10(5), 2780-2788.
[8] Bailly, A., Blanc, C., Francis, É., Guillotin, T., Jamal, F., Wakim, B., Roy, P. (2022) Effects of dataset size and interactions on the prediction performance of logistic regression and deep learning models. Computer Methods and Programs in Biomedicine, 213, 106504.
[9] Gulati, K., Kumar, S. S., Boddu, R. S. K., Sarvakar, K., Sharma, D. K., Nomani, M. Z. M. (2022) Comparative analysis of machine learning-based classification models using sentiment classification of tweets related to COVID-19 pandemic. Materials Today: Proceedings, 51, 38-41.
[10] Salih, N., Ksantini, M., Hussein, N., Ben Halima, D., Abdul Razzaq, A., Ahmed, S. (2023) Prediction of ROP zones using deep learning algorithms and voting classifier technique. International Journal of Computational Intelligence Systems, 16(1), 86.
[11] Hupkes, D., Giulianelli, M., Dankers, V., Artetxe, M., Elazar, Y., Pimentel, T., Jin, Z. (2023) A taxonomy and review of generalization research in NLP. Nature Machine Intelligence, 5(10), 1161-1174.
[12] Goyal, S., Doddapaneni, S., Khapra, M. M., Ravindran, B. (2023) A survey of adversarial defenses and robustness in nlp. ACM Computing Surveys, 55(14s), 1-39.
[13] Dogra, V., Verma, S., Kavita, Chatterjee, P., Shafi, J., Choi, J., Ijaz, M. F. (2022) A complete process of text classification system using state – of – the – art NLP models. Computational Intelligence and Neuroscience, 2022(1), 1883698.
[14] Haque, R., Islam, N., Islam, M., Ahsan, M. M. (2022) A comparative analysis on suicidal ideation detection using NLP, machine, and deep learning. Technologies, 10(3), 57.
[15] Lauriola, I., Lavelli, A., Aiolli, F. (2022) An introduction to deep learning in natural language processing: Models, techniques, and tools. Neurocomputing, 470, 443-456.
[16] Li, Q., Wang, Y., Shao, Y., Li, L., Hao, H. (2023) A comparative study on the most effective machine learning model for blast loading prediction: From GBDT to Transformer. Engineering Structures, 276, 115310.
[17] Zini, J. E., Awad, M. (2022) On the explainability of natural language processing deep models. ACM Computing Surveys, 55(5), 1-31.
[18] Kastrati, Z., Dalipi, F., Imran, A. S., Pireva Nuci, K., Wani, M. A. (2021) Sentiment analysis of students’ feedback with NLP and deep learning: A systematic mapping study. Applied Sciences, 11(9), 3986.
[19] Qorib, M., Oladunni, T., Denis, M., Ososanya, E., Cotae, P. (2023) Covid-19 vaccine hesitancy: text mining, sentiment analysis and machine learning on COVID-19 vaccination Twitter dataset. Expert Systems with Applications, 212, 118715.

Share and Cite
Liu, Z., Pan, F. (2025) ToxiGuard: An Ensemble TF-IDF Framework for Multi-label Toxic Comment Filtering. Scientific Research Bulletin, 2(4), 1-15. https://doi.org/10.71052/srb2024/WTWQ7807

Published

19/12/2025