Web Attack Detection for SQLi and XSS Using Ensemble Learning Based on Character-Level N-Gram Features

Yaya Suharya; Mohammad Bayu Anggara

doi:10.35870/ijsecs.v6i1.7193

Published: 2026-04-30

Web Attack Detection for SQLi and XSS Using Ensemble Learning Based on Character-Level N-Gram Features

DOI: 10.35870/ijsecs.v6i1.7193

Yaya Suharya, Mohammad Bayu Anggara

Affiliation Details

Yaya Suharya: Universitas Bale Bandung
Mohammad Bayu Anggara: Universitas Bale Bandung

Cover IJSECS VOLUME 6 NOMOR 1 APRIL 2026

PDF

Article Metrics

Scopus Citations
Google Scholar
Crossref Citations
Semantic Scholar
DataCite Metrics
If the link doesn't work, copy the DOI or article title for manual search (API Maintenance).

Abstract

SQL Injection (SQLi) and Cross-Site Scripting (XSS) remain severe threats to web application security, particularly as attackers employ increasingly sophisticated obfuscation techniques to bypass conventional detection systems. This research constructs a machine learning framework using ensemble learning — specifically combining Random Forest and XGBoost — integrated with character-level n-gram feature extraction. The methodology involved rigorous data curation of a large-scale dataset, refining 156,636 raw samples into 151,783 unique entries to ensure high-quality training data. By extracting 10,000 character-level n-gram features, the model captures the intricate structural patterns of complex and obfuscated payloads. Experimental results show consistent and measurable performance: the proposed ensemble model achieved an overall accuracy of 99.67%. Stability was confirmed through a 5-fold cross-validation process, yielding a mean accuracy of 99.64% and a standard deviation of 0.0003. These findings are reinforced by ROC AUC scores of 1.0000 for XSS and 0.9999 for SQLi, indicating near-perfect discriminative capability. The combination of character-level representation and ensemble learning produces a precise and resilient solution for safeguarding modern web environments against dynamic and evolving cyber threats.

Keywords

Character-level N-gram; Ensemble Learning; SQL Injection; Web Security; XSS

Peer Review Process

This article has undergone a double-blind peer review process to ensure quality and impartiality.

Indexing Information

Discover where this journal is indexed at our indexing page.

Open Science Badges

This journal supports transparency in research and encourages authors to meet criteria for Open Science Badges.

How to Cite

Suharya, Y., & Anggara, M. B. (2026). Web Attack Detection for SQLi and XSS Using Ensemble Learning Based on Character-Level N-Gram Features. International Journal Software Engineering and Computer Science (IJSECS), 6(1), 391-403. https://doi.org/10.35870/ijsecs.v6i1.7193

Article Information

This article has been peer-reviewed and published in the International Journal Software Engineering and Computer Science (IJSECS). The content is available under the terms of the Creative Commons Attribution 4.0 International License.

Issue: Vol. 6 No. 1 (2026)
Section: Articles
Published: 2026-04-30

License: CC BY 4.0
Copyright: © 2026 Authors
DOI: 10.35870/ijsecs.v6i1.7193

AI Research Hub

This article is indexed and available through various AI-powered research tools and citation platforms. Our AI Research Hub ensures that scholarly work is discoverable, accessible, and easily integrated into the global research ecosystem.

Scholarly Connection Platforms

Dimensions

Connected Papers

Scite

Google Scholar

Semantic Scholar

Garuda

Scilit

Crossref

BASE

Zenodo

Unpaywall

OpenCitations

Author Biographies

Yaya Suharya, Universitas Bale Bandung

Department of Informatics Engineering, Universitas Bale Bandung, Bandung Regency, West Java Province, Indonesia

Mohammad Bayu Anggara, Universitas Bale Bandung

Department of Informatics Engineering, Universitas Bale Bandung, Bandung Regency, West Java Province, Indonesia

References

Ahmed, M., & Uddin, M. N. (2020). Cyber attack detection method based on NLP and ensemble learning approach. 2020 23rd International Conference on Computer and Information Technology (ICCIT), 1–6. https://doi.org/10.1109/ICCIT51783.2020.9392682
Bakır, R. (2025). UniEmbed: A novel approach to detect XSS and SQL injection attacks leveraging multiple feature fusion with machine learning techniques. Arabian Journal for Science and Engineering, 50(19), 15591–15604. https://doi.org/10.1007/s13369-024-09916-4
Feng, Y., Yang, Z., Sun, Q., & Liu, Y. (2024). SEDAT: A stacked ensemble learning-based detection model for multiscale network attacks. Electronics, 13(15), Article 2953. https://doi.org/10.3390/electronics13152953
Habibi, G., & Surantha, N. (2020). XSS attack detection with machine learning and n-gram methods. 2020 International Conference on Information Management and Technology (ICIMTech), 516–520. https://doi.org/10.1109/ICIMTech50083.2020.9210946
Işiker, B., & Soğukpınar, İ. (2021). Machine learning based web application firewall. 2021 2nd International Informatics and Software Engineering Conference (IISEC), 1–6. https://doi.org/10.1109/IISEC54230.2021.9672335
Kiruthika, S., Roshni, A., & Padmavathi, G. (2024). Detection of Obfuscated Malware using Ensemble Learning Techniques. Grenze International Journal of Engineering & Technology (GIJET), 10.
Kshirsagar, D., & Kumar, S. (2020). An ensemble feature reduction method for web attack detection. Journal of Discrete Mathematical Sciences and Cryptography, 23(2), 515–529. https://doi.org/10.1080/09720529.2020.1721861
Le, T., Hwang, Y., Choi, C., & Wardhani, R. W. (2024). Enhancing SQL injection detection with trustworthy ensemble learning and boosting models using local explanation techniques. Preprints, Article 2024100878. https://doi.org/10.20944/preprints202410.1878.v1
Li, Z., Liu, F., Gu, Z., & Liu, Y. (2025). XSS attack detection method based on CNN-BiLSTM-Attention. Applied Sciences, 15(16), Article 8924. https://doi.org/10.3390/app15168924
Lower, N., & Zhan, F. (2020). A study of ensemble methods for cyber security. 2020 10th Annual Computing and Communication Workshop and Conference (CCWC), 1001–1009. https://doi.org/10.1109/CCWC47524.2020.9031256
Odeh, A., & Taleb, A. A. (2024). Ensemble learning techniques against structured query language injection attacks. Indonesian Journal of Electrical Engineering and Computer Science, 35(2), 1004–1012. https://doi.org/10.11591/ijeecs.v35.i2.pp1004-1012
Panadiya, P., & Singhal, M. K. (2024). Advanced detection and prevention of SQL injection attacks using machine learning techniques for enhanced web security. International Journal of Scientific Research in Science and Technology, 11(6), 554–564.
Patil, S., & Bansode, R. (2024). Advancing web security: Machine learning-based attack detection with optimized features. Panamerican Mathematical Journal, 35(2s), 571–579. https://doi.org/10.52783/pmj.v35.i2s.2938
Phuong, T. T., The, T. T., Shigetomi, R., Yamamura, Y., & Nakata, T. (2020). Boosting homograph attack classification using ensemble learning and N-gram model. 2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), 1983–1988. https://doi.org/10.1109/TrustCom50675.2020.00271
Ren, X., Hu, Y., Kuang, W., & Souleymanou, M. B. (2018). A web attack detection technology based on bag of words and hidden Markov model. 2018 IEEE 15th International Conference on Mobile Ad Hoc and Sensor Systems (MASS), 526–531. https://doi.org/10.1109/MASS.2018.00081
Riadi, I., Umar, R., & Lestari, T. (2020). Analisis kerentanan serangan cross site scripting (XSS) pada aplikasi smart payment menggunakan framework OWASP. Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), 5(3), 146–152.
Sonoda, M., Matsuda, T., Koizumi, D., & Hirasawa, S. (2011). On automatic detection of SQL injection attacks by the feature extraction of the single character. Proceedings of the 4th International Conference on Security of Information and Networks (SIN '11), 81–86. https://doi.org/10.1145/2070425.2070440
Sornsuwit, P., & Jaiyen, S. (2019). A new hybrid machine learning for cybersecurity threat detection based on adaptive boosting. Applied Artificial Intelligence, 33(4), 1–21. https://doi.org/10.1080/08839514.2019.1582861
Subba, B., & Gupta, P. (2021). A TF-IDF vectorizer and singular value decomposition based host intrusion detection system framework for detecting anomalous system processes. Computers & Security, 100, Article 102084. https://doi.org/10.1016/j.cose.2020.102084
Tadhani, J. R., Vekariya, V., Sorathiya, V., Alshathri, S., & El Shafai, W. (2024). Securing web applications against XSS and SQLi attacks using a novel deep learning approach. Scientific Reports, 14, Article 1228. https://doi.org/10.1038/s41598-023-48845-4.

License & Copyright

This work is licensed under a Creative Commons Attribution 4.0 International License.

Authors who publish with this journal agree to the following terms:

1. Copyright Retention and Open Access License

Authors retain copyright of their work and grant the journal non-exclusive right of first publication under the Creative Commons Attribution 4.0 International License (CC BY 4.0).

This license allows unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

2. Rights Granted Under CC BY 4.0

Under this license, readers are free to:

Share — copy and redistribute the material in any medium or format
Adapt — remix, transform, and build upon the material for any purpose, including commercial use
No additional restrictions — the licensor cannot revoke these freedoms as long as license terms are followed

3. Attribution Requirements

All uses must include:

Proper citation of the original work
Link to the Creative Commons license
Indication if changes were made to the original work
No suggestion that the licensor endorses the user or their use

4. Additional Distribution Rights

Authors may:

Deposit the published version in institutional repositories
Share through academic social networks
Include in books, monographs, or other publications
Post on personal or institutional websites

Requirement: All additional distributions must maintain the CC BY 4.0 license and proper attribution.

5. Self-Archiving and Pre-Print Sharing

Authors are encouraged to:

Share pre-prints and post-prints online
Deposit in subject-specific repositories (e.g., arXiv, bioRxiv)
Engage in scholarly communication throughout the publication process

6. Open Access Commitment

This journal provides immediate open access to all content, supporting the global exchange of knowledge without financial, legal, or technical barriers.

Published: 2026-04-30

Web Attack Detection for SQLi and XSS Using Ensemble Learning Based on Character-Level N-Gram Features

DOI: 10.35870/ijsecs.v6i1.7193

Yaya Suharya, Mohammad Bayu Anggara

Article Metrics

Share:

Abstract

Keywords

Peer Review Process

Indexing Information

Open Science Badges

How to Cite

Article Information

Issue: Vol. 6 No. 1 (2026)

Section: Articles

Published: 2026-04-30

License: CC BY 4.0

Copyright: © 2026 Authors

DOI: 10.35870/ijsecs.v6i1.7193

AI Research Hub

Yaya Suharya, Universitas Bale Bandung

Mohammad Bayu Anggara, Universitas Bale Bandung

1. Copyright Retention and Open Access License

2. Rights Granted Under CC BY 4.0

3. Attribution Requirements

4. Additional Distribution Rights

5. Self-Archiving and Pre-Print Sharing

6. Open Access Commitment

Powered by Contrimetric

Recommendations