Published: 2026-04-30
Web Attack Detection for SQLi and XSS Using Ensemble Learning Based on Character-Level N-Gram Features
DOI: 10.35870/ijsecs.v6i1.7193
Yaya Suharya, Mohammad Bayu Anggara
Article Metrics
- Scopus Citations
- Google Scholar
- Crossref Citations
- Semantic Scholar
- DataCite Metrics
-
If the link doesn't work, copy the DOI or article title for manual search (API Maintenance).
Abstract
SQL Injection (SQLi) and Cross-Site Scripting (XSS) remain severe threats to web application security, particularly as attackers employ increasingly sophisticated obfuscation techniques to bypass conventional detection systems. This research constructs a machine learning framework using ensemble learning — specifically combining Random Forest and XGBoost — integrated with character-level n-gram feature extraction. The methodology involved rigorous data curation of a large-scale dataset, refining 156,636 raw samples into 151,783 unique entries to ensure high-quality training data. By extracting 10,000 character-level n-gram features, the model captures the intricate structural patterns of complex and obfuscated payloads. Experimental results show consistent and measurable performance: the proposed ensemble model achieved an overall accuracy of 99.67%. Stability was confirmed through a 5-fold cross-validation process, yielding a mean accuracy of 99.64% and a standard deviation of 0.0003. These findings are reinforced by ROC AUC scores of 1.0000 for XSS and 0.9999 for SQLi, indicating near-perfect discriminative capability. The combination of character-level representation and ensemble learning produces a precise and resilient solution for safeguarding modern web environments against dynamic and evolving cyber threats.
Keywords
Character-level N-gram; Ensemble Learning; SQL Injection; Web Security; XSS
Peer Review Process
This article has undergone a double-blind peer review process to ensure quality and impartiality.
Indexing Information
Discover where this journal is indexed at our indexing page.
Open Science Badges
This journal supports transparency in research and encourages authors to meet criteria for Open Science Badges.
How to Cite
Article Information
This article has been peer-reviewed and published in the International Journal Software Engineering and Computer Science (IJSECS). The content is available under the terms of the Creative Commons Attribution 4.0 International License.
-
Issue: Vol. 6 No. 1 (2026)
-
Section: Articles
-
Published: 2026-04-30
-
License: CC BY 4.0
-
Copyright: © 2026 Authors
-
DOI: 10.35870/ijsecs.v6i1.7193
AI Research Hub
This article is indexed and available through various AI-powered research tools and citation platforms. Our AI Research Hub ensures that scholarly work is discoverable, accessible, and easily integrated into the global research ecosystem.
Yaya Suharya, Universitas Bale Bandung
Department of Informatics Engineering, Universitas Bale Bandung, Bandung Regency, West Java Province, Indonesia
-
Ahmed, M., & Uddin, M. N. (2020). Cyber attack detection method based on NLP and ensemble learning approach. 2020 23rd International Conference on Computer and Information Technology (ICCIT), 1–6. https://doi.org/10.1109/ICCIT51783.2020.9392682
-
Bakır, R. (2025). UniEmbed: A novel approach to detect XSS and SQL injection attacks leveraging multiple feature fusion with machine learning techniques. Arabian Journal for Science and Engineering, 50(19), 15591–15604. https://doi.org/10.1007/s13369-024-09916-4
-
Feng, Y., Yang, Z., Sun, Q., & Liu, Y. (2024). SEDAT: A stacked ensemble learning-based detection model for multiscale network attacks. Electronics, 13(15), Article 2953. https://doi.org/10.3390/electronics13152953
-
Habibi, G., & Surantha, N. (2020). XSS attack detection with machine learning and n-gram methods. 2020 International Conference on Information Management and Technology (ICIMTech), 516–520. https://doi.org/10.1109/ICIMTech50083.2020.9210946
-
Işiker, B., & Soğukpınar, İ. (2021). Machine learning based web application firewall. 2021 2nd International Informatics and Software Engineering Conference (IISEC), 1–6. https://doi.org/10.1109/IISEC54230.2021.9672335
-
-
Kshirsagar, D., & Kumar, S. (2020). An ensemble feature reduction method for web attack detection. Journal of Discrete Mathematical Sciences and Cryptography, 23(2), 515–529. https://doi.org/10.1080/09720529.2020.1721861
-
Le, T., Hwang, Y., Choi, C., & Wardhani, R. W. (2024). Enhancing SQL injection detection with trustworthy ensemble learning and boosting models using local explanation techniques. Preprints, Article 2024100878. https://doi.org/10.20944/preprints202410.1878.v1
-
Li, Z., Liu, F., Gu, Z., & Liu, Y. (2025). XSS attack detection method based on CNN-BiLSTM-Attention. Applied Sciences, 15(16), Article 8924. https://doi.org/10.3390/app15168924
-
Lower, N., & Zhan, F. (2020). A study of ensemble methods for cyber security. 2020 10th Annual Computing and Communication Workshop and Conference (CCWC), 1001–1009. https://doi.org/10.1109/CCWC47524.2020.9031256
-
Odeh, A., & Taleb, A. A. (2024). Ensemble learning techniques against structured query language injection attacks. Indonesian Journal of Electrical Engineering and Computer Science, 35(2), 1004–1012. https://doi.org/10.11591/ijeecs.v35.i2.pp1004-1012
-
-
Patil, S., & Bansode, R. (2024). Advancing web security: Machine learning-based attack detection with optimized features. Panamerican Mathematical Journal, 35(2s), 571–579. https://doi.org/10.52783/pmj.v35.i2s.2938
-
Phuong, T. T., The, T. T., Shigetomi, R., Yamamura, Y., & Nakata, T. (2020). Boosting homograph attack classification using ensemble learning and N-gram model. 2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), 1983–1988. https://doi.org/10.1109/TrustCom50675.2020.00271
-
Ren, X., Hu, Y., Kuang, W., & Souleymanou, M. B. (2018). A web attack detection technology based on bag of words and hidden Markov model. 2018 IEEE 15th International Conference on Mobile Ad Hoc and Sensor Systems (MASS), 526–531. https://doi.org/10.1109/MASS.2018.00081
-
-
Sonoda, M., Matsuda, T., Koizumi, D., & Hirasawa, S. (2011). On automatic detection of SQL injection attacks by the feature extraction of the single character. Proceedings of the 4th International Conference on Security of Information and Networks (SIN '11), 81–86. https://doi.org/10.1145/2070425.2070440
-
Sornsuwit, P., & Jaiyen, S. (2019). A new hybrid machine learning for cybersecurity threat detection based on adaptive boosting. Applied Artificial Intelligence, 33(4), 1–21. https://doi.org/10.1080/08839514.2019.1582861
-
Subba, B., & Gupta, P. (2021). A TF-IDF vectorizer and singular value decomposition based host intrusion detection system framework for detecting anomalous system processes. Computers & Security, 100, Article 102084. https://doi.org/10.1016/j.cose.2020.102084
-
Tadhani, J. R., Vekariya, V., Sorathiya, V., Alshathri, S., & El Shafai, W. (2024). Securing web applications against XSS and SQLi attacks using a novel deep learning approach. Scientific Reports, 14, Article 1228. https://doi.org/10.1038/s41598-023-48845-4.

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
1. Copyright Retention and Open Access License
Authors retain copyright of their work and grant the journal non-exclusive right of first publication under the Creative Commons Attribution 4.0 International License (CC BY 4.0).
This license allows unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
2. Rights Granted Under CC BY 4.0
Under this license, readers are free to:
- Share — copy and redistribute the material in any medium or format
- Adapt — remix, transform, and build upon the material for any purpose, including commercial use
- No additional restrictions — the licensor cannot revoke these freedoms as long as license terms are followed
3. Attribution Requirements
All uses must include:
- Proper citation of the original work
- Link to the Creative Commons license
- Indication if changes were made to the original work
- No suggestion that the licensor endorses the user or their use
4. Additional Distribution Rights
Authors may:
- Deposit the published version in institutional repositories
- Share through academic social networks
- Include in books, monographs, or other publications
- Post on personal or institutional websites
Requirement: All additional distributions must maintain the CC BY 4.0 license and proper attribution.
5. Self-Archiving and Pre-Print Sharing
Authors are encouraged to:
- Share pre-prints and post-prints online
- Deposit in subject-specific repositories (e.g., arXiv, bioRxiv)
- Engage in scholarly communication throughout the publication process
6. Open Access Commitment
This journal provides immediate open access to all content, supporting the global exchange of knowledge without financial, legal, or technical barriers.