Enhancing Interpretable Anomaly Detection: Depth-based Extended Isolation Forest Feature Importance (DEIFFI)
DOI:
https://doi.org/10.26438/ijcse/v12i5.5967Keywords:
Anomaly Detection, Explainable Artificial Intelligence, Extended Isolation Forest, Feature Selection,, Interpretability, InterpretabilityOutlier DetectionAbstract
The research introduces a novel approach, Depth-based Extended Isolation Forest Feature Importance (DEIFFI), to enhance the interpretability of Extended Isolation Forest (EIF) algorithm in anomaly detection (AD). Anomaly detection is critical for identifying rare and significant deviations from norm in data. However, understanding the reasons behind classifying instances as anomalies poses a challenge. DEIFFI addresses this challenge by providing valuable insights, empowering users of EIF-based AD to conduct thorough root cause analysis. A noteworthy feature of DEIFFI is its capacity to improve interpretability without imposing heavy computational burdens. This is crucial for real world applications requiring efficient AD, particularly in situations demanding real-time decision-making. DEIFFI achieves remarkable results with low computational costs, making it an appealing option for practical implementations. With an accuracy of 0.914 and 0.942, precision of 0.607 and 0.64, recall of 0.773 and 0.96, and an F1 score of 0.68 and 0.768 on real and synthetic datasets, respectively. DEIFFI provides interpretable insights alongside competitive performance metrics, solidifying its suitability for real-time decision support. Importantly, DEIFFI contributes to AD by enhancing interpretability and assisting in unsupervised feature selection. This dual capability highlights practical utility of DEIFFI, improving EIF’s capabilities and extending its applicability across diverse AD scenarios.
References
[1] Andrew Bell, Ian Solano-Kamaiko, Oded Nov, and Julia Stoyanovich. It’s just not that simple: an empirical study of the accuracy-explainability trade-off in machine learning for public policy. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, pp.248–266, 2022.
[2] G´erard Biau and Erwan Scornet. A random forest guided tour. Test, 25, pp.197–227, 2016.
[3] Mattia Carletti, Matteo Terzi, and Gian Antonio Susto. Interpretable anomaly detection with diffi: Depth-based feature importance of isolation forest. Engineering Applications of Artificial Intelligence, 119:105730, 2023.
[4] Chengjie Chen, Hao Chen, Yi Zhang, Hannah R Thomas, Margaret H Frank, Yehua He, and Rui Xia. Tbtools: an integrative toolkit developed for interactive analyses of big biological data. Molecular plant, Vol.13, Issue.8, pp.1194–1202, 2020.
[5] Zhiguo Ding and Minrui Fei. An anomaly detection approach based on isolation forest algorithm for streaming data using sliding window. IFAC Proceedings Vol.46, Issue.20, pp.12–17, 2013.
[6] Finale Doshi-Velez and Been Kim. Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608, 2017.
[7] Timo Freiesleben, Gunnar K¨onig, Christoph Molnar, and Alvaro Tejero-Cantero. Scientific inference with interpretable machine learning: Analyzing models to learn about real-world phenomena. arXiv preprint arXiv:2206.05487, 2022.
[8] David Gunning and David Aha. Darpa‘s explainable artificial intelligence (xai) program. AI magazine, Vol.40, Issue.2, pp.44–58, 2019.
[9] Sahand Hariri, Matias Carrasco Kind, and Robert J. Brunner. Extended isolation forest. IEEE Transactions on Knowledge and Data Engineering, Vol.33, Issue.4, pp.1479–1489, 2021.
[10] Abderrahim BENI Hssane and Moulay Lahcen. Improved and balanced leach for heterogeneous wireless sensor networks. IJCSE International Journal on Computer Science and Engineering, Vol.2, Issue.8, pp.2633–2640, 2010.
[11] Vladislav Ishimtsev, Alexander Bernstein, Evgeny Burnaev, and Ivan Nazarov. Conformal k-nn anomaly detector for univariate data streams. In Conformal and Probabilistic Prediction and Applications, pages 213–227. PMLR, 2017.
[12] Pawe-l Karczmarek, Adam Kiersztyn, Witold Pedrycz, and Ebru Al. K-means-based isolation forest. Knowledge-based systems, 195:105659, 2020.
[13] Edwin M Knorr, Raymond T Ng, and Vladimir Tucakov. Distance-based outliers: algorithms and applications. The VLDB Journal, Vol.8, Issue.3, pp.237–253, 2000.
[14] Hans-Peter Kriegel, Matthias Schubert, and Arthur Zimek. Angle-based outlier detection in high-dimensional data. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pp.444– 452, 2008.
[15] Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou. Isolation forest. In 2008 eighth ieee international conference on data mining, pp.413–422, 2008.
[16] Lorenzo Meneghetti, Matteo Terzi, Simone Del Favero, Gian Antonio Susto, and Claudio Cobelli. Data-driven anomaly recognition for unsupervised model-free fault detection in artificial pancreas. IEEE Transactions on Control Systems Technology, Vol.28, Issue.1, pp.33–47, 2020.
[17] Hla Yin Min and Win Zaw. Performance evaluation of energy efficient cluster-based routing protocol in wireless sensor networks. International Journal of Computer Science Engineering IJCSE, Vol.3, Issue.2, pp.71–76, 2014.
[18] KM Archana Patel and Prateek Thakral. The best clustering algorithms in data mining. In 2016 International Conference on Communication and Signal Processing (ICCSP), pp.2042–2046, 2016.
[19] Andrew Pavlo, Gustavo Angulo, Joy Arulraj, Haibin Lin, Jiexi Lin, Lin Ma, Prashanth Menon, Todd C Mowry, Matthew Perron, Ian Quah, et al. Self-driving database management systems. In CIDR, Vol.4, pp.1, 2017.
[20] Luca Puggini and Se‘n McLoone. An enhanced variable selection and isolation forest-based methodology for anomaly detection with oes data. Engineering Applications of Artificial Intelligence, 67: pp.126–135, 2018.
[21] Guillaume Staerman, Pavlo Mozharovskyi, Stephan Cl´emen¸con, and Florence d‘Alch´e Buc. Functional isolation forest. In Asian Conference on Machine Learning, pp.332–347. 2019.
[22] Simon Tong and Daphne Koller. Support vector machine active learning with applications to text classification. Journal of machine learning research, 2(Nov), pp.45–66, 2001.
[23] Ke Wu, Kun Zhang, Wei Fan, Andrea Edwards, and S Yu Philip. Rs-forest: A rapid density estimator for streaming anomaly detection. In 2014 IEEE international conference on data mining, pp.600–609, 2014.
[24] Junbo Zhang, Yu Zheng, Dekang Qi, Ruiyuan Li, and Xiuwen Yi. Dnn-based prediction model for spatio-temporal data. In Proceedings of the 24th ACM SIGSPATIAL international conference on advances in geographic information systems, pp.1–4, 2016.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors contributing to this journal agree to publish their articles under the Creative Commons Attribution 4.0 International License, allowing third parties to share their work (copy, distribute, transmit) and to adapt it, under the condition that the authors are given credit and that in the event of reuse or distribution, the terms of this license are made clear.
