Bilevel Feature Extraction-Based Text Mining for Fault Diagnosis of Railway Systems

Authors

  • Keerthanaa D Department of Computer Science, Idhaya College for Women, Kumbakonam, Tamilnadu, India
  • Rosy CP M.Sc Computer Science, Idhaya College for Women, Kumbakonam, Tamilnadu, India

Keywords:

Bilevel, Feature Selection, Feature Extraction, Railway, Text Mining

Abstract

A vast amount of text data is recorded in the forms of repair verbatim in railway maintenance sectors. Efficient text mining of such maintenance data plays an important role in detecting anomalies and improving fault diagnosis efficiency. However, unstructured verbatim, high-dimensional data, and imbalanced fault class distribution pose challenges for feature selections and fault diagnosis. We propose a bilevel feature extraction-based text mining that integrates features extracted at both syntax and semantic levels with the aim to improve the fault classification performance. We first perform an improved χ2 statistics-based feature selection at the syntax level to overcome the learning difficulty caused by an imbalanced data set. Then, we perform a prior latent Dirichlet allocation-based feature selection at the semantic level to reduce the data set into a lowdimensional topic space. Finally, we fuse fault features derived from both syntax and semantic levels via serial fusion. The proposed method uses fault features at different levels and enhances the precision of fault diagnosis for all fault classes, particularly minority ones. Its performance has been validated by using a railway maintenance data set collected from 2008 to 2014 by a railway corporation. It out performs traditional approaches

References

[1] D. G. Rajpathak, “An ontology based text mining system for knowledge discovery from the diagnosis data in the automotive domain,” Comput.Ind., vol. 64, no. 5, pp. 565–580, Jun. 2013.

[2] W. Wang, H. Xu, and X. Huang, “Implicit feature detection via aconstrained topic model and SVM,” in Proc. Conf. Empirical Methods Natural Lang. Process., Seattle, WA, USA, 2013, pp. 903– 907.

[3] L. Yin, Y. Ge, K. Xiao, X. Wang, and X. Quan, “Feature selection for high-dimensional imbalanced data,” Neuro computing, vol. 105, pp. 3–11,Apr. 2013.

[4] Z. Zhai, B. Liu, H. Xu, and P. Jia, “Constrained LDA for grouping productfeatures in opinion mining,” in Proc. 15th Pacific-Asia Conf. Adv. Knowl.Discov. Data Mining, Shenzhen, China, 2011, vol. 1, pp. 448–459.

[5] X. Ding, Q. He, and N. Luo, “A fusion feature and its improvementbased on locality preserving projections for rolling element bearingfault classification,” J. Sound Vibration, vol. 335, pp. 367–383,Jan. 2015.

[6] L. Huang and Y. L. Murphey, “Text mining with application to engineeringdiagnostics,” in Proc. 19th Int. Conf. IEA/AIE, Annecy, France, 2006,pp. 1309–1317.

[7] J. Silmon and C. Roberts, “Improving switch reliability with innovativecondition monitoring techniques,” Proc. IMechE, F C J. Rail RapidTransit, vol. 224, no. 4, pp. 293–302, 2010. [8] D. Blei, A. Ng, and M. Jordan, “Latent Dirichlet allocation,” J. Mach.Learn. Res., vol. 3, pp. 993–1022, Jan. 2003.

[9] J. Chang, J. Boyd-Graber, C.Wang, S. Gerrish, and D. Blei, “Reading tealeaves: How humans interpret topic models,” Neural Inf. Process. Syst.,vol. 22, pp. 288–296, 2009.

[10] D. A. Cieslak and N. V. Chawla, “Learning decision trees for unbalanceddata,” in Proceedings of the 2008 European Conference on MachineLearning and Knowledge Discovery in Databases-Part I. Berlin,Germany: Springer-Verlag, 2008, pp. 241–256.

[11] T. Kailath, “The divergence and Bhattacharyya distance measures in signalselection,” IEEE Trans. Commun. Technol., vol. 15, no. 1, pp. 52–60,Feb. 1967.

[12] J. Yang, J. Yang, D. Zhang, and J. Lu, “Feature fusion: Parallel strategyvs. serial strategy,” Pattern Recognit., vol. 36, no. 6, pp. 1369–1381,Jun. 2003.

[13] C. Drummond and R. C. Holte, “C4. 5, class imbalanced, and cost sensitivity:Why under-sampling beats over-sampling,” in Proc. WorkshopLearn. Imbalanced Datasets II, ICML, Washington, DC, USA, 2003,pp. 1–8.

[14] X. Liu, J. Wu, and Z. Zhou, “Exploratory undersampling for classimbalancelearning,” IEEE Trans. Syst., Man Cybern., B, vol. 39, no. 2,pp. 539–550, Apr. 2009.

[15] D. Margineantu and T. G. Dietterich, “Learning decision trees for lossminimization in multi-class problems,” Dept. Comput. Sci., Oregon StateUniv., Corvallis, OR, USA, Tech. Rep., 1999.

[16] M. V. Joshi, R. Agarwal, and V. Kumar, “Predicting rare classes: Canboosting make any week learner strong?” in Proc. 8th ACM SIGKDDInt. Conf. Knowl. Discov. Data Mining, Edmonton, AB, Canada, 2002,pp. 297–306.

[17] Y. Tang, Y. Zhang, and N. V. Chawla, “SVMs modeling for highly imbalancedclassification,” IEEE Trans. Syst., Man Cybern., B, vol. 39, no. 1,pp. 281–288, Feb. 2009.

[18] G. Weiss, “Mining with rarity: A unifying framework,” ACM SIGKDDExplorations Newslett.—Spec. Issue Learn. Imbalanced Datasets, vol. 6,no. 1, pp. 7–19, Jun. 2004.

[19] D. Mladenic and M. Grobelnik, “Feature selection for unbalanced classdistribution and naive Bayes,” in Proc. 16th Int. Conf. Mach. Learn., Bled,Slovenia, 1999, pp. 258–267.

[20] Z. Zheng, X.Wu, and R. Srihari, “Feature selection for text categorizationon imbalanced data,” ACM SIGKDD Explorations Newslett.—Spec. IssueLearn. Imbalanced Datasets, vol. 6, no. 1, pp. 80–89, Jun. 2004

Downloads

Published

2025-11-24

How to Cite

[1]
D. Keerthanaa and C. P. Rosy, “Bilevel Feature Extraction-Based Text Mining for Fault Diagnosis of Railway Systems”, Int. J. Comp. Sci. Eng., vol. 7, no. 4, pp. 137–139, Nov. 2025.