Comparative Security Analysis of Password Recovery in PDF and Compressed File Formats
DOI:
https://doi.org/10.26438/ijcse/v13i9.815Keywords:
Password Recovery, Encrypted File Formats (PDF, ZIP, RAR),, Rule-Based and Brute Force Method, AI/ML- Assisted Password Cracking, Probabilistic Models (PCFG, Markov Chains),, Cryptographic Security and VulnerabilitieAbstract
Password protection in widely used file formats such as PDF, ZIP, and RAR is a key mechanism for securing sensitive digital data. While these formats implement strong encryption algorithms, including AES and key derivation functions like PBKDF2, practical security often hinges on the strength of user-chosen passwords, which are frequently weak or predictable. This study provides a comprehensive comparative analysis of password recovery techniques, encompassing conventional methods, brute force, dictionary-based, and rule-based approaches, and AI/ML-assisted strategies, including Markov chains, probabilistic context-free grammar (PCFG), and recurrent neural networks (RNNs). Experiments were performed on CPU-based systems using John the Ripper and Hashcat, evaluating performance across varying password lengths, character sets, and encryption schemes. The results demonstrate that weakly encrypted ZIP files are recovered almost instantly, whereas RAR archives employing PBKDF2-HMAC-SHA256 show substantially higher resistance. PDF files remain vulnerable to short passwords despite AES-256 encryption. Rule-based strategies consistently reduce recovery time compared to brute- force methods, while AI-assisted approaches produce realistic password candidates that closely mimic human password behavior, further enhancing efficiency. The findings underscore that practical security depends more on password quality than on cryptographic strength. This analysis offers actionable insights for security auditing, the enforcement of password policies, and the design of more resilient authentication mechanisms. Future work will explore GPU-accelerated recovery using CUDA frameworks and investigate the implications of quantum computing on large-scale password cracking, providing guidance for addressing emerging digital security challenges.
References
] and Engineering, Yokosuka, Japan. Vol.12, pp.1-1, 2024. DOI:10.1109/ACCESS.2024.3401195.
[2] J. Ma, W. Yang, M. Luo, and N. Li, “A study of probabilistic password models,” IEEE Symposium on Security and Privacy (SP `14), Purdue University, and Wuhan University, pp. 689-704, 2014, DOI: 10.1109/SP.2014.50.
[3] RARLab, “RAR version 3.20 – Technical information.”
[4] G. Hu, J. Ma, and B. Huang, “Password recovery for RAR files using CUDA,” in Proc. 8th IEEE Int. Conf. Dependable, Autonomic and Secure Computing, Chengdu, China, pp. 444–449, 2009.
[5] ISO 32000-1:2008, “Document management – Portable document format – Part 1: PDF 1.7,” International Organization for Standardization, Geneva, Switzerland, 2008.
[6] M. Nelson, “The Data Compression Book”, 2nd ed., M&T Books, IDG Books Worldwide, Inc., Publisher, Cambridge, 1991. ISBN: 1558514341
[7] PKWARE, “APPNOTE.TXT – .ZIP file format specification, version 6.3.9,” 2014.
[8] Spiceworks, “What is a ZIP file,” 2022.
[9] E7Z, “Open/Extract ZIP File with Freeware on Windows/Mac/Linux,” 2021.
[10] Payatu Security, “PoC for Foxit Reader CVE-2018-14442,” GitHub Repository, 2018.
[11] J. Kelsey, B. Schneier, D. Wagner, and C. Hall, “Cryptanalytic attacks on pseudorandom number generators,” in Proc. 5th Int. Workshop on Fast Software Encryption (FSE), Paris, France, pp.168–188, 1998. DOI: 10.1007/3-540-69710-1_12.
[12] E. Pavlov, “RAR 5.0 archive format,” RARLab Documentation, 2013.
[13] D. Müller, C. Rückert, and J. Schwenk, “PDFex: Breaking PDF encryption,” in Proc. 26th ACM Conf. Computer and Communications Security (CCS), London, UK, pp.731–743, 2019.
[14] B. Hitaj, P. G. Ateniese, and F. Perez-Cruz, “PassGAN: A deep learning approach for password guessing,” 17th International Conference on Applied Cryptography and Network Security (ACNS), Bogota, Colombia, Vol.9, Issue.24, pp.1–13, Dec. 2019.
[15] G. Pagnotta, D. Hitaj, F. De Gaspari, and L. V. Mancini, “PassFlow: Guessing passwords with generative flows,” In the Proceedings of the 52nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), Baltimore, MD, USA, 2022. DOI: 10.1109/DSN53405.2022.00035
[16] D. Biesner, K. Cvejoski, R. Sifa, et al., “Combining variational autoencoders and transformer language models for improved password generation,” in the proceedings of the 17th International Conference on Availability, Reliability and Security (ARES `22), Vienna, Austria, 2022. DOI: https://doi.org/10.1145/3538969.3539000.
[17] X. Su, X. Zhu, Y. Li, Y. Li, C. Chen, and P. Esteves-Veríssimo, “PagPassGPT: Pattern guided password guessing via generative pretrained transformer,” in the proceedings of the 54th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pp.429–442, 2024. DOI:10.1109/DSN58291.2024.00049.
[18] J. Xie, H. Cheng, R. Zhu, P. Wang, and K. Liang, “WordMarkov: A new password probability model of semantics,” in the proceedings of the 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). TU Delft Research Repository, pp.3034–3038 2022.
[19] V. Taneski, M. Kompara, M. Heri?ko, and B. Brumen, “Strength analysis of real-life passwords using Markov models,” Applied Sciences, vol. 11, Issue 20, Switzerland, pp.3895–3909, 2021. DOI: https://doi.org/10.3390/app11209406
[20] J. Bonneau, “The science of guessing: Analyzing an anonymized corpus of 70 million passwords,” in Proc. IEEE Symposium on Security and Privacy, San Francisco, CA, USA, pp.538–552, 2012. DOI:10.1109/SP.2012.49.
[21] M. Dürmuth, F. Calvet, M. Dell’Amico, and D. Balzarotti, “OMEN: Faster password guessing using an ordered Markov enumerator,” in Proc. 21st USENIX Security Symposium, Bellevue, WA, USA, pp.119–132, 2015.
[22] M. Weir, S. Aggarwal, B. de Medeiros, and B. Glodek, “Password cracking using probabilistic context-free grammars,” in Proc. 30th IEEE Symposium on Security and Privacy, Oakland, CA, USA, pp.391–405, 2009. DOI: 10.1109/SP.2009.8.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors contributing to this journal agree to publish their articles under the Creative Commons Attribution 4.0 International License, allowing third parties to share their work (copy, distribute, transmit) and to adapt it, under the condition that the authors are given credit and that in the event of reuse or distribution, the terms of this license are made clear.
