TY - GEN
T1 - Detecting Ransomware Encryption with File Signatures and Machine Learning Models
AU - Duignan, Michael
AU - Schukat, Michael
AU - Barrett, Enda
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - This study presents an analysis of the use of machine learning models in the identification and classification of ransomware encrypted files, differentiating them from standard encrypted or compressed files, and non-encrypted files (referred to as goodware). The study utilized a robust dataset of approximately 159,897 files, categorized into goodware, Chaos, Conti, and Xorist strains, and applied five machine learning models: Logistic Regression, Linear Discriminant Analysis, K-Nearest Neighbor, Naive Bayes, and Classification and Regression Trees to this dataset. The models were trained using an array of data points, including file headers and footers, entropy, Chi Squared, and file extensions. The analysis revealed high accuracy rates of between 97% and 100% in distinguishing ransomware encrypted files from other file types, demonstrating the importance of file extensions as a key determinant in this process. The study also draws attention to the increasing prevalence and complexity of ransomware strains, specifically those which do not alter file extensions, thereby posing additional challenges to identification and classification efforts. The research suggests further investigation and study into a wider array of ransomware strains and a more extensive range of file types. Special emphasis is recommended on strains that do not modify file extensions, as understanding these could significantly enhance the efficiency and effectiveness of machine learning models in ransomware detection.
AB - This study presents an analysis of the use of machine learning models in the identification and classification of ransomware encrypted files, differentiating them from standard encrypted or compressed files, and non-encrypted files (referred to as goodware). The study utilized a robust dataset of approximately 159,897 files, categorized into goodware, Chaos, Conti, and Xorist strains, and applied five machine learning models: Logistic Regression, Linear Discriminant Analysis, K-Nearest Neighbor, Naive Bayes, and Classification and Regression Trees to this dataset. The models were trained using an array of data points, including file headers and footers, entropy, Chi Squared, and file extensions. The analysis revealed high accuracy rates of between 97% and 100% in distinguishing ransomware encrypted files from other file types, demonstrating the importance of file extensions as a key determinant in this process. The study also draws attention to the increasing prevalence and complexity of ransomware strains, specifically those which do not alter file extensions, thereby posing additional challenges to identification and classification efforts. The research suggests further investigation and study into a wider array of ransomware strains and a more extensive range of file types. Special emphasis is recommended on strains that do not modify file extensions, as understanding these could significantly enhance the efficiency and effectiveness of machine learning models in ransomware detection.
KW - Chaos
KW - Conti
KW - Magic Numbers
KW - Ransomware
KW - Shannon entropy
KW - Xorist
KW - encryption
KW - machine learning
UR - http://www.scopus.com/inward/record.url?scp=85165975907&partnerID=8YFLogxK
U2 - 10.1109/ISSC59246.2023.10162047
DO - 10.1109/ISSC59246.2023.10162047
M3 - Conference contribution
AN - SCOPUS:85165975907
T3 - 2023 34th Irish Signals and Systems Conference, ISSC 2023
BT - 2023 34th Irish Signals and Systems Conference, ISSC 2023
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 34th Irish Signals and Systems Conference, ISSC 2023
Y2 - 13 June 2023 through 14 June 2023
ER -