Detecting Ransomware Encryption with File Signatures and Machine Learning Models

Michael Duignan, Michael Schukat, Enda Barrett

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

This study presents an analysis of the use of machine learning models in the identification and classification of ransomware encrypted files, differentiating them from standard encrypted or compressed files, and non-encrypted files (referred to as goodware). The study utilized a robust dataset of approximately 159,897 files, categorized into goodware, Chaos, Conti, and Xorist strains, and applied five machine learning models: Logistic Regression, Linear Discriminant Analysis, K-Nearest Neighbor, Naive Bayes, and Classification and Regression Trees to this dataset. The models were trained using an array of data points, including file headers and footers, entropy, Chi Squared, and file extensions. The analysis revealed high accuracy rates of between 97% and 100% in distinguishing ransomware encrypted files from other file types, demonstrating the importance of file extensions as a key determinant in this process. The study also draws attention to the increasing prevalence and complexity of ransomware strains, specifically those which do not alter file extensions, thereby posing additional challenges to identification and classification efforts. The research suggests further investigation and study into a wider array of ransomware strains and a more extensive range of file types. Special emphasis is recommended on strains that do not modify file extensions, as understanding these could significantly enhance the efficiency and effectiveness of machine learning models in ransomware detection.

Original languageEnglish
Title of host publication2023 34th Irish Signals and Systems Conference, ISSC 2023
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798350340570
DOIs
Publication statusPublished - 2023
Event34th Irish Signals and Systems Conference, ISSC 2023 - Dublin, Ireland
Duration: 13 Jun 202314 Jun 2023

Publication series

Name2023 34th Irish Signals and Systems Conference, ISSC 2023

Conference

Conference34th Irish Signals and Systems Conference, ISSC 2023
Country/TerritoryIreland
CityDublin
Period13/06/2314/06/23

Keywords

  • Chaos
  • Conti
  • Magic Numbers
  • Ransomware
  • Shannon entropy
  • Xorist
  • encryption
  • machine learning

Fingerprint

Dive into the research topics of 'Detecting Ransomware Encryption with File Signatures and Machine Learning Models'. Together they form a unique fingerprint.

Cite this