Class imbalance in gradient boosting classification algorithms: Application to experimental stroke data

Olga Lyashevska, Fiona Malone, Eugene MacCarthy, Jens Fiehler, Jan Hendrik Buhk, Liam Morris

Research output: Contribution to journalArticlepeer-review

7 Citations (Scopus)

Abstract

Imbalance between positive and negative outcomes, a so-called class imbalance, is a problem generally found in medical data. Imbalanced data hinder the performance of conventional classification methods which aim to improve the overall accuracy of the model without accounting for uneven distribution of the classes. To rectify this, the data can be resampled by oversampling the positive (minority) class until the classes are approximately equally represented. After that, a prediction model such as gradient boosting algorithm can be fitted with greater confidence. This classification method allows for non-linear relationships and deep interactive effects while focusing on difficult areas by iterative shifting towards problematic observations. In this study, we demonstrate application of these methods to medical data and develop a practical framework for evaluation of features contributing into the probability of stroke.

Original languageEnglish
Pages (from-to)916-925
Number of pages10
JournalStatistical Methods in Medical Research
Volume30
Issue number3
DOIs
Publication statusPublished - Mar 2021
Externally publishedYes

Keywords

  • Imbalanced data
  • classification algorithm
  • gradient boosting
  • oversampling
  • stroke
  • trees

Fingerprint

Dive into the research topics of 'Class imbalance in gradient boosting classification algorithms: Application to experimental stroke data'. Together they form a unique fingerprint.

Cite this