18. Predicting brain relapse in lymphoma patients using machine learning

150 150 Techna Symposium

Drs. Sleiman Bassim, Robert Kridel

Princess Margaret Cancer Centre – University Health Network

With an incidence of some 4,000 new cases a year in Canada, diffuse large B-cell lymphoma (DLBCL) is the most common subtype of non-Hodgkin lymphoma. Relapse occurs in 40% of DLBCL patients and is often fatal. We have a particular interest in relapse that occurs in the brain (5% of patients). At present, no accurate prediction model of brain relapse exists. Our goal was to improve prediction of relapse in the brain with a new polygenic risk model.

Gene expression arrays with 70,524 probes were used to profile tumor biopsies from 240 patients. Patients were selected to fall into either of 3 groups: those that presented with brain relapse, those that presented with relapse outside of the brain, and those that were cured. We allocated 188 samples to the training cohort and 52 to the validation cohort. Stochastic mini-batch sampling insured balanced classification. Subsequently, we compared over 20 different machine learning models and deep networks to create a translational prediction test. Because expression alone is not sufficient to yield optimal model sensitivity and confidence, additional metrics were added to the classification. These metrics include modularity of gene clusters, gene-gene interactions, gene ranking by L1-regularization, and degrees of feature importance.

Support vector machines, gradient boosting, and neural nets ranked as top classifiers. The model that achieved best overall performance contained 64 genes, with 87.5% accuracy at 95% confidence interval, F1-statistic at 5.57, and p-value ≤ 0.001. These genes fell into discrete pathways, including transcriptional deregulation (MYC, MMP9, TRAF1, LMO2) and cyctokine receptors (CXCR4, CXCR6, IFNAR2, TNFRSF17).

Our approach identified, better than any reported prognostic model, individuals who were at high risk of brain relapse. Our study further opens the possibility for functional exploration of the biological underpinnings of relapse in the brain.