Researchers used a low-cost, machine learning-based approach to diagnosing paraffin-embedded lymphoma biopsy specimens, with a possible application to diagnostics in low- to middle-income countries (LMICs). They reported their findings in a study published in the journal Blood Advances.

The researchers pointed out in their report that in LMICs the lymphoma diagnosis process may be hindered by difficulties related to cost or in finding a pathologist. However, subtypes of lymphoma are potentially identifiable by transcriptional profiling techniques, they explained in their report.

The researchers analyzed lymphoma biopsy specimens from the Instituto de Cancerología y Hospital Dr. Bernardo Del Valle in Guatemala City, Guatemala, a hospital that reportedly sees approximately 100 patients per year with suspicious findings for lymphoma. Biopsy specimens used in this study were obtained from the years 2006 to 2018. For these specimens, diagnoses were assigned as nonmalignant or as any of 8 types of lymphoma based on World Health Organization guidelines for pathology assessments.

Continue Reading

The researchers then conducted targeted expression profiling of 37 genes using remaining sample material from biopsy specimens by performing probe-based polymerase chain reactions using capillary electrophoresis. This approach cost approximately $10 per sample, according to the researchers, including reagents and consumables.

With gene expression data, the researchers then used machine learning techniques, including an Extreme Gradient Boosting algorithm, to develop a predictive model for diagnosing the lymphoma specimens. Available sample data were divided across a training cohort (70% of specimens, representing 397 samples) and a validation cohort (30% of specimens, representing 163 samples).

The validation cohort showed 86% accuracy (95% CI, 80-91), with highest accuracy for diffuse large B-cell lymphoma, Hodgkin lymphoma, mantle cell lymphoma, and natural killer/T-cell lymphoma subtypes. Indeterminate calls were those for which probability fell below a 60% confidence threshold, and these were present in 17% of the validation cohort. When samples with indeterminate calls were removed from analysis, the accuracy rose to 94% (95% CI, 89-97).

The concordance level was 97% for a set of 37 samples, with indeterminate calls removed, that were assayed in both the US and Guatemala. In another analysis, a cohort of 39 specimens with relapsed disease had been evaluated, with an accuracy of 79%, which rose to 88% when indeterminate cases were removed.

While some specimens failed quality control, the researchers reported that classification accuracy did not depend significantly on specimen age or tissue type in terms of nodal/secondary lymphoid tissue or extranodal tissue.

In a cost analysis, the researchers estimated that if performing assays in Guatemala, the cost per sample would be $6.76 if running 95 samples per week, and it would be a maximum of $15 per sample if running 16 tests per week.

“Our findings indicate that, even in complex diagnostic settings like lymphoma, gene expression–based testing can be both more accurate and less expensive than currently available strategies in LMICs,” the researchers wrote in their report. The researchers also have set up an open-access app, enabling anyone to use the model developed in this study with appropriate specimen data.

Disclosures: Some authors have declared affiliations with or received grant support from the pharmaceutical industry. Please refer to the original study for a full list of disclosures.


Valvert F, Silva O, Solórzano-Ortiz E, et al. Low-cost transcriptional diagnostic to accurately categorize lymphomas in low- and middle-income countries. Blood Adv. 2021;5(10):2447-2455. doi:10.1182/bloodadvances.2021004347