Analyze Multidimensional Data with Machine Learning Algorithms
February 19, 2018
Published in the latest supplement of Current Protocols in Cytometry
, this publication describes a high-quality, rigorously tested protocol that explains how to analyze multidimensional data (such as that acquired through multiplexing techniques) utilizing machine learning algorithms.
Generating Quantitative Cell Identity Labels with Marker Enrichment Modeling (MEM)
Kirsten E. Diggins, Jocelyn S. Gandelman, Caroline E. Roe, Jonathan M. Irish
Identifying distinct clusters or groups from high dimensional data has become increasingly automated with clustering and dimensionality reduction tools. However, classifying and labeling those groups is still left to expert interpretation, which can be both time consuming and subjective. Marker enrichment modeling (MEM) automates this process by producing a quantitative label for a cluster indicating which features are enriched on that cluster relative to others in the sample or dataset. MEM quantifies both positive and negative feature enrichment, i.e. which features are specifically expressed and specifically lacking on a population, on a scale from -10 to +10. These labels can be interpreted by experts or read by machines to facilitate machine learning and automated classification approaches to cell subset identification and characterization.
Multiplexed single-cell experimental techniques like mass cytometry measure 40 or more features and enable deep characterization of well-known and novel cell populations. However, traditional data analysis techniques rely extensively on human experts or prior knowledge, and novel machine learning algorithms may generate unexpected population groupings. Marker enrichment modeling (MEM) creates quantitative identity labels based on features enriched in a population relative to a reference.
While developed for cell type analysis, MEM labels can be generated for a wide range of multidimensional data types, and MEM works effectively with output from expert analysis and diverse machine learning algorithms. MEM is implemented as an R package and includes three steps: (1) calculation of MEM values that quantify each feature's relative enrichment in the population, (2) reporting of MEM labels as a heatmap or as a text label, and (3) quantification of MEM label similarity between populations.
The protocols here show MEM analysis using datasets from immunology and oncology. These MEM implementations provide a way to characterize population identity and novelty in the context of computational and expert analyses. © 2018 by John Wiley & Sons, Inc.
Diggins, K. E., Gandelman, J. S., Roe, C. E., & Irish, J. M. (2018). Generating quantitative cell identity labels with marker enrichment modeling (MEM). Current Protocols in Cytometry, 83, 10.21.1–10.21.28. doi: 10.1002/cpcy.34