SpaGCN: Integrating gene expression, spatial location and histology to identify spatial domains and spatially variable genes by graph convolutional network

Jian Hu1, Xiangjie Li2, Kyle Coleman3, Amelia Schroeder3, Nan Ma4, David J Irwin5, Edward B Lee6, Russell T Shinohara3, Mingyao Li7

  1. Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA. jianhu@pennmedicine.upenn.edu.
  2. School of Statistics and Data Science, Nankai University, Tianjin, China.
  3. Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
  4. Weitzman School of Design, University of Pennsylvania, Philadelphia, PA, USA.
  5. Department of Neurology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
  6. Translational Neuropathology Research Laboratory, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
  7. Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA. mingyao@pennmedicine.upenn.edu.

Abstract

Recent advances in spatially resolved transcriptomics (SRT) technologies have enabled comprehensive characterization of gene expression patterns in the context of tissue microenvironment. To elucidate spatial gene expression variation, we present SpaGCN, a graph convolutional network approach that integrates gene expression, spatial location and histology in SRT data analysis. Through graph convolution, SpaGCN aggregates gene expression of each spot from its neighboring spots, which enables the identification of spatial domains with coherent expression and histology. The subsequent domain guided differential expression (DE) analysis then detects genes with enriched expression patterns in the identified domains. Analyzing seven SRT datasets using SpaGCN, we show it can detect genes with much more enriched spatial expression patterns than competing methods. Furthermore, genes detected by SpaGCN are transferrable and can be utilized to study spatial variation of gene expression in other datasets. SpaGCN is computationally fast, platform independent, making it a desirable tool for diverse SRT studies.

Presented By Jian Hu | ORCID iD