Oral Presentation Australasian Extracellular Vesicles Conference 2020

Analysis of unannotated long non-coding RNAs from exosome subtypes using next-generation RNA sequencing (#17)

Wittaya WS Suwakulsiri 1 , Maoshan MC Chen 2 , David DG Greening 3 , Rong RX Xu 1 , Richard RS Simpson 1
  1. La Trobe Institute for Molecular Science, Melbourne, VIC, Australia
  2. Australian Centre for Blood Diseases, Alfred Hospital, Monash University, Melbourne, Victoria, Australia
  3. Molecular Proteomics, Baker IDI Heart and Diabetes Institute, Melbourne, Victoria, Australia

Long non-coding RNAs (lncRNAs) contain >200 nucleotides and act as regulatory molecules in transcription and translation processes in both normal and pathological conditions. Exosomal lncRNAs have gained attention as important mediators of intercellular communication and potential markers for disease diagnosis and prognosis. In recent years, a lot of exosomal lncRNAs have been discovered and annotated. However, it is expected that many exosomal lncRNAs are yet to be identified as characterization of unannotated exosomal RNAs with non-protein coding sequences from massive RNA sequencing data is technically challenging. Here we describe a method for the discovery of unannotated lncRNAs in two exosome subtypes (A33+ and EpCAM+ exosomes) sequentially isolated from the human colon cell line, LIM1863 using ultracentrifugation and immunoaffinity capture-based purification. The method inputs exosomal RNA sequencing reads and performs transcript assembly to identify unannotated exosomal lncRNAs. Cutoffs (length, number of exon, human protein-coding probability and area where transcription occurs) are used to identify potentially novel exosomal lncRNAs. Raw read count calculation and differential expression analysis are also introduced for downstream analysis and candidate selection. We identified 130,396 exosomal lncRNAs across the two exosome subtypes. Of these, 40,543 transcripts are identified as unannotated exosomal lncRNAs (number of exon < 2, protein-coding probability (CPAT) < 0.364 and derived from fully contained within a reference intron, exonic overlap on the opposite strand and unknown intergenic). The method permits novel unannotated exosomal lncRNA discovery using RNA sequencing coupled with stringent bioinformatic approaches. This protocol facilitates the discovery of novel exosomal lncRNA candidates for future cell biology studies and potential disease biomarkers