Skip to content

vidhya2205/Transcriptomic-Profiling-of-Old-Age-Sarcoma-Patients-using-TCGA-RNA-seq-data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

65 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Transcriptomic-Profiling-of-Old-Age-Sarcoma-Patients-using-TCGA-RNA-seq-data

This project aims to identify significant differentially expressed genes, transcription factors, prognostic biomarkers and hub genes by transcriptomic profiling of older age (>=65 years) Sarcoma patients. 

Table of Contents

Description

Sarcoma is a rare type of cancer that is more frequently found among children (<18 years) and older adults ( OA- age at diagnosis ≥ 65years). However, these populations are less frequently involved in clinical studies and their survival rates are poorer compared to the younger adults (YA - age at diagnosis - 18-65 years). It is further seen that the tumor microenvironment in OA cancer patients is different compared to the YA. Hence, in this study we utilize the TCGA-SARC RNA-seq database to identify differentially regulated genes, transcription factors, prognostic biomarkers and hub genes. We further perform functional enrichment analysis along with literature survey on the identified genes to understand the dysregulated pathways in OA.

We use R programming language along with several tools like Cytoscape, STRING, ShinyGO and databases TCGA, DoRothEA, TRRUST for the analysis.

Installation

Steps to set up the project locally:

  1. Cloning of repository :
git clone <https://github.com/vidhya2205/Transcriptomic-Profiling-of-Old-Age-Sarcoma-Patients-using-TCGA-RNA-seq-data.git>
  1. Navigate to the Code directory :
cd Transcriptomic-Profiling-of-Old-Age-Sarcoma-Patients-using-TCGA-RNA-seq-data/Code

Code Directory: This is the folder within the project where the code resides. Ensure you execute all subsequent commands from within this directory to avoid issues with file paths or configurations. 4. Install the R and R package dependencies:
R version 4.4.0 (2024-04-24 ucrt) is used

  1. CRAN packages-
install.packages(c("dplyr", "tidyr", "ggplot2", "gplots", "tidyverse", "reshape2", "svglite", "survminer", "survival", "forestplot")) <br>
  1. Bioconductor packages -
if (!requireNamespace("BiocManager", quietly = TRUE))
{
   install.packages("BiocManager")
}
BiocManager::install(c("TCGAbiolinks", "SummarizedExperiment", "EnhancedVolcano", "org.Hs.eg.db", "dorothea", "enrichR", "DESeq2"))
  1. Additional tools used: 
    1. Cytoscape - Download Cytoscape (Version 3.10.2 was used)
    2. ShinyGO - ShinyGO 0.81 web tool.

Usage

This project is built around an R Notebook (`code.Rmd`) that contains multiple sections to perform different tasks. Follow the steps below to use it effectively: 

  1. Open the Code.Rmd file-
    Use RStudio or any R-compatible IDE to open `code.Rmd`.
  2. Notebook Structure -
     The notebook is organized into the following sections:
    1. Load the libraries needed

    2. Section 1: Preliminary Survival analysis, Data extraction and preprocessing -
      Description:
      This section obtains the Clinical and RNA-seq data from the TCGA database for SARC (sarcoma patients). The samples are stratified based on age at diagnosis into OA (≥ 65 years) and YA (18-65 years) Further, the survival analysis is done using cox regression analysis and log rank association test. A bubble plot to represent the subtypes included in the study is plotted. The RNA-seq data is preprocessed and lowly expressed genes with a quantile normalization cutoff of 0.25 are filtered out.
      Outputs:
      This section produces 2 images

      1. Patient Demographics
      2. KM_Plot for OAvsYA
    3. Section 2: Differential Gene Expression analysis (DGEA) and Functional Enrichment analysis (FEA) -
      Description:
      DGEA  comparing the OA with YA samples is done using the edgeR methodology. Significant differentially regulated genes (Sig-DEG’s) are selected based on logFC > ± 1.5 and p value < 0.005. A Volcano plot and heatmap representing the up and down regulated genes is made. Further, FEA of the sig-DGE’s is done to obtain the top 5 significant GO Terms associated with them.
      Outputs:
      This section produces 4 images and 2 csv

      1. Enhanced Volcano
      2. Heatmap
      3. Up-regulated genes FEA
      4. Down regulated genes_FEA
      5. DGEA results(all genes).csv
      6. DGEA results(significant gens).csv
    4. Section 3:  Transcription Factor Enrichment Analysis -
      Description:
      Using DoRothEA and TRRUST trancription factor- Target interaction databases, in this section significant transcription factors (sig-TFs) are identified as illustrated in the Flowchart_TFEA. Then we use Cytoscape app and STRING network database to visualize the interactions of the sig-TF’s. FEA analysis of the sig-TF’s is done using the ShinyGO web based tool and the top 5 GO terms are visualized in R.
      Inputs:

      1. Human DoRothEA TF-TG database_paper
      2. TRRUST TF-TG database_Github_repository
      3. ShinyGO Results_csv

      Outputs:
      This section produces 4 images and 4 csv

      1. DGEA of sig-TF's
      2. FEA of sig-TFs
      3. Network for TF's identified by DoRothEA database cytoscape
      4. Network for TF's identified by TRRUST database_ cytoscape
      5. TF's from DoRothEA.csv
      6. TF's from TRRUST.csv
      7. sig-TF's identified from DoRothEA based analysis
      8. sig-TF's identified from TRRUST based analysis
    5. Section 4:  Gene Specific Survival analysis (Prognostic markers) -
      Description:
      This section performs a gene specific survival analysis of the OA sarcoma patients exclusively by comparing samples with high (expression> median) and low (expression<median) values to identify genes that have a significant association with their lower survival as illustrated in the Flowchart_GSSA. Cox regression and KM log- rank association test based results are used to select the significant survival associated genes (sig-Surv). Functional enrichment analysis of these genes is done using the enrichR package. Further, a forest plot to represent the sig-Surv genes, expression strata (high/low) and their HR’s is plotted.
      Outputs:
      This section produces 4 images and 2 csv

      1. DGEA of significant survival associated genes
      2. FEA of significant survival associated genes
      3. KM Plot for the significant survival associated genes
      4. Forest Plot (Cox) for the survival associated genes
      5. Gene Specific Survival Analysis all genes.csv
      6. Significant differentially expressed genes associated with survival(sig-Survival)

Acknowledgments

The authors would like to express their gratitude to Adewale Ogunleye and Richard Agyekum from the Hackbio team for their mentorship in completing this research project. 

About

Transcriptomic profiling of older age (>=65 years) Sarcoma patients to identify significant differentially expressed genes, transcription factors, prognostic biomarkers and hub genes.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors