There is nothing special about PCA on RNAseq counts. It is same as microarray data except that instead of expression values, you will have counts. Let us work on such a small data and you can download it from here ( the file is zipped and google AI monkeys are converting .txt files to word automatically). This data is obtained from web and it has 3 normal samples and 3 tumor samples.
Data looks like this:
Sample names are in columns and Gene symbols are in rows. Condition categories are Normal and Tumor.
Now that we have data, let us do some PCA in R and plot by sample, condition and both together:
Code:
=====================================
## Load data into R
## We have used gene names as row names and records are separated by space
data=read.csv("final_counts.txt", sep=" ", header = T, stringsAsFactors = F, row.names = 1)
## Create a prcomp object. For this we need to transpose the data frame so that sample
## names are in rows and gene names are in columns
pca_data=prcomp(t(data))
## Let us calculated the variances covered by components.
pca_data_perc=round(100*pca_data$sdev^2/sum(pca_data$sdev^2),1)
## create a data frame with principal component 1 (PC1), PC2, Conditions and sample names
df_pca_data = data.frame(PC1 = pca_data$x[,1], PC2 = pca_data$x[,2], sample = colnames(data), condition = rep(c("Normal","Tumor"),each=3))
## Let us have a look at the new data frame
## Let us plot now. There are more than one way of plotting this: One highlight each sample ## second, highlight each group instead of samples, third color by sample and shaped by
## condition
library(ggplot2)
ggplot(df_pca_data, aes(PC1,PC2, color = sample))+
geom_point(size=8)+
labs(x=paste0("PC1 (",pca_data_perc[1],")"), y=paste0("PC2 (",pca_data_perc[2],")"))
color by Sample
ggplot(df_pca_data, aes(PC1,PC2, color = condition))+
geom_point(size=8)+
labs(x=paste0("PC1 (",pca_data_perc[1],")"), y=paste0("PC2 (",pca_data_perc[2],")"))
color by group
ggplot(df_pca_data, aes(PC1,PC2, color = sample, shape=condition))+
geom_point(size=8)+
labs(x=paste0("PC1 (",pca_data_perc[1],")"), y=paste0("PC2 (",pca_data_perc[2],")"))
color by group and shape by condition