One of the routine stuff that R handles is to import data (in several formats) from either single file or multiple files. This note details importing multiple tab separated text files (without headers) and storing as a list with multiple data frames (one data frame for each file). Please note that these files have identical first column (gene names). User can later generate data frames of choice from this list. Examples are given below:
In this note, we work with examples that can be downloaded from here. This is a tar gzipped file (~23 kb) with 6 files in it.
## Store the file names: .
$ df.import.all=lapply(files, read.delim2, header=FALSE)
## name the data frames
names(df.import.all)=tools::file_path_sans_ext(basename(files))
## check the structure of the newly created list
$ str(df.import.all)
##If user wants to combine all the data frames next to each other
$ do.call(cbind,df.import.all)
## If user wants to combine all the data frames one after another
$ do.call(rbind,df.import.all)
####################################################
Total code for importing files into a single list with multiple data frames:
####################################################
library(plyr)
files=list.files(path=".htresults/",pattern="txt",full.names=TRUE)
df.import.all=ldply(files, read.delim2,header = FALSE)
#####
Storing files one after another i.e one dataframe after another one
#####
$ do.call(rbind,df.import.all)
######
Storing files one besides the other i.e one dataframe next to another one
#######################################################
Another way of doing the same ( for serial concatenation of data into one data.frame. For side-by-side concatenation, please use lapply function):
############################################################
Total code for storing files one besides the other, with the file names in a separate column:
files=list.files(path=".htresults/",pattern="txt",full.names=TRUE)
files
df.import <- function(x) {
df=data.frame(read.delim2(x, header=FALSE))
df$FileName=x
return(df)
}
df.import.all=as.data.frame(lapply(files,df.import))
############################################################
Serial |
Parallel |
In this note, we work with examples that can be downloaded from here. This is a tar gzipped file (~23 kb) with 6 files in it.
Each file contains two columns each. Now we need to get these files into R, store them as a data frame.
Logic is to first store the file names and then execute import function (i.e read.delim2) multiple times to store the information in list. This list will have one data frame for each file. Name for each data frame can be supplied later. Later on, user can recombine the data frames as per requirements.
First let us store the file names :
Example sake, I stored all the files in a folder named "htresults" on desktop and my current working directory in R is Desktop.
## Store the file names: .
$ files=list.files(path="./htresults/",pattern="txt",full.names=TRUE)
## List the imported files:
$ files
This should list the 6 files mentioned above.
## Import all 6 files and store as 6 data frames in a list named df.import.all$ df.import.all=lapply(files, read.delim2, header=FALSE)
## name the data frames
names(df.import.all)=tools::file_path_sans_ext(basename(files))
## check the structure of the newly created list
$ str(df.import.all)
##If user wants to combine all the data frames next to each other
$ do.call(cbind,df.import.all)
## If user wants to combine all the data frames one after another
$ do.call(rbind,df.import.all)
####################################################
Total code for importing files into a single list with multiple data frames:
####################################################
library(plyr)
files=list.files(path=".htresults/",pattern="txt",full.names=TRUE)
df.import.all=ldply(files, read.delim2,header = FALSE)
#####
Storing files one after another i.e one dataframe after another one
#####
$ do.call(rbind,df.import.all)
######
Storing files one besides the other i.e one dataframe next to another one
######
$ do.call(cbind,df.import.all) #######################################################
Another way of doing the same ( for serial concatenation of data into one data.frame. For side-by-side concatenation, please use lapply function):
############################################################
Total code for storing files one besides the other, with the file names in a separate column:
############################################################
library(plyr)files=list.files(path=".htresults/",pattern="txt",full.names=TRUE)
files
df.import <- function(x) {
df=data.frame(read.delim2(x, header=FALSE))
df$FileName=x
return(df)
}
df.import.all=as.data.frame(lapply(files,df.import))
############################################################