Lists and Data Frames#
Lists#
Vectors and matrices have the limitation that they must contain data all in the same mode, i.e.: all numbers or all characters. Lists circumvent this limitation, acting as containers for absolutely any type of data.
# Declare an empty list
l <- list()
# Declare a list with items
l <- list("a", 1, "b", 2:4)
l
# Declare a list with named items
l <- list(names=c("Anna", "Ben", "Chris"), scores=c(23, 31, 34))
l
Data Frames#
In that last example, it would be ideal if we could link the names with the scores, and maybe further data. We can store tabular data in R in a data frame, which is really a special kind of list.
# Declare a data.frame
df <- data.frame(names=c("Anna", "Ben", "Chris"), scores=c(23, 31, 34))
df
Looking at the df, you can see that the data is neatly arranged in named columns. You can also change the format of a variable between list and data frame quite easily.
# Change between list and data.frame
l <- list(names=c("Anna", "Ben", "Chris"), scores=c(23, 31, 34))
df_from_l <- as.data.frame(l)
df <- data.frame(names=c("Anna", "Ben", "Chris"), scores=c(23, 31, 34))
l_from_df <- as.list(df)
If you then look at l_from_df, the way the list is shown includes the line ‘Levels: Anna Ben Chris’. Levels are the possible choices for a categorical factor, which is a variable mode in R for storing that sort of data. Data frames will almost always convert text into a factor, which will cause that data to behave differently than a character variable. This can be avoided:
# No factors please
df <- data.frame(names=c("Anna", "Ben", "Chris"), scores=c(23, 31, 34), stringsAsFactors=F)
as.list(df)
Exercises#
Create a simple list containing some numbers - not vectors of numbers
What happens if you try to do arithmetic with the list?
Now create a data frame with three columns, a name and two numeric values per name, such as coordinates.
What happens if you try to do arithmetic with the data frame?
Importing Data#
R has a host of functions for importing data of different types. I generally recommend that if you have a data table from Excel, for instance, you save the file as tab-delimited text for import into R.
Firstly we need a data table to import: Ecoli Genes
. We can then use the read.table function.
# Import a data table
genes <- read.table("ecoli_genes.txt")
Note I am assuming the file is in your working directory, which you can find with the command getwd() or set in the Session menu. Alternatively you can give a relative or absolute path, just as on the Unix command line.
We can now see what the table looks like using the Environment tab in the top-right - but something went wrong and the column headings are in the first row. We can fix this pretty easily.
# Import the table again
genes <- read.table("ecoli_genes.txt",header=TRUE)
There are a few other useful arguments to help import tables of various formats:
sep - determines the field separator (between columns), i.e.: sep=”,”
quote - determines the quote mark (items in quote marks are considered to be the same field), i.e.: quote=”"”
row.names - determines which column contains the row names, if there are any
comment.char - determines which character, if at the start of a line, indicates the line should be ignored, i.e.: comment.char=”#”
stringsAsFactors - determines whether the table should turn text into factors, which you may want to turn off, i.e.: stringsAsFactors=F
You’ll learn more about functions and arguments next time.
Exporting Data#
Conversely, R has functions for exporting data into different formats. You will most likely want to create a file to open in R later, or a .csv file to open in Excel.
# Write a data.frame to an R-friendly format
write.table(df,"Rfriendly_df.txt")
# Write a data.frame to a .csv file
write.csv(df,"df.csv")
Many of the arguments for the read functions also apply to the write functions, so you can decide whether you want to see row or column headings, how the text fields are separated, etc.
Exercises#
Download and import the ecoli_genes.txt table for yourself, make sure to get the column headings correct
Write the table out to a new file name using write.table
Now import the table again without any additional arguments to read.table - do you still need to correct the column headings?