Lists and Data Frames

Lists and Data Frames#

Lists#

Vectors and matrices have the limitation that they must contain data all in the same mode, i.e.: all numbers or all characters. Lists circumvent this limitation, acting as containers for absolutely any type of data.

# Declare an empty list
l <- list()

# Declare a list with items
l <- list("a", 1, "b", 2:4)
l

# Declare a list with named items
l <- list(names=c("Anna", "Ben", "Chris"), scores=c(23, 31, 34))
l

Data Frames#

In that last example, it would be ideal if we could link the names with the scores, and maybe further data. We can store tabular data in R in a data frame, which is really a special kind of list.

# Declare a data.frame
df <- data.frame(names=c("Anna", "Ben", "Chris"), scores=c(23, 31, 34))
df

Looking at the df, you can see that the data is neatly arranged in named columns. You can also change the format of a variable between list and data frame quite easily.

# Change between list and data.frame
l <- list(names=c("Anna", "Ben", "Chris"), scores=c(23, 31, 34))
df_from_l <- as.data.frame(l)

df <- data.frame(names=c("Anna", "Ben", "Chris"), scores=c(23, 31, 34))
l_from_df <- as.list(df)

If you then look at l_from_df, the way the list is shown includes the line ‘Levels: Anna Ben Chris’. Levels are the possible choices for a categorical factor, which is a variable mode in R for storing that sort of data. Data frames will almost always convert text into a factor, which will cause that data to behave differently than a character variable. This can be avoided:

# No factors please
df <- data.frame(names=c("Anna", "Ben", "Chris"), scores=c(23, 31, 34), stringsAsFactors=F)
as.list(df)

Exercise 0.3

  • Create a simple list containing some numbers - not vectors of numbers

# A list of only numbers
numbers <- list(1, 3, 6, 10)
  • What happens if you try to do arithmetic with the list?

numbers + 1
# We get an error - lists cannot be used like vectors!
  • Now create a data frame with three columns, a name and two numeric values per name, such as coordinates.

# A data frame of mixed types
coords <- data.frame(Place=c("London", "Paris", "Zurich"), Latitude=c(51.5074, 48.8566, 47.3769), Longitude=c(-0.1278, 2.3522, 8.5417))
  • What happens if you try to do arithmetic with the data frame?

coords + 1
# We get a result, and a warning - the data frame cannot do arithmetic with factors, but can with the numbers.

Importing Data#

R has a host of functions for importing data of different types which I am not going to explain further. But in general, I recommend to save data tables files (for example from Excel) in a tab-delimited text manner. This makes the import into R simpler.

Firstly we need a data table to import: /nfs/teaching/551-1299-00L-2023/R_refresher/ecoli_genes.txt. We can then use the read.table function.

# Import a data table
genes <- read.table("/nfs/teaching/551-1299-00L-2023/R_refresher/ecoli_genes.txt")

We can now see what the table looks like using the Environment tab in the top-right - but something went wrong and the column headings are in the first row. We can fix this pretty easily.

# Import the table again
genes <- read.table("/nfs/teaching/551-1299-00L-2023/R_refresher/ecoli_genes.txt",header=TRUE)

There are a few other useful arguments to help import tables of various formats:

  • sep - determines the field separator (between columns), i.e.: sep=”,”

  • quote - determines the quote mark (items in quote marks are considered to be the same field), i.e.: quote=”"”

  • row.names - determines which column contains the row names, if there are any

  • comment.char - determines which character, if at the start of a line, indicates the line should be ignored, i.e.: comment.char=”#”

  • stringsAsFactors - determines whether the table should turn text into factors, which you may want to turn off, i.e.: stringsAsFactors=F

Note

If you want to upload something from your local machine, you can use the Upload button in the Files section (window at the bottom right).

Exporting Data#

Conversely, R has functions for exporting data into different formats. You will most likely want to create a file to open in R later, or a .csv file to open in Excel.

# Write a data.frame to an R-friendly format
write.table(df,"Rfriendly_df.txt")

# Write a data.frame to a .csv file
write.csv(df,"df.csv")

Many of the arguments for the read functions also apply to the write functions, so you can decide whether you want to see row or column headings, how the text fields are separated, etc.

Note

If you want to download the file to your local machine, right-click on the file you want to download and choose the Save as option. Select a folder on your local machine.