At some point during your R programming experience, you are going to have to come to grips with the data frame. Of course, people familiar with data frames will tell you it's difficult to imagine ever having to do without them.
When I was first learning R, data frames seemed like such a mysterious concept. However, when I figured them out, I too, joined the ranks of never wanting to be without!
In a nutshell, a data frame is like a database in memory. You can think of it as a type of spreadsheet. It contains rows and columns. The key feature that separates the data frame from most other R constructs is that you can store different data types in the columns.
Another aspect that confuses people is how you construct one from scratch. One way to do this is to create a series of vectors and then put them together into a data frame.
Why this confuses newcomers to R is that vectors are row-like by design. You create a vector and that seems like it would be a row. But, when placing in a data frame it becomes a column. I'll illustrate with an example in a minute.
While I stated that data frames can contain columns of varying types, each data item in a column must be of the same type all other values of that column. This will make more sense when you look at the data frames in action.
Suppose you wanted to store test scores for students inside a data frame. This requires two columns:
student <- c("Joe", "Beth", "Mary", "Mark")
scores <- c(80, 83, 78, 94)
Once the data is ready, you simply create your data frame:
scoresDF <- data.frame(student, scores)
data.frame() will take care of matching up each column value into rows. The row with "Joe" will correspond to 80, and "Beth" will correspond to 83, etc.
As you can see, the vectors student and scores both look like rows but are actually columns. Although not initially intuitive, it will make your life easy when creating data frames. If you had to construct each row manually, you would need to have each column values ahead of time. This would require you looking up each of those values for every row.
In our example, you would need to look up "Joe" then 80. Then, you would look up "Beth" and 83. While this may not seem so bad with two vectors of four values each, imagine doing this exercise with 50 vectors.
There are still better ways to handle larger data sets. You would not need to manually type 50 vectors. Even as columns, this is cumbersome the more data points you have.
Just like anything in life, moderation is called for with data frames. Okay, maybe you will use them more than just moderately, and maybe you should. However, there will be times when it's not appropriate to use them. For instance, if you want to do matrix transformation and manipulation, it's best to use the matrix construct. That's what it's there for.
You will also find plenty of functions that require constructs other than the data frame. In these cases, you will need to convert to the other type of construct or your functions will fail.
Another great aspect of the R programming language is when you read in files. In most cases, R will automatically convert what you read into a data frame. This depends on the libraries you are using, of course. However, the standard libraries read several file types into the data frame. This means you can use it as soon as you read it in.
To determine if the structure you are working with is a data frame, use the following (assume df is your structure):
Alternatively, you could use:
To convert a construct into a data frame use:
The following are some other resources to further your knowledge about data frames.
James is a data science writer who has several years' experience in writing and technology. He helps others who are trying to break into the technology field like data science. If this is something you've been trying to do, you've come to the right place. You'll find resources to help you accomplish this.