R 2.1 – Loading Data and Working With Data Frames
Articles Blog

R 2.1 – Loading Data and Working With Data Frames

October 12, 2019

Welcome to section two. R is a powerful tool
for data analysis. And the first step to working
with data in R is to get the data into R. Suppose I have a CSV file on
my computer and I wanted to load it into R. Since I’m
running R on OS X, I could use the Command-D trick to change
my working directory to the folder with the file. But it’s also helpful to know
how to navigate using the getwd and setwd functions. We’ll also make use of the List
Files function, which lists the files and folders in
the current working directory. I want to get into my Google
Drive folder so I’m going to specify that in the
setwd function. Next, I’m going to print out
the folders in this Google Drive folder and navigate
to Projects. Next, I’m going to navigate
to the Top_Secret folder. And since I know there’s a data
folder there, I’m just going to add this folder
into my setwd command. All right. I’ve arrived, and I can now see
the data set of interests in my current working
directory, the state.csv data set. Since I’ll be saving this code,
I might as well save my current working directory
in a setwd command at the top of my script. This way, I won’t need to set
the directory in the future unless I change the working
directory of my project files. Now that I’m ready to go, I’ll
load in the state.csv file using the read.csv function,
specifying the name of the file in quotation marks. When you’re reading the CSV file
into R, it’s stored as a data matrix, which is more
formally called a data frame in R. Just like with a regular matrix,
I can use the dim function to see how many
rows and columns are in the data set. There are 51 rows representing
all US states and Washington, DC and 12 different columns
representing the 12 different variables recorded
for each state. I’ll use the head function to
print out the first two rows of this state data set
just like I could do with a regular matrix. However, if I apply the length
function, R will just return the number of variables in
the data set, which are represented by the columns. Data frames are one of the
most common objects for holding data inside of R, and
I can subset on them in ways similar to how I might
subset a matrix. This is fine, but there
are actually better ways with data frames. A new function for data frames
worth remembering is the names function which is used to access
the variable names. Once you know the names of the
variables, it’s easy to extract out the entire
variable using the data set name. Followed by a dollar sign,
then followed by the variable name. Let’s take a look at
the smoke variable. The smoke variable is
a numerical variable representing the percentage
of people who smoke in each state. If I wanted, I could apply some
standard functions like the mean or standard deviation
function to get some summary information about
this variable. In addition to subsetting with
brackets, I can make use of this subset function. Here I’m going to examine only
states with smoking rates higher than 25%. If I wanted, I could also
specify that I only want to select a small number
of columns. Next, I’ll take a look at the
party that won each state in the 2012 presidential
election. This is in the pres12
variable. Note that the pres12 variable
isn’t numerical, yet it was stored for the smoke variable
which was a numerical object in the state data frame. Data frames can hold a different
type of variable in each column while a matrix can
only hold a single data type for the entire matrix. Notice also that the output
doesn’t look like a regular string output, which generally
has quotation marks when it is printed out. Additionally, there’s a listing
at the bottom that indicates there are
two levels. Output like this indicates that
this is a factor variable or a factor object. A factor object is a special
kind of object that’s sort of a blend between character
and numerical variables. If you ever have substantial
trouble working with factors in R, you can just convert the
factor to a string with the as.character function. R will generally convert a
character variable back into a factor variable when it is
appropriate to do so. However, when R does do this
conversion, it may notify you with a warning. Just read your warnings
carefully and make sure that that’s all that’s happening. In the next video, we’ll talk
about ways to take a quick look at a data object inside
of R, and we’ll also take a look at date objects.

Only registered users can comment.

  1. This is how fucked up the auido is on this:

    I'm sitting in a quiet room. Fans/AC off. Windows closed down. TV not on.


    This is so fucking annoying!

  2. First, thank you for these videos. Could you please share with us the doc you are using for this exercise? It would help a lot! Thank you!

  3. Great tutorial. Is there a way I can download the whole collection?
    Will be expecting a prompt reply from you
    Thank you

  4. I'm sorry that people post such rude and abrasive comments. Thank you for posting these videos. They are incredibly helpful and I appreciate the time and energy you spent putting them together.

  5. Excellent and simple video. Volume is not an issue. "Exercise data" does not need to be included. The focus here is on syntax and how R operates. Not providing pointless busy work.

    Keep up the good work google.

  6. Hello, I tried to run the read.csv function and store it in a variable called X but each time I execute it I get  an error saying: more columns than column names? can you help me with that please

  7. Are .csv files usully loaded into R as a dataframe by default, or is it usually a table structure? (Hopefully dataframe!)

  8. Constructive criticism: this would've been sooooo much more helpful and easier to understand if you had shown the actual table you're working with at the beginning. That way I could've understood much quicker what is what when you wrote it into the console.

  9. If you are looking for a .csv file to do this tutorial. There's a sample one here:

Leave a Reply

Your email address will not be published. Required fields are marked *