Which state college systems offer the best “bang for the buck” when it comes to a student’s chance of graduating, and the amount of debt they will graduate with? With more and more students taking more than four years to graduate, it’s an important question — a question that a scatterplot can shed light on.
A scatterplot shows you individual entities across two dimensions. Here’s an example showing you the relationship between the life expectancy of its country’s citizens and its Gross Domestic Product per capita (GDP). As you can see, countries with lower GDP tend to have lower life expectancies.
Now let’s try it with a dataset of our own. First, we have to get a dataset. We’ll start at College Insight, which features downloadable data for higher ed across the U.S..
Select “Explore All Data” on the top row of tabs.
Once you are in “Explore All Data,” click Advanced Search. You will see an interface like this one:
We want to see all states and all institutions. Under “Level of Aggregation,” select “State Total.” On States, select “all.” Then click “Show List.” You will see that you will get a list of states. Click “select all” and then click the button that says “add to list.”
We’re not done yet, though. Now that we have our list of states and colleges, we want to see the student debt and graduation rates for them.
Under “select variables,” choose “Student Debt.”
Now, using the pulldown menu again, choose Student Success, and then, Four Year Freshman Retention Rate.
Let’s select a year. 2013-14 is the most recent data available, so let’s choose that.
Select “Download As Table.” You’ll be downloading a comma-separated value (.csv) file. Remember where you put it!
You will need to do some minimal data cleanup here. First, do a find and replace to get rid of the “- Total” after every state (replace it with nothing or a space). This string will be a label on your chart, and space is at a premium — you don’t want to waste it with repetitive text that won’t mean much to the viewers of your chart.
[Need to catch up? No problem. Here’s a ready-to-go file: scatterplot-example.csv]
Now, select the data we will use to make our visualization. Select the header row, with name, year, average debt, and full-time freshman retention rate, and then select all 50 states and the District of Columbia. Don’t select the extra text at the top and bottom of your chart (the part with the citations at the bottom or “Generated on” at the top).
Next, paste the data from your CSV into the large dialog box at the top. If your data is in good shape, you’ll see a green box with a message below, like this one:
If your data doesn’t parse, make sure you aren’t copy-pasting anything but the four columns of data under year, name, average debt, and full-time freshman retention rate.
Scroll down to the different types of visualizations offered and select “scatterplot.”
Scroll down further, and we’ll start matching our data to different features of the visualization. This is a drag-and drop feature, so drag your data elements on the left onto the white boxes on the right until they look like this:
Scroll down and adjust the size of your chart — it’s likely to be skinny and unreadable until you adjust the size. Here are some suggested dimensions:
Now you’ll be able to see your visualization!