JOUR 307 Homework

Spring 2016 — Matt Waite, instructor

This repository holds different assignments for JOUR 307: Data Journalism.

From the syllabus:

Every day, more of our lives is being stored in a database somewhere. With that explosion of data, journalists now more than ever need the skills to analyze and understand data to then produce the stories hidden in the information. In this class, we’ll use brainpower and software to look at raw data -- not summarized and already reported information -- to do investigative reporting. We’re going to get our hands dirty with spreadsheets, databases, maps and some basic stats. And we're going to do journalism. So buckle up and hold on.

Below are some assignments including the instructions and my work.

Percent change
Grouping data
First chart
Data joining
Data normalization
Mapping assignment
Second mapping assignment
Data request
Health inspections investigation
Downtown apartments investigation
Data cleaning

Percent change

The assignment:

Download this dataset of population estimates from the US Census Bureau.

Calculate the percent change in population for every county in the US from 2010 to 2014.
Round that change off to a single decimal point.
Sort it fastest growing to fastest shrinking. Print it to the screen but limit it to 50.

Jupyter Notebook

Grouping data

The assignment:

We've calculated the median and mean salary for all UNL employees, but that doesn't tell the whole story. The mean includes football and basketball coaches. The medians don't show the differences between jobs at the university. So using what you've learned in this walkthrough, group the salaries by job title.

Aggregate a count, a median and an average for each job title. Hint: You can do this all in one aggregate table. One gotcha on that multiple aggregates in a single table thing: Watch out for commas. You need one at the end of every line EXCEPT the last aggregate.

Sort the table by the count, putting the most common job title at the top.

Print the table out. Limit it to the 50 most common jobs.

Jupyter Notebook

First chart

The assignment:

Using the Mountain Lion data from earlier this semester, make a bar chart of the top 10 counties for sightings. Create three different versions, with three different set_styles and three different colors. In your Jupyter Notebook, tell me which one you like the most and why. And what is the main weakness of your chart?

Jupyter Notebook

Data joining

The assignment:

Often, in data, we have one set of information stored in a table over here, and another set of information stored in a table over here. At the university, your student records are scattered in tables all over. Somewhere, there is a master student record, that has your name, birthdate, ID number, home address and other basic info. Then, over in the registrars office, we have the classes you took and the grades you received. Over here, we have the bursars office, which shows how much you owe in tuition and how much you've paid. If we wanted to get a single table together that showed how much you paid for each grade you got, we'd have to JOIN them together somehow.

There are three data files in the Data folder in the class repository: frl13, frl14, and frl15. They are the Free and Reduced Lunch participation totals for every school in Nebraska. I want you to join them together into a single table and calculate the percent change from 2013 to 2015 and sort them by the largest. Which school in Nebraska saw the largest increase in participation in free and reduced school lunches, which is a proxy for poverty.

Jupyter Notebook

Data normalization

The assignment:

In this assignment, you must take a file from the Nebraska Department of Environmental Quality and make it useful. I want to know how many leaking underground storage tanks there are in each city in Nebraska.

To do this, you will need to:

Get the file from the DEQ. The file you want is called spillfac.csv, but keep this page handy because it has some filter conditions you're going to need.

The file that comes from the state is not UTF-8. Follow the walkthrough. Use Excel and csvkit to zap the non-UTF-8 characters.
Normalize the data using Open Refine. Specifically, the fields you need to normalize are the owner company -- OWNCO -- and the city the tank is in -- SPCITY.
Export your newly cleaned data into a new csv file.
Import your newly cleaned up data into Agate.
Filter out any leaking underground storage tanks that aren't leaking. (see the documentation from where you downloaded the file)
Group it by the OWNCO and count them.
Sort it.
Print the top 20 to the screen.

Jupyter Notebook

Mapping assignment

The assignment:

Create a map of population change in Nebraska using this data. Thematically shade it in a manner you think is appropriate. Write a paragraph discussing what the map shows, as if this paragraph were going to be published in a story about population change in Nebraska after a Census data release.

Second mapping assignment

The assignment:

An editor has just approached you about a story idea they have. They heard this podcast of this Washington Post reporter who wrote a story about The Worst Place In America To Live, which is in Minnesota. In the podcast, and in the story, the ranking was based on the USDA's Natural Amenities Index. The map that the Washington Post published also made Nebraska out to be a pretty rough place to live.

Your editor wants to know:

What are the 50 worst places in America to live based on natural beauty?

How many of Top 50 Worst places are in Nebraska?

Where are they?

And can we have a map like the Washington Post map?

Data request

The assignment:

During the semester, you will identify a database held by a government agency that you need for a story and go get it. You are negotiating for public data as a journalist, you may not promise to not use the records. Downloading data from the Internet does not fulfill the requirements of this exercise.

Health inspections investigation

The final article

Downtown apartments investigation

As the University of Nebraska-Lincoln continues efforts to grow enrollment to 30,000 students, private housing companies are preparing for the surge by developing housing complexes near campus.

The final article

Data cleaning

The assignment:

Take a PDF from the Office of Institutional Research, Analytics and Decision Support and turn it into a usable dataset.

Github repository of converted documents

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
ChartAssignment		ChartAssignment
DataJoining		DataJoining
DataNormalization		DataNormalization
DataRequest		DataRequest
DowntownAptsArticle		DowntownAptsArticle
GroupByAssignment		GroupByAssignment
HealthInspectionsArticle		HealthInspectionsArticle
MappingAssignment		MappingAssignment
OpenDataUNL		OpenDataUNL
PercentChange		PercentChange
SecondChartAssignment		SecondChartAssignment
SecondMappingAssignment		SecondMappingAssignment
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

JOUR 307 Homework

Percent change

Grouping data

First chart

Data joining

Data normalization

Mapping assignment

Second mapping assignment

Data request

Health inspections investigation

Downtown apartments investigation

Data cleaning

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

JOUR 307 Homework

About

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Uh oh!

Languages