TDM 10100: Project 6 — Fall 2023
Dataset(s)
The following questions will use the following dataset(s):
-
/anvil/projects/tdm/data/olympics/athlete_events.csv
-
/anvil/projects/tdm/data/death_records/DeathRecords.csv
Questions
Question 1 (1.5 pts)
(We do not need the tapply function for Question 1)
For this question, please read the dataset
/anvil/projects/tdm/data/olympics/athlete_events.csv
into a data frame called myDF
as follows:
myDF <- read.csv("/anvil/projects/tdm/data/olympics/athlete_events.csv", stringsAsFactors=TRUE)
-
Use the
table
function to list all Games with occurrences in this data frame -
Use the
table
function to list all countries participating in the Olympics during the year 1980. (The output should exclude all countries that did not have any athletes in 1980.) -
Use the
subset
function to create a new data frame containing data related to athletes that attended the Olympics more than one time.
(Use the original data frame myDF
as a starting point for each of these three questions. Problems 1a and 1b and 1c are independent of each other. For instance, when you solve question 1c, do not restrict yourself to the year 1980.)
For question 1c, use
|
Question 2 (1.5 pts)
Use the tapply
command to solve each of these questions:
-
What is the average age of the participants from each country?
-
What is Maximum Height by Sport? For your output on this question, please sort the Maximum Heights in decreasing order, and display the first 5 values.
Question 3 (1 pt)
For this question, save the data from the data set
/anvil/projects/tdm/data/death_records/DeathRecords.csv
into a new data frame called myDF
as follows:
myDF <- read.csv("/anvil/projects/tdm/data/death_records/DeathRecords.csv", stringsAsFactors = TRUE)
It might be helpful to get an overview of the structure of the data frame, by using the str()
function:
str(myDF)
-
How many observations (i.e., rows) are given in this dataframe?
-
Change the column
MonthOfDeath
from numbers to months -
How many people died (altogether) during each month? For instance, group together all of the deaths in January, all of the months in February, etc., so that you can display the total numbers from January to December in a total of 12 output values.
You may factorize the month names with a specified level order:
|
Question 4 (2 pts)
-
For each race, what is the average age at the time of death? Use the
race
column, which has integer values, and sort your outputs into descending order. -
Now considering only data for females: for each race, what is the average age at the time of death? Now considering only data for males, we can ask the same question: for each race, what is the average age at the time of death?
If you want to see the list of race values from the CDC for this data, you can look at page 15 of this pdf file:
If you want to (this is optional!) you can use the method we used in question 3B to convert integer values into the string values that describe each race. This is not required but you are welcome to do this, if you want to.
Question 5 (2 pts)
-
Using the data set about the Olympic athletes, create a graph or plot that you find interesting. Write 1-2 sentences about something you found interesting about the data set; explain what you noticed in the dataset.
-
Using the data set about the death records, create a graph or plot that you find interesting. Write 1-2 sentences about something you found interesting about the data set; explain what you noticed in the dataset.
Project 06 Assignment Checklist
-
Jupyter Lab notebook with your code and comments for the assignment
-
firstname-lastname-project06.ipynb
.
-
-
R code and comments for the assignment
-
firstname-lastname-project06.R
.
-
-
Submit files through Gradescope
Please make sure to double check that your submission is complete, and contains all of your code and output before submitting. If you are on a spotty internet connection, it is recommended to download your submission after submitting it to make sure what you think you submitted, was what you actually submitted. In addition, please review our submission guidelines before submitting your project. |