Lab 2B - Oh the Summaries ...
Lab 2B - Oh the Summaries...
Directions: Follow along with the slides, completing the questions in blue on your computer, and answering the questions in red in your journal.
Just the beginning
- 
Means, medians, and MAD are just a few examples of numerical summaries.
 - 
In this lab, we will learn how to calculate and interpret additional summaries of distributions such as: minimums, maximums, ranges, quartiles and IQRs.
– We'll also learn how to write our first custom function!
 - 
Start by loading your Personality Color data again and name it
colors. 
Extreme values
- 
Besides looking at typical values, sometimes we want to see extreme values, like the smallest and largest values.
– To find these values, we can use the
min,maxorrangefunctions. These functions use a similar syntax as themeanfunction. - 
(1) Find and write down the
minvalue andmaxvalue for your predominant color. - 
(2) Apply the
rangefunction to your predominant color and describe the output.– The range of a variable is the difference between a variable’s smallest and largest value.
– Notice, however, that our
rangefunction calculates the maximum and minimum values for a variable, but not the difference between them.– Later in this lab you will create a custom
Rangefunction that will calculate the difference. 
Quartiles (Q1 & Q3)
- 
The median of our data is the value that splits our data in half.
– Half of our data is smaller than the median, half is larger.
 - 
Q1 and Q3 are similar.
– 25% of our data are smaller than Q1, 75% are larger.
– 75% of our data are smaller than Q3, 25% are larger.
 - 
(3) Fill in the blanks to compute the value of Q1 for your predominant color.
quantile(~____, data = ____, p = 0.25) - 
(4) Use a similar line of code to calculate and write down Q3, which is the value that's larger than 75% of our data.
 
The Inter-Quartile-Range (IQR)
- 
(5) Write and run code making a
dotPlotof your predominant color's scores. Make sure to include thenintoption. - 
Visually (don't worry about being super-precise):
– Cut the distribution into quarters so the number of data points is equal for each piece. (Each piece should contain 25% of the data.)
- Hint: You might consider using the 
add_line(vline = )to add vertical lines at the quarter marks. 
– (6) Write down the numbers that split the data up into these 4 pieces.
– (7) How long is the interval of the middle two pieces?
– This length is the IQR.
 - Hint: You might consider using the 
 
Calculating the IQR
- 
The
IQRis another way to describe spread.– It describes how wide or narrow the middle 50% of our data are.
 - 
Just like we used the
minandmaxto compute therange, we can also use the 1st and 3rd quartiles to compute the IQR. - 
(8) Use the values of Q1 and Q3 you calculated previously and find the IQR by hand.
– (9) Then write and run code using the
iqr()function to calculate it for you. - 
(10) Which personality color score has the widest spread according to the IQR? Which is narrowest?
 
Boxplots
- 
By using the medians, quartiles, and min/max, we can construct a new single variable plot called the box and whisker plot, often shortened to just boxplot.
 - 
(11) By showing someone a
dotPlot, how would you teach them to make a boxplot? Write out your explanation in a series of steps for the person to use.– (12) Use the steps you write to create a sketch of a boxplot for your predominant color's scores in your journal.
– (13) Then use the
bwplotfunction to create a boxplot usingR. 
Our favorite summaries
- 
In the past two labs, we've learned how to calculate numerous numerical summaries.
– Computing lots of different summaries can be tedious.
 - 
(14) Fill in the blanks below to compute some of our favorite summaries for your predominant color all at once.
favstats(~____, data = colors) 
Calculating a range value
- 
We saw in the previous slide that the
rangefunction calculates the maximum and minimum values for a variable, but not the difference between them. - 
We could calculate this difference in two steps:
– Step 1: Use the
rangefunction toassignthe max and min values of a variable the namevalues. This will store the output from therangefunction in the Environment pane.values <- range(~____, data = colors)– Step 2: Use the
difffunction to calculate the difference ofvalues. The input for thedifffunction needs to be a vector containig two numeric values.diff(values) - 
(15) Use these two steps to calculate the range of your predominant color.
 
Introducing custom functions
- 
Calculating the range of many variables can be tedious if we have to keep performing the same two steps over and over.
– We can combine these two steps into one by writing our own custom
function. - 
Custom functions can be used to combine a task that would normally take many steps to compute and simplify them into one.
 - 
The next slide shows an example of how we can create a custom function called
mm_diffto calculate the absolute difference between themeanandmedianvalue of avariablein ourdata. 
Example function
mm_diff <- function(variable, data) {
  mean_val <- mean(variable, data = data)
  med_val <- median(variable, data = data)
  abs(mean_val - med_val)
}
- 
The function takes two generic arguments:
variableanddata. - 
It then follows the steps between the curly braces
{ }.– Each of the generic arguments is used inside the
meanandmedianfunctions. - 
Copy and paste the code above into an R Script and run it.
 - 
The
mm_difffunction will appear in your Environment pane. 
Using mm_diff()
- 
After running the code used to create the function, we can use it just like we would any other numerical summary.
– (16) In the console, fill in the blanks below to calculate the absolute difference between the
meanandmedianvalues of your predominant color:____(~____, data = ____) - 
(17) Which of the four colors has the largest absolute difference between the
meanandmedianvalues?– (18) By examining a
dotPlotfor this personality color, make an argument why either themeanormedianwould be the better description of the center of the data. 
Our first function
- 
(19) Using the previous example as a guide, create a function called
Range(note the capial 'R') that calculates the range of a variable by filling in the blanks below:____ <- function (____, ____) { values <- range(____, data = ____) diff(___) } - 
(20) Use the
Rangefunction to find the personality color with the largest difference between themaxandminvalues. 
On your own
- (21) Create a function called 
myIQRthat uses thequantilefunction to compute the middle 30% of the data.