Lecture 1
2020-07-06
Serena DeStefani – Lecture 1 – 7/6/2020
Course Information
Textbook: Stats: Data and Models, 4th Edition
Let’s introduce ourselves.
Let’s look at the Syllabus.
It’s not what you think it is.
Statistics is…
Statisticians…
Fjallsjokull glacier in Iceland
Spiral strand of DNA
Jarvis Hayes, Darius Songalia, Mark Madsen and Rashad McCants. Image source: Keith Allison
President Barack Obama campaign rally in Urbandale, Iowa, 2012. Image source: White House
Yezidi children in an internally displaced person camp in Sharya, Iraq. Image source: Human Rights Data Analysis Group
Data center
Budweiser plant. Image source: Ryan Glenn
The opportunity is endless.
The practice of collecting and collating numerical facts


Inferential statistics:

Statistics is about variation.
Statistics helps us make sense of the data and how the data vary.
Statistics is a collection of conceptual and mathematical tools that allow us to study such variation.
The use of Statistics qualifies Psychology as a science…
Using statistics we can determine whether a psychological hypothesis is true for a wider population, or whether a treatment works or not.
Statistical methods provide a unifying force within Psychology.
Starting from the 17th century, Statistics (the process of reasoning about the data) originated from different fields:
| Field | Contribution |
|---|---|
| Demography | Statistical summaries |
| Astronomy | Theory of errors, normal distribution |
| Gambling | Probability theory |
| Agriculture | Experimental design |
Let’s take a step back…
In 1687 Newton published Principia Mathematica
But what about other complex phenomena?
At some point it was found that many events follow what we call a normal distribution.
How did they find out?
Starting from the 17th century, Statistics (the process of reasoning about the data) originated from different fields:
John Graunt, a London haberdasher, born in 1620, tried to predict and explain social phenomena from tables he compiled from the “Bills of Mortality”.


For science to progress, scientists propose hypotheses to test.
In order to do that, they need to collect data…
In order to collect data, scientists must make experimental measurements.
But how to measure the position of the stars?!
Initially, by the naked eye.
Tycho Brahe, a Dane, worked from ~1570-1601 and built the most accurate naked eye observatory ever.
Scientists used to make one observation only – Problem of errors
1720, Roger Cotes: reporting the arithmetic average of a group of observations decreased the error of the measurement process.
1755 Thomas Simpson proposed that the mean of a series of observations was a better estimate of the true quantity of the object to be measured than any single observation, however meticulously obtained.
1755 Bayes, in a comment on Simpson, noted that the mean only made sense as a superior estimator if the deviations from the mean were symmetric about it.
Simpson took note and revised his recommendation in 1757: report both the mean (as the “best” estimate) and the scatter of the deviations from the mean.
So scientists are now talking about more than one observation and reporting the mean and the deviations from the mean (errors).
But do these errors have a regular distribution???
Antoine Gombauld, the Chevalier de Mere, a writer and gambler, consulted his friend, Blaise Pascal (1623-1662) on how to calculate expected (probable) frequency of gains and losses, and how to divide the stakes fairly if the game was interrupted.



Pascal did not know the solution and wrote to his friend Fermat.
Correspondence between Gombauld, Pascal and Fermat: the birth of probability theory



Pascal connected the study of probability with the arithmetic triangle:
Was already known in India and China


Pascal connected the study of probability with the arithmetic triangle:
This triangle is linked to the binomial expansion
The arithmetic triangle is linked to the binomial expansion
\[\small\begin{align} (a+b)^0 &= 1 \\ (a+b)^1 &= a + b \\ (a+b)^2 &= a^2 + 2ab + b^2 \\ (a+b)^3 &= a^3 + 3a^2b + 3ab^2 + b^3 \\ (a+b)^4 &= a^4 + 4a^3b + 6a^2b^2 + 4ab^3 + b^4 \\ (a+b)^5 &= a^5 + 5a^4b + 10a^3b^2 + 10a^2b^3 + 5ab^4 + b^5 \\ (a+b)^6 &= a^6 + 6a^5b + 15a^4b^2 + 20a^3b^3 + 15a^2b^4 + 6ab^5 + b^6 \end{align}\]
The arithmetic triangle is linked to the binomial expansion
Watch video at: https://bit.ly/3dwO969
\[\small\begin{align} (x+y)^0 &= \mathbf{1} \\ (x+y)^1 &= \mathbf{1}x + \mathbf{1}y \\ (x+y)^2 &= \mathbf{1}x^2 + \mathbf{2}xy + \mathbf{1}y^2 \\ (x+y)^3 &= \mathbf{1}x^3 + \mathbf{3}x^2y + \mathbf{3}xy^2 + \mathbf{1}y^3 \\ (x+y)^4 &= \mathbf{1}x^4 + \mathbf{4}x^3y + \mathbf{6}x^2y^2 + \mathbf{4}xy^3 + \mathbf{1}y^4 \\ (x+y)^5 &= \mathbf{1}x^5 + \mathbf{5}x^4y + \mathbf{10}x^3y^2 + \mathbf{10}x^2y^3 + \mathbf{5}xy^4 + \mathbf{1}y^5 \end{align}\]
We can use the binomial expansion (and the triangle) to find the probability of some simple events, like tossing a coin
| Tosses | Frequency of Heads |
|---|---|
| 1 | 1 1 |
| 2 | 1 2 1 |
| 3 | 1 3 3 1 |
| 4 | 1 4 6 4 1 |
\[\small\left(\frac{1}{2} + \frac{1}{2}\right)^4 = \frac{1}{16} + \frac{4}{16} + \frac{6}{16} + \frac{4}{16} + \frac{1}{16}\]
We can plot the frequency of getting heads on a histogram
\[\left(\frac{1}{2} + \frac{1}{2}\right)^4 = \frac{1}{16} + \frac{4}{16} + \frac{6}{16} + \frac{4}{16} + \frac{1}{16}\]
The more coin tosses I make, the more this histogram will resemble a curve:
See simulation at: https://shiny.rit.albany.edu/stat/binomial/
Abraham De Moivre (1667-1754) published in 1738:
The Doctrine of Chances or a Method of Calculating the Probabilities of Events in Play
In the third edition (1756) he showed a way to approximate the sum of the binomial terms when n is very large.
De Moivre (1756) showed a way to approximate the sum of the binomial terms when n is very large.
Now you can calculate probabilities for an infinite number of tosses! And if you graph them, you get this curve:
all tails
all heads
Carl Friedrich Gauss (1777-1855) was the first one to derive a function for this curve (1809)


So scientists are now talking about more than one observation and reporting the mean and the deviations from the mean (errors).
But do these errors have a regular distribution???
Pierre Laplace (1749-1827) independently derived the formula of the normal distribution in 1812 and understood that this function was the one describing the distribution of errors!
It is called Gauss-Laplace distribution, or Gaussian distribution, or Normal distribution.
One of the first applications of the distribution outside of gaming was in the assessment of errors in astronomy.


Summary:
The ready acceptance of the normal distribution as a law of nature encouraged its wide application and produced consternation when exceptions were observed.
Are there distributions that are asymmetrical?
It is safe to say that no other theoretical mathematical abstraction has had such an important influence on psychology and the social sciences.
Using this distribution we can calculate the probabilities of a wide range of events and conduct hypothesis testing.


Data: Any collection of numbers, characters, images, or other items that provide information about something.
Data vary: Surveys and experiments produce a variety of outcomes.
Statistical inference is making a decision or a conclusion based on the data.
Is texting while driving safe?
Difficult to decipher the data above. Presentation can make all the difference.


Rows correspond to individual cases, that may go by different names:
Amazon knows your age and will use it to present an age-appropriate image customized for you.
Is Age categorical or quantitative?
Identifier Variable: A variable that is used to uniquely identify the individual. It does not describe the individual.
Ordinal Variable: A variable that reports order without natural units
Can be treated as quantitative by using the rank number:
1 = Strongly Disagree, 2 = Disagree, 3 = Agree, 4 = Strongly Agree
A frequency table is a table whose first column displays each distinct outcome and second column displays that outcome’s frequency.
If there are many distinct outcomes, then combining them into a few categories is recommended.

A relative frequency table is a table whose first column displays each distinct outcome and second column displays that outcome’s relative frequency.
The relative frequency table is similar to the frequency table, but it displays relative frequencies rather than frequencies.

A bar chart displays the frequency or relative frequency of each category.


Were most members of the Titanic crew members?
Three times as many crew members as second-class passengers.
The eyes are tricked by the area being nine times as large for the crew.

The Area Principle: The area occupied by a part of the graph should correspond to the magnitude of the value it represents.

A pie chart presents each category as a slice of a circle so that each slice has a size that is proportional to the whole in each category.
Pie charts help to display the fraction of the whole that each category represents.
Better not to use a pie chart in science.

A contingency table is a table that displays two categorical variables and their relationships.

A table of percents can be misleading.
Looking at “Alive”, was it better to have a second- or third-class ticket?
A conditional distribution provides the percent of one variable satisfying the conditions of another.
25.2% of all third-class ticket holders survived.
Was it better to have a second- or third-class ticket?

The “Condition” can either be based on rows or columns.
This table shows that the highest percent of survivors were crew members.
The highest percent of the dead were also crew members.

Pie charts can give a visual representation of the conditional distributions.
Compare how the first-class ticket holders were represented amongst the survivors vs. the dead.

Bar charts can also effectively tell the story for conditional distributions.
Which is best: Table, Pie chart, or Bar Graph?

Independence: The distribution of one variable is the same for all categories of another. There is no association between the two.
For dependent variables, there is an association between the two variables.
Is there an association between gender and interest in Super Bowl TV Coverage?


Definition: An association that holds for all of several groups can reverse direction when the data are combined to form a single group. This reversal is called Simpson’s paradox.
History: Simpson’s paradox is named after Edward Simpson: he described this paradox in the 1951 paper “The Interpretation of Interaction in Contingency Tables.”
Pearson and Yule each observed a similar paradox half a century earlier than Simpson, so Simpson’s paradox is sometimes also referred to as the Simpson-Yule effect.
We want to test two drugs.
We give each drug to a group of people and then count the number of successes (improvements) and failures (no change) for each group.
Let’s look at the same result split by gender.
Male:
Female:
Can data answer every possible question?