Table of Contents

Statistics - Null Hypothesis Significance Testing (NHST)

About

NHST is a procedure for testing the Null Hypothesis.

It's a binary decision:

Before starting any experimentation (ie test), two hypothesis are set up:

The game of NHST is to start with the assumption that the Null hypothesis is true.

During an experimentation, we compare:

The “difference” is defined in terms of test statistic:

The p-value gives the probability that the difference is not only due to the chance.

Examples

Direction

The test is known :

Example: Set up for a Non-directional for regression:

Lifecyle

Steps:

<MATH>p = P(D | H_0)</MATH>

It's sort of odd and backwards. Very rarely assumptions are made that predict no relationship between two variables. It's rare to look for no relationship. It's a little bit weird and backwards.

multiple regression

Four possible outcomes

Four boxes, four outcomes.

Either the Null is true or is false.

Experimenter Decision
Retain H0 Reject H0
H0 true Correct
Decision
Type I error
(False alarm)
H0 false Type II error
(Miss)
Correct
Decision

False Alarm: claiming that it works when it facts it really doesn't. Sometimes that happens. You might get an initial result that looks good and it say it works. But then as you do more research, get more representative samples, better assessments, you may find out that it doesn't work.

Miss: There is really an effect out there and you just missed it. For any reasons: poor assessment, not enough subjects, no random representative sample.

Never made a conclusion on one study because they are prone to this errors.

Do not reject <math>H_0</math> Reject <math>H_0</math>
<math>H_0</math> is true Correct Decision
1 -­ <math>alpha</math>
Type 1 error
<math>alpha</math>
<math>H_0</math> is false Type 2 error
<math>beta</math>
Correct Decision
1 -­ <math>beta</math>
The power of test

Statistics / Equation

Most NHST's statistics are essentially ratios.

They are basically, what did you observe relative to what do you expect just due to chance.

That was standard error. How much sampling error are we going to get, just due to chance.

Problems

Biased by sample size

We'll get a significant result almost all the time if we just obtain a really large sample (Big sample size, Big N). See t-value formula.

If NHST results are reported, it is important to also report effect size

Arbitrary decision rule

Yokel local test

Error prone

error prone

Shady logic

Modus tollens

If p then q
Not q
Therefore, not p
If the null hypothesis is correct, then these data can not occur
The data have occurred
Therefore, the null hypothesis is false

If the null hypothesis is correct, then these data are highly unlikely
These data have occurred
Therefore, the null hypothesis is highly unlikely

If a person plays football, then he or she is probably not a professional player
This person is a professional player
Therefore, he or she probably does not play football

Alternative

Alternative to a Null Hypothesis Significance Test: