Parametric Tests — the t-test (2024)

Hypothesis Testing

One-stop shop for t-tests — from theory to python implementation

Published in

Towards Data Science

14 min read

With respect to numerical values and the mean, the sum of the numerical values must equal the sample size times the mean, i.e. sum = n * mean, where n is the sample size. So if you have a sample size of 20 and a mean of 40, the sum of of all the observations in the sample must be 800. The first 19 values can be anything, but the 20th value has to ensure that the total of all the values adds up to 800, therefore it has no freedom to vary. Hence the degrees of freedom are 19.

The formula for degrees of freedom is sample size — number of parameters you’re measuring.

If you want to compare the means of two groups then the right tests to choose between are the z-test and the t-test.

One-sample (one-sample z-test or a one-sample t-test): one group will be a sample and the second group will be the population. So you’re basically comparing a sample with a standard value from the population. We are basically trying to see if the sample comes from the population, i.e. does it behave differently from the population or not.

An example of this is the one we discussed in the previous article — the mean age of patients known to visit a dentist is 18, but we hypothesize it could be greater than this. The sample must be randomly selected from the population and the observations must be independent of one another.

Two-sample (two-sample z-test and a two-sample t-test): both groups will be separate samples. As in the case of one-sample tests, both samples must be randomly selected from the population and the observations must be independent of one another.

Two-sample tests are used when there are two variables involved. For example, comparing the mean money spent on a shopping site between the two sexes. One sample will be female customers and the second sample will be male customers. Since the means are being compared, one of the variables involved in the test has to be numerical (the money spent on a shopping site is the numerical variable).

Important note: don’t confuse one-sample and two-sample with one-tailed and two-tailed! The former is related to the number of samples being compared and the latter with whether your alternate hypothesis is directional. You can have a one-sample two-tailed test.

How do we choose between a z-test and a t-test though? By looking at the sample size and population variance.

If the population variance is known and the sample size is large (greater than or equal to 30) — we choose a z-test
If the population variance is known and the sample size is small (less than 30) — we can perform either a z-test or a t-test
If the population variance is not known and the sample size is small — we choose a t-test
If the population variance is not known and the sample size is large — we choose a t-test

As mentioned above, the t-test is very similar to the z-test, barring the fact that it works well with smaller samples and the population variance doesn’t need to be known.

The t-test is based on the t-distribution, which is a bell-shaped curve like the normal distribution, but has heavier tails.

As the sample size increases, the degrees of freedom also increase, and the t-distribution becomes similar to the normal distribution. It becomes less skewed and tighter around the mean (lighter tails). Why? We’ll find out in a bit.

There are three types of t-tests. Introductions for two have already been given above — one-sample and two-sample. Both of these come under the ‘unpaired t-test’ umbrella, and so the third type of t-test is the ‘paired t-test’.

The concept of paired and unpaired is to do with the samples. Is the sample the same or are they two different samples? Are we monitoring a variable in two different groups or the same group? If the sample is the same, then the t-test should be paired, else unpaired.

For example, let’s say you want to test whether a certain medication increases the level of progesterone in women.

If the data you have is the progesterone levels of a group of women before the medication was consumed and the progesterone levels of the same group of women after the medication was consumed, then you would conduct a paired t-test since the sample is the same.

If the data you have is the progesterone level of two groups of women of different age groups after the medication was consumed, then you would conduct a two-sample unpaired t-test since there are two different samples.

Every statistical test has a test statistic which helps us calculate the p-value which then determines whether to reject or not reject the null hypothesis. In the case of the t-test, the test statistic is known as the t-statistic. The formula to calculate the t-statistic differs depending on which t-test you’re performing, so let’s take a closer look at them all.

The code and data used in all the below examples can be found here.

One-sample t-test.

The average height of women in India was recorded to be 158.5cm. Is the average height of women in India today greater than 158.5cm?

To test this hypothesis I asked 25 women their height.

My hypotheses are —

The significance level is 0.05.
The sample mean is 162cm and sample standard deviation is 2.4cm.
Since the sample size is 25, the degrees of freedom will be 24 (25–1).
Since I’m comparing a sample mean with a population mean (standard value), this will be a one-sample test.
Since my hypothesis has a direction — the average sample height is greater than the average population height — this will be a one-tailed test.

The formula to calculate the t-statistic is:

So the t-statistic in our case will be

Next we need to look up the critical value of the t-distribution where alpha is 0.05 and the degrees of freedom are 24 in the table for t-statistic values. The critical value for our scenario is 1.711. Our t-statistic is greater than the critical value, so we can reject the null hypothesis and conclude that the mean height of women in India is greater than 158.5 cm!

While it is better to calculate the p-value in hypothesis testing to reject or not reject the null hypothesis, the formula to calculate the p-value for a t-statistic is a bit tricky. You can either work with the t-distribution table values or simply use the critical value to reject or not reject the null hypothesis when performing hypothesis testing manually. Otherwise using a calculator or python functions will help you get the p-value. Let’s see how!

We’ll start off by reading our csv into a dataframe:

import pandas as pd
data = pd.read_csv("one-sample-t.csv")
data.head()

We have two columns — age and height. For this one-sample t-test, we only need height since we are comparing the mean height of this sample with the population mean — 158.5cm.

Let’s check the mean and standard deviation of the height column:

data.height.mean()
>> 162.053526834564data.height.std()
>> 2.4128517286126048

The assumptions of a t-test state that the sample data must come from a normal distribution. We can check if the height column is normally distributed or not by using a Probability Plot(also known as a QQ plot — Quantile-Quantile plot). In brief, a probability plot is a graphical method to check if a data set follows a particular distribution. It is essentially a plot of two data sets — one is the data whose distribution you want to check, and the other is data from that distribution itself. In our case, one set of data will be the height column and the distribution will be the normal distribution.

import pylab
stats.probplot(data.height, dist="norm", plot=pylab)
pylab.show()

The red line represents the normal distribution and the blue dots represent our data of the height column. The graph above confirms that the height column comes from / follows a normal distribution since the height data points follow the path of the normal distribution line.

Now, we will perform the one-sample t-test using scipy’s stats method. We need to pass it our data and the population mean:

stats.ttest_1samp(data.height,popmean=158.5)
>> Ttest_1sampResult(statistic=7.363748862859639, pvalue=1.32483697812078e-07)

The p-value is ridiculously small! So we can reject the null hypothesis.

Two-sample t-test.

Is there a relationship between age and height of women in India?

To test this hypothesis I asked 50 women their age — 25 women are between 27 and 30 years of age (group A), 25 women are between 37 and 40 years of age (group B).

My hypotheses are —

The significance level is 0.05.
The sample mean and standard deviation for group A are 162cm and 2.4cm respectively.
The sample mean and standard deviation for group B are 158.6cm and 3.4cm respectively.
Since I’m comparing the means of two samples, this will be a two-sample test.
Since my hypothesis is nondirectional, this will be a two-tailed test.

It was mentioned earlier that parametric tests assume hom*ogeneity of variance, i.e. the variance of both the samples should be the same. In the example mentioned here, the variance is definitely not the same — the standard deviation of group A is 2.4cm whereas it’s 3.4cm for group B. Does this mean we can’t perform a two-sample t-test? No, it doesn’t! Thankfully, there’s a variation of the t-test that allows for different variances and it’s called Welch’s t-test.

When the variance of both samples is equal, the denominator used in calculating the t-statistic is known as the pooled variance. If the sample sizes of both groups is different then the formula is:

If the sample sizes of both groups is equal then the formula is simply:

It finds the common variance of the two groups to be used in the t-statistic formula. The formula for the t-statistic is:

However, when the variance of both samples is not equal, the denominator compares both variances and the formula to calculate the t-statistic is:

Furthermore, the calculation of the degrees of freedom also differs between the two tests. If the variance of both groups in the current example were equal, the degrees of freedom would be 48 (25+25–2; we subtract 2 because we are measuring two parameters — the means of each sample).

In the case of Welch’s t-test, the degrees of freedom are fractional, always smaller than the degrees of freedom of Student’s t-test, and frankly a bit complicated to calculate.

Since our variances are not equal, we will be performing Welch’s t-test.

So the t-statistic in our case will be:

Let’s do this in python too.

import pandas as pd
df_a = pd.read_csv("one-sample-t.csv") # group A
df_b = pd.read_csv("two-sample-t.csv") # group B

Group A is the same csv we used for the one-sample t-test, so we already know its mean and standard deviation. Let’s check the same for Group B.

df_b.height.mean()
>> 158.60704061997612df_b.height.std()
>> 3.42443022417948

Now we perform the t-test! To perform Welch’s t-test we simply need to pass the equal_var parameter as False. By default it is true so if we were performing Student’s t-test we needn’t pass it at all.

stats.ttest_ind(df_a.height, df_b.height, equal_var=False)
>> Ttest_indResult(statistic=4.113633648976651, pvalue=0.00017195968508873518)

The p-value is much smaller than 0.05 hence we can reject our null hypothesis.

Paired t-test.

Does nutritional drink xyz increase the height of women?

To test this hypothesis I measured the height of 25 women before they began the course of the nutritional drink and then after they completed the course.

My hypotheses are -

The significance level is 0.05.
The sample mean and standard deviation for the women before the drink are 162cm and 2.4cm respectively.
The sample mean and standard deviation for the women after the drink are 167cm and 3.4cm respectively.
Since the sample size is 25, the degrees of freedom will be 24 (25–1).
Since I’m comparing the means of the same sample but with an intervention in between, this will be a paired t-test.
Since my hypothesis is directional, this will be a one-tailed test.

Fun fact — a paired t-test calculates the differences between the paired observations in the two sets of data (same sample, before and after) and then performs a one-sample t-test with the mean difference and mean standard deviation.

Let’s implement this directly in python:

Read the csv into a dataframe.

import pandas as pd
data = pd.read_csv("paired-t.csv")
data.head()

Use the describe() method to check the mean and standard deviations of both the before and after columns.

data.describe()

Perform the paired t-test using scipy stat’s ttest_rel method! We pass ‘greater’ in the alternative parameter since our alternate hypothesis is that the mean height after the nutritional drink will be greater than the height before the drink.

stats.ttest_rel(data.height_before, data.height_after, alternative='greater')
>> Ttest_relResult(statistic=-1.9094338173992416, pvalue=0.9658844528005113)

The p-value is greater than 0.05, hence we can’t reject our null hypothesis.

Now that we’ve seen all the types of t-tests and their formulae for calculating the t-statistic, we can understand why as the sample size increases the t-distribution becomes similar to the normal distribution. All the different t-tests involve the sample’s standard deviation / variance. This is simply an estimation of the population’s variance since that is unknown. Since the assumption in a t-test is that the sample data is from a population which follows the normal distribution, as the sample size increases and hence the degrees of freedom increase, there is a higher chance of this estimation of variance to actually be correct, i.e. to be the population’s variance. Additionally, the larger the sample size, the closer it is to being the population. Since the population is normally distributed, it makes sense that the t-distribution for this larger sample size with a higher number of degrees of freedom also resembles a normal distribution.

FAQs

Is the t-test a parametric test? ›

t-tests are parametric tests, which assume that the underlying distribution of the variable of interest is normally distributed. Consider the two-sample t-test. It is fairly robust to deviations from normality [4], and—by the central limit theorem—increasingly so when the sample size increases.

Know More ›

Are parametric tests more accurate? ›

Parametric methods generally produce more accurate estimates, i.e., they have more statistical power, where the power of a statistical test is the probability that it correctly rejects the null hypothesis when the null hypothesis is false.

Keep Reading ›

When not to use a parametric test? ›

If you don't meet the sample size guidelines for the parametric tests and you are not confident that you have normally distributed data, you should use a non-parametric test or even a permutation-based test (see a statistician!).

Get More Info Here ›

What is nonparametric vs parametric t-test? ›

Parametric tests are those that make assumptions about the parameters of the population distribution from which the sample is drawn. This is often the assumption that the population data are normally distributed. Non-parametric tests are “distribution-free” and, as such, can be used for non-Normal variables.

Show Me More ›

Are the t-test and z-test parametric? ›

The t and z tests are known as parametric because the assumption is made that the samples are normally distributed.

Find Out More ›

What are the disadvantages of parametric tests? ›

However, the main disadvantage of parametric tests is that they are sensitive to violations of the assumptions, such as normality, hom*oscedasticity, or independence. If your data do not meet these assumptions, your results may be inaccurate or misleading.

Discover More Details ›

What are three reasons to use parametric tests? ›

A parametric test makes assumptions about a population's parameters:

Normality : Data in each group should be normally distributed.
Independence : Data in each group should be sampled randomly and independently.
No outliers : No extreme outliers in the data.

More items...

Mar 2, 2023

Get More Info ›

How do you know when to use a parametric test? ›

If the mean more accurately represents the center of the distribution of your data, and your sample size is large enough, use a parametric test. If the median more accurately represents the center of the distribution of your data, use a nonparametric test even if you have a large sample size.

Know More ›

How do I know if my data is parametric or nonparametric? ›

Parametric statistics are based on assumptions about the distribution of population from which the sample was taken. Nonparametric statistics are not based on assumptions, that is, the data can be collected from a sample that does not follow a specific distribution.

Learn More Now ›

What sample size is needed for a parametric test? ›

Reasons to Use Parametric Tests

Parametric analyses	Sample size guidelines for nonnormal data
1-sample t test	Greater than 20
2-sample t test	Each group should be greater than 15
One-Way ANOVA	If you have 2-9 groups, each group should be greater than 15. If you have 10-12 groups, each group should be greater than 20.

Feb 19, 2015

Know More ›

What are the four basic assumptions of most parametric tests? ›

There are four basic assumptions of most parametric tests:

Interval or ratio (i.e., continuous) dependent variable.
Independent scores on the dependent variable.
Normal distribution.
hom*ogeneity of variances.

Keep Reading ›

Is the t-test parametric? ›

As the t test is a parametric test, samples should meet certain preconditions, such as normality, equal variances and independence.

Why are non-parametric tests better? ›

Nonparametric statistical techniques have the following advantages: - There is less of a possibility to reach incorrect conclusions because assumptions about the population are unnecessary. In other words, this is a conservative method. - It is more intuitive and does not require much statistical knowledge.

Get More Info ›

Can I use both parametric and nonparametric tests? ›

If it is normally distributed, then use a stringent approach, by using parametric tests. However, it is better to also do non-parametric tests, because they do not make modelling assumptions, and if you get similar results, good enough, report your approach in the methods section.

Show Me More ›

What are the types of parametric tests? ›

Parametric tests are those tests for which we have prior knowledge of the population distribution (i.e, normal), or if not then we can easily approximate it to a normal distribution which is possible with the help of the Central Limit Theorem. Parameters for using the normal distribution is: Mean. Standard Deviation.

Show Me More ›

Which version of t-test is non-parametric? ›

The nonparametric version of the independent-samples t-test is known as the Mann-Whitney U-Test. The nonparametric version of the paired-samples t-test is known as the Wilcoxon Signed-Rank Test.

Discover More ›

Is the f test parametric or nonparametric? ›

The F-test is a parametric test that helps the researcher draw out an inference about the data that is drawn from a particular population. The F-test is called a parametric test because of the presence of parameters in the F- test.

Learn More ›

How to determine if data is parametric or nonparametric? ›

Parametric tests are suitable for normally distributed data. Nonparametric tests are suitable for any continuous data, based on ranks of the data values. Because of this, nonparametric tests are independent of the scale and the distribution of the data.

Get More Info Here ›

Parametric Tests — the t-test (2024)

Hypothesis Testing

One-stop shop for t-tests — from theory to python implementation

One-sample t-test.

Two-sample t-test.

Paired t-test.

FAQs

Is the t-test a parametric test? ›

What are the four basic assumptions of most parametric tests? ›

References