Fundamental Concepts

Introduction

1. Describe the purpose of inferential statistics.

Inferential statistic helps to draw conclusion about the population and make predictions based on the samples.

2. Explain the key difference between descriptive statistics and inferential statistics.

Descriptive stats describe the characteristics of the data, while inferential stats allow you to make reasonable conclusion about the larger population.

3. Explain random sampling.

Random sampling implies that every member and set of members has an equal chance of being included in the sample.

4. Why is it important to apply random sampling when collecting the samples from the population?

It is important to apply random sampling because it helps ensure that the selected sample is representative of the entire population, the sample is less likely to be biased and more likely to mirror the characteristics of the overall population.

5. Explain sampling error in inference statistics

Sampling error is the difference of the true population (parameter) value and the sample values (statistics)

6. Explain the terms: parameter and statistics. Provide an example for each of the terms.

Parameter is a characteristic of the target population(e.g. population mean)
Statistics is a measure that describes the features of a sample (e.g. sample mean) and it is used to estimate the population parameter.

7. Explain the two types of estimates (point and interval estimates) and provide examples.

A point estimate is a single value that is used to estimate an unknown parameter of a population. For example, the sample mean is the point estimate of the population mean.
An interval estimate is a range of values that will most likely have the true value of a population parameter given a certain level of confidence. For example, the confidence interval is an interval estimate.

8. What does it mean to have an unbiased estimator of the population mean?

An unbiased estimator of the population mean is a statistical estimator whose expected value (mean) is equal to the true population mean. This implies that the estimator consistently offers an accurate estimate of the population mean when calculated across various samples.

Sampling Distribution

1. Explain sampling distribution and the use-case.

Definition: A sampling distribution is a probability distribution that describes the likelihood of different values a statistic can take based on different samples drawn from the same population.
Use-case: Provides insights of how a statistic behaves, which helps to make estimates and inferences about a larger population of interest.

2. Explain sample means and provide the formula to calculate it.

Definition: Sample means are the average of the multiple samples drawn from the population.

Formula: $\bar{X} = \frac{\sum_{i=1}^nX_i}{n}$

$\bar{X}$: The sample mean
$X_i$: Individual samples
$n$: The sample size.

3. Can the sample means value vary?

Yes, a sample mean value can differ from other sample means because each sample mean will most likely have different samples drawn from the same population.

4. Explain sampling distribution of the sample means.

Sampling distribution of the sample mean is the probability distribution of the sample means drawn from the population.

5. Is the sample mean an unbiased estimator of the population mean and what does this observation imply?

Yes, the sample mean is an unbiased estimator of the population mean. This means that, on average, the sample mean equals the true population mean when considering all possible samples.

6. Under what circumstances that the sampling distribution resembles the population distribution regardless of the sample size?

When the population distribution is itself approximately normal.

7. Why do we use the sampling distribution instead of the population distribution?

Cost & Time constraints: Collecting data from an entire population can be time-consuming, expensive and extremely difficult. Sampling provides a more efficient and cost-effective way to make inferences about the population.
Non-Normal: Many occasions, the population distribution is not normal, making it difficult to perform inferences.

8. Describe the concept of sample variance $(s^2)$, including the formula.

Definition: A measure of the spread or dispersion of a set of sample data.

Formula: $s^2= \frac{\sum_{i=1}^n(x_i-\bar{x})^2}{n-1}$

$s^2:$The sample variance.
$n$: The sample size.
$\bar{x}$: The sample mean.
$x_i$: Represents each individual value in the sample.
$\sum$ denotes the sum of the squared differences between each value and the mean.

9. Describe the concept of sample standard deviation $(s)$, including the formula.

Definition: A measure of how spread out the values in a dataset are from the mean, expressed in the same units as the original statistic.

Formula: $s = \sqrt{s^2}$

$s$: The sample standard deviation
$s^2$: The sample variance

10. In the formulas for $s^2$ and $s$, the denominator is $n-1$. Explain the intuition for the adjustment from subtracting 1 from the sample size $(n)$.

The purpose is to provide an unbiased estimate of the population variance.
For instance, the sample variance uses the sample mean to calculate the squared deviations. However, the sample mean is one possible point for the true population mean. If the true population mean is further away from the sample observations, then the deviation will be larger than the sample mean. Therefore, the sample mean almost always underestimates the desired deviation from the population mean. Using $n-1$ as the denominator / divisor helps to correct that mistake by making the sample variance bigger.

Standard of Error

1. Describe the concept of standard error $(SE)$, including the formula.

Definition: The sample standard error measures the variability of the sample mean as an estimate of the population mean. It represents the standard deviation of the sampling distribution of the sample mean.

Formula: $SE = \frac{s}{\sqrt{n}}$

$SE$: Standard error.
$s$: The sample standard deviation.
$n$: The sample size.

2. What happens to the standard error when the sample standard deviation increases?

Formula: $SE = \frac{s}{\sqrt{n}}$

The sample standard error increases when the sample standard deviation increases.

3. What happens to the standard error when the sample size increases?

Formula: $SE = \frac{s}{\sqrt{n}}$

The standard error decreases when the sample size increases.

4. How does the law of large number (LLN) applies in terms of reducing the sample standard error?

LLN infers that as the sample size increases the sample mean converges to the population mean, which implies that the standard error will decrease as the sample size increases.

5. Explain the difference between the Standard Error $(SE)$ Vs Standard Deviation $(S)$

The standard deviation describes variability within a single sample.
The standard error estimates the variability across multiple samples of a population.

6. How should we report the standard error?

The standard error should be reported in relation to the mean or in a confidence interval.

Degree of Freedom

1. Explain degrees of freedom in the context population variance and sample variance.

Degrees of freedom refers to the number of values that are free to vary when estimating parameters based on sample data.

2. Explain why the sample variance $(s^2)$ calculation uses $n-1$ degrees of freedom.

Formula: $s^2= \frac{\sum_{i=1}^n(x_i-\bar{x})^2}{n-1}$

The sample variance has one less degree of freedom $(n-1)$ because given the sample mean, one observation must be restricted (pre-determined) to ensure the sample mean retain its value. To further elaborate, one observation must be fixed such that the deviations must sum to 0 in order for the sample mean to be the specified number.

Source: https://www.youtube.com/watch?v=wpY9o_OyxoQ

3. Provide the formula to calculate degrees of freedom.

$DF = n-p$

where:

$DF:$ Degrees of Freedom.
$n:$ sample size $n$.
$p:$ The number of parameters to estimate.

Central Limit Theorem and Law of Large NUmbers

1. Explain Central Limit Theorem (CLT).

The CLT states that, regardless of the shape of the population distribution, the distribution of the sample means will be approximately normal if the sample size is sufficiently large.

2. Why is CLT theorem crucial for statistical inference?

The CLT is useful in statistical inference because it allows us to make assumptions about the normal distribution of sample means, making it possible to use standard statistical techniques (e.g. empirical rule) even for non-normally distributed populations.

3. The commonly used rule is that CLT starts to apply when the sample size is around 30 or greater. Provide the intuition behind the sentence.

A population consist of observations that can take on wide range of values. When the sample size $(n)$ is small, an extreme value will have a significant effect on a particular sample mean value.

When $n$ is very low, the sample mean values can yield significantly different results because the observations can be on a wide range of values. However, as the sample size increases, the effect of a single extreme value becomes smaller because it is averaged with more values and the sample mean will converge to an approximate value. As a result, this reduces standard error and makes the distribution more normal.

4. Explain Law of Large Numbers (LLN).

As sample size grows, the sample mean will converge to the expected value or the true population parameter. Intuitively, if the sample size is similar to the population size, the sample mean will be similar to the population mean.

5. Explain the key difference and similarities between LLN and CLT.

LLN and CLT is similar in the aspect that both approximately tell us the behaviour of the sample mean as the sample size increases.
The key difference is that the CLT gives us the approximate shape of the sampling distribution of the sample means, which is normally distributed. Whereas, the LLN talks about the approximate value of sample mean which becomes closer and closer to the population mean as the sample size becomes large.

Last updated on 23 Aug 2024