Essential topic of statistic
the mean, or arithmetic mean, of a data set is the sum of
all values divided by the total number of values. It’s the most commonly used
measure of central tendency and is often referred to as the “average.”
Measures of central tendency help you find the middle, or the average, of a data set.
The 3 most common measures of central tendency are the mode, median, and mean.
- Mode: the
most frequent value.
- Median:
the middle number in an ordered data set.
- Mean:
the sum of all values divided by the total number of values.
- Variance
is the measure of how notably a collection of data is spread out.
- Standard Deviation is a
measure which shows how much variation (such as spread, dispersion,
spread,) from the mean exists. The standard deviation indicates a
“typical” deviation from the mean.
- Secondly, how do you find 25th and 75th percentile? For
1, 3, 3, 4, 5, 6, 6, 7, 8, 8:
- The 25th percentile = 3.
- The 50th percentile = 5.5.
- The 75th percentile = 7.
·
What does it mean to be in the 25th percentile
for weight?
·
If your child is in the 25th
percentile for weight, this means he's heavier than 25
percent of boys his age, and less heavy than 75 percent of boys his age.
- 95% confidence interval of mean
- A 95%
confidence interval (CI) of the mean is a range with an upper and lower
number calculated from a sample. Because the true population mean is
unknown, this range describes possible values that the mean could be. If
multiple samples were drawn from the same population and a 95% CI
calculated for
- A Scatter
Analysis is used when you need to compare two data sets against each other
to see if there is a relationship.
- A null
hypothesis is a type of conjecture used in statistics that proposes
that there is no difference between certain characteristics of a
population or data-generating process. The alternative hypothesis proposes
that there is a difference.
·
Student's' t Test is one of the most commonly
used techniques for testing a hypothesis on the basis of a difference between
sample means. Explained in layman's terms, the t test determines a probability
that two populations are the same with respect to the variable tested.
·
For example, suppose you collected data on the
heights of male basketball and football players, and compared the sample means
using the t test. A probability of 0.4 would mean that there is a 40% liklihood
that you cannot distinguish a group of basketball players from a group of
football players by height alone. That's about as far as the t test or any
statistical test, for that matter, can take you. If you calculate a probability
of 0.05 or less, then you canreject the null hypothesis (that is, you can
conclude that the two groups of athletes can be distinguished by height.
·
To the extent that there is a small probability
that you are wrong, you haven't proven a difference, though. There are
differences among popular, mathematical, philosophical, legal, and scientific
definitions of proof. I will argue that there is no such thing as scientific
proof. Please see my essay on that subject. Don't make
the error of reporting your results as proof (or disproof) of a hypothesis. No
experiment is perfect, and proof in the strictest sense requires perfection.
·
chi-square
test for independence when we want to
formally test whether or not there is a statistically significant
association between two categorical variables.
·
The hypotheses of the
test are as follows:
·
Null
hypothesis (H0): There
is no significant association between the two variables.
·
Alternative
hypothesis: (Ha): There is a
significant association between the two variables.
·
Examples
·
Here are some examples
of when we might use a chi-square test for independence:
·
Example
1: We want to know if there
is a statistically significant association between gender (male, female) and
political party preference (republican, democrat, independent). To test this,
we might survey 100 random people and record their gender and political party
preference. Then, we can conduct a chi-square test for independence to
determine if there is a statistically significant association between gender
and political party preference.
·
We use a t-test for a difference in means when we want to formally test whether or not there is
a statistically significant difference between two population means.
·
The hypotheses of the
test are as follows:
·
Null
hypothesis (H0): The two
population means are equal.
·
Alternative
hypothesis: (Ha): The two population
means are not equal.
·
Note: It’s
possible to test whether one population mean is greater or less than the other,
but the most common null hypothesis is that both means are equal.
·
Examples
·
Here are some examples
of when we might use a t-test for a difference in means:
·
Example
1: We want to know if diet A or
diet B leads to greater weight loss. We randomly assign 100
people to follow diet A for two months and another 100 people to
follow diet B for two months. We can conduct a t-test for a
difference in means to determine if there is a statistically significant
difference in average weight loss between the two groups.
- Difference
between Z-test and t-test: Z-test is
used when sample size is large (n>50), or the population variance is
known. t-test is used when sample size is small (n<50) and
population variance is unknown. There is no universal constant at which the
sample size is generally considered large enough to justify use of the
plug-in test.
- One-Way
Analysis of Variance (ANOVA) tells you if there are any statistical
differences between the means of three or more independent groups. One-way
means the analysis of variance has one independent variable. Two-way means
the test has two independent variables. An example of this may be the
independent variable being a brand of drink (one-way), or independent
variables of brand of drink and how many calories it has or whether it’s
original or diet.
·
Sampling is a technique of selecting individual
members or a subset of the population to make statistical inferences from them
and estimate characteristics of the whole population. Different sampling
methods are widely used by researchers in market research
so that they do not need to research the entire population to collect
actionable insights.
-
- Probability sampling:
it is a sampling technique where a researcher sets a selection of a few
criteria and chooses members of a population randomly. All the members
have an equal opportunity to be a part of the sample with this selection
parameter.
- Non-probability sampling: in this sampling, the
researcher chooses members for research at random. This sampling method is
not a fixed or predefined selection process. This makes it difficult for
all elements of a population to have equal opportunities to be included in
a sample.
- Simple
random sampling: One of the best probability sampling techniques that
helps in saving time and resources, is the simple random sampling method.
It is a reliable method of obtaining information where every single member
of a population is chosen randomly, merely by chance. Each individual has
the same probability of being chosen to be a part of a sample.
For example, in an organization of 500 employees, if the HR team decides
on conducting team building activities, it is highly likely that they would
prefer picking chits out of a bowl. In this case, each of the 500
employees has an equal opportunity of being selected. - Cluster
sampling:it I s a method where the researchers divide the entire
population into sections or clusters that represent a population. Clusters
are identified and included in a sample based on demographic parameters
like age, sex, location, etc. This makes it very simple for a survey
creator to derive effective inference from the feedback.
For example, if the United States government wishes to evaluate the number
of immigrants living in the Mainland US, they can divide it into clusters
based on states such as California, Texas, Florida, Massachusetts,
Colorado, Hawaii, etc. This way of conducting a survey will be more
effective as the results will be organized into states and provide
insightful immigration data. - Systematic
sampling: Researchers use this method to choose the sample members of a population
at regular intervals. It requires the selection of a starting point for
the sample and sample size that can be repeated at regular intervals. This
type of sampling method has a predefined range, and hence this sampling
technique is the least time-consuming.
For example, a researcher intends to collect a systematic sample of 500
people in a population of 5000. He/she numbers each element of the
population from 1-5000 and will choose every 10th individual to be a part
of the sample (Total population/ Sample Size = 5000/500 = 10). - Stratified
random sampling: Stratified random sampling is a method in which the
researcher divides the population into smaller groups that don’t overlap
but represent the entire population. While sampling, these groups can be
organized and then draw a sample from each group separately.
For example, a researcher looking to analyze the characteristics of people
belonging to different annual income divisions will create strata (groups)
according to the annual family income. Eg – less than $20,000, $21,000 –
$30,000, $31,000 to $40,000, $41,000 to $50,000, etc. By doing this, the
researcher concludes the characteristics of people belonging to different
income groups. Marketers can analyze which income groups to target and
which ones to eliminate to create a roadmap that would bear fruitful
results.
Uses of probability sampling
There are multiple uses of probability sampling:
- Reduce
Sample Bias: Using the probability sampling method, the bias in the
sample derived from a population is negligible to non-existent. The
selection of the sample mainly depicts the understanding and the inference
of the researcher. Probability sampling leads to higher quality data
collection as the sample
appropriately represents the population.
- Diverse
Population: When the population is vast and diverse, it is essential
to have adequate representation so that the data is not skewed towards one
demographic. For example, if Square would like to understand the people
that could make their point-of-sale devices, a survey conducted from a
sample of people across the US from different industries and
socio-economic backgrounds helps.
- Create
an Accurate Sample: Probability sampling helps the researchers plan
and create an accurate sample. This helps to obtain well-defined data.
Types of non-probability sampling
with examples
The non-probability method is a sampling method that involves a collection
of feedback based on a researcher or statistician’s sample selection
capabilities and not on a fixed selection process. In most situations, the
output of a survey conducted with a non-probable sample leads to skewed
results, which may not represent the desired target population. But, there are
situations such as the preliminary stages of research or cost constraints for
conducting research, where non-probability sampling will be much more useful
than the other type.
Four types of non-probability sampling explain the purpose of this sampling
method in a better manner:
- Convenience
sampling: This method is dependent on the ease of access to subjects
such as surveying customers at a mall or passers-by on a busy street. It
is usually termed as convenience sampling, because of the researcher’s
ease of carrying it out and getting in touch with the subjects.
Researchers have nearly no authority to select the sample elements, and
it’s purely done based on proximity and not representativeness. This
non-probability sampling method is used when there are time and cost
limitations in collecting feedback. In situations where there are resource
limitations such as the initial stages of research, convenience sampling
is used.
For example, startups and NGOs usually conduct convenience sampling at a
mall to distribute leaflets of upcoming events or promotion of a cause –
they do that by standing at the mall entrance and giving out pamphlets
randomly. - Judgmental
or purposive sampling: Judgemental or purposive samples are formed by
the discretion of the researcher. Researchers purely consider the purpose
of the study, along with the understanding of the target audience. For
instance, when researchers want to understand the thought process of
people interested in studying for their master’s degree. The selection
criteria will be: “Are you interested in doing your masters in …?” and
those who respond with a “No” are excluded from the sample.
- Snowball
sampling: Snowball sampling is a sampling method that researchers
apply when the subjects are difficult to trace. For example, it will be
extremely challenging to survey shelterless people or illegal immigrants.
In such cases, using the snowball theory, researchers can track a few
categories to interview and derive results. Researchers also implement
this sampling method in situations where the topic is highly sensitive and
not openly discussed—for example, surveys to gather information about HIV
Aids. Not many victims will readily respond to the questions. Still,
researchers can contact people they might know or volunteers associated
with the cause to get in touch with the victims and collect information.
- Quota
sampling: In Quota sampling, the selection of members in this
sampling technique happens based on a pre-set standard. In this case, as a
sample is formed based on specific attributes, the created sample will
have the same qualities found in the total population. It is a rapid
method of collecting samples.
Difference
between probability sampling and non-probability sampling methods
We have looked at the different
types of sampling methods above and their subtypes. To encapsulate the whole
discussion, though, the significant differences between probability sampling
methods and non-probability sampling methods are as below:
|
Probability Sampling Methods
|
Non-Probability Sampling Methods
|
Definition
|
Probability Sampling is a sampling
technique in which samples from a larger population are chosen using a method
based on the theory of probability.
|
Non-probability sampling is a
sampling technique in which the researcher selects samples based on the
researcher’s subjective judgment rather than random selection.
|
Alternatively Known as
|
Random sampling method.
|
Non-random sampling method
|
Population selection
|
The population is selected
randomly.
|
The population is selected
arbitrarily.
|
Nature
|
The research is conclusive.
|
The research is exploratory.
|
Sample
|
Since there is a method for
deciding the sample, the population demographics are conclusively
represented.
|
Since the sampling method is
arbitrary, the population demographics representation is almost always skewed.
|
Time Taken
|
Takes longer to conduct since the
research design defines the selection parameters before the market research
study begins.
|
This type of sampling method is
quick since neither the sample or selection criteria of the sample are
undefined.
|
Results
|
This type of sampling is entirely
unbiased and hence the results are unbiased too and conclusive.
|
This type of sampling is entirely
biased and hence the results are biased too, rendering the research
speculative.
|
Hypothesis
|
In probability sampling, there is
an underlying hypothesis before the study begins and the objective of this
method is to prove the hypothesis.
|
In non-probability sampling, the
hypothesis is derived after conducting the research study.
|