Essential topic of statistic
the mean, or arithmetic mean, of a data set is the sum of all values divided by the total number of values. It’s the most commonly used measure of central tendency and is often referred to as the “average.”
Measures of central tendency help you find the middle, or the average, of a data set. The 3 most common measures of central tendency are the mode, median, and mean.
- Mode: the most frequent value.
- Median: the middle number in an ordered data set.
- Mean: the sum of all values divided by the total number of values.
- Variance is the measure of how notably a collection of data is spread out.
- Standard Deviation is a measure which shows how much variation (such as spread, dispersion, spread,) from the mean exists. The standard deviation indicates a “typical” deviation from the mean.
- Secondly, how do you find 25th and 75th percentile? For 1, 3, 3, 4, 5, 6, 6, 7, 8, 8:
- The 25th percentile = 3.
- The 50th percentile = 5.5.
- The 75th percentile = 7.
· What does it mean to be in the 25th percentile for weight?
· If your child is in the 25th percentile for weight, this means he's heavier than 25 percent of boys his age, and less heavy than 75 percent of boys his age.
- 95% confidence interval of mean
- A 95% confidence interval (CI) of the mean is a range with an upper and lower number calculated from a sample. Because the true population mean is unknown, this range describes possible values that the mean could be. If multiple samples were drawn from the same population and a 95% CI calculated for
- A Scatter Analysis is used when you need to compare two data sets against each other to see if there is a relationship.
- A null hypothesis is a type of conjecture used in statistics that proposes that there is no difference between certain characteristics of a population or data-generating process. The alternative hypothesis proposes that there is a difference.
· Student's' t Test is one of the most commonly used techniques for testing a hypothesis on the basis of a difference between sample means. Explained in layman's terms, the t test determines a probability that two populations are the same with respect to the variable tested.
· For example, suppose you collected data on the heights of male basketball and football players, and compared the sample means using the t test. A probability of 0.4 would mean that there is a 40% liklihood that you cannot distinguish a group of basketball players from a group of football players by height alone. That's about as far as the t test or any statistical test, for that matter, can take you. If you calculate a probability of 0.05 or less, then you canreject the null hypothesis (that is, you can conclude that the two groups of athletes can be distinguished by height.
· To the extent that there is a small probability that you are wrong, you haven't proven a difference, though. There are differences among popular, mathematical, philosophical, legal, and scientific definitions of proof. I will argue that there is no such thing as scientific proof. Please see my essay on that subject. Don't make the error of reporting your results as proof (or disproof) of a hypothesis. No experiment is perfect, and proof in the strictest sense requires perfection.
· chi-square test for independence when we want to formally test whether or not there is a statistically significant association between two categorical variables.
· The hypotheses of the test are as follows:
· Null hypothesis (H0): There is no significant association between the two variables.
· Alternative hypothesis: (Ha): There is a significant association between the two variables.
· Examples
· Here are some examples of when we might use a chi-square test for independence:
· Example 1: We want to know if there is a statistically significant association between gender (male, female) and political party preference (republican, democrat, independent). To test this, we might survey 100 random people and record their gender and political party preference. Then, we can conduct a chi-square test for independence to determine if there is a statistically significant association between gender and political party preference.
· We use a t-test for a difference in means when we want to formally test whether or not there is a statistically significant difference between two population means.
· The hypotheses of the test are as follows:
· Null hypothesis (H0): The two population means are equal.
· Alternative hypothesis: (Ha): The two population means are not equal.
· Note: It’s possible to test whether one population mean is greater or less than the other, but the most common null hypothesis is that both means are equal.
· Examples
· Here are some examples of when we might use a t-test for a difference in means:
· Example 1: We want to know if diet A or diet B leads to greater weight loss. We randomly assign 100 people to follow diet A for two months and another 100 people to follow diet B for two months. We can conduct a t-test for a difference in means to determine if there is a statistically significant difference in average weight loss between the two groups.
- Difference between Z-test and t-test: Z-test is used when sample size is large (n>50), or the population variance is known. t-test is used when sample size is small (n<50) and population variance is unknown. There is no universal constant at which the sample size is generally considered large enough to justify use of the plug-in test.
- One-Way Analysis of Variance (ANOVA) tells you if there are any statistical differences between the means of three or more independent groups. One-way means the analysis of variance has one independent variable. Two-way means the test has two independent variables. An example of this may be the independent variable being a brand of drink (one-way), or independent variables of brand of drink and how many calories it has or whether it’s original or diet.
· Sampling is a technique of selecting individual members or a subset of the population to make statistical inferences from them and estimate characteristics of the whole population. Different sampling methods are widely used by researchers in market research so that they do not need to research the entire population to collect actionable insights.
- Probability sampling: it is a sampling technique where a researcher sets a selection of a few criteria and chooses members of a population randomly. All the members have an equal opportunity to be a part of the sample with this selection parameter.
- Non-probability sampling: in this sampling, the researcher chooses members for research at random. This sampling method is not a fixed or predefined selection process. This makes it difficult for all elements of a population to have equal opportunities to be included in a sample.
- Simple
random sampling: One of the best probability sampling techniques that
helps in saving time and resources, is the simple random sampling method.
It is a reliable method of obtaining information where every single member
of a population is chosen randomly, merely by chance. Each individual has
the same probability of being chosen to be a part of a sample.
For example, in an organization of 500 employees, if the HR team decides on conducting team building activities, it is highly likely that they would prefer picking chits out of a bowl. In this case, each of the 500 employees has an equal opportunity of being selected. - Cluster
sampling:it I s a method where the researchers divide the entire
population into sections or clusters that represent a population. Clusters
are identified and included in a sample based on demographic parameters
like age, sex, location, etc. This makes it very simple for a survey
creator to derive effective inference from the feedback.
For example, if the United States government wishes to evaluate the number of immigrants living in the Mainland US, they can divide it into clusters based on states such as California, Texas, Florida, Massachusetts, Colorado, Hawaii, etc. This way of conducting a survey will be more effective as the results will be organized into states and provide insightful immigration data. - Systematic
sampling: Researchers use this method to choose the sample members of a population
at regular intervals. It requires the selection of a starting point for
the sample and sample size that can be repeated at regular intervals. This
type of sampling method has a predefined range, and hence this sampling
technique is the least time-consuming.
For example, a researcher intends to collect a systematic sample of 500 people in a population of 5000. He/she numbers each element of the population from 1-5000 and will choose every 10th individual to be a part of the sample (Total population/ Sample Size = 5000/500 = 10). - Stratified
random sampling: Stratified random sampling is a method in which the
researcher divides the population into smaller groups that don’t overlap
but represent the entire population. While sampling, these groups can be
organized and then draw a sample from each group separately.
For example, a researcher looking to analyze the characteristics of people belonging to different annual income divisions will create strata (groups) according to the annual family income. Eg – less than $20,000, $21,000 – $30,000, $31,000 to $40,000, $41,000 to $50,000, etc. By doing this, the researcher concludes the characteristics of people belonging to different income groups. Marketers can analyze which income groups to target and which ones to eliminate to create a roadmap that would bear fruitful results.
Uses of probability sampling
There are multiple uses of probability sampling:
- Reduce Sample Bias: Using the probability sampling method, the bias in the sample derived from a population is negligible to non-existent. The selection of the sample mainly depicts the understanding and the inference of the researcher. Probability sampling leads to higher quality data collection as the sample appropriately represents the population.
- Diverse Population: When the population is vast and diverse, it is essential to have adequate representation so that the data is not skewed towards one demographic. For example, if Square would like to understand the people that could make their point-of-sale devices, a survey conducted from a sample of people across the US from different industries and socio-economic backgrounds helps.
- Create an Accurate Sample: Probability sampling helps the researchers plan and create an accurate sample. This helps to obtain well-defined data.
Types of non-probability sampling with examples
The non-probability method is a sampling method that involves a collection of feedback based on a researcher or statistician’s sample selection capabilities and not on a fixed selection process. In most situations, the output of a survey conducted with a non-probable sample leads to skewed results, which may not represent the desired target population. But, there are situations such as the preliminary stages of research or cost constraints for conducting research, where non-probability sampling will be much more useful than the other type.
Four types of non-probability sampling explain the purpose of this sampling method in a better manner:
- Convenience
sampling: This method is dependent on the ease of access to subjects
such as surveying customers at a mall or passers-by on a busy street. It
is usually termed as convenience sampling, because of the researcher’s
ease of carrying it out and getting in touch with the subjects.
Researchers have nearly no authority to select the sample elements, and
it’s purely done based on proximity and not representativeness. This
non-probability sampling method is used when there are time and cost
limitations in collecting feedback. In situations where there are resource
limitations such as the initial stages of research, convenience sampling
is used.
For example, startups and NGOs usually conduct convenience sampling at a mall to distribute leaflets of upcoming events or promotion of a cause – they do that by standing at the mall entrance and giving out pamphlets randomly. - Judgmental or purposive sampling: Judgemental or purposive samples are formed by the discretion of the researcher. Researchers purely consider the purpose of the study, along with the understanding of the target audience. For instance, when researchers want to understand the thought process of people interested in studying for their master’s degree. The selection criteria will be: “Are you interested in doing your masters in …?” and those who respond with a “No” are excluded from the sample.
- Snowball sampling: Snowball sampling is a sampling method that researchers apply when the subjects are difficult to trace. For example, it will be extremely challenging to survey shelterless people or illegal immigrants. In such cases, using the snowball theory, researchers can track a few categories to interview and derive results. Researchers also implement this sampling method in situations where the topic is highly sensitive and not openly discussed—for example, surveys to gather information about HIV Aids. Not many victims will readily respond to the questions. Still, researchers can contact people they might know or volunteers associated with the cause to get in touch with the victims and collect information.
- Quota sampling: In Quota sampling, the selection of members in this sampling technique happens based on a pre-set standard. In this case, as a sample is formed based on specific attributes, the created sample will have the same qualities found in the total population. It is a rapid method of collecting samples.
Difference between probability sampling and non-probability sampling methods
We have looked at the different types of sampling methods above and their subtypes. To encapsulate the whole discussion, though, the significant differences between probability sampling methods and non-probability sampling methods are as below:
Probability Sampling Methods |
Non-Probability Sampling Methods |
|
Definition |
Probability Sampling is a sampling technique in which samples from a larger population are chosen using a method based on the theory of probability. |
Non-probability sampling is a sampling technique in which the researcher selects samples based on the researcher’s subjective judgment rather than random selection. |
Alternatively Known as |
Random sampling method. |
Non-random sampling method |
Population selection |
The population is selected randomly. |
The population is selected arbitrarily. |
Nature |
The research is conclusive. |
The research is exploratory. |
Sample |
Since there is a method for deciding the sample, the population demographics are conclusively represented. |
Since the sampling method is arbitrary, the population demographics representation is almost always skewed. |
Time Taken |
Takes longer to conduct since the research design defines the selection parameters before the market research study begins. |
This type of sampling method is quick since neither the sample or selection criteria of the sample are undefined. |
Results |
This type of sampling is entirely unbiased and hence the results are unbiased too and conclusive. |
This type of sampling is entirely biased and hence the results are biased too, rendering the research speculative. |
Hypothesis |
In probability sampling, there is an underlying hypothesis before the study begins and the objective of this method is to prove the hypothesis. |
In non-probability sampling, the hypothesis is derived after conducting the research study. |
No comments:
Post a Comment