Let’s say we want to figure out some unknown characteristics of “yellow apple”. We haven’t seen this kind of apple so we are highly interest in how it would taste. The simplest way is just to collect every single “yellow apple” in this world and solve curiosity by tasting all of them by yourself.
But is it possible to do this? — No, since we cannot eat that much apples at once. Then how can we know the taste of this yellow apple? Statistical inference starts with this curiosity: how can we infer unknown characteristics of population scientifically enough to convince others? The answer is — we can “neatly” pick up some sample from the population and use several “scientific” methods to draw a conclusion from that sample. If everyone else admits that there was no mistake during this procedure, then we can infer that this conclusion can be also applied to the population.
Now I want to talk about how we can build up our statistical inference from the simplest one.
How to “neatly” pick up the sample
Before getting into the actual inference, we should start with the ground work. If we cannot take an adequate sample from the population, our whole endeavor to statistically infer the population would be useless.
In statistics, we use “random sampling” to draw unbiased data from the population. If the sample is biased, the following conclusion will also be biased and we cannot make an accurate inference about the population. Only when we pick up the data with same randomly can we make representative conclusion.
So if you want to get meaningful conclusion, you should randomly pick up an enough number of data to satisfy “unbiased” condition.
“Scientific” methods to draw conclusion
Now that we’ve prepared a random sample, we can use those data to infer some parameters of population. When it comes to the methods of statistical inference, we’ll divide them into two branches — estimation and hypothesis testing.
In some cases, we might want to know about population parameters such as population mean, population variance, the difference between two population means, and the ratio between two population variances. From the randomly-picked data, we can calculate sample mean, sample variance, the difference between two sample means, and the ratio between two sample variances by using some math. We call these calculated statistics “estimate values” of the population parameters.
In other cases, we might confront the problems requiring yes-or-no answers. For example, do Korean teenagers watch TV more than 3 hours per day? The population parameter would be the mean hour per day and we can randomly draw 100 Korean teenagers to average the hour of watching TV per day. Here, the process of answering this question using sample parameter is called “hypothesis testing”. Although I took only the population mean as an example, hypothesis testing can also be applied to population variance, or even to compare two samples.
If you want to see more details on these scientific methods to analyze data, visit our publications!😊