In statistics, a sample is a section of the subject chosen for a study. In most cases sample will to be taken because a given subject area to be researched has a large number of participating population. If a decision is made to make some kind of a study on a given subject, a population for example, it is easy to take a sample and work with the sample to arrive at whatever is intended on the outset, which in most cases is to make a statistical calculation that could be about the samples opinion on a given subject matter, such as voting for example. The reason why taking samples is important is because of the large number of the population or groups that will be targeted. Such a process of obtaining information to infer and extrapolate an outcome from a collected data goes by the name known as sampling. There is always effort exerted to avoid bias when taking samples, which means the outcome of the finding will have to be unbiased as much as possible and that is when the accuracy of the data could be relied upon. [1]

If someone wants to make a certain study that pertains to women and if the sample is taken only from women who are working in an office environment, it will miss its target, and will not be representative, because the bigger population of women might not be working in the office. There are stay-at-home moms, there are those who are working in various fields that do not involve working in an office environment, there are those who could be unemployed and cannot be included in the sample that will be taken from offices. The solution for this problem is to make the sample random or a probability sample because it will be a better representative of the diverse women population and those that are not working in the office could also be included.

When dealing with samples the first key aspect to pay attention to is “external validity”, which focuses on the approximate truth of the conclusion or how the conclusion will be applicable to other subjects that are in different places, and how it would hold if used at a different time. There are two known general approaches to arrive at a conclusion through a finding made by conducting a research. The first one is sampling model where like it was mentioned, it is possible to target a subject or a population and take a sample to conduct the required research and most of the time the finding could be relied upon and could be used to arrive at a conclusion. But it is possible label the sample as biased, which means no matter how careful the process had been, there is a possibility of a bias. [2]

The other approach called Proximal Similarity Model that was suggested by Donald Campbell can be used to take varied samples by keeping in mind different components of the general subject, population or locations, or different times in the future. This is if the subject are people who are not in the group when the sample is being taken they will be kept in mind in such a way that the finding will have relevance to them too. The same applies to those who could be in a different location other than where the research is being conducted and they will also get the same consideration, where if the finding is used in the future it will still hold some reliability.

There is an inherent threat with external validity that will make the final conclusion wrong because of the three components involved namely people, place, and time. Because the validity of the finding could always be argued for as simple a reason as not choosing the right kind of people or the right kind of place, or time. It is possible to point out a bias after the conclusion is final, which means although generalizing is possible, still the degree of the acceptance could come down on the scale. However, it is possible to minimize the mistake by being extra careful when taking the sample and by doing it randomly. Another way of defending a validity of a finding is by working on the proximal similarity, by simply covering the extra mile in taking the samples from various groups and various locations. It is also possible to use different times so that the findings will not be influenced by a given incident or event.

Another important aspect of sampling is, knowing exactly who the target is or to whom the finding will be important. Because there are situations where a certain finding could have a universal acceptance, for example. If we are talking about people, it should be applicable to all humans. Or if we are talking about a group living in a given area then the scope will be narrower, but in most cases what is applicable in a finding for one location should be applicable somewhere else with a degree of bias we can live with. Still, no matter how careful we are, there will always be those who are not accessible from the group for various reasons and will force the finding to lean on the theoretical aspect of the finding. In practice the “sampling frame” from where the sampling is derived from will not be complete in most cases, because of inaccessible population, yet all effort should be exerted to minimize the theorizing aspect of the finding. Yet even if things go right for the most part, it might not possible to completely eliminate the bias simply because something could go wrong.

Another interesting aspect of sampling is it is possible to take different but similar samples from the same population, which means, why take only one sample when it is possible to take more samples. The advantage is it would be possible to look at different outcomes and according to many findings, chances are the final outcome will be similar. What this means is if we draw a graph all the graphs will have a bell shape although some extreme outcome cannot be ruled out, but it is rare. The idea is the more sample we are able to take, our chance of arriving at what is known as a the “parameter” will be high. The parameter is what will normally be arrived at if the whole group or population had been surveyed, which means as the number of the sample increases, the possibility of comparing and arriving at a better result is there. Eventually the statistical finding will be more than generalizing, but the problem is if that was always possible, there is no need to be bothered with statistics. The other advantage of taking a bigger number of samples is to bring down the “standard deviation”, which by itself would lead to a lower “standard error”. The reason for that is even if it is avoided because of impracticality, the ideal approach would have been to go for the parameter, but it would be very expensive and time consuming, hence the main reason why taking sample is involved. [3]

In the world of sampling there is a “probability sampling” and it uses a random selection method. There has to be some kind of a mechanism in order to carry out a random selection from a given group and the mechanism used will have to ascertain that the various components or participants will have equal probability of being chosen. A “simple random sampling” has a few methods for it and what the aim is to choose a certain number of samples from a given group. The latest and the fastest way of doing that is using the various computer programs, which include EXCEL spreadsheet and using the RAND () function will do it automatically. However, in spite of the fact that simple random may seem to be fair and correct for the most part, the possibility that not a good representative of the subjects might not be chosen is there affecting its efficiency and reliability.

The other sampling method that acknowledged to be more accurate is “stratified random sampling”. How it works is by dividing the subjects into visible subgroups and then it is possible to perform random sampling on each subgroup. The advantage, of course, is there will not be a subgroup under representation since covering every possibility will be possible. In addition, the statistical precision is much higher and it gets better if the subjects are homogeneous.

There is also “systematic random sampling”, which comes more handier and for it to work there will have to be a given number such as 1 – N, then deciding on the size of its components is required. There also has to be an interval and to make it work if the interval starts at k=N/n= the outcome will be the interval size. Once randomly selected an integer between 1 – k the number to take will the kth unit. In order to accomplish this keeping the subjects random is required and its advantage is it is easy to work on, where all that is required is picking a random number. Furthermore, it is always precise and when things get a bit complicated, it might be the only way out.

The useful random sampling is “cluster random sampling” and it is mostly handy when a vast geographical area is involved, where a lot of distance-covering geographically will be involved. Therefore, it is easy to come up with a clustering system where dividing the involved groups is possible using a geographic boundary, for example, and taking random sampling on each cluster and compare the outcome of the total cluster to arrive at a conclusion is possible. When the need comes to combine the above mentioned cluster random sampling the process involved is “multi-stage-sampling”. Sometimes after introducing cluster sampling there might be a need to apply stratified sampling in order to do an effective job and such mixing of the sampling is what is called multi-stage- sampling, because first comes the cluster sampling and in order to complete that effectively stratified sampling might be applied.

The other sampling method that is believed to be more accurate is “stratified random sampling” and how it works is the subject or the population will be divided into visible subgroups and then it is possible to perform random sampling on each subgroup. The advantage, of course, is there will not be a subgroup misrepresentation because it is possible to cover every possibility. In addition, the statistical precision is much higher and it gets better if the subjects are homogeneous.

The sampling that does not involve random sampling is known as “non-probability sampling” and it tends to be less reliable because of the lack of applying the “probability theory” and the end result could be, it could go either ways, to the point where it could be difficult to tell if it might have been possible to do a good representation or not. Even if that is the case for the most part researchers prefer to rely on probabilistic sampling, but there could be cases where applying it might be unavoidable.

Non-probability sampling has two main types, “accidental” and “purposive”. The accidental or haphazard samplings are incidents where the news media, for example, would go out on the street and approach anyone to get an opinion on a certain matter and it is not possible to say that individual is a fair representative of the population at large. Or in some cases researchers could prefer high school or college students to make researches on certain subjects simply because there is a concentration of individuals that could be well informed on certain matter that might pertain to them, but in no way they can represent the population at large. In clinical research it had been customary to use clients that are available and but might not be represent the of the overall population or at times volunteers could be involved where in some circumstances payment could be awarded. Even if it is possible to get an opinion, it still might fall short of representing the overall population.

On the other hand “purposive sampling” like its name states is done by having a given purpose in mind and it could be quick. A good example is if a big company wants to know what its customers are thinking about the goods and services it offers, it could hire interviewers who will approach the customers while they enter the store or when they leave and will ask them certain questions. Here it is possible to make mistakes as those who are visiting that store might be a representative of a particular subgroup that just happened to be there because of the location or because of the other businesses that are located around the particular business. Alternatively, if it is a regular day the visitors could be of a certain subgroup, and the same is applicable if it is a weekend. The point made here is the subjects might not be the proper targets and the outcome might not be as reliable as a method that uses a proper random sampling method.

Purposive sampling has different catagories and one is for example “Modal Instance Sampling” where the target is the “most frequent case” or the “typical” case. While interviewing individuals there could be certain aspects that will influence them to give certain answers such as age, education, and income level. Without knowing those aspects it will be difficult to rely on the “typical” or “modal” findings to be representative of the public at large. Religion and ethnicity also can play roles when individuals are giving quick answers and the interviewer has so much leeway to ask those questions without stepping over with what is normally unacceptable subject matter such as marital status, the size of the family, occupation could also make such a difference. Therefore, any finding made through modal instance sampling could end up being an informal finding.

There is “expert sampling” that fall in the same category and it involves bringing together experts in a form of a panel to ask about a modal survey that was made so that when the result is released the findings will have the backing of the experts. That gives a boost to the reliability of the findings and the only time there will be a problem is if the experts themselves are wrong, which is a rarity, and it means if a modal instant sampling had been conducted, to be in the safe side it will be advisable to back it up with expert sampling, because nothing goes wrong most of the time.

Another part of “purposive sampling” is “Quota Sampling” where a predetermined number of subjects will have to be approached and for no reason, that quota will be changed. If the idea is to approach 50 female students and 50 male students in a college setup even if there is a shortage from one group and it is possible to make up from the other, it will not be allowed. If there is a shortcoming on this sampling it is that in advance the characteristic has to be determined and if there is no success or if there is a shortage, adjustment could be made by lowering the number and the reliability of the data might suffer. This in fact is proportional quota sampling. The non-proportional quota sampling is not concerned about the exact number per se as long as there are enough of participants and collecting the data could go on and could be inferred to arrive at the desired result. This method gets the credited for making sure even small groups will have a part to play and is similar to stratified sampling.

Another part of purpose sampling is “heterogeneity” sampling where the drive here is to come up with certain working or popular ideas and what is involved in most instances is brainstorming of a group of people to see if they come up with the desired new ideas. Therefore, there is concern about average instances and the number of people does not matter as long as it is possible to have enough number to participate. Because of that the participants should be as heterogeneous as possible in spite of their number.

One other sampling method that fall in the same category is “Snowball Sampling” where the participants will have to meet certain circumstances and once they qualify they will be asked to recommend others, which would work most of the time. This method could be good to get access into some groups that have accessibility problem. [4]

The conclusion is it is difficult to say one method is the best and meets all the requirements, which means each method has its own advantage that has to be evaluated by the researchers on their drive to arrive at a conclusion that is very much closer to the reality. However, in the field of taking surveys in a form of opinions, what is accepted is generalizing so that those who are using the information will know what the commonly held outlook about a certain matter is, hence they could form their own opinion based on that. There is no scientific guarantee that states the findings are this much precise and close to the truth. Such information, however, is much more important to those who will make important decisions based on it, because the finding might be the only available tool on their hand to help them make important decisions and they would have no choice other than taking a chance.

It is proven that survey makes a lot of difference in business decision making because businesses are heavily dependant on feedback and that is why they spend a lot of money on surveys. Other entities like governments can use surveys too since it is the only means that will give them a good glimpse into what kind belief, opinion, or stand the public has when they try to implement certain strategies. Hence, sampling is one of the tools statistics rely upon to gather information about certain matters and the reason why a representative sample is used is, it is cost effective.