Statistical Learning Process for the Reduction of Sample Collection Assuring a Desired Level of Confidence

2020 
In the process of characterizing a given population collecting samples, there are machine learning applications today that provide a wide range of possibilities regarding, e.g., clustering and data mining topics. These possibilities consist of industrial and scientific application techniques that are adapted to each particular field for the successful achievement of results. As a fundamental element in statistical learning, this paper aims to understand in a simple way the use of the t-Student statistical distribution, clarifying the concepts of sampling error and convergence criterion based on an iterative process for the calculation of the optimal number of samples. With this reasoning and inference application of the t-Student distribution, this paper is intended to find the convenience of a procedure that can be used to discard or not sampling protocols, serving as a starting point till more reliable data can be available. In other words, regarding problem-solving and planning issues, and at the beginning from a preliminary situation where simplifications are made, it is intended here to estimate the distortions introduced by the measurements, so that according to different values of sampling error, a reasonable number of samples can be obtained. As a criterion of convergence of the algorithm for calculating the number of samples, the objective here will be to determine a minimum number of characterizations that will reduce costs and efforts, while adjusting to the desired confidence level considering the error of the measurements. With this purpose, this chapter begins introducing concepts from probabilistic models and methods, in order to propose after that a sampling mathematical protocol. Then, the new protocol is validated by simulation in some study cases. Finally, this chapter ends discussing possible future researches on this field and with some conclusions.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    9
    References
    0
    Citations
    NaN
    KQI
    []