)% confidence level. To conduct a runs test on a sample, perform the following steps:
searchtsearchpsearch1searchwsearch. %3E%3Cdiv%3E%3C W 666%20%D1%81%D0%B5%D0%BA%D1%83%D0%BD%D0%B4 oj Best z Best x A%3E%3C .search3search5 Best 1 %20 8searchYsearchC5 P%3E%3C 15
c Div%3E%3C! n compute the mean of the sample.
Step 2: going through the sample sequence, replace any observation with +, or - depending on whether it is above or below the mean. Discard any ties.
Step 3: compute R, n1, and n2.
Step 4: compute the expected mean and variance of R, as follows:
a =1 + 2n1n2/(n 1 + n2).
s2 = 2n1n2(2n 1n2-n1- n2)/[[n1 + n2)2 (n1 + n2 -1)].
Step 5: Compute z = (R-m)/ s.
Step 6: Conclusion:
If z > Za, then there might be cyclic, seasonality behavior (under-mixing).
If z < - Za, then there might be a trend.
If z < - Za/2, or z > Za/2, reject the randomness.
Note: This test is valid for cases for which both n1 and n2 are large, say greater than 10. For small sample sizes, special tables must be used.
For example, suppose for a given sample of size 50, we have R = 24, n1 = 14 and n2 = 36. Test for randomness at a = 0.05.
The Plugging these into the above formulas we have a = 16.95, s = 2.473, and z = -2.0 From Z-table, we have Z = 1.645. Therefore, there might be a trend, which means that the sample is not random.
You may use the following JavaScript to Test for Randomness.
Lilliefors' Test for Normality: This test is a special case of the Kolmogorov-Smirnov goodness-of-fit test, developed for testing the normality of population's distribution. When applying the Lilliefors test, a comparison is made between the standard normal cumulative distribution function, and a sample cumulative distribution function with standardized random variable. If there is a close agreement between the two cumulative distributions, the hypothesis that the sample was drawn from population with a normal distribution function is supported. If, however, there is a discrepancy between the two cumulative distribution functions too great to be attributed to chance alone, then the hypothesis is rejected.
The difference between the two cumulative distribution functions is measured by the statistic D, which is the greatest vertical distance between the two functions.
You might like to use the well-known Lilliefors' Test for Normality to assess the goodness-of-fit.
Further Readings
Thode T., Testing for Normality, Marcel Dekker, Inc., 2001. Contains the major tests for normality.
Results of estimation can be expressed as a single value; known as a point estimate, or a range of values, referred to as a confidence interval. Whenever we use point estimation, we calculate the margin of error associated with that point estimation.
Estimators of population parameters are sometimes distinguished from the true value by using the symbol 'hat'. For example, true population standard deviation s is estimated from a sample population standard deviation.
Again, the usual estimator of the population mean is = Sxi / n, where n is the size of the sample and x1, x2, x3,.......,xn are the values of the sample. If the value of the estimator in a particular sample is found to be 5, then 5 is
the estimate of the population mean µ.
Qualities of a Good Estimator
A"Good" estimator is the one which provides an estimate with the following qualities:
Unbiasedness: An estimate is said to be an unbiased estimate of a given parameter when the expected value of that estimator can be shown to be equal to the parameter being estimated. For example, the mean of a sample is an unbiased estimate of the mean of the population from which the sample was drawn. Unbiasedness is a good quality for an estimate, since, in such a case, using weighted average of several estimates provides a better estimate than each one of those estimates. Therefore, unbiasedness allows us to upgrade our estimates. For example, if your estimates of the population mean µ are say, 10, and 11.2 from two independent samples of sizes 20, and 30 respectively, then a better estimate of the population mean µ based on both samples is [20 (10) + 30 (11.2)] (20 + 30) = 10.75.
Consistency:
The standard deviation of an estimate is called the standard error of that estimate. The larger the standard error the more error in your estimate. The standard deviation of an estimate is a commonly used index of the error entailed in estimating a population parameter based on the information in a random sample of size n from the entire population.
An estimator is said to be"consistent" if increasing the sample size produces an estimate with smaller standard error. Therefore, your estimate is"consistent" with the sample size. That is, spending more money to obtain a larger sample produces a better estimate.
Efficiency: An efficient estimate is one which has the smallest standard error among all unbiased estimators.
The"best" estimator is the one which is the closest to the population parameter being estimated:
The above figure illustrates the concept of closeness by means of aiming at the center for unbiased with minimum variance. Each dart board has several samples:
The first one has all its shots clustered tightly together, but none of them hit the center. The second one has a large spread, but around the center. The third one is worse than the first two. Only the last one has a tight cluster around the center, therefore has good efficiency.
If an estimator is unbiased, then its variability will determine its reliability. If an estimator is extremely variable, then the estimates it produces may not on average be as close to the population parameter as a biased estimator with small variance.
The following chart depicts the quality of a few popular estimators for the population mean µ:
The widely used estimator of the population mean m is = Sxi/n, where n is the size of the sample and x1, x2, x3,......., xn are the values of the sample that have all of the above good properties. Therefore, it is a"good" estimator.
If you want an estimate of central tendency as a parameter for a test or for comparison, then small sample sizes are unlikely to yield any stable estimate. The mean is sensible in a symmetrical distribution as a measure of central tendency; but, e.g., with ten cases, you will not be able to judge whether you have a symmetrical distribution. However, the mean estimate is useful if you are trying to estimate the population sum, or some other function of the expected value of the distribution. Would the median be a better measure? In some distributions (e.g., shirt size) the mode may be better. BoxPlot will indicate outliers in the data set. If there are outliers, the median is better than the mean as a measure of central tendency.
You might like to use Descriptive Statistics JavaScript for obtaining"good" estimates.
Further Readings
Casella G., and R. Berger, Statistical Inference,
Wadsworth Pub. Co., 2001.
Lehmann E., and G. Casella, Theory of Point Estimation, Springer Verlag, New York, 1998.