JavaScript at the design stage of your statistical investigation in decision making with specific subjective requirements.

For u Www tsearche Www searche Searchforsinglesmeetsingles d Szh n For :
searchu Szh p Szh y Searchforsinglesmeetsingles o Www l Searchforsinglesmeetsingles zK Searchforsinglesmeetsingles ssearcha Searchforsinglesmeetsingles c Www a Www Ssearchasearchcsearchf Www r Szh i Searchforsinglesmeetsingles g0e Searchforsinglesmeetsingles m Szh e Searchforsinglesmeetsingles ssearchngl For ssearchdsearchF Www rBsearchW Www wsearch Searchforsinglesmeetsingles w For Szh Searchforsinglesmeetsingles z Szh Searchforsinglesmeetsingles searchzsearch searchs For achStatistical Power Analysis, L. Erlbaum Associates, 1998.


Parametric vs. Non-Parametric vs. Distribution-free Tests

One must use a statistical technique called non-parametric if it satisfies at least one of the following five types of criteria:

  1. The data entering the analysis are enumerative; that is, counted data represent the number of observations in each category or cross-category.

  2. The data are measured and/or analyzed using a nominal scale of measurement.

  3. The data are measured and/or analyzed using an ordinal scale of measurement.

  4. The inference does not concern a parameter in the population distribution; for example, the hypothesis that a time-ordered set of observations exhibits a random pattern.

  5. The probability distribution of the statistic upon which the analysis is based is not dependent upon specific information or conditions (i.e., assumptions) about the population(s) from which the sample(s) are drawn, but only upon general assumptions, such as a continuous and/or symmetric population distribution.

According to these creteria, the distinction of non-parametric is accorded either because of the level of measurement used or required for the analysis, as in types 1 through 3; the type of inference, as in type 4, or the generality of the assumptions made about the population distribution, as in type 5.

For example, one may use the Mann-Whitney Rank Test as a non-parametric alternative to Students T-test when one does not have normally distributed data.

Mann-Whitney: To be used with two independent groups (analogous to the independent groups t-test)
Wilcoxon: To be used with two related (i.e., matched or repeated) groups (analogous to the related samples t-test)
Kruskall-Wallis: To be used with two or more independent groups (analogous to the single-factor between-subjects ANOVA)
Friedman: To be used with two or more related groups (analogous to the single-factor within-subjects ANOVA)

Non-parametric vs. Distribution-free Tests:

Non-parametric tests are those used when some specific conditions for the ordinary tests are violated.

Distribution-free tests are those for which the procedure is valid for all different shape of the population distribution.

For example, the Chi-square test concerning the variance of a given population is parametric since this test requires that the population distribution be normal. The Chi-square test of independence does not assume normality condition, or even that the data are numerical. The Kolmogorov-Smirnov test is a distribution-free test, which is applicable to comparing two populations with any distribution of continuous random variable.

The following section is an interesting non-parametric procedure with various and useful applications.

Comparison of Two Random Variables: Consider two independent observations X = (x1, x2,…, xr) and Y = (y1, y2,…, ys) for two random variables X and Y respectively. To estimate the reliability function:

R = Pr (X > Y)

One may use:

The estimator RS = U/(r ´ s),

where U is the number of pairs (xi, yj) such that xi > yj, for all i = 1, 2, ,r,  and j = 1, 2,..,s.

This estimator is an unbiased one with the minimum variance for R. It is important to know that the estimate has an upper limit, non-negative delta value for its accuracy:

Pr{} ³ max {}.

Application areas include the insurance ruin problem. Let random variable Y denote the claims per unit of time and let random variable X denote the return on investment (ROI) for the Insurance Company. Finally, let z denote the constant premium amount collected; then the probability that the insurance company will survive is:

R = Pr [X + z > Y}.

You might like to use the Kolmogorov-Smirnov Test for Two Populations and Comparing Two Random Variables in checking your computations and performing some numerical experiment for a deeper understanding of these concepts.

Further Readings:
Arsham H., A generalized confidence region for stress-strength reliability, IEEE Transactions on Reliability, 35(4), 586-589, 1986.
Conover W., Practical Nonparametric Statistics, Wiley, 1998.
Hollander M., and D. Wolfe, Nonparametric Statistical Methods, Wiley, 1999.
Kotz S., Y. Lumelskii, and M. Pensky, The Stress-Strength Model and Its Generalizations: Theory and Applications,
Imperial College Press, London, UK, 2003, distributed by World Scientific Publishing.


Hypotheses Testing

Let us consider a simple problem of inference about population mean. We have a large population with known mean. We take a sample and wish to know whether the sample mean is significantly different from the population mean. Our null hypothesis is that it is not.

The theory of probability is only capable of dealing with random variables which generate a frequency distribution "in the long run". We have one fixed population and one fixed sample. There is nothing random about this problem and the experiment is conducted once, so there is no "long run".

We pretend that the experiment was not conducted once, but an infinite number of times, that is, we consider all possible samples of the same size. We assume that each sample mean includes an "error", which is independently and normally distributed about zero. The sample mean now becomes our random variable, which we call our "statistic". We can now apply the t-test or z-test interpretation of probability.

We are now able to determine the probability of a randomly chosen sample mean having a value at least as extreme as our original sample mean. Note that we are implicitly assuming that the null hypothesis is true. This probability is our p-value which we apply to the original problem.

Remember that, in the t-tests for differences in means, there is a condition of equal population variances that must be examined. One way to test for possible differences in variances is to do an F test. However, the F test is very sensitive to violations of the normality condition; i.e., if populations appear not to be normal, then the F test will tend to reject too often the null of no differences in population variances.

You might like to use the following JavaScript to check your computations and to perform some statistical experiments for deeper understanding of these concepts:


Single Population t-Test

The purpose is to compare the sample mean with the given population mean. The aim is to judge the claimed mean value, based on a set of random observations of size n. A necessary condition for validity of the result is that the population distribution is normal, if the sample size n is small (say less than 30).

The task is to decide whether to accept a null hypothesis:

H0 = m = m0

or to reject the null hypothesis in favor of the alternative hypothesis:

Ha: m is significantly different from m0

The testing framework consists of computing a the t-statistics:

T = [( - m0) n1/2] / S

Where is the estimated mean and S2 is the estimated variance based on n random observations.

The above statistic is distributed as a t-distribution with parameter d.f. = n = (n-1). If the absolute value of the computed T-statistic is"too large" compared with the critical value of the t-table, then one rejects the claimed value for the population's mean.

This test could also be used for testing similar claims for other unimodal populations including those with discrete random variables, such as proportion, provided there are sufficient observations (say, over 30).

You might like to use Testing the Mean JavaScript in checking your computations. and Sample Size Determination JavaScript at the design stage of your statistical investigation in decision making with specific subjective requirements.

You might like also to use JavaScript Testing Two Populations.


Two Independent Populations

If an estimate is an unbiased such as sample mean, then it is a good idea to pool the estimates to get a single estimate from several relatively small samples. The pooled estimate is a “good” estimate when compared with each individual estimates.

Pooled Mean: Supposed we have m number of estimates (i), of sample size n(i), for the population expected value m, the pooled estimate is:

[S n(i)(i)] / [Sn(i)], both sums are over all values of i = 1, 2,. . ., m.

Pooled Variance: Since the sample variance is also unbiased estimate of population variance s2, therefore, it is a good idea to pool the estimates to get a single estimate from m number of estimates S(i)2, of sample size n(i), the pooled estimate is:

{} / {}, both sums are over all values of i = 1, 2,…, m.

We pool variance estimates for other good reasons. Depending on a particular reason, then the conclusion might have to be made explicitly conditional on e.g., the validity of the equal-variance model. There are several different good reasons for pooling:

You might like to use JavaScript Pooling the Means, and Variances.

Pooled Standard Deviation: Both the sample mean, and variance are unbiased estimates for the population parameters, m, and s2, respectively, however the sample standard deviation in NOT an unbiased estimate of population standard deviation s. This is so, because of an equality known as the Jensen's inequality when applied to a concave function, i.e., the square root of the unbiased variance estimate. Therefore, pooling standard deviation directly is meaningless; the best one can do to take the square root of the pooled variance

Notice that, when sample sizes are large and nearly equal, so that there is essentially no difference between the pooled and unpooled estimates of standard errors of paired-data samples, and degrees of freedom are nearly asymptotic. This rationale can fall apart for any other cases. One must pool variance rather than merely taking a shortcut in the computation of standard errors.

If you calculate the test without the assumption, you have to determine the degrees of freedom (d.f.). The formula works in such a way that d.f. will be less if the larger sample variance is in the group with the smaller number of observations. This is the case in which the two tests will differ considerably. A study of the formula for the d.f. is most enlightening, and one must understand the correspondence between the unfortunate design, having the most observations in the group with little variance, and the low d.f. and accompanying large t-value.

Applications: When doing t tests for differences in means of populations, for independent samples case:

  1. For differences in means that do not make any assumption about equality of population variances, use the standard error formula:

    [S21/n1 + S22/n2]½,

    with d.f. = n = n1 or n2 whichever is smaller.

  2. With equal variances, use the statistics:

    with parameter d.f. = n = (n1 + n2- 2), for n1, and n2 greater than to 1, where the pooled variance is:

  3. If total N is less than 50 and one sample is 1/2 the size of the other (or less), and if the smaller sample has a standard deviation at least twice as large as the other sample, then apply the procedure given in item no. 1, but adjust d.f. parameter of the t-test to the largest integer less than or equal to:

    d.f. = n = A/(B +C),

    where:

    A = [S21/n1 + S22/n2]2,

    B = [S21/n1]2 / (n1 -1),

    C = [S22/n2]2/ (n