This link has been bookmarked by 493 people . It was first bookmarked on 02 Mar 2006, by Jinan.
-
01 Aug 16
-
08 Nov 14
-
10 Feb 14
-
27 May 13
-
21 Jul 11
-
03 Jul 11
-
06 May 11
-
29 Apr 11
-
26 Jan 11
-
23 Jan 11
-
08 Jan 11
-
01 Nov 10
Deb Mortonstatistics - can get quite complex but there are some concepts that may be useful for applied maths\n
-
30 Oct 10
Deb Bridgestatistics - can get quite complex but there are some concepts that may be useful for applied maths
-
23 Jun 10
-
14 Jun 10
-
24 May 10
-
15 Feb 10
-
01 Feb 10
-
30 Nov 09
Michael LevinDefinitions and short discussions about statistics technique.
-
29 Nov 09
-
26 Nov 09
-
13 Oct 09
-
28 Sep 09
-
24 Aug 09
-
11 Aug 09
-
29 Jul 09
-
24 Jul 09
-
The mean is a particularly informative measure of the "central tendency" of the variable if it is reported along with its confidence intervals
-
The confidence intervals for the mean give us a range of values around the mean where we expect the "true" (population) mean is located
-
If you set the p-level to a smaller value, then the interval would become wider thereby increasing the "certainty" of the estimate, and vice versa;
-
Unlike the analyses of random samples of observations that are discussed in the context of most other statistics, the analysis of time series is based on the assumption that successive values in the data file represent consecutive measurements taken at equally spaced time intervals
-
. Note that the width of the confidence interval depends on the sample size and on the variation of data values. The larger the sample size, the more reliable its mean. The larger the variation, the less reliable the mean
-
the analysis of time series is based on the assumption that successive values in the data file represent consecutive measurements taken at equally spaced time intervals.
-
The larger the sample size, the more reliable its mean.
-
The calculation of confidence intervals is based on the assumption that the variable is normally distributed in the population. The estimate may not be valid if this assumption is not met, unless the sample size is large, say n=100 or more.
-
There are two main goals of time series analysis: (a) identifying the nature of the phenomenon represented by the sequence of observations, and (b) forecasting (predicting future values of the time series variable).
-
Most time series analysis techniques involve some form of filtering out noise in order to make the pattern more salient.
-
measurement scales used should be at least interval scales,
-
Correlation coefficients can range from -1.00 to +1.00. The value of -1.00 represents a perfect negative correlation while a value of +1.00 represents a perfect positive correlation. A value of 0.00 represents a lack of correlation.
-
Most time series patterns can be described in terms of two basic classes of components: trend and seasonality.
-
The statistical significance of a result is the probability that the observed relationship (e.g., between variables) or a difference (e.g., between means) in a sample occurred by pure chance ("luck of the draw"), and that in the population from which the sample was drawn, no such relationship or differences exist.
-
The statistical significance of a result is the probability that the observed relationship (e.g., between variables) or a difference (e.g., between means) in a sample occurred by pure chance ("luck of the draw"), and that in the population from which the sample was drawn, no such relationship or differences exist.
-
Specifically, the p-value represents the probability of error that is involved in accepting our observed result as valid, that is, as "representative of the population."
-
and it determines the extent to which values of the two variables are "proportional" to each other.
-
it determines the extent to which values of the two variables are "proportional" to each other. The value of correlation (i.e., correlation coefficient) does not depend on the specific measurement units used;
-
The value of correlation (i.e., correlation coefficient) does not depend on the specific measurement units used;
-
Proportional means linearly related; that is, the correlation is high if it can be "summarized" by a straight line (sloped upwards or downwards).
-
This line is called the regression line or least squares line, because it is determined such that the sum of the squared distances of all the data points from the line is the lowest possible.
-
The main advantage of median as compared to moving average smoothing is that its results are less biased by outliers
-
The main disadvantage of median smoothing is that in the absence of clear outliers it may produce more "jagged" curves than moving average and it does not allow for weighting.
-
If the correlation coefficient is squared, then the resulting value (r2, the coefficient of determination) will represent the proportion of common variation in the two variables (i.e., the "strength" or "magnitude" of the relationship)
-
The significance level calculated for each correlation is a primary source of information about the reliability of the correlation.
-
the significance of a correlation coefficient of a particular magnitude will change depending on the size of the sample from which it was computed. The test of significance is based on the assumption that the distribution of the residual values (i.e., the deviations from the regression line) for the dependent variable y follows the normal distribution, and that the variability of the residual values is the same for all values of the independent variable x.
-
he data first need to be transformed to remove the nonlinearity. Usually a logarithmic, exponential, or (less often) polynomial function can be used.
-
It is formally defined as correlational dependency of order k between each i'th element of the series and the (i-k)'th element (Kendall, 1976) and measured by autocorrelation (i.e., a correlation between the two terms); k is usually called the lag. If the measurement error is not too large, seasonality can be visually identified in the series as a pattern that repeats every k elements.
-
outliers have a profound influence on the slope of the regression line and consequently on the value of the correlation coefficient. A single outlier is capable of considerably changing the slope of the regression line and, consequently, the value of the correlation
-
autocorrelations for consecutive lags are formally dependent.
-
we believe that outliers represent a random error that we would like to be able to control. Unfortunately, there is no widely accepted method to remove outliers automatically (however, see the next paragraph), thus what we are left with is to identify any outliers by examining a scatterplot of each important correlation.
-
In a sense, the partial autocorrelation provides a "cleaner" picture of serial dependencies for individual lags (not confounded by other serial dependencies).
-
In some areas of research, such "cleaning" of the data is absolutely necessary. For example, in cognitive psychology research on reaction times, even if almost all scores in an experiment are in the range of 300-700 milliseconds, just a few "distracted reactions" of 10-15 seconds will completely change the overall picture
-
defining an outlier is subjective
-
we still need not only to uncover the hidden patterns in the data but also generate forecasts.
-
results depend on the researcher's level of expertise
-
The exact shape of the normal distribution (the characteristic "bell curve") is defined by a function which has only two parameters: mean and standard deviation.
-
A characteristic property of the Normal distribution is that 68% of all of its observations fall within a range of ±1 standard deviation from the mean, and a range of ±2 standard deviations includes 95% of the scores. In other words, in a Normal distribution, observations that have a standardized value of less than -2 or more than +2 have a relative frequency of 5% or less
-
Standardized value means that a value is expressed in terms of its difference from the mean, divided by the standard deviation
-
If you do not know how to identify the hypothetical subsets, try to examine the data with some exploratory multivariate techniques (e.g., Cluster Analysis).
-
the series would not be stationary. I
-
we can precisely calculate the probability of obtaining "by chance" outcomes representing various levels of deviation from the hypothetical population mean of 0.
-
knowing the shape of the normal curve,
-
If such a calculated probability is so low that it meets the previously accepted criterion of statistical significance, then we have only one choice: conclude that our result gives a better approximation of what is going on in the population than the "null hypothesis" (remember that the null hypothesis was considered only for "technical reasons" as a benchmark against which our empirical result was evaluated). Note that this entire reasoning is based on the assumption that the shape of the distribution of those "replications" (technically, the "sampling distribution") is normal. This assumption is discussed in the next paragraph.
To index -
Specifically, the three types of parameters in the model are: the autoregressive parameters (p), the number of differencing passes (d), and moving average parameters (q). In the notation introduced by Box and Jenkins, models are summarized as ARIMA (p, d, q)
-
a typical transformation used in such cases is the logarithmic function which will "squeeze" together the values at one end of the range.
-
-
21 Jul 09
-
18 Jul 09
-
10 Jul 09
-
04 Jul 09
-
28 Jun 09
-
27 Jun 09
-
19 Jun 09
-
18 Jun 09
monniewolfGreat online statistics textbook presented by StatSoft.
-
17 Jun 09
-
13 Jun 09
-
09 Jun 09
-
03 Jun 09
-
02 Jun 09
-
01 Jun 09
-
29 May 09
-
25 May 09
-
23 May 09
-
18 May 09
-
15 May 09
-
30 Apr 09
-
Overview of Elementary Concepts in Statistics. In this introduction, we will briefly discuss those elementary statistical concepts that provide the necessary foundations for more specialized expertise in any area of statistical data analysis. The selected topics illustrate the basic assumptions of most statistical methods and/or have been demonstrated in research to be necessary components of one's general understanding of the "quantitative nature" of reality (Nisbett, et al., 1987). Because of space limitations, we will focus mostly on the functional aspects of the concepts discussed and the presentation will be very short. Further information on each of those concepts can be found in statistical textbooks. Recommended introductory textbooks are: Kachigan (1986), and Runyon and Haber (1976); for a more advanced discussion of elementary theory and assumptions of statistics, see the classic books by Hays (1988), and Kendall and Stuart (1979).
-
The means for the two groups are quite different (2 and 6, respectively).
-
-
15 Apr 09
-
14 Apr 09
-
21 Mar 09
-
12 Mar 09
-
09 Mar 09
-
27 Feb 09
Darren DraperThis Electronic Statistics Textbook offers training in the understanding and application of statistics. The material was developed at the StatSoft R&D department based on many years of teaching undergraduate and graduate statistics courses and covers a wide variety of applications, including laboratory research (biomedical, agricultural, etc.), business statistics and forecasting, social science statistics and survey research, data mining, engineering and quality control applications, and many others.
The Electronic Textbook begins with an overview of the relevant elementary (pivotal) concepts and continues with a more in depth exploration of specific areas of statistics, organized by "modules," accessible by buttons, representing classes of analytic techniques. A glossary of statistical terms and a list of references for further study are included.
Proper citation: (Electronic Version): StatSoft, Inc. (2007). Electronic Statistics Textbook. Tulsa, OK: StatSoft. WEB: http://www.statsoft.com/textbook/stathome.html.
Proper citation: (Printed Version): Hill, T. & Lewicki, P. (2007). STATISTICS Methods and Applications. StatSoft, Tulsa, OK. -
20 Feb 09
-
13 Feb 09
-
11 Feb 09
wind chThis Electronic Statistics Textbook offers training in the understanding and application of statistics. The material was developed at the StatSoft R&D department based on many years of teaching undergraduate and graduate statistics courses and covers a wi
-
26 Jan 09
-
15 Jan 09
-
12 Jan 09
Troy JohsnonEloctronic Textbook for Statistica
-
29 Dec 08
-
09 Dec 08
S. DalbElectronic Statistics Textbook offers training in the understanding and application of statistics.
statistiques(sujet) info-statistique(nature_info) portail(type_of_object) delicious
-
07 Dec 08
-
24 Nov 08
-
20 Nov 08
-
18 Nov 08
-
11 Nov 08
-
26 Oct 08
-
19 Oct 08
-
18 Oct 08
-
24 Sep 08
-
17 Sep 08
-
09 Sep 08
-
18 Aug 08
-
04 Aug 08
-
02 Aug 08
-
21 Jul 08
-
03 Jul 08
-
01 Jul 08
-
30 Jun 08
murraywThis Electronic Statistics Textbook offers training in the understanding and application of statistics. The material was developed at the StatSoft R&D department.
-
29 Jun 08
-
19 Jun 08
-
13 Jun 08
-
10 Jun 08
-
06 Jun 08
-
28 May 08
-
17 May 08
-
02 May 08
-
01 May 08
-
15 Apr 08
-
10 Apr 08
-
05 Apr 08
M GStatistics: Methods and Applications
-
31 Mar 08
-
n variables, specifically, those manipulated and those affected by the manipulation. However, experimental data may potentially provide qualitatively better information: Only experimental data can conclusively demonstrate causal relations between variables. For example, if we found
-
-
27 Mar 08
Would you like to comment?
Join Diigo for a free account, or sign in if you are already a member.