Vlad V's List: Probability, Randomness and Statistics (PRS)

Concepts
Almost surely - Wikipedia, the free encyclopedia

Apr 27, 10

en.wikipedia.org/...Almost_surely probability
Bayesian probability - Wikipedia, the free encyclopedia 2

Apr 27, 10

en.wikipedia.org/...Bayesian_probability statistics probability
- Bayesian probability is one of the most popular interpretations of the concept of probability. The Bayesian interpretation of probability can be seen as an extension of logic that enables reasoning with uncertain statements. To evaluate the probability of a hypothesis, the Bayesian probabilist specifies some prior probability, which is then updated in the light of new relevant data. The Bayesian interpretation provides a standard set of procedures and formula to perform this calculation.
  
  Bayesian probability interprets the concept of probability as "a measure of a state of knowledge",^[1] in contrast to interpreting it as a frequency or a physical property of a system.
- Given some data and some hypothesis, the posterior probability that the hypothesis is true is proportional to the product of the likelihood multiplied by the prior probability. For simplicity, the "prior probability" is often abbreviated as the "prior" and the "posterior probability" as the "posterior". The likelihood brings in the effect of the data, while the prior specifies the belief in the hypothesis before the data was observed.
  
  More formally, Bayesian inference uses Bayes' formula for conditional probability:
  
  $P(H|D) = \frac{P(D|H)\;P(H)}{P(D)}$
Confounding - Wikipedia, the free encyclopedia 2

Apr 28, 10

en.wikipedia.org/...Confounding_variable statistics
- In statistics, a confounding variable (also confounding factor, lurking variable, a confound, or confounder) is an extraneous variable in a statistical model that correlates (positively or negatively) with both the dependent variable and the independent variable.
- For example, assume that a child's weight and a country's gross domestic product (GDP) rise with time. A person carrying out an experiment could measure weight and GDP, and conclude that a higher GDP causes children to gain weight, or that children's weight gain boosts the GDP. However, the confounding variable, time, was not accounted for, and is the real cause of both rises.
  
  A further example is the statistical relationship between ice cream sales and drowning deaths. When these variables are entered into a statistical analysis, they will show a positive and potentially statistically significant correlation. However, it would be a mistake to infer a causal relationship (i.e., ice cream causes drowning) , because of the presence of an important confounding variable which causes both ice cream sales and an increase in drowning deaths: summertime.
Correlation and dependence - Wikipedia, the free encyclopedia 2

Apr 28, 10

en.wikipedia.org/Correlate correlation dependence statistics
- In statistics, correlation and dependence are any of a broad class of statistical relationships between two or more random variables or observed data values.
- Formally, dependence refers to any situation in which random variables do not satisfy a mathematical condition of probabilistic independence. In general statistical usage, correlation or co-relation can refer to any departure of two or more random variables from independence, but most commonly refers to a more specialized type of relationship between mean values. There are several correlation coefficients, often denoted ρ or r, measuring the degree of correlation.
Correlation does not imply causation - Wikipedia, the free encyclopedia 3

Apr 28, 10

en.wikipedia.org/...ation_does_not_imply_causation causation correlation dependence fallacy
- "Correlation does not imply causation" is a phrase used in science and statistics to emphasize that correlation between two variables does not automatically imply that one causes the other (though it does not remove the fact that correlation can still be a hint, whether powerful or otherwise).^[
- Empirically observed covariation is a necessary but not sufficient condition for causality.
- Intuitively, causation seems to require not just a correlation, but a counterfactual dependence. Suppose that a student performed poorly on a test and guesses that the cause was his not studying. To prove this, one thinks of the counterfactual - the same student writing the same test under the same circumstances but having studied the night before. If one could rewind history, and change only one small thing (making the student study for the exam), then causation could be observed (by comparing version 1 to version 2). Because one cannot rewind history and replay events after making small controlled changes, causation can only be inferred, never exactly known. This is referred to as the Fundamental Problem of Causal Inference - it is impossible to directly observe causal effects.^[
1 more annotation...
Frequency probability - Wikipedia, the free encyclopedia 1

Apr 27, 10

en.wikipedia.org/...Frequency_probability frequency probability
- Frequency probability is the interpretation of probability that defines an event's probability as the limit of its relative frequency in a large number of trials. The development of the frequentist account was motivated by the problems and paradoxes of the previously dominant viewpoint, the classical interpretation.
Pearson product-moment correlation coefficient - Wikipedia, the free encyclopedia 1

Apr 28, 10

en.wikipedia.org/...moment_correlation_coefficient co correlation dependence statistics
- In statistics, the Pearson product-moment correlation coefficient (sometimes referred to as the PMCC, and typically denoted by r) is a measure of the correlation (linear dependence) between two variables X and Y, giving a value between +1 and −1 inclusive. It is widely used in the sciences as a measure of the strength of linear dependence between two variables.
P-value - Wikipedia, the free encyclopedia 3

Apr 27, 10

en.wikipedia.org/P-value statistics probability
- In statistical hypothesis testing, the p-value is the probability of obtaining a test statistic at least as extreme as the one that was actually observed, assuming that the null hypothesis is true.
  
  The lower the p-value, the less likely the result is if the null hypothesis is true, and consequently the more "significant" the result is, in the sense of statistical significance. One often rejects a null hypothesis if the p-value is less than 0.05 or 0.01, corresponding to a 5% or 1% chance respectively of an outcome at least that extreme, given the null hypothesis. Stating that the p-value is the observed significance level of a hypothesis test implies that the p-value is also the probability of making a Type I error because the significance level equals the probability of making a Type I error.
- For example, an experiment is performed to determine whether a coin flip is fair (50% chance of landing heads or tails) or unfairly biased (> 50% chance of landing heads or tails).
  
  Suppose that the experimental results show the coin turning up heads 14 times out of 20 total flips. The p-value of this result would be the chance of a fair coin landing on heads at least 14 times out of 20 flips. The probability that 20 flips of a fair coin would result in 14 or more heads is 0.058, which is also called the p-value.
  
  Because there is no way to know what percentage of coins in the world are unfair, the p-value does not tell us whether the coin is unfair. It measures the chance that a fair coin gives such result.
- The data obtained by comparing the p-value to a significance level will yield one of two results: either the null hypothesis is rejected, or the null hypothesis cannot be rejected at that significance level (which however does not imply that the null hypothesis is true). A small p-value that indicates statistical significance does not indicate that an alternative hypothesis is ipso facto correct; there are additional tests which may be performed in order to make a more definitive statement about the validity of the null hypothesis, such as some "goodness of fit" tests.
1 more annotation...
Statistical significance - Wikipedia, the free encyclopedia 2

Apr 27, 10

en.wikipedia.org/...Statistical_significance statistics probability
- In statistics, a result is called statistically significant if it is unlikely to have occurred by chance.
- The amount of evidence required to accept that an event is unlikely to have arisen by chance is known as the significance level or critical p-value: in traditional Fisherian statistical hypothesis testing, the p-value is the probability conditional on the null hypothesis of the observed data or more extreme data. If the obtained p-value is small then it can be said either the null hypothesis is false or an unusual event has occurred. It is worth stressing that p-values do not have any repeat sampling interpretation.
Laws and Theorems
Infinite monkey theorem - Wikipedia, the free encyclopedia 4

Apr 27, 10

en.wikipedia.org/...Infinite_monkey_theorem infinity monkey randomness statistics theorem thought experiment probability
- The infinite monkey theorem states that a monkey hitting keys at random on a typewriter keyboard for an infinite amount of time will almost surely type a given text, such as the complete works of William Shakespeare.
- The probability of a monkey exactly typing a complete work such as Shakespeare's Hamlet is so tiny that the chance of it occurring during a period of time of the order of the age of the universe is minuscule, but not zero.
- In a 1939 essay entitled "The Total Library", Argentine writer Jorge Luis Borges traced the infinite-monkey concept back to Aristotle's Metaphysics.
- In 2003, lecturers and students from the University of Plymouth MediaLab Arts course used a £2,000 grant from the Arts Council to study the literary output of real monkeys. They left a computer keyboard in the enclosure of six Celebes Crested Macaques in Paignton Zoo in Devon in England for a month, with a radio link to broadcast the results on a website. One researcher, Mike Phillips, defended the expenditure as being cheaper than reality TV and still "very stimulating and fascinating viewing".^[1]
  
  Not only did the monkeys produce nothing but five pages^[24] consisting largely of the letter S, the lead male began by bashing the keyboard with a stone, and the monkeys continued by urinating and defecating on it. The zoo's scientific officer remarked that the experiment had "little scientific value, except to show that the 'infinite monkey' theory is flawed". Phillips said that the artist-funded project was primarily performance art, and they had learned "an awful lot" from it. He concluded that monkeys "are not random generators. They're more complex than that. … They were quite interested in the screen, and they saw that when they typed a letter, something happened. There was a level of intention there."^[1]^[25]
2 more annotations...
Law of large numbers - Wikipedia, the free encyclopedia 2

Apr 27, 10

en.wikipedia.org/...Law_of_Large_Numbers statistics probability
- In probability theory, the law of large numbers (LLN) is a theorem that describes the result of performing the same experiment a large number of times. According to the law, the average of the results obtained from a large number of trials should be close to the expected value, and will tend to become closer as more trials are performed.
- Similarly, when a fair coin is flipped once, the expected value of the number of heads is equal to one half. Therefore, according to the law of large numbers, the proportion of heads in a large number of coin flips should be roughly one half. In particular, the proportion of heads after n flips will almost surely converge to one half as n approaches infinity.
Fallacies
Gambler's fallacy - Wikipedia, the free encyclopedia 3

Apr 27, 10

en.wikipedia.org/...Gambler%27s_fallacy fallacy statistics probability
- The gambler's fallacy, also known as the Monte Carlo fallacy (due to its significance in a Monte Carlo casino in 1913)^[1] or the fallacy of the maturity of chances, is the belief that if deviations from expected behaviour are observed in repeated independent trials of some random process then these deviations are likely to be evened out by opposite deviations in the future. For example, if a fair coin is tossed repeatedly and tails comes up a larger number of times than is expected, a gambler may incorrectly believe that this means that heads is more likely in future tosses.^[2] Such an expectation could be mistakenly referred to as being due. This is an informal fallacy. It is also known colloquially as the law of averages.
- The reversal is also a fallacy, the reverse gambler's fallacy, in which a gambler may instead decide that tails are more likely out of some mystical preconception that fate has thus far allowed for consistent results of the tail; the false conclusion being, why change if odds favor tails? Again, the fallacy is the belief that the "universe" somehow carries a memory of past results which tend to favor or disfavor future outcomes.
- The probability of getting 20 heads then 1 tail, and the probability of getting 20 heads then another head are both 1 in 2,097,152. Therefore, it is equally likely to flip 21 heads as it is to flip 20 heads and then 1 tail when flipping a fair coin 21 times. Furthermore, these two probabilities are as equally likely as any other 21-flip combinations that can be obtained (there are 2,097,152 total); all 21-flip combinations will have probabilities equal to 0.5²¹, or 1 in 2,097,152. From these observations, there is no reason to assume at any point that a change of luck is warranted based on prior trials (flips), because every outcome observed will always have been equally as likely as the other outcomes that were not observed for that particular trial, given a fair coin. Therefore, just as Bayes' theorem shows, the result of each trial comes down to the base probability of the fair coin: $\scriptstyle\frac{1}{2}$ .
1 more annotation...
Prosecutor's fallacy - Wikipedia, the free encyclopedia 3

Apr 26, 10

en.wikipedia.org/...Prosecutor%27s_fallacy conditional probability fallacy statistics probability
- One form of the fallacy results from misunderstanding conditional probability, or neglecting the prior odds of a defendant being guilty; i.e., the chance an individual might be guilty even though there's no evidence directly implicating him/her. When a prosecutor has collected some evidence (for instance a DNA match) and has an expert testify that the probability of finding this evidence if the accused were innocent is tiny, the fallacy occurs if it is concluded that the probability of the accused being innocent must be comparably tiny. The probability of innocence would only be the same small value if the prior odds of guilt were exactly 1:1. In reality the probability of guilt would depend on other circumstances. If the person is already suspected for other reasons, then the probability of guilt would be very high, whereas if he is otherwise totally unconnected to the case, then we should consider a much lower prior probability of guilt, such as the overall rate of offenders in the populace for the crime in question, and the probability of guilt would be much lower.
- Another form of the fallacy results from misunderstanding the idea of multiple testing, such as when evidence is compared against a large database. The size of the database elevates the likelihood of finding a match by pure chance alone; i.e., DNA evidence is soundest when a match is found after a single directed comparison because the existence of matches against a large database where the test sample is of poor quality (common for recovered evidence) is very likely by mere chance.
- Suppose there is a one-in-a-million chance of a match given that the accused is innocent. The prosecutor says this means there is only a one-in-a-million chance of innocence. But if everyone in a community of 10 million people is tested, one expects 10 matches even if everyone tested is innocent.
  
  The defendant's fallacy would be to say, "We would expect 10 matches in this city of 10 million people, so this particular piece of evidence suggests there is a 90% chance that the accused is innocent. So this evidence cannot be used to point to a conclusion of guilt, and should be excluded."
  
  The problem with the defendant's argument is that there may be other available evidence which on its own is also not conclusive.
1 more annotation...
Representativeness heuristic - Wikipedia, the free encyclopedia 1

Apr 27, 10

en.wikipedia.org/...Representativeness_heuristic cognitive bias fallacy frequency heuristic probability
- The representativeness heuristic is a rule of thumb wherein people judge the probability or frequency of a hypothesis by considering how much the hypothesis resembles available data as opposed to using a Bayesian calculation. While often very useful in everyday life, it can also result in neglect of relevant base rates and other cognitive biases. The representative heuristic was first proposed by Amos Tversky and Daniel Kahneman^[1]. In causal reasoning, the representativeness heuristic leads to a bias toward the belief that causes and effects will resemble one another (examples include both the belief that "emotionally relevant events ought to have emotionally relevant causes", and magical associative thinking)^[2].
Puzzling
Nontransitive dice - Wikipedia, the free encyclopedia 3

Apr 27, 10

en.wikipedia.org/...Nontransitive_dice dice probability
- Consider a set of three dice, A, B and C such that
  
  die A has sides {2,2,4,4,9,9},
  
  die B has sides {1,1,6,6,8,8}, and
  
  die C has sides {3,3,5,5,7,7}.
  
  Then:
  
  the probability that A rolls a higher number than B is 5/9 (55.55 %),
  
  the probability that B rolls a higher number than C is 5/9, and
  
  the probability that C rolls a higher number than A is 5/9.
- A set of nontransitive dice is a set of dice for which the relation "is more likely to roll a higher number" is not transitive.
- Wow! Really counter-intuitive!
1 more annotation...
Simpson's paradox - Wikipedia, the free encyclopedia 4

Apr 28, 10

en.wikipedia.org/...Simpson_paradox paradox probability ratio
- In probability and statistics, Simpson's paradox (or the Yule-Simpson effect) is an apparent paradox in which the successes in different groups seem to be reversed when the groups are combined.
- Suppose two people, Lisa and Bart, each edit Wikipedia articles for two weeks. In the first week, Lisa improves 60 percent of the 100 articles she edited, and Bart improves 90 percent of 10 articles he edited. In the second week, Lisa improves just 10 percent of 10 articles she edited, while Bart improves 30 percent of 100 articles he edited.
- Both times Bart improved a higher percentage of articles than Lisa, but the actual number of articles each edited (the bottom number of their ratios also known as the sample size) were not the same for both of them either week. When the totals for the two weeks are added together, Bart and Lisa's work can be judged from an equal sample size, i.e. the same number of articles edited by each. Looked at in this more accurate manner, Lisa's ratio is higher and, therefore, so is her percentage. Also when the two tests are combined using a weighted average, overall, Lisa has improved a much higher percentage than Bart because the quality modifier had a significantly higher percentage.
- This imagined paradox is caused when the percentage is provided but not the ratio. In this example, if only the 90% in the first week for Bart was provided but not the ratio (9:10), it would distort the information causing the imagined paradox. Even though Bart's percentage is higher for the first and second week, when two weeks of articles is combined, overall Lisa had improved a greater proportion, 55% of the 110 total articles. Lisa's proportional total of articles improved exceeds Bart's total.
2 more annotations...

1 - 20 of 20

20 items/page

List Comments (0)

List Info

Vlad V

20 items | 23 visits

Updated on Apr 28, 10
Created on Apr 27, 10

Category: Science

URL:

Vlad V's List: Probability, Randomness and Statistics (PRS)

Almost surely - Wikipedia, the free encyclopedia

Bayesian probability - Wikipedia, the free encyclopedia 2

Confounding - Wikipedia, the free encyclopedia 2

Correlation and dependence - Wikipedia, the free encyclopedia 2

Correlation does not imply causation - Wikipedia, the free encyclopedia 3

Frequency probability - Wikipedia, the free encyclopedia 1

Pearson product-moment correlation coefficient - Wikipedia, the free encyclopedia 1

P-value - Wikipedia, the free encyclopedia 3

Statistical significance - Wikipedia, the free encyclopedia 2

Infinite monkey theorem - Wikipedia, the free encyclopedia 4

Law of large numbers - Wikipedia, the free encyclopedia 2

Gambler's fallacy - Wikipedia, the free encyclopedia 3

Prosecutor's fallacy - Wikipedia, the free encyclopedia 3

Representativeness heuristic - Wikipedia, the free encyclopedia 1

Nontransitive dice - Wikipedia, the free encyclopedia 3

Simpson's paradox - Wikipedia, the free encyclopedia 4

List Info