While there are a few existing online explanations of Bayes' Theorem, my experience with trying to introduce people to Bayesian reasoning is that the existing online explanations are too abstract. Bayesian reasoning is very counterintuitive. People do not employ Bayesian reasoning intuitively, find it very difficult to learn Bayesian reasoning when tutored, and rapidly forget Bayesian methods once the tutoring is over. This holds equally true for novice students and highly trained professionals in a field. Bayesian reasoning is apparently one of those things which, like quantum mechanics or the Wason Selection Test, is inherently difficult for humans to grasp with our built-in mental faculties.
Or so they claim. Here you will find an attempt to offer an intuitive explanation of Bayesian reasoning - an excruciatingly gentle introduction that invokes all the human ways of grasping numbers, from natural frequencies to spatial visualization. The intent is to convey, not abstract rules for manipulating numbers, but what the numbers mean, and why the rules are what they are (and cannot possibly be anything else). When you are finished reading this page, you will see Bayesian problems in your dreams.
One of the easiest ways to understand probabilities is to think of them in terms of Venn Diagrams. You basically have a Universe with all the possible outcomes (of an experiment for instance), and you are interested in some subset of them, namely some event. Say we are studying cancer, so we observe people and see whether they have cancer or not. If we take as our Universe all people participating in our study, then there are two possible outcomes for any particular individual, either he has cancer or not. We can then split our universe in two events: the event "people with cancer" (designated as A), and "people with no cancer" (or ~A). We could build a diagram like this:
I'm a programmer with a decent background in math and computer science. I've studied computability, graph theory, linear algebra, abstract algebra, algorithms, and a little probability and statistics (through a few CS classes) at an undergraduate level.
I feel, however, that I don't know enough about statistics. Statistics are increasingly useful in computing, with statistical natural language processing helping fuel some of Google's algorithms for search and machine translation, with performance analysis of hardware, software, and networks needing proper statistical grounding to be at all believable, and with fields like bioinformatics becoming more prevalent every day.
As a software engineer, I'm interested in topics such as statistical algorithms, data mining, machine learning, Bayesian networks, classification algorithms, neural networks, Markov chains, Monte Carlo methods, and random number generation.
I personally haven't had the pleasure of working hands-on with any of these techniques, but I have had to work with software that, under the hood, employed them and would like to know more about them, at a high level. I'm looking for books that cover a great breadth - great depth is not necessary at this point. I think that I can learn a lot about software development if I can understand the mathematical foundations behind the algorithms and techniques that are employed.
Can the Statistical Analysis community recommend books that I can use to learn more about implementing various statistical elements in software?
R is an elegant and comprehensive statistical and graphical programming language. Unfortunately, it can also have a steep learning curve. I created this website for both current R users, and experienced users of other statistical packages (e.g., SAS, SPSS, Stata) who would like to transition to R. My goal is to help you quickly access this language in your work.
I assume that you are already familiar with the statistical methods covered and instead provide you with a roadmap and the code necessary to get started quickly, and orient yourself for future learning. I designed this web site to be an easily accessible reference.
The purpose of a measure of similarity is to compare two lists of numbers (i.e. vectors), and compute a single number which evaluates their similarity. Most measures were developed in the context of comparing pairs of variables (such as income or attitude toward abortion) across cases (such as respondents in a survey). In other words, the objective is to determine to what extent two variables co-vary, which is to say, have the same values for the same cases
The following lists of further reading are provided for each of the Core technical subjects. The exams for each subject will be based on the relevant syllabus and core reading, and the ActEd course material will be the main source of tuition for students
This page is intended to assist students and professionals pursuing a career in the actuarial profession in preparing for the actuarial exams. The long term objective is to provide textbooks for most of the exams offered by both the Society of Actuaries and the Casualty Actuarial Society. All these books are free of charge and are available to the publi
This is a General Statistics Curriculum E-Book, which includes Advanced-Placement (AP) materials.This is an Internet-based probability and statistics E-Book. The materials, tools and demonstrations presented in this E-Book would be very useful for advanced-placement (AP) statistics educational curriculum. The E-Book is initially developed by the UCLA Statistics Online Computational Resource (SOCR). However, all statistics instructors, researchers and educators are encouraged to contribute to this project and improve the content of these learning materials.
There are 4 novel features of this specific Statistics EBook. It is community-built, completely open-access (in terms of use and contributions), blends information technology, scientific techniques and modern pedagogical concepts, and is multilingual.
"Introduction to statistics. Will eventually cover all of the major topics in a first-year statistics course (not there yet!)"
These are a set of video lectures by Prof. Yaser S. Abu Mostafa of Caltech on Statistical Learning Theory that accompany his book "Learning from Data". The topics covered in brief are:
1.Bayesian Learning
2. Bin Model
3. Data Snooping
4. Ensemble Learning
5. Gradient Descent
6. Learning Curves (Regression)
7. Neural Networks
8. Overfitting problem
9. Radial basis functions and Regularization
10. Support Vector Machines
11. VC Dimension
"The articles on the left provide an introduction to R for people who are already familiar with other programming languages."
This is a draft textbook on data analysis methods, intended for a one-semester course for advance undergraduate students who have already taken classes in probability, mathematical statistics, and linear regression. Contents
I. Regression and Its Generalizations
1.Regression Basics
2.The Truth about Linear Regression
3.Model Evaluation
4.Smoothing in Regression
5.Simulation
6.The Bootstrap
7.Weighting and Variance
8.Splines
9.Additive Models
10.Testing Regression Specifications
11.More about Hypothesis Testing
12.Logistic Regression
13.Generalized Linear Models and Generalized Additive Models
II. Multivariate Data, Distribution Estimates, and Latent Structure
14.Multivariate Distributions
15.Density Estimation
16.Relative Distributions and Smooth Tests
17.Principal Components Analysis
18.Factor Analysis
19.Mixture Models
20.Graphical Models
III. Causal Inference
21.Graphical Causal Models
22.Identifying Causal Effects
23.Estimating Causal Effects
24.Discovering Causal Structure
IV. Dependent Data
25.Time Series
26.Time Series with Latent Variables
27.Longitudinal, Spatial and Network Data
CIML is a set of introductory materials that covers most major aspects of modern machine learning (supervised learning, unsupervised learning, large margin methods, probabilistic modeling, learning theory, etc.). It's focus is on broad applications with a rigorous backbone. A subset can be used for an undergraduate course; a graduate course could probably cover the entire material and then some.
Welcome to our new online textbook on forecasting. This book is a replacement for Makridakis, Wheelwright and Hyndman (Wiley 1998).
This textbook is intended to provide a comprehensive introduction to forecasting methods and to present enough information about each method for readers to be able to use them sensibly. We don’t attempt to give a thorough discussion of the theoretical details behind each method, although the references at the end of each chapter will fill in many of those details. The book is written for three audiences: (1) people finding themselves doing forecasting in business when they may not have had any formal training in the area; (2) undergraduate students studying business; (3) MBA students doing a forecasting elective. We use it ourselves for a second-year subject for students undertaking a Bachelor of Commerce degree at Monash University, Australia.
"One of the first things a scientist hears about statistics is that there is are two different approaches: frequentism and Bayesianism. Despite their importance, many scientific researchers never have opportunity to learn the distinctions between them and the different practical approaches that result. The purpose of this post is to synthesize the philosophical and pragmatic aspects of the frequentist and Bayesian approaches, so that scientists like myself might be better prepared to understand the types of data analysis people do.
I'll start by addressing the philosophical distinctions between the views, and from there move to discussion of how these ideas are applied in practice, with some Python code snippets demonstrating the difference between the approaches. "