As part of my plan for edging over to the field of predictive analytics and data mining, I’ve been brushing up on some statistics via Coursera and Udacity classes. In one of the Udacity classes (Intro to Statistics), the professor (Sebastian Thrun) presented the relationship between statistics and probability as follows:
The explanation behind the diagram above was that statistics uses data to infer causes while probability predicts data (or outcomes) from possible causes.
From the perspective of predictive analytics, this really resonated with me. Statistics are applied to historical data to determine factors/attributes that are highly correlated with an outcome of interest. From this a predictive model can be built that can look at the set of factors/attributes and determine the probability of certain outcomes before the outcome is actually realized.
For example, a famous bike retailer might apply statistics to historical data including customer demographics and past purchasing behavior in order to determine which attributes or factors (e.g. gender, annual income, education level, geography, etc) have the highest correlation with bicycle purchases. A predictive model can then be created that takes in these highly correlated factors/attributes as input and spits out the probability of a bike purchase for new customers.
Pretty cool – i think 🙂