diweshojha.com

Is it going to rain today?

July 28, 2018 | by ojhadiwesh@gmail.com

1532791189107

Every once in a while I try to guess if it’s going to rain today or not…almost every time it’s wrong. My grandmother has that superpower and she doesn’t even know about Bayes Theorem. In data science it is one of the most useful rules from statistics that we can use while predicting something. So how does conditional probability help us in making our predictions better?

Bayes theorem provides us a way to update our predictive ability as and when we find out new information that is related to the event that concerns us. To pivot our decision as we find new information is essential to make effective decisions that could be as big as deciding a major business deal, identifying cancer cells in a patient, an investment opportunity, selecting an advertising campaign etc. So let’s try to understand how exactly does conditional probability work.

This is the final formula called Bayes Theorem, but before going into it let’s first understand joint and conditional probability. Joint probability is the probability of two or more events happening at the same time. It is represented by the intersection of two events in a Venn diagram.

The probability of picking a card that is both red and a 6 from a deck of 52 cards is P(6∩Red) = 2/52 = 1/26. If there are two events A and B, then the joint probability is represented as P(A ∩ B). Conditional probability is the probability that event A will occur given that even B has occurred which is represented as P(A│B). For example probability of getting a 6 given that I have picked a red card would be P(6│red) = 2/26 = 1/13 since there are two 6’s in all the red cards. We can use conditional probability to calculate joint probability as such, P(6∩ Red)= P(6│red) * P(Red) = 1/13 * 26/52 = 1/26 which matches our earlier logical answer too. Now the probability of getting a 6 given that I have picked a red is not same as the probability of getting a red given that I have picked a 6. And I leave it up to you to verify that, but the joint probability of getting a 6 and red is same as getting a red and 6, P(6∩ Red)= P(6│red) * P(Red) = P(Red∩ 6) = P(Red│6) * P(6). This leads us to the Bayes theorem. P(6│red) = [P(Red│6) * P(6)] / P(Red)

Lets call event A to be “rain today” and B to be “cloudy today”. So we are interested in finding the probability of rain given that it is cloudy today, P(A| B)P(A) is the prior probability, it represents the probability when we have no prior information about rain. So this is kind of like the default value. P(B|A) is the likelihood which is a way to analyze the evidence given the hypothesis, probability that it will be cloudy when we know it has rained ( this values comes mostly from the historical data). P(A|B) is called the posterior probability, the one in which we are interested. P(B) is the marginal likelihood, which is a way to normalize the data.

Let us assume that we know the P(A), probability of rain to be 0.05 (5%), P(B)- the probability of the day being cloudy is 0.15 (15%). And we find out that 50% of the time rain has happened it was a cloudy day, so P(B|A) = 0.5. So using Bayes theorem, we can find out the P(A|B) = P(B| A) * P(A) / P(B) = 0.5 * 0.05/ 0.15 = 0.167. So now we know that there is 16.66% chance of rain if it is cloudy, so our predictive ability has improved almost 3 times. As we add more evidence to our mix our predictive power keeps on increasing. For example we if add the information of the humidity we would be able to increase the probability of predicting rain. However my grandmother just has a sense about it, I wonder if sixth sense could also be modeled ever!

There are a lot of application in real world and a lot of research. Wiki says : In Bayesian statistics, the recent development of Markov chain Monte Carlo methods has been a key step in making it possible to compute large hierarchical models that require integrations over hundreds or even thousands of unknown parameters.

RELATED POSTS

View all

view all