Excel Applications: Naive Bayes AI (Text Classification) – part 1 of 2
Artificial intelligence is pervasive in our world today. From the cars we drive to the food we eat, our world is increasingly defined by the algorithms and the formulas of which they are composed. Because of the diversity of possible applications of artificial intelligence, cottage industries have cropped up attempting to apply AI models to virtually every facet of modern life. Although the applications of these techniques are quite modern, the underlying techniques themselves predate modernity by generations.
The workhorse of artificial intelligence algorithms is derived, interestingly enough, from the work of an ecclesiastical 18th century academic and statistician, Thomas Bayes. After his death, his notes would be published, laying out the foundations of Bayes’ Theorem, which would find new purpose in the world of machine learning. Concurrently, Bayes’ Theorem experienced a renaissance in the aftermath of the post-2008 financial crisis world, where “fat tails” and “black swans” gained notoriety as common parlance in mainstream academia. The simple premise underlying the statistical jargon is that previous observation (or “Bayesian priors”) should be accounted for when making predictions about the future.
Think about recent flooding. Climatologists often proclaim to much fanfare that once-in-a-hundred year hurricanes, earthquakes, and other natural phenomena are occurring at rates that far exceed their likelihood, and thus their policy recommendations should be considered above their more muted peers. Bayesian thinking, including the incorporation of “priors”, might indicate that the likelihood of the inaccuracy of the model is greater than the likelihood of two massive wildfires occurring in subsequent years being a virtual impossibility. Of course, this thinking pervades many spheres. The financial meltdown of 2008 was caused in part (how many parts there actually were is the subject of a whole subset of literature) on the erroneous notion that risk in the sub-prime housing market was not systemic. Of course, observation of prior banking crises indicates that in fact systemic risk is an endemic part of the banking sector. Nonetheless, we will see that Bayesian thinking has become centrally featured in the world of statistics.
Probability theory undergirds Bayes’ theorem and the naive Bayes AI applications used in Excel. John Foreman’s Data Smart gives an excellent primer on probability theory, and is worth a look for anyone wanting a quick refresher in reference to AI models. For our sake, it’s worth remembering that p(A|B) is the probability of A given B. This is known as a conditional probability, because it is the probability that A occurs on condition of B occurring. In the case of independent events, we can apply conditional probability. If p(A)=.5 and p(C)=.5, the chance that both A and B occurs is .5 x .5 = .25. Understanding probability theory will help in working through Naive Bayes models, particularly with regard to sentiment analysis application in Excel.
The model used for our purposes will allow us to perform text classification. Text classification in the modern sense allows us to rapidly and effectively make judgments about the class of text that we are attempting to understand. Sentiment analysis, a subset of text classification, is often used by advertisers to gauge twitter reactions and customer reviews. Text classification spans a broad range of applications, however, from military intelligence to targeted advertising to data mining.