Excel Applications: Naive Bayes AI (Text Classification) – part 1 of 2
Artificial intelligence is pervasive in our world today. From the cars we drive to the food we eat, our world is increasingly defined by the algorithms and the formulas of which they are composed. Because of the diversity of possible applications of artificial intelligence, cottage industries have cropped up attempting to apply AI models to virtually every facet of modern life. Although the applications of these techniques are quite modern, the underlying techniques themselves predate modernity by generations.
The workhorse of artificial intelligence algorithms is derived, interestingly enough, from the work of an ecclesiastical 18th century academic and statistician, Thomas Bayes. After his death, his notes would be published, laying out the foundations of Bayes’ Theorem, which would find new purpose in the world of machine learning. Concurrently, Bayes’ Theorem experienced a renaissance in the aftermath of the post-2008 financial crisis world, where “fat tails” and “black swans” gained notoriety as common parlance in mainstream academia. The simple premise underlying the statistical jargon is that previous observation (or “Bayesian priors”) should be accounted for when making predictions about the future.
Think about recent flooding. Climatologists often proclaim to muchfanfare that once-in-a-hundred year hurricanes, earthquakes, and other natural phenomena are occurring at rates that far exceed their likelihood, and thus their policy recommendations should be considered above their more muted peers. Bayesian thinking, including the incorporation of “priors”, might indicate that the likelihood of the inaccuracy of the model is greater than the likelihood of two massive wildfires occurring in subsequent years being a virtual impossibility. Of course, this thinking pervades many spheres. The financial meltdown of 2008 was caused in part (how many parts there actually were is the subject of a whole subset of literature) on the erroneous notion that risk in the sub-prime housing market was not systemic. Of course, observation of prior banking crises indicates that in fact systemic risk is an endemic part of the banking sector. Nonetheless, we will see that Bayesian thinking has become centrally featured in the world of statistics.
Probability theory undergirds Bayes’ theorem and the naive Bayes AI applications used in Excel. John Foreman’s Data Smart gives an excellent primer on probability theory, and is worth a look for anyone wanting a quick refresher in reference to AI models. For our sake, it’s worth remembering that p(A|B) is the probability of A given B. This is known as a conditional probability, because it is the probability that A occurs on condition of B occurring. In the case of independent events, we can apply conditional probability. If p(A)=.5 and p(C)=.5, the chance that both A and B occurs is .5 x .5 = .25. Understanding probability theory will help in working through Naive Bayes models, particularly with regard to sentiment analysis application in Excel.
The model used for our purposes will allow us to perform text classification. Text classification in the modern sense allows us to rapidly and effectively make judgments about the class of text that we are attempting to understand. Sentiment analysis, a subset of text classification, is often used by advertisers to gauge twitter reactions and customer reviews. Text classification spans a broad range of applications, however, from military intelligence to targeted advertising to data mining.
The foundations of the euro have their roots in war. Although sovereignty, war, and currency are intertwined in ways beyond the scope of this series, the chain of causality in our story nonetheless beings following the second world war. Following the reunification of Germany, the drumbeat of a unified Europe became a refrain that required response. Germans, beginning with the Third Reich, developed a plan for a unified Europe that transcended the war-torn map they themselves had developed. By the 1990’s, though, this plan had materialized as a way to prevent the future possibility of war. If countries were intertwined in both economic and societal terms, so the thinking went, war would be an impossibility. The first, and most significant, step was unification through trade liberalization and currency use. Rather than separate countries, the goal of the eurozone would be create a new sovereignty on the basis of federalism. Like the United States, the states within the eurozone would ultimately be superseded by a decision-making body that held sway over the decisions of its constituent members. Although most every thinker will admit this goal has not been achieved, whether or not it is even possible remains largely a function of who you ask.
Optimal Currency Area (OCA), or: The Theory That Started It All
The idea underpinning the unification of the European area into a single currency union was born out of the work of Robert Mundell, who developed the theory of an Optimal Currency Area (OCA). The OCA, roughly likened to an area like the United States (although this is a point of contention that we will return to later), is one in which certain criteria are met that allow for the frictionless adoption of a single currency. Sovereign nations of course issue their own currencies, and this is roughly a layman’s approximation of an OCA. More formally, areas that experience symmetric shocks in their economies should peg (fix) their currencies to one another.
The theory of shocks and the reality of asymmetric shocks in times of crisis (like, for instance, a multi-trillion-dollar asset bubble in an ill-regulated market) is one we will return to, but these ideas have since helped to explain the problems that arise from a system of pegged currencies (see also: the Euro, Argentine peso to USD, the Tequila Crisis, Bretton Woods). In order to qualify as an optimal currency area, four primary conditions must be satisfied:
Perfect labor mobility: The first criterion for an OCA assumes that workers are interchangeable across the extent of the area. For this assumption to hold, workers’ preferences do not vary across regions. Language barriers must not exist, skills requirements must be satisfied by workers, and prejudice has no place in the market. Under this scheme, no barriers to the freedom of movement have been erected and costs are negligible (think of 401K’s and switching driver’s licenses-or passports).
Capital mobility and wage/price flexibility: The second criterion assumes that supply and demand dictate the flow of capital throughout the region. This assumes that wages and prices are freely able to adjust according to the flow of workers (such as an absence of a minimum wage). Implicit in this assumption is the notion of a non-artificially restricted labor supply, tying back to the notion of labor mobility.
Risk sharing that mitigates adverse impacts of conditions (1) and (2): The third criterion states that a system of fiscal transfers takes place that reduces the economic burden brought on by shifts in the flow of wages and workers across an OCA. In the United States, for instance, redistribution of wealth takes place at the federal level and the state level, where certain states are net beneficiaries of federal dollars and others are net payers.
Similarity of business cycles: The fourth criterion assumes that areas within the OCA have similar business cycles. When the cycle trends upward in one area, that effect is similarly felt in all other parts of the optimal currency area. This is also true in the reverse, whereby all areas affected by an adverse economic shock are affected equally. This is closely related to the notion of asymmetric shocks, which we will later see disqualify certain areas from being truly optimal currency areas.
The list above constitutes the main four conditions that must be satisfied in order to be considered an OCA as originally set forth by Mundell. Other potential criteria include product diversification, homogenous preferences, and solidarity. If the list above sounds to you more like textbook theory than the eurozone, you aren’t alone. However, the reasoning was sufficient to give proper context to the Maastricht Treaty, which started the domino effect of unification.
The Maastricht Treaty provided the backbone of what would eventually become the EU. In order to consolidate the various forces pushing for union, the Maastricht Treaty (along with the Delors Commission) created the three pillars of what would become the European Union:
The European Communities (such as the ECSC)
The Common Foreign and Security Policy
The Police and Judicial Co-operation in Criminal Matters
The three pillars can still be seen today, in some ways in greater detail. The ECSC in many ways acted as the precursor to the eurozone, while the third pillar foreshadowed Interpol. Despite these beginnings, Europe remains somewhat distant from the ideals laid out by the three pillars. For instance, many scholars have waxed poetic about the failure of Europe to enact a comprehensive foreign policy in the face of the global winds of change (despite the second pillar).
The Ghost of Maastricht
In order for states to enter the eurozone, however, price stability was needed. Achieving it, according to the Maastricht treaty, required five criteria to be met. The convergence criteria, or Maastricht criteria, were thought to constitute the necessary conditions for an optimal currency area’s creation. The five criteria were:
HICP inflation: Inflation levels for the countries entering the eurozone were supposed to stay within an ill-defined range which was counted as the unweighted arithmetic average of the similar HICP inflation rates in the 3 EU member states with the lowest HICP inflation plus 1.5 percentage points. If, however, states had significantly lower interest rates than the average, they were not to be included in the measure of the lowest three. How this was determined was through a test as to whether they had suffered from “exceptional factors.” What seems more exceptional was that this rule was implemented at all.
Government budget deficit: The ratio of the domestic government deficit was not to exceed 3% of GDP of the previous fiscal year. This rule, although used as convergence criteria, was supposed to have been a strict, binding rule, although as we will see this has largely been violated by virtually every eurozone country at any given point in time.
Government debt-to-GDP ratio: The government debt to GDP ratio was not to exceed 60% of the GDP at any time. While the budget deficit is a flow, the debt ratio was a stock that, again, was supposed to be inviolable. As time has progressed, however, most eurozone countries have gone well beyond the 60% mark, calling into question the ECB’s commitment to honoring the ground rules and the extent to which it seems willing to stamp out “moral hazard.”
Exchange rate stability: Devaluation of the applying country’s currency was not to have happened in the prior two years to convergence. Unlike the previous two rules, this rule (for obvious reasons) does not carry forward to the present day. However, the inability of countries like Cyprus and Greece to devalue their currencies while Germany amasses a large trade surplus features centrally in the problems of the eurozone at the present time.
Long-term interest rates: Long term interest rates were not to exceed 2% greater than the average of applicant countries, with wording similar to the first criterion. The foundational notion behind both the first and fifth criteria was that post-convergence, inflation and interest rates for all eurozone countries would be exactly equal, as the market would view these countries’ bonds as perfect substitutes. As we will see, this erroneous line of thinking nearly caused Grexit and the eventual collapse of the EU. Although that problem was staved off temporarily, the chief lesson from the 2008 financial crisis was that without fiscal controls available to the member nations, the assumptions behind the first and fifth criteria are fallacious enough to potentially end the eurozone experiment.
The EU, founded upon the notion that unification would prevent war, has struggled since its inception to build a stable economic policy framework. Between the assumptions made in equating the eurozone to an optimal currency area (four) and the assumptions implicitly behind the convergence criteria (five), the great unification experiment was founded upon a set of notions that have been in want of a solution in the decades since. In the coming parts of the series we will examine these assumptions, the tensions behind solving them, and the outcomes if they go unsolved. The experiment to end war through economics may ultimately rest on these very outcomes.
Principal Component Analysis (PCA) is a statistical remedy that allows data science practitioners to pare down numerous variables in a dataset to a predefined number of ‘principal components.’ Essentially, this method allows statisticians to visualize and manipulate unwieldy data.
For a moment, take a look at the graph below, which comes from Jose Portilla’s Udemy course on machine learning. On the upper left graph, you see what would be considered a normal data with two features, or ‘components.’ This graph is the eventual output of a PCA transformation. Looking at the bottom left graph, you see all of the data points graphed on a single axis, with the y value (‘Feature 2’) dropped so as to only display the values on a single x axis. The bottom right graph functions in a similar way, but using the other ‘principal component’ (‘Feature 2’) as the axis.
At its root, PCA requires understanding the theory behind the X and Y axes that normally goes unnoticed when looking at plotted data. Below, you see a traditional number line like the one that gets presented to primary school students across the country. Normally, any value is plotted as a one-to-one relationship to a point on the graph. For instance, take stock returns. If in year one returns for the S&P 500 were 6%, that number would be dotted at the six. If in the following year returns for the S&P 500 were -2%, that number would be represented with a dot at the -2. Simply, all graphs that we see in everyday life are representations in 2-dimensional space as the intersection between two variables.
In finance, statistics, epidemiology, and elsewhere, we typically see this referred to as an ‘axis.’ So, plotting any two of these at a 90 degree angle typically yields the scatterplots, bar charts, or trendlines typically seen in the professional and academic worlds. PCA is principally no different from this, except that we take dimensions to the nth number and pare them down until we can visualize them on a two-dimensional graph.
When datasets get complex and more than two variables are used to capture the essence of the data, PCA can be used as a tool to visualize and capture information about the data structure. In the following example, using Python, we will move through Principal Component Analysis on a built-in Python dataset: Continue reading Principal Component Analysis with Python (Intro)→
Everyone is searching for alpha. Some are better dowsers than others.