# Principal Component Analysis with Python (Intro)

Principal Component Analysis (PCA) is a statistical remedy that allows data science practitioners to pare down numerous variables in a dataset to a predefined number of ‘principal components.’ Essentially, this method allows statisticians to visualize and manipulate unwieldy data.

For a moment, take a look at the graph below, which comes from Jose Portilla’s Udemy course on machine learning.  On the upper left graph, you see what would be considered a normal data with two features, or ‘components.’  This graph is the eventual output of a PCA transformation.  Looking at the bottom left graph, you see all of the data points graphed on a single axis, with the y value (‘Feature 2’) dropped so as to only display the values on a single x axis.  The bottom right graph functions in a similar way, but using the other ‘principal component’ (‘Feature 2’) as the axis. At its root, PCA requires understanding the theory behind the X and Y axes that normally goes unnoticed when looking at plotted data.  Below, you see a traditional number line like the one that gets presented to primary school students across the country.  Normally, any value is plotted as a one-to-one relationship to a point on the graph.  For instance, take stock returns.  If in year one returns for the S&P 500 were 6%, that number would be dotted at the six.  If in the following year returns for the S&P 500 were -2%, that number would be represented with a dot at the -2.  Simply, all graphs that we see in everyday life are representations in 2-dimensional space as the intersection between two variables.