Whether it’s interpreting percentile shifts, binomial distributions, estimators or the statistical significance of outliers using analysis of variance, statistics is a broad field that is becoming ever-more popular today. Here’s a brief history of how statistics got started and where to find the best resources for troubleshooting your statistics questions!
Learning the computations behind statistical software is essential to accurate analysis
You may have heard data analysis in the news recently. From the major data breaches that regularly occur in some of the major banks across the world to the small, GDPR induced tick box you’ll have to check every time you visit a website – statistical analysis is shaping up to be this century’s Big Brother.
If all this sounds merely like statisticians’ jargon and Big Brother only calls forth images of the infamous reality TV show, let’s unpack the importance of understanding statistical methods by taking a closer look at Big Brother’s namesake: George Orwell’s 1984.
Without spoiling the plot or drawing up too convoluted a definition for inferential and descriptive statistics, the narrative of 1984 follows the story of two protagonists struggling against the ideals of a dystopian, authoritarian government. One of these protagonists works for the “Ministry of Truth,” where he, ironically, edits historical records to conform with the political party’s agenda. In other words: redacting and revising historical data.
The importance of statistical data and its history isn’t simply that it has improved our quality of life. Yes, Bayesian statistics and machine learning has given rise to statistical software that can track endangered species. Yes, the field of biostatistics has enabled the statistician to perform tests that produce the pharmaceutical drugs that save our lives. There is no question statistical techniques are essential to our daily lives, however, the manipulation of mathematical statistics can be wielded by corporations and government bodies to push political agendas or oppress certain segments of society – exemplified by Orwell’s seminal novel.
With concepts like categorical data, sample size, standard deviation and a probability distribution, the field of statistics can too often be perceived and taught in ways that are not only inaccessible, but perhaps lock out part of the population from a discipline that could empower them the most. While statistical theory and applied statistics may seem like a hyper-modern field filled with ultra-complex ideas to match, taking a look at the history of statistics belies this sentiment.
Humans have been using statistics to solve society’s urgent problems since the dawn of societies themselves. From collecting raw data on agricultural phenomena to improve farming techniques to recording the movements of planetary systems in order to unlock the mysteries of the universe – scientists have been using statistical data analysis for centuries. If you’re rolling your eyes and qualifying this statement in your head by adding that men have historically dominated this field – you wouldn’t be wrong.
However, not only have women like Florence Nightingale revolutionized the way we use and visualize probability and statistics, but there are women the world over today using statistical analyses to expose the discrimination women face even in industries that have always been considered to be bastions of egalitarianism.
So, this is great and all, but what exactly does applied statistics look like mathematically and in the present day? The discipline’s name is self-explanatory but is worth clarifying applied statistics involves using data collection, probability theory and data visualization to either solve a problem or test a hypothesis in areas like business, insurance, governments, education, and more.
Mathematically, statistics refers to applying probability and central tendency theories to test a null and alternative hypothesis through a number of different models: linear regression, multivariate regression, ANOVA, etc. While in the past statistics has been a discipline locked behind complex mathematics, involving aspects from a null hypothesis on normality of a distribution to probability density functions. However, in the present day, statistics is widely available to anyone with internet access. Open source programs like R and tutorials online, combined with tools that don’t require any previous statistics experience – such as Datawrapper – have led to a new, more democratized era of statistics and data analysis.
Programming is becoming an increasingly important component to data analysis
Now that you have a grasp of the evolution of statistics and data analysis, it can be helpful to know the composition of the discipline itself. Generally, the statistician or mathematician will divide the field into two main branches: descriptive and inferential statistics. Starting with the first, descriptive statistics concerns itself less with the intricacies of drawing an estimator or predictor from sample data and crafting a confidence interval based on various regression models.
Instead descriptive statistics concerns itself with understanding what the data looks like. While this may sound rudimentary, it is in fact what the majority of the population not only understands best but also consumes the most. In the UK, for example, people and governments are less interested with predicting the average income for a family with certain characteristics and more interested with, let’s say, the average income of their city.
Descriptive statistics describe either qualitative data or quantitative data and want to understand both location and variability statistics. In other words, using things like a histogram or normal distribution, descriptive statistics can tell you what the average data looks like and how different the data is from that average.
Measures of location, or central tendency, include the sample mean, median and mode. Measures of variability, on the other hand, are things like the variances, covariance or standard deviation of dependent and independent variables. Some other tools of measuring you can use in descriptive statistics include:
The second branch of statistics involve metrics you’re less likely to see in a newspaper. For example, while you’re probably used to seeing and understanding figures like rankings for the happiest countries, you’re probably not combing journals for the latest quarterly GDP estimates. While inferential statistics can be an extremely powerful statistical tool that shapes are daily lives, it can be a little more complex to perform, interpret and understand.
Inferential statistics is split between Bayesian statisticians and frequentists. While more detailed descriptions for how this matters in inferential statistics exists, most methods that are dealt with on a daily basis revolve around Bayesian statistics. Using probability theory, data scientists and statisticians are able to go beyond exploratory analysis to create a study design that tries to make predictions outside a given data set.
While inferential statistics was only implemented in the 19th century, its methods and uses have skyrocketed with the invention of computers and computer software geared towards statistics such as SPSS, R, Stata, and more. The most methods and models you’ll apply when conducting inferential statistics are:
Non-parametric tests can be very powerful in certain situations because of the fact that they don’t require the data to follow a specific distribution. If you’re interested in learning more about inferential statistics, start by getting familiar with the many assumptions – such as those under the Gauss-Markov theorem – statisticians will place on their data sets!
Comparing indicators within your data set can be fun!
From ordinary least squares to professional statistical methodology – Statistics as a field is as broad as topics such as Economics or Literature. This can make the job of students and professionals within statistics even harder when it comes to learning new skills or perfecting old ones. If you’re looking on advice or help with projects involving statistical models or are simply stuck on a bit of code, the best place to turn is the internet. While not a complete guide to statistical resources, here are some sites you definitely shouldn’t miss
Whether you’re interested in creating a classification system for the numerical data you’ve gathered or want to understand more about how certain indicators are measured, one of the best sites you can turn to for help is Eurostat’s Statistics Explained.
Visualization, whether you’re doing it for your categorical data or for an ANOVA or regression analysis, can be tricky. Sometimes, you just might not know things like the technicalities of graphing confidence intervals or how to best present your dependent variable. If you’re interested in fast, low-maintenance visualizations, make sure to check out Datawrapper.
If you’re looking for help with anything related to code, start by checking out Stackoverflow. One of the many online forums dedicated to answering, asking or browsing questions and answers set by real people on real problems related to code.
If you’re interested in finding a statistics tutor, you can start by looking through Superprof’s community of almost 150,000 maths professors. Teaching at all different levels and subjects, you’ll be able to find a tutor for statistics for the average price of 10 pounds an hour.