Pearsons Correlation

From bibbleWiki
Jump to navigation Jump to search

Introduction

This is hopefully going to be about Pearsons Correlation however there were pre-reqs so

Histograms

so quick reminder, you create a set of bins and put data into them to try and understand the distribution.

Distributions

Introduction

Next we look at types of distributions

Normal

First up normal distribution, the bell curve, or Gaussian distribution. There are names for a sample and a population

  • The mean is in the centre of the curve and given x̄ pronounced x-bar (sample), the greek symbol μ and is pronounced mu (population)
  • The standard deviation is the spread of the curve, the symbol is S for sample and the greek symbol σ pronounced sigma (population)
  • Large standard deviation means a wide bell curve
  • Small standard deviation means a narrow bell curve


The formula we use for standard deviation depends on whether the data is being considered a population of its own, or the data is a sample representing a larger population.

  • If the data is being considered a population on its own, we divide by the number of data points, N.
  • If the data is a sample from a larger population, we divide by one fewer than the number of data points in the sample, n-1.


There are some rules about the percentage contained in steps away from the mean

  • 1 step away is 68% of the data
  • 2 steps away is 95% of the data
  • 3 steps away is 99.7% of the data

Area Under a Curve

You can calculate the area under a curve with this. Thinking maths would be easier now.

So here is what we are looking at

Estimating Mean