Python Data Analysis

From bibbleWiki
Jump to navigation Jump to search

Introduction

There are two sets of statistical tools

  • Descriptive Statistics

Descriptive statistics describes data (for example, a chart or graph)

  • Inferential Statistics

Inferential statistics allows you to make predictions (“inferences”) from that data. With inferential statistics, you take data from samples and make generalizations about a population

Package Managers

Pip

  • Install from PyPi
  • Install Python package only
  • Installs "wheels" Source only
  • To install other software you need another package manager installer

Conda

  • Install from Anaconda repository
  • Install packages as well as other tools
  • Installs binaries
  • Can also install other software

Python Refresh

We can run commands from the shell

pip install numpy
import numpy
numpy.__version__

Jupyter Notebooks (IPython)

This is open source and originated from the IPython project

Running a Jupyter Server

In Anaconda you can create environments with the create command

conda create -n py27 python=2.7
conda activate py27
conda deactivate

With Anaconda installed we can run Jupyter notebooks using

jupyter notebook

Follow the prompt and open in the browser

Jupyter Shortcuts

We can run commands in the notebook !pip install xxxx !pwd !ls !pip install ipywidgets
We read a csv with this

import numpy as np
import pandas as pd
import matplotlib.pyplot as pit
import seaborn as sns
import ipywidgets as widgets
from ipywidgets import interact, interact_manual

iris = pd.read_csv('IRIS.csv')

iris.head()

Pandas

Really like the simplicity of using python and pandas to look at data very simple but good examples to get me going

Concatenation

Here is concatenation

df1_dummy = {
    "serial_id" : ["1", "2", "3", "4", "5"],
    "sale_month" : ["Jan", "Feb", "Mar", "Apr", "May"],
    "sales" : ["12300", "25100", "17800", "20100", "21000"]
}

df1 = pd.DataFrame(df1_dummy, columns = ["serial_id", "sale_month", "sales"])

df1

Merging

df1 = pd.DataFrame({"product" : ["Prod_1", "Prod_2", "Prod_3", "Prod_4"], 
                   "division": ["Div_A", "Div_B", "Div_C", "Div_B"],
                   })

df6 = pd.DataFrame({"division" : ["Div_A", "Div_A", "Div_B", "Div_C", "Div_C", "Div_C"],
                   "emp_grade" : ["13", "14+", "12", "11", "10", "9-"]})

df7 = pd.merge(df1, df6)