Introduction

There are two sets of statistical tools

Descriptive Statistics

Descriptive statistics describes data (for example, a chart or graph)

Inferential Statistics

Inferential statistics allows you to make predictions (“inferences”) from that data. With inferential statistics, you take data from samples and make generalizations about a population

Package Managers

Pip

Install from PyPi
Install Python package only
Installs "wheels" Source only
To install other software you need another package manager installer

Conda

Install from Anaconda repository
Install packages as well as other tools
Installs binaries
Can also install other software

Python Refresh

We can run commands from the shell

pip install numpy
import numpy
numpy.__version__

Jupyter Notebooks (IPython)

This is open source and originated from the IPython project

Running a Jupyter Server

In Anaconda you can create environments with the create command

conda create -n py27 python=2.7
conda activate py27
conda deactivate

With Anaconda installed we can run Jupyter notebooks using

jupyter notebook

Follow the prompt and open in the browser

Jupyter Shortcuts

We can run commands in the notebook !pip install xxxx !pwd !ls !pip install ipywidgets
We read a csv with this

import numpy as np
import pandas as pd
import matplotlib.pyplot as pit
import seaborn as sns
import ipywidgets as widgets
from ipywidgets import interact, interact_manual

iris = pd.read_csv('IRIS.csv')

iris.head()

Pandas

Really like the simplicity of using python and pandas to look at data very simple but good examples to get me going

Concatenation

Here is concatenation

df1_dummy = {
    "serial_id" : ["1", "2", "3", "4", "5"],
    "sale_month" : ["Jan", "Feb", "Mar", "Apr", "May"],
    "sales" : ["12300", "25100", "17800", "20100", "21000"]
}

df1 = pd.DataFrame(df1_dummy, columns = ["serial_id", "sale_month", "sales"])

df1

Merging

df1 = pd.DataFrame({"product" : ["Prod_1", "Prod_2", "Prod_3", "Prod_4"], 
                   "division": ["Div_A", "Div_B", "Div_C", "Div_B"],
                   })

df6 = pd.DataFrame({"division" : ["Div_A", "Div_A", "Div_B", "Div_C", "Div_C", "Div_C"],
                   "emp_grade" : ["13", "14+", "12", "11", "10", "9-"]})

df7 = pd.merge(df1, df6)

Python Data Analysis

Contents

Introduction

Package Managers

Pip

Conda

Python Refresh

Jupyter Notebooks (IPython)

Running a Jupyter Server

Jupyter Shortcuts

Pandas

Concatenation

Merging

Navigation menu

Python Data Analysis

Introduction

Package Managers

Pip

Conda

Python Refresh

Jupyter Notebooks (IPython)

Running a Jupyter Server

Jupyter Shortcuts

Pandas

Concatenation

Merging

Navigation menu

Search