Program

2018 course program

8:30AM to 5:15PM each day

Module #1

Introduction to Data Science

  • Origins of Data Science and a brief history of the Big Data revolution.
  • The Big Data landscape.
  • How much data is there really, and does it matter?
  • Un-siloing data: use paradigms for organisational data and public data.
  • Descriptive, predictive and prescriptive analysis.
  • From recommendations to insights: black-box and white-box analytics.

Module #2

Data as an Asset

  • The V's of Big Data: Volume, Velocity, Variability, Veracity.
  • Data business strategies.
  • Data sources, synergies and differentiators.

Module #3

Data Life Cycle

  • The analytics value chain.
  • Overview of the data analysis cycle, connecting Data Science to the business problems.
  • Work cycle of a data scientist: wrangling, modelling and validation.
  • Managing research.

Module #4

Privacy and Ethics in Big Data

  • Defining Big Data, Data Ethics and Privacy
  • Reframing the conversation - 5 provocations
  • Data Ethics and the Challenges of Big Data
  • A Brief History of Privacy
  • Privacy and Dignity

Module #5

Data Wrangling and Exploratory Analysis

  • Determining data quality. Data cleansing.
  • Entity matching.
  • Imputation.
  • Background modelling.
  • Exploratory analysis.

Module #6

Fundamentals of Statistics

  • Types of data: numerical, categorical, ordinal.
  • Statistical summaries: mean, standard deviation, quantities, correlation.
  • Simple data visualisation: histograms, boxplots, time plots and scatterplots.
  • Cross-tabulations.
  • Causality vs. association, independence.
  • Randomisation and random sampling.
  • Statistical inference using bootstrapping.

Module #7

Model Creation and Validation

  • Prediction: linear regression, nonparametric regression, k-NN.
  • Forecasting: auto.arima and Error-Trend-Seasonal exponential smoothing algorithms.
  • Hold-out sets, cross-validation, AIC.
  • Classification: logistic regression, classification trees, SVM.
  • Clustering: k-means, hierarchical clustering.
  • Supervised vs. unsupervised vs. semi-supervised learning.
  • Dimension reduction: principal components.
  • Languages and environments (e.g. R, Python, MATLAB or even Excel) and standards (PMML).

Module #8

Visualisation

  • Practical and effective visualisation: beyond bar charts.
  • Finding the unexpected: the role of visualisation in exploratory analysis.
  • Communicating findings: the role of visualisation in communicating Data Science outputs.
  • Standard tools: R, Tableau, D3.

Module #9

Data Engineering for Analysis

  • Data Science engineering and its drivers for change.
  • Data volumes, data structures, and how they vary.
  • Data Science architectures: the common stages.
  • The Usual suspects: Distributed File Systems, Map Reduce, Spark

Module #10

Operationalisation and the Model Life Cycle

  • Determining the needs: on how much data must decisions be taken, how often and how quickly must they be made, how often must models be refreshed?
  • Plugging into existing data paths and choosing appropriate technologies.
  • Stale models and  model refreshing.
  • Operationalisation from a business perspective: determining value and making Data Science outputs part of standard business and decision-making processes.

Module #11

Panel: Building a Data - Driven Enterprise

  • Data Science as a process, rather than as a point event.
  • The role of high-level management in enabling data-driven decisions.
  • The role of direct management: on the un-Gantt-ability of research.

Module #12

Case Study

  • Operational efficiency by predictive analytics
  • Architectural choices for integration and efficacy

Day 1

8:30 am - 9:00 am Tea & coffee on arrival
9:00 am - 10:45 am Module 1: "Introduction to Data Science" - Michael Brand
10:45 am - 11:00 am Morning tea
11:00 am - 12:45 am Module 2: "Data as an Asset" - Michael Brand
12:45 pm - 1:30 pm Lunch
1:30 pm - 3:15 pm Module 3: "Data Life Cycle" - Michael Brand
3:15 pm - 3:30 pm Afternoon tea
3:30 pm - 5:15 pm Module 4: "Privacy and Ethics in Big Data" - James Horton

Day 2

8:30 am - 9:00 am Tea & coffee on arrival
9:00 am - 10:45 am Module 5: "Data Wrangling and Exploratory Analysis" - Di Cook
10:45 am - 11:00 am Morning tea
11:00 am - 12:45 am Module 6: "Fundamental of Statistics" - Rob Hyndman
12:45 pm - 1:30 pm Lunch
1:30 pm - 3:15 pm Module 7: "Model Creation and Validation" - Rob Hyndman
3:15 pm - 3:30 pm Afternoon tea
3:30 pm - 5:15 pm Module 8: "Visualisation" - Kim Marriot

Day 3

8:30 am - 9:00 am Tea & coffee on arrival
9:00 am - 10:45 am Module 9: "Data Engineering for Analysis" - Mark Stammers
10:45 am - 11:00 am Morning tea
11:00 am - 12:45 am Module 10: "Operationalisation and the Model Life Cycle" -
Dickson Lukose
12:45 pm - 1:30 pm Lunch
1:30 pm - 3:15 pm Module 11: "Building a Data-Driven Enterprise" Panel
Facilitator: Geoff Webb
Panel: (Dennis Claridge, James Horton,
Stuart Growse, Salim Naim and/or Jin Yu)
3:15 pm - 3:30 pm Afternoon tea
3:30 pm - 5:15 pm Module 12: "Case Study" - Salim Naim and/or Jin Yu

No content

No content

Note: Program order and speakers may change slightly.