6 points, SCA Band 2, 0.125 EFTSL
Postgraduate - Unit
Refer to the specific census and withdrawal dates for the semester(s) in which this unit is offered.
(or ) and
A working knowledge of Java and Python.
Monash Online offerings are only available to students enrolled in the Graduate Diploma in Data ScienceGraduate Diploma in Data Science (http://online.monash.edu/course/graduate-diploma-data-science/?Access_Code=MON-GDDS-SEO2&utm_source=seo2&utm_medium=referral&utm_campaign=MON-GDDS-SEO2) via Monash Online.
This unit teaches about working with different kinds of data, documents, graphs, spatial data. Distributed processing is introduced using Hadoop and Spark technologies, including streaming, graph processing and using NoSQL. Programming assignments are generally done in Spark, Linux, and similar shell-like environments.
Upon successful completion of this unit, students should be able to:
- compare the use of data streaming methods such as sampling, sketching and hashing;
- apply spatial data methods such as nearest neighbour and search on trees;
- apply large scale graph, vector and document processing methods;
- apply distributed processing using the Hadoop and Spark technologies;
- evaluate the suitability of different distributed technologies for big data processing;
- explain the workings of a typical cloud computing environment.
In-semester assessment: 100%
Minimum total expected workload equals 144 hours per semester comprising:
- Contact hours for on-campus students:
- Two hours/week lectures
- Two hours/week laboratories
- Contact hours for Monash Online students:
- Two hours/week online group sessions.
Online students generally do not attend lecture, tutorial and laboratory sessions, however should plan to spend equivalent time working through resources and participating in discussions.
- Additional requirements (all students):
- A minimum of 8 hours per week of personal study (22 hours per week for Monash online students) for completing lab/tutorial activities, assignments, private study and revision, and for online students, participating in discussions.
See also Unit timetable information