# FIT honours projects listing

Digital repositories for Art Archives (18 or 24 pts)
Supervisors: Tom Denison, Gillian Oliver

How can a living archive be created that not only documents the Monash University Prato Centre (MUPC) Visual Residency Program but also engages artists-in-residence in a direct and active way as part of their residency experience in Prato? In collaboration with Monash University Museum of Art and Monash University Prato Centre, this project aims to help realise the creative possibilities inherent in cross-fertilizing artistic processes with archival theory. It will establish a prototype archival platform that complements contemporary art practices and supports the ability to record, relate and re-vision ephemera and events associated with transformative, creative experience as artefacts themselves.

Aim and Outline:
To develop a prototype digital repository suitable for use by artists participating in the Monash University Prato Artists in Residence Program.

Pre- and Co-requisite Knowledge:
Familiarity with digital repositories and/or open source software and/or archival principles.

Approximation algorithms in the Branch-and-Bound algorithm (18 or 24 pts)
Supervisors: Pierre Le Bodic

The Branch-and-Bound algorithm solves optimisation problems by recursively dividing and exploring the space of solutions, using bounds computed on each of them to prune entire subspaces of solutions. The bounds are often computed using a linear program, and the efficiency of the whole procedure depends on the quality of these bounds and the efficiency with which they can be computed.

Aim and Outline:
In this project you will analyse theoretically and experimentally the benefits of using approximation algorithms (i.e. heuristics with a performance guarantee) to compute the bound, compared to a bound computed using linear programming.

Pre- and Co-requisite Knowledge:
A strong background in computational complexity is necessary. Knowledge of integer programming is a plus.

Automated Warehouse Optimisation (18 or 24 pts)
Supervisors: Daniel Harabor and Pierre Le Bodic

Warehouses are becoming increasingly automated and optimised. A great example is Amazon fulfilment centres (see https://www.youtube.com/watch?v=tMpsMt7ETi8 ). Many computer science problems, ranging from pathfinding to scheduling and facility layout, need to be solved to design efficient warehouses and their systems. These individual problems are not all well formalised and solved yet, and contributions in these directions are bound to have a high scientific and societal impact.

Aim and Outline:
The aim of this project is to formalise one of the problems related to warehouse automation, design methods to solve the problem, and run experiments to assess their performance.

URLs and References:

Pre- and Co-requisite Knowledge:
Strong general background in CS, both in theory and practice, and interest in pathfinding and/or optimisation.

A Child Protection Recordkeeping App for Parents and Family Members (24 pts)
Supervisors:Associate Professor Joanne Evans and Dr Greg Rolan

Within the faculty’s Centre for Organisational and Community Informatics, the Archives and the Rights of the Child Research Program is investigating ways to re-imagine recordkeeping systems in support of responsive and accountable child-centred and family focused out-of-home care. Progressive child protection practice recognises the need, where possible, to support and strengthen parental engagement in the system in order to ensure the best interests of the child.

‘No single strategy is of itself effective in protecting children. However, the most important factor contributing to success is the quality of the relationship between the child’s family and the responsible professional’ (Dartington, 1995 quoted in Qld Department of Communities, Child Safety and Disability Services 2013).

Child protection and court processes generate a mountain of documentation that can be overwhelming and confusing to navigate, hard to manage and keep track of, especially if parents are also dealing with health and behavioural issues. Being on top of the paperwork handed out by workers, providing the documentation the system demands in a timely fashion and ensuring that records are created to document interactions, etc. could be one way in which child protection outcomes could be improved.

Aim and Outline:
In this exploratory project, we would like to investigate how digital and networked information technologies could be used to support the recordkeeping needs of parents in child protection cases. It will involve the use of a design science approach to develop a model the information architecture of a recordkeeping system for parents. This may entail the creation of a prototype utilising existing and/or new open source components as a demonstrator for further research and development.

Challenges include investigating and dealing with the digital, recordkeeping, and other literacies of families involved in child protection.  The other challenge is that there will not be time to form the deep, trusted relationships that are required to do this in a truly participatory manner.  The project will rely on secondary sources such as literature and subject matter experts --- rather than interacting with parents and families directly.

URLs and References:

• Assistant Director Child Protection. (2017). Child Protection Manual. Retrieved February 8, 2018, from http://www.cpmanual.vic.gov.au/
• Burstein, F. (2002). System development in information systems research. In K. Williamson (Ed.), Research Methods for Students and Professionals: Information Management and Systems (pp. 147–158). Wagga Wagga, N.S.W.: Centre for Information Studies, Charles Sturt University.
• Gurstein, M. (2003). Effective use: A community informatics strategy beyond the Digital Divide. First Monday, 8(12). Retrieved from http://firstmonday.org/ojs/index.php/fm/article/view/1107
• Hinton, T. (2013). Parents in the child protection system. Social Action and Research Centre, Anglicare Tasmania. Retrieved from https://www.socialactionresearchcentre.org.au/wp-content/uploads/Parents-in-the-child-protection-system.pdf
• Hersberger, J. A. (2013). Are the economically poor information poor? Does the digital divide affect the homeless and access to information? Presented at the Proceedings of the Annual Conference of CAIS/Actes du congrès annuel de l’ACSI.
• Western Suburbs Legal Service. (2008). Child protection : a guide for parents and family members. Newport, Vic.: Western Suburbs Legal Service.

Pre- and Co-requisite Knowledge:
The ideal candidate will have a background in one or more of software development, data analytics, and recordkeeping metadata modelling, with a keen desire to expand their knowledge and skills into the other areas encompassed by this research project. They will have.

This is not so much a technical projects as one that engages with the societal and community needs of the target audience. It would suit someone from an MBIS background with an interest in community informatics, recordkeeping metadata modelling and/or value sensitive research and design, coupled with a keen desire to expand their existing knowledge and skills into the other areas encompassed but this research project.

Text Analysis of the Royal Commission Report (24 pts)
Supervisors: Dr Gregory Rolan and TBA

Within the faculty’s Centre for Organisational and Community Informatics, the ‘Archives and the Rights of the Child’ Research Program is investigating ways to re-imagine recordkeeping systems in support of responsive and accountable child-centred and family focused out-of-home care.
In December 2017, an Australian Royal Commission presented a final report to the Governor-General, detailing the culmination of a five-year inquiry into institutional responses to child sexual abuse and related matters.  This report runs to tens of thousands of pages in 17 volumes and includes 409 recommendations. While volume 8, Recordkeeping and information sharing, examines the records, recordkeeping and information sharing of institutions that care for or provide services to children, there are many mentions of records and recordkeeping (and related concepts such as: files, registers, correspondence, evidence, memory, oversight, accountability etc.) throughout the whole report.

Aim and Outline:
This project concerns the use of machine learning techniques to analyse a complex corpus of text and identify connections between textual elements.
We are interested in identifying these references and linking them to specific issues and recommendations located in volume 8. This is, essentially, a classification project.  It will involve determining an appropriate representation of the text and selection of a learner to analyse the text. We will provide marked-up sections of the report as training data.  The output should be a cross reference and, ideally, some sort of simple visualisation of the result.
This is an early foray into using machine-learning techniques for recordkeeping analytics and the results of this analysis will provide strategic input into our recordkeeping system design.

URLs and References:
Leavy, S., Pine, E., & Keane, M. (2017). Mining the Cultural Memory of Irish Industrial Schools Using Word Embedding and Text Classification. In DH2017. Montreal, Canada. Retrieved from https://dh2017.adho.org/abstracts/098/098.pdf
Royal Commission into Institutional Responses to Child Sexual Abuse. (2017, May 30). Final report [Text]. Retrieved February 9, 2018, from https://www.childabuseroyalcommission.gov.au/final-report

Pre- and Co-requisite Knowledge:
This project would suit someone with a Data science or NLP background.

Immersive visualisation of thunderstorms (18 or 24 pts)
Supervisors: Bernie Jenny (FIT) and Christian Jakob (Atmospheric Sciences)

Thunderstorms in the tropics are the heat engine of Earth’s Climate. A major observing system for thunderstorms are weather radars. Apart from using them to predict if we get rained on at our barbecue in a few hours, radars are great research tools. However, they produce large volumes of data that are often visualised in only very simple ways.

Aims and Outline:
In this project we will explore advanced 3-d visualisations of radar data in the Darwin region. Using 3-d volume measurements of rainfall-related quantities every 10 minutes over a 100 km x 100 km area we will explore visualisation options that will help weather and climate scientists to better understand how thunderstorms evolve over their lifetime. This is a collaborative project of the Immersive Analytics group of the Faculty of Information Technology and the Atmospheric Science group of the Monash School of Earth, Atmosphere and Environment.

Pre- and Co-requisite Knowledge:
Interest in computer graphics/OpenGL.

Augmented reality table-top GIS for the HoloLens (18 or 24 pts)
Supervisor: Bernie Jenny

Aims and Outline:
You contribute to building a AR geographic information system for any table top (or other horizontal surface). The system uses augmented reality headsets, such as the Microsoft HoloLens. You can make a contribution in the following areas: invent and evaluate user interactions for creating maps, including ways to adjust their size, scale, and orientation; design gestures and voice commands for adjusting map content and styling; combine 3D maps with diagrams placed on the maps or on the table; enable collaborative features for multiple simultaneous users.

Pre- and Co-requisite Knowledge:
Interest in Unity game engine.

Branching flow maps (18 or 24 pts)
Supervisor: Bernie Jenny

Recent research has resulted in the automated creation of origin-destination flow maps with flow lines that do not merge or branch. While there exist experimental algorithms for merging and branching flow maps, the visual appearance and the automated selection of merged flows does not result in automated maps that meet the standards of manually produced flow maps.

Aims and Outline:
The goal of this project is to (1) identify design principles applied in manual cartography for creating merging and branching lines on flow maps, (2) review existing algorithms for bundling and clustering origin-destination flows, and (3) propose and implement a method for the automated creation of merging and branching flows.

URLs and References:
http://doi.org/10.1080/17445647.2017.1313788
http://usmigrationflowmapper.com/
https://www.researchgate.net/publication/314281925_Force-directed_Layout_of_Origin-destination_Flow_Maps
https://www.researchgate.net/publication/311588861_Design_principles_for_origin-destination_flow_maps

Pre- and Co-requisite Knowledge:

Interest in flow maps and map design, programming langue.

Turning process mining results into recommendation: An optimization approach based on user goals and preferences (24 pts)
Supervisors: Professor Dragan Gasevic and TBA

Data science offers a wide range of methods that can be used to detect patterns in data. Process mining is a data science discipline that aims to identify temporal patterns in data. Process mining has been extensively used in many application domains such as business process management, health care, and education.  Much helpful insight includes understanding of the common execution pathways, conformity to the reference processes, and process optimization. However, the current research on process mining has much less focused on optimizing the discovered processes to turn them into actionable recommendations that can optimally meet users’ preferences, goals, and expectations.

Aim and Outline:
This project will address the above gap in the literature by aiming to support the optimization of the results of process mining to turn them into actionable recommendations. The project will specifically aim to
select an existing process mining algorithm that is best suited for recommendations of process pathways based on users’ goals and preferences;
suggest a representation for the discovered processes to enable their optimization based on users’ goals and preferences;
propose an optimization algorithm (e.g., constraint programming, integer programming, genetic algorithms, or automated planning); and
evaluate empirically the proposed algorithm on a dataset that will be provided from the educational domain and that will include several years of a university-wide data in course and unit enrolments.

URLs and References:
van der Aalst, W. M. (2016). Process mining: data science in action. Springer.
Trčka, N., Pechenizkiy, M., & van der Aalst, W. (2010). Process mining from educational data. In Romero, C., Ventura, S., Pechenizkiy, M., Baker, R. S.J.d. (Eds.) Handbook of educational data mining, 123-142.
Beek, P., & Walsh, T. (2006). Handbook of Constraint Programming: Foundations of Artificial Intelligence. New York, NY, USA: Elsevier Science Inc.
Wolsey, L. A. Integer programming. New York, NY, USA: Wiley-Interscience, 1998

Pre- and Co-requisite Knowledge:
Strong experience in data mining and programming (Python and R ideally) is expected. Experience in optimization and/or automated planning is desirable.

Robustness of Australian gas and electricity networks (18 or 24 pts)
Supervisors: Kaveh Rajab Khalilpour, David Green

Cascading failure of infrastructure networks incurs substantial economic and social consequences. There are intensive research activities on improving infrastructure networks reliability. The robustness of the Australian national energy network has been also questioned in several occasions such as the Basslink failure in 2015 and South Australia blackout in 2016.

Aim and Outline:
The aim of this project is to develop a methodology for assessing the reliability of networks with consideration of dependent failures. The next step would be to study the Australian electricity and gas networks topology and identify the critical nodes which make these networks susceptible to failure. The outcome will be a practical proposal to improve these networks’ resilience to disturbance (physical failure or cyber attack).

Pre- and Co-requisite Knowledge:
Network theory

Forecast of energy demand with micro- and macro-economic parameters (18 or 24 pts)
Supervisors: Kaveh Rajab Khalilpour, Manos Varvarigos, Lachlan Andrew

Often demand forecasting techniques are developed for short-term scheduling purposes with consideration of micro-economic inputs. However, in medium- and long-term planning macroeconomic parameters also play important roles.

Aim and Outline:
The goal of this project is to develop efficient probabilistic forecasting algorithms for medium- and short-term energy demand projection with consideration of macroeconomic parameters and mixed-frequency data. Students with prior knowledge (or motivation to master) in statistical analysis are suggested for this project.

Pre- and Co-requisite Knowledge:
Data analysis

Modelling of energy systems with PV, stationary battery, and electric vehicle (18 or 24 pts)
Supervisors:  Kaveh Rajab Khalilpour, Pierre Le Bodic

We have just passed a tipping point in PV uptake in Australia and some other places around the world. The next energy revolution is expected to be battery energy storage. Germany has just passed a law that from 2030, all new cars are mandated be electric.

Aim and Outline:
Imagine your family in 2030, when your house has a rooftop PV, and a stationary battery to store surplus PV generation. You also have an electric car that you could either charge at home, work, or e-stations (petrol stations of tomorrow). At night, you might connect your stationary or car battery to supply your house’s energy demand! The objective of this project is to develop an optimization scheduling program for energy management (minimum bill) of your future house.

URLs and References:
http://www.springer.com/gp/book/9789812876515

Pre- and Co-requisite Knowledge:
mixed-integer programming and Simulink (can learn during the thesis)

Multi-attribute decision making approaches for evaluation of energy storage technologies (18 or 24 pts)
Supervisors: Kaveh Rajab Khalilpour, Aldeida Aleti

We have just passed a tipping point in PV uptake in Australia and some other places around the world. The next energy revolution is expected to be energy storage. There are several energy storage technologies with various features. This makes the technology selection process complex.

Aim and Outline:
You are thinking of buying an energy storage system to store your surplus PV generation to use later (rather than selling to the grid at a low price). There are several energy storage products in the market, with various features (lifetime, cost, charge time, depth of discharge, energy throughput, weight, volume, etc.) which makes the decision-making complex. The objective of this research is to utilize Multi-Attribute Decision Making approaches for evaluation of energy storage systems.

Pre- and Co-requisite Knowledge:
Background or interest in learning (behavioural) decision analysis

Mathematical modelling of stress management mechanism by human neural systems (18 or 24 pts)
Supervisors: Kaveh Rajab Khalilpour, David Green

The Hypothalamic-pituitary-adrenal glands axis (HPA axis) plays a major regulatory role in our physiological as psychological processes. The hypothalamus secrets corticotropin-releasing hormone (CRH). This is transferred to the pituitary and stimulates the release of adrenocorticotropic hormone (ACTH). Then, ACTH travels through the bloodstream and reaches the adrenal gland in which it stimulates the secretion cortisol. When we are stressed, the concentrations of the HPA axis hormones are elevated. The high cortisol levels direct the distribution of energy to different organs that underlie the stress response. There are also feedback loops for life-sustaining during a chronic stress.

Aim and Outline:
It is known today that HPA axis regulates many body processes, including digestion, the immune system, mood and emotions, sexuality, and energy storage and expenditure.  The objective of this project is to develop a mathematical model of HPA axis in Matlab and simulate a feed-back and feed-forward control mechanism.

Pre- and Co-requisite Knowledge:
Experience or interest in learning mathematical modeling (differential equations, and optimization) is required. Background in Biology is not necessary.

A macroeconomic life cycle model for a sustainable Australia (18 or 24 pts)
Supervisors: Kaveh Rajab Khalilpour, Andrew Hoadley

For mitigating climate change and global warming, smart macroeconomic policies are required, followed by microeconomic rules and regulations. Development of effective environmental policies requires a detailed knowledge international trade.

Aim and Outline:
The goal of this study is to use the world input–output tables and through life cycle assessment identify the key Australian industries which should be regulated first to reach a given environmental target, while achieving the demand satisfaction.
Interest or background in macroeconomics, life-cycle analysis, and mixed-integer optimization is required.

Pre- and Co-requisite Knowledge:
linear optimisation

Equation-based modeling of natural gas liquefaction system (18 or 24 pts)
Supervisors: Kaveh Rajab Khalilpour, Andrew Hoadley

Natural gas despite being a cleaner fossil fuel suffers from low volume intensity which makes its long-distance transportation costly. Liquefaction is proven to be a feasible option for the international trade of this fuel. Over the last few decades, there have been extensive studies to develop efficient processes for natural gas liquefaction. Some commercial technologies include PRICO single mixed refrigerant (SMR), ConocoPhillips optimized cascade, Airproduct propane precooled mixed refrigerant (C3MR), Shell dual mixed refrigerant (DMR), Statoil-Linde mixed fluid cascade (MFC), and AP-X hybrid processes. However, the dominant technology in the market is C3MR. There are several software packages for modelling and simulation of these processes. However, such tools reduce the flexibility of users to integrate the simulation with rigorous optimisation programs.

Aim and Outline:
The goal of this study is to develop an equation-based modelling program for natural gas liquefaction with C3MR process. It will include modelling compressors, turbines, and various types of heat exchangers.

Pre- and Co-requisite Knowledge:
linear optimisation; thermodynamics; Refrigeration systems.

Simulation and operation of PV-battery systems integrated with a desalination unit (18 or 24 pts)
Supervisors: Kaveh Rajab Khalilpour, John Betts; Manos Varvarigos

We have just passed a tipping point in PV uptake in Australia and some other places around the world. The next energy revolution is expected to be battery energy storage. The operation of integrated PV-battery systems is an interesting optimization problem. Now, think of a rural community which decides to install a PV-battery system not only for local power supply, but also for fresh water production from reverse osmosis desalination system.

Aim and Outline:
The goal of this study is to develop a mathematical program for optimal design and operation of a desalination system integrated with a PV and battery to supply both electricity and water demands.

Pre- and Co-requisite Knowledge:
Mathematical programming; Linear and nonlinear optimisation
Interest or background in Matlab Simulink; Python; Optimisation programming; Raspberry Pi

Virtualised cooperative energy market (18 or 24 pts)
Supervisors: Kaveh Rajab Khalilpour, Manos Varvarigos

With the widespread emergence of microgrids, virtualized smart energy networks are being developed and a new energy Services Company (ESCO) business models are evolving with the role of power aggregator. These businesses could operate centralised form or allow for the creation of virtual smart energy network where very small, medium and major energy prosumers could be able to sell/buy energy amongst themselves as well as with the aid of centralized aggregators.

Aim and Outline:
The goal of this study is to design advanced cooperative market models with a focus on scheduling policies (optimal sell/buy/store decisions) for cooperation of multiple microgrids in larger coalitions. Also, part of the study will be to investigate and identify the role of various generation (PV, wind, genset, etc.) and storage (battery, hydro, hydrogen, etc.) systems in virtual markets.

URLs and References:
http://www.springer.com/gp/book/9789812876515

Pre- and Co-requisite Knowledge:
Mathematical programming; Linear and nonlinear optimisation

Woodside - Equipment Allocation in a Chemical Process Plant (24 pts)
Supervisors: Prof Maria Garcia de la Banda (FIT), Dr Gleb Belov (FIT), Dr Ilankaikone Senthooran (FIT).

We have an exciting project with Woodside Energy Ltd that involves optimising the layout of a Liquefied Natural Gas (LNG) production plant. The design of the geometric layout of a chemical plant involves assigning the location of processing equipment and connecting elements, such as pipes and support structures. This involves many constraints and objectives and is currently done manually in practice [1,2]. The objectives are connected, e.g., to costs, while some of the constraints are:
- Relative positions due to process & safety requirements
- Maintenance access

Aim and Outline:
The task is to investigate algorithmic variants of robust accelerated solving, starting with the ideas of [3]. In particular, the project will explore approaches to object aggregation and disaggregation, local search, and feedback between equipment allocation and pipe/path routing. This will require the application of various optimization methods, from heuristics to mathematical programming. The software implementation involves the extension and collaborative modification of existing C++ code.

References:
[1] G. Belov et al. An optimization model of 3D pipe routing with flexibility constraints. 2017
[2] Guirardello & Swaney. Optimization of process plant layout with pipe routing. 2005
[3] G. Xu and L. G. Papageorgiou. Process plant layout using an improvement-type algorithm. Chemical Engineering Research and Design, 87(6):780–788, 2009

Pre- and co-requisite knowledge:
good programming in C++, including classes and STL. Experience in mathematical optimization is helpful.

Plate Image Analysis to Promote Healthy Eating (24 pts)
Supervisors: David Squire, David Cordover (external partner)

Foost is a health promotion business. Their mission is to get Australians to eat more fruits and vegetables. This is perhaps the biggest and easiest way to improve health outcomes for all Australians.

By sharing simple strategies, delicious recipes and fun activities, they aim to create a healthier
and more relaxed food culture in the home, school, workplace or community.

Their simple message is "Eat Colourful".

Aim and Outline:
The ultimate aim of this project is to build an app that allows users to track and gamify their food consumption and encourages positive behaviour change. The basic idea is to support the following functionality:

User takes a photo of their plate
App counts how many "colours" on the plate
App tracks progress and gamifies experiences to encourage desired behaviour (eating more colours)
User can share results, photos etc. on social media
Some form of collaborative 'challenge' or comparisons

The initial research questions addressed in this project are related to step 2. This app will first require a means of recognising the "plate" in the image taken by the phone. Then image segmentation of the plate into regions of sufficiently uniform colour and texture will be required. Multiple questions will need to be answered: How small can the regions be? How uniform? Are there existing image segmentation techniques that can be applied or adapted to solve this problem.

Further research could investigate which gamification strategies were most effective in promoting the desired behavioural change.

URLs and References:
https://en.wikipedia.org/wiki/Image_segmentation

Pre- and Co-requisite Knowledge:
It would be an advantage if the student has previously studied units in image/video processing. Knowledge of machine learning/AI could also be useful.

An extensible framework for automated solving of cryptic crosswords using machine learning, natural language, and statistical techniques (24pts)
Supervisors: David Squire, Robyn McNamara

Cryptic crosswords are commonly found in newspapers all around the globe, from the British Guardian to our very own The Age and the Herald Sun. However, for a human, one of the challenges in learning such crosswords is the learning curve involved, as well as the inside knowledge required in deciphering (or parsing) a clue.
Currently, there is a scarcity of computer-based approaches to parse cryptic crossword clues, let alone solve entire puzzles! A few such papers were written decades ago, such as Williams & Woodhead (1979) and Smith & du Boulay (1986). Commercial solvers such as William Tunstall-Pedoe's Crossword Maestro do exist; however the algorithms used in such solvers are proprietary.

Aim and Outline:
Realising this niche, this project aims to create an extensible framework for the automated solving of cryptic crossword clues (and by extension, an entire cryptic crossword grid). This framework should ideally be plugin-based, to allow for extensibility in e.g. handling new clue types. The proposed solution could use existing sources of semantic relations between words (e.g. the Natural Language ToolKit (NLTK), or WordNet), as well as letter or ngram-based statistics.

A start on this was made last year, when an honours student implemented a system that used Gibbs Sampling and Markov Random Fields to provide probabilistic soft constraints that enabled crosswords constructed of clues of the anagram type to be solved much more efficiently than by greedy search.

This project would look to build on that work by fitting it into a properly engineered extensible framework that would allow other clue types to be plugged-in as and when solvers are created. The candidate would also research and develop at least one more solver type for use in this framework – perhaps one for the “embedded” clue type.

URLs and References:

Pre- and Co-requisite Knowledge:
It would be an advantage if the student has previously studied statistics, natural language processing, and machine learning/AI. Knowledge (or willingness to learn) how cryptic crosswords work is a must.

Chess Video Analysis for Move Recognition  (18 or 24 pts)
Supervisors: David Squire, David Cordover (external partner)

Chess has been a popular game around the world for many hundreds of years. Chess enthusiasts have taken a very scientific approach to the game with the creation of "theory" about opening moves, endgames and strategy. This strategic knowledge has been possible to accumulate only because the games have been recorded then analysed.

Today, every competition chess game must be recorded by the players. However the recording is done manually with pen and paper and much of the data collected is lost, inaccurate or incomplete. The time required to digitise the manually recorded games is significant, if and when this happens.

Non-competitive players may be unfamiliar with the standard syntax for recording a chess game.

Replaying those manually recorded games is difficult. Storing, sorting and searching is obviously unrealistic.

Aim and Outline:
The ultimate aim of this project is to create a SmartPhone App (Apple first priority, Android second priority) that can use video to "watch" a game of chess and record the moves in standard PGN (Portable Game Notation) format. This could then be fed directly into an online database storage, editing, sharing and display facility (the partner for this project, Chess World, has created one called Chess Microbase), or can be exported or emailed.

This task will require the application of computer vision and artificial intelligence techniques to recognise the state of the board, detect and recognise moves (and check for their consistency and legality).

URLs and References:

Pre- and Co-requisite Knowledge:
It would be an advantage if the student has previously studied units in image/video processing, and machine learning/AI.

Applying Lean to Distribution (24 pts)
Supervisors:
Yen Cheung, Vincent Lee, Rabi Gunaratnam (Timstock Ltd)

Pioneered by Toyota, lean manufacturing has been applied in many companies since the 1990s with the aim of improving business performance by reducing waste. Although the idea of lean processes started in Toyota in the 1940s, this concept only became popular in the 1990s in response to the worldwide recession at the time. Today, lean processing or lean ‘thinking' is applied not only in the automotive industry but other industries as well. Since the recent global financial crisis, companies are seeking to be leaner organisations to remain competitive.

Aim and Outline
This research project involves applying lean ‘thinking' to a distribution centre that plans and delivers goods to customers located mainly in Victoria. Currently the company relies on a combination of both manual and software enabled processes for planning and delivery of their products, which are prone to errors and waste.

The expected outcomes of this project are:

* A proposal to the company for improving the current business processes;
* Implementation of the business improvement plan;
* Evaluation of the business improvement plan.

URLs and References
"The benefits of Lean Manufaturing: what lean thinking has to offer process industries", Melton T, Chemical Engineering Research and Design, 83(A6), 662-673, June 2005.

http://mimesolutions.com/PDFs/WEB%20Trish%20Melton%20Lean%20Manufacturing%20July%202005.pdf

Pre- and Co-requisite Knowledge
Students who have achieved at least an overall of D or higher in their prior degree and interest in applying business process improvements are encouraged to apply. There is a scholarship attached to this project provided by Timstock Ltd.

ANZ Project: Customer Experience of the future - Unassisted Channel (for two students) (24 pts)
Supervisors:
David Taniar, Vincent Lee, Colin Dinn (ANZ) and Tim Liddelow (ANZ/SAP Team Partner)

As ANZ expands across the Asia Pacific region, we seek to provide a seamless and consistent customer experience, and minimise duplication of effort, by leveraging common capabilities.

In seeking to build enterprise capability for our digital channels, we must balance conflicting needs:

• Culture: Whilst maintaining a consistent brand and experience across markets, we need to meet local language, cultural and customer behaviour expectations.
• Regulations: We must comply with global and local regulations in each market in which we operate.
• Maturity: In developed markets, customers expect a high degree of functionality and differentiation (and are prepared to pay for a premium product and experience). In developing markets, customers are looking for more basic products and services at reasonable cost.
• Scale: In markets where we have a significant presence, we achieve economies of scale. In smaller markets, we need to scale our operations down to achieve profitability at low revenue levels.

ANZ banks therefore seek to build a capability that can be leveraged across the region in an economically and technically sustainable way.

Aim and Outline
Define the digital Customer Experience of the future, and build a working prototype on SAP Sybase software.
We want to consider innovative ways of (i) creating controls within a digital development environment, (ii) defining the regional asset as a level of business standardisation, (iii) defining the controls for the regional asset, (iv) defining how we would manage the localisation to ensure integrity in the regional asset. Benefits include:

• Significant exposure to the Asia context of technology, banking, and commerce.
• Potentially part of the engagement will be in Singapore at ANZ cost.

Expected outcome: Working Prototype on SAP Sybase software

Pre- and Co-requisite Knowledge
Digital exposure and customer experience techniques, possibly customer centred design. Development skills, especially hybrid.

The managerial use of mobile business intelligence (18 pts)
Supervisors:

Both mobile technology and business intelligence (BI) have been rated by the Gartner Global CIO Survey as top technology priorities for organisations in the last few years. Although many managers have incorporated mobile devices into their work routine for decision-making tasks, little research has been conducted to explore how managers use BI on mobile devices.

Aim and Outline
This project aims to investigate how managers use mobile BI for their decision-making tasks. It will be exploratory in nature and can be investigated through different theoretical lenses (including but not limited to task-technology fit, unified theory of acceptance and use of technology).

The project can be undertaken by multiple students with each student using a different method, for example, one student could use a survey method, another a lab-based approach, and another a case study.

Students working on this project will be required to be part of the DSS Lab. This includes attendance and participation in weekly seminars.

Pre- and Co-requisite Knowledge
To tackle this project students need an undergraduate degree in IT (preferrably in business information systems) or be a student in the Master of Business Information Systems.

Can managers effectively use business analytics? (18 pts)
Supervisors:

Business analytics (BA) is currently the boom area of business intelligence (BI) - the use of IT to support management decision-making. BI is rated by industry analyst Gartner as the top technology for chief information officers worldwide. Most BI vendors are agressively marketing BA software, especially predictive analytics. These vendors assume that managers will understand the statistical techniques used by their software. Conversations with CIOs and other senior IT executives have indicated that managers in large organizations do not always have this knowledge.

Aim and Outline
The aim of the project is to investigate whether or not business managers have the requisite background knowledge to effectively use BA systems.

The project can use either a survey or a laboratory experiment. It may be possible for two students to tackle the project at the same time, each using a different research method.

Students working on this project will be required to be part of the DSS Lab. This includes attendance and participation in weekly seminars.

Pre- and Co-requisite Knowledge
To tackle this project students need an undergraduate degree in IT (preferrably in business information systems) or be a student in the Master of Business Information Systems.

Multitouch Interactive Parameter Search for a Visual Agent-based Model of Bee Foraging (24 pts)
Supervisors: Alan Dorin, Michael Wybrow, Nic Geard, Adrian Dyer

Agent-based models (ABMs) are simulations that operate from the “bottom up”. That is, they explicitly represent individual components (termed agents) that interact with one another and their environment to generate “emergent” phenomena at the group level. For instance, a population of individual birds (the agents) may be simulated and visualised to generate a flock and its associated behaviour (the emergent phenomenon). An agent’s behaviour at any moment during ABM simulation is dictated by its current state. State is often unique to each agent and may be influenced by that agent’s particular life history and perception of its local environmental conditions. Thus, ABMs maintain the basic principles that each individual in a population has unique behavioural and physiological qualities resulting from genetic and environmental influences; and that interactions between organisms are local – each individual is affected primarily by its local environment including by other organisms in its proximity [Huston et al 1988].

In this project you will design novel multitouch-based interactive tools for exploring complex parameter spaces typical of agent-based models. The focus will be on simple simulation models of bee foraging behaviour that you will need to develop during the course of the project. You will build the software to run on a new PQ-Labs 32-point, 42" multi-touch surface (http://multi-touch-screen.com/) and versions for use on a conventional desktop computer with keyboard and mouse. A number of designs will be trialled in order to determine the most effective means of representing a multi-parameter space visually for interaction. The aim is to allow users to explore the space of emergent outcomes generated by the simulation and outline the parameter regions that give rise to the most interesting phenomena. Simple user-testing will be conducted by the student and supervisors. (The focus of this project is not on this aspect of interface design.) The purpose of the interface tool for this project is to map out regions in which various bee foraging strategies outperform competing strategies.

This is an untested idea that will require considerable creativity on the part of the student. A love of interactive computer graphics and graphic design is an advantage! Experience coding in C++ is a necessity. It would be of considerable benefit if students engaged on this project enrolled in the honours unit FIT4012 Advanced Topics in Computational Science.

[1] Huston, M., D. DeAngelis, and W. Post, New Computer Models Unify Ecological Theory. BioScience, 1988. 38(10): p. 682-691.

[2] Dorin A., Korb K.B. & Grimm V., Artificial-Life Ecosystems: What are they and what could they become?, In Proceedings of the Eleventh International Conference on Artificial Life, S. Bullock, J. Noble, R. A. Watson, and M. A. Bedau (Eds.), MIT Press, Cambridge, MA. 2008, pp.173-180 [pdf paper]

[3] Grimm, V. and S. F. Railsback, Individual-based Modeling and Ecology, Princeton University Press, 2005

[4] Vries, H.de and J.C. Biesmeijer, Modelling collective foraging by means of individual behaviour rules in honey-bees. Behavioural Ecology and Sociobiology, 1998. 44: p. 109-124.

Simulation of Bee Foraging (18 or 24 pts)
Supervisors: Alan Dorin, Zoe Bukovac, Adrian Dyer (RMIT), Mani Shrestha

Bees forage for nectar and pollen from flowers to support their hives. In doing this, they pollinate our crops and support reproduction of plants in natural ecosystems. Globally, this resource is worth over $200 billion AUD to crop production every year. As our climate warms, it seems that pollinator and flower interactions may be changing. It is vital for us to understand how, so that we can manage the world's food supply and natural ecosystems. Together with an international team of ecologists, botanists and computer modellers, the participant in this project will contribute to an important global effort to understand the changing behaviour of bees under climate change. The student participant will write computer models to pick apart the factors that have the potential to contribute to the changing dynamics of insect/plant relationships. The technique to be applied is "agent-based modelling". Agent-based models (ABMs) are simulations that operate from the “bottom up”. That is, they explicitly represent individual components (termed agents) that interact with one another and their environment to generate “emergent” phenomena at the group level. For instance, a population of individual birds (the agents) may be simulated and visualised to generate a flock and its associated behaviour (the emergent phenomenon). An agent’s behaviour at any moment during ABM simulation is dictated by its current state. State is often unique to each agent and may be influenced by that agent’s particular life history and perception of its local environmental conditions. Thus, ABMs maintain the basic principles that each individual in a population has unique behavioural and physiological qualities resulting from genetic and environmental influences; and that interactions between organisms are local – each individual is affected primarily by its local environment including by other organis ms in its proximity [1]. What will be done: In this project you will design novel agent-based models for understanding bee/flower interactions. The focus will be on simple simulation models of bee foraging behaviour that you will need to develop during the project. This project is of global significance and will require considerable creativity and dedication on the part of the student. The benefit is that the project involves conducting real science of massive potential benefit. Requirements: Experience coding in Java or C++ is a necessity. It would be of considerable benefit if students engaged on this project enrolled in the honours unit FIT4012 Advanced Topics in Computational Science and took FIT4008 reading unit with Alan Dorin in semester 2, 2013. Reading: [1] Huston, M., D. DeAngelis, and W. Post, New Computer Models Unify Ecological Theory. BioScience, 1988. 38(10): p. 682-691. [2] Dorin A., Korb K.B. & Grimm V., Artificial-Life Ecosystems: What are they and what could they become?, In Proceedings of the Eleventh International Conference on Artificial Life, S. Bullock, J. Noble, R. A. Watson, and M. A. Bedau (Eds.), MIT Press, Cambridge, MA. 2008, pp.173-180 [pdf paper] [3] Grimm, V. and S. F. Railsback, Individual-based Modeling and Ecology, Princeton University Press, 2005 [4] Vries, H.de and J.C. Biesmeijer, Modelling collective foraging by means of individual behaviour rules in honey-bees. Behavioural Ecology and Sociobiology, 1998. 44: p. 109-124 [5] Dyer, A.G., Dorin, A., Reinhardt, V., Rosa, M., "Colour reverse learning and animal personalities: the advantage of behavioural diversity assessed with agent-based simulation", Nature Precedings pre-print, http://hdl.handle.net/10101/npre.2012.7037.1 (March 2012) Non-standard models of computation and universality (24 pts) Supervisor: David Dowe Zvonkin and Levin (1970) (and possibly earlier, Martin-Lo"f (1966)) consider the probability that a Universal Turing Machine (UTM), U, will halt given infinitely long random input (where each bit from the input string has a probability of 0.5 of being a 0 or a 1). Chaitin (1975) would later call this the halting probability, Omega, or Omega_U . Following an idea of C. S. Wallace's in private communication (Dowe 2008a, Dowe 2011a), Barmpalias & Dowe (to appear) consider the universality probability - namely, the probability that a UTM, U, will retain its universality. If some input x to U has a suffix y such that Uxy simlates a UTM, then U has not lost its universality after input x. Barmpalias, Levin (private communication) and Dowe (in a later simpler proof) have shown that the universality probability, P_U, satisfies 0 < P_U < 1 for all UTMs U and that the set of universality probabilities is dense in the interval (0, 1). We examine properties of the universality probability for non-standard models of computation (e.g., DNA computing). Reference: G. Barmpalias and D. L. Dowe, "Universality probability of a prefix-free machine", accepted, Philosophical Transactions of the Royal Society A MML inference of systems of differential equations (24 pts) Supervisor: David Dowe Many simple and complicated systems in the real world can be described using systems of differential equations (Bernoulli, Navier-Stokes, etc). Despite the fact that we can accurately describe and solve those equations they often fail to produce accurate predictions. In this project, our goalis to create a way of inferring the system of (possibly probabilistic or stochastic (partial or ordinary) differential equations (with a quantified noise term accounting for any inexactness) that describes a real-world system based on a set of given data. Initially we can begin by working on a single equation with one unknown. (The noise could be due to a number of effects such as measurement inaccuracies or oversimplified models used.) From there, we can progressively move to gradually more complicated equations. Minimum Message Length (MML) will be one of the tools used for modelling as it can provide ways of producing simpler models that actually fit closer than their more complicated counterparts produced by other methods. The project will become increasingly CPU-intensive but will ultimately have many real-world applications in a wide range of areas. References: Wallace (2005) Dowe (2011a) Econometric, statistical and financial time series modelling using MML (24 pts) Supervisor: David Dowe, Farshid Vahid Time series are sequences of values of one or more variables. They are much studied in finance, econometrics, statistics and various branches of science (e.g., meteorology, etc.). Minimum Message Length (MML) inference (Wallace and Boulton, 1968) (Wallace and Freeman, 1987)(Wallace and Dowe, 1999a)(Wallace, posthumous, 2005)(Comley and Dowe, 2005) has previously been applied to autoregressive (AR) time series (Fitzgibbon et al., 2004), other time series (Schmidt et al., 2005) and (at least in preliminary manner) both AR and Moving Average (MA) time series (Sak et al., 2005). In this project, we apply MML to the Autoregressive Conditional Heteroskedasticity (ARCH) model, in which the (standard deviations and) variances also vary with time. Depending upon progress, we can move on to the GARCH (Generalised ARCH) model or Peiris's Generalised Autoregressive (GAR) models, or to inference of systems of differential equations. This project will require strong mathematics - calculus (partial derivatives, second-order partial derivatives, integration by parts, determinants of matrices, etc.), etc. References: CoDo2005 Comley, Joshua W. and D.L. Dowe (2005). FiDV2004 Fitzgibbon, L.J., D. L. Dowe and F. Vahid (2004). SaDR2005 ScPL2005 Wall2005 WaBo1968 WaDo1999a Wallace, C.S. and D.L. Dowe (1999a). Minimum Message Length and Kolmogorov Complexity, Computer Journal, Vol. 42, No. 4, pp270-283. WaFr1987 Film production: how far can constraints go? (18 or 24 pts) Supervisors: Maria Garcia de la Banda, Chris Mears, Guido Tack, Mark Wallace The "film production" (or "talent scheduling") problem, defined by [1] in 1993, is a very simplistic version of the real-life optimisation problem, which involves determining when and where scenes in a movie are filmed in order to minimise a certain objective function. While the simplified version only takes into account the cost incurred by actors who have to wait while in-between scenes, the real-life version needs to take into account many other factors, from location and light requirements, to limits in the amount of hours the crew can be working. There has been a significant amount of research on the idealised version of this problem using many different technique such as evolutionary algorithms, local search and constraint programming. However, it is not clear whether the results of this research apply to the more realistic version of the problem. We have enlisted the help of an assistant director to many Australian movies to provide us with data and expertise regarding this problem. The aim of the project is to investigate how well the real-life problem can be modelled and solved using constraint programming. This project is most suited for students with good mathematical, modelling and programming skills. [1] Cheng, T. C. E., J. E. Diamond, B. M. T. Lin. 1993. Optimal scheduling in film production to minimize talent hold cost. Journal of Optimization Theory and Applications 79 479–482. [2] Garcia de la Banda, M., Stuckey, P., Chu, G. Solving Talent Scheduling with Dynamic Programming. INFORMS Journal on Computing. 23(1): 120-137, 2011. Exploring data management platforms for “big data” (24 pts) Supervisor: Maria Indrawan-Santiago The growth of data produced and consumed by applications has increased recently. Several organisations have seen the explosion of the amount of data that they have to collect, store and retrieve for its daily operations or decision making support. The explosion has placed challenges to the current relational DBMS such as Oracle, SQL Server and MySQL in term of query performance. To overcome the performance limitation of relational database in handling large amount of data, several alternative database models have been introduced in the last few years. This group of alternative database is known as “NOSQL” that could be interpreted as “Not Only SQL” or “No SQL”1. Examples of these new approaches are Big Table, Array and HDFS. The models have derived mainly from in-house research and development teams at major companies such as Google with the HDFS and Facebook with Haystack. Academic research contribution is very limited in this area. There will be several possible projects in this area. For example: • Finding the best performance database given different classes of queries. • Exploring the advancement in graph databases. Final shape and scope of the project will be determined after discussion between supervisor(s) and individual student. What would you learn? • New database technology (both theoretical and practical) • Making a critical analysis of new technology What types of skill do you need? • Critical thinking • Java programming • Relational database • Database modeling Investigating into Continuous Opinion Dynamics for Innovation Support System (24 pts) Supervisor: Vincent Lee Self-categorization theory (SCT) is a relatively new paradigm in social psychology that aims to explain the “psychological emergence of the group-level properties of social behaviour". A formal model of self-categorization, with the aid of adaptive intelligent tool, could be used to build a new opinion dynamics model, which would be social-psychologically founded. In an open innovation domain, individual behaviour if positively motivated can lead to generating creative ideas for process, product, market, and organisational innovations. As the scope of the SCT is an integrated group processes, SCT deals fundamentally with situations where a great number of individuals interact. These individual interactions are driving inputs to develop an agile innovation support system. Blogging and microblogging messages could typically generate complex collective phenomena, which however are difficult to anticipate the behaviour of individuals (with regard to interactions) for the support need for new idea generation. Simulation is a reliable way of exploring the collective dynamics resulting from the hypotheses made on the individual level. The broad scope of this project will investigate the behaviour of a continuous opinion dynamics model, inspired by social psychology. The project will also study the behaviour of the model for several network interactions and show that, in particular, consensus, polarization or extremism are possible outcomes, even without explicit introduction of extremist agents. The expected outcomes are to compare the results of the simulation to what is expected according to the theory, and to other opinion dynamics models. Keywords: Opinion Dynamics; Self-Categorization Theory; Consensus; Polarization; Extremism; Open Innovation. Relevant Readings Rao, Balkrishna, C. (2010), On the methodology for quantitifying innovations, International Journal of Innovation Management, vol. 14, No. 5, pp.823-839. Shin, Juneseuk, Park, Yongtae (2010), Evolutionary optimization of a technological knowledge network, Technovation, vol. 30, pp.612-626. Laurent Salzarulo (2006). A Continuous Opinion Dynamics Model Based on the Principle of Meta-Contrast, Journal of Artificial Societies and Social Simulation vol. 9, no. 1 Amblard, F. and Deffuant, G. (2004), The role of network topology on extremism propagation with the relative agreement opinion dynamics, Physica A, vol. 343, pp. 725-738. Salzarulo, L. (2004), Formalizing self-categorization theory to simulate the formation of social groups, presented at the 2nd European Social Simulation Association Conference, Valladolid, Spain, 16th-19th September 2004. Intelligent Real-time Activities Recognition (24 pts) Supervisors: Vincent Lee, Clifton Phua (I2R) Real time activities recognition is an emerging research area, especially in smart future city living. This project focuses on the automated recognition of activities and behaviors in smart homes and providing assistance / intervention accordingly. We will carry out the automated monitoring of basic Activities of Daily Living (bADL) and instrumental Activities of Daily Living (iADL) among single and multiple residents in smart homes. Technically, these objectives translate to significant advances in sensitivity and specificity in activity and plan recognition of finer grained bADLs / iADLs for single subject; and improved location tracking, object / human dissociation and activity recognition among multiple subjects. Readings: Norbert Gyo? rbíró · Ákos Fábián · Gergely Hományi (2009), An Activity Recognition System For Mobile Phones, Mobile Network Applications, vol. 14, pp 82–91 DOI 10.1007/s11036-008-0112-y. Flora Dilys Salim, Jane Burry, David Taniar, Vincent Cheng Lee, Andrew Burrow(2010), The Digital Emerging and Converging Bits of Urbanism Crowd designing a Live Knowledge Network for Sustainable Urban Living, in proceedings of 28th Conference on Future Cities eCAADe(Education and Research in Computer Aided Architectural Design in Europe 2010 Generation-Y: What do they think about Mobile Payment? (24 pts) Supervisor: Mahbubur Rahim Payment method has undergone a drastic change in line with technology and science development. Mobile payments are payment for goods, services, and bills/invoices with a mobile device like mobile phones and personal digital assistant, leveraging on wireless and other communication technologies. However, the acceptance and usage of mobile payment is relatively low albeit high penetration of mobile phone services in Australia. This project would investigate the Generation-Y’s perception and intention towards mobile payment using a modified Technology Acceptance Model (TAM). It will involve a survey among students for data collection. Advanced Visualisation for Constraint Propagation and Search (24 pts) Supervisors: Guido Tack, Chris Mears The project is suitable for both Honours and Minor Theses. Optimisation problems arise almost everywhere. Advances in optimisation technology have made it possible to solve problems in diverse areas such as transport networks, production scheduling, energy grids, nurse rostering, university timetables, or protein structure prediction. One challenge with these approaches is that in order to improve their efficiency, it is vital to understand their behaviour at different levels. Current tools offer little to no support for this. In this project, you will develop visualisation strategies for a particular class of optimisation solvers, those based on constraint propagation and search. A number of tools have been proposed over the years (see e.g. [1], [2], [3]), but they usually do not scale well to the massive search trees encountered in real-life applications. Your goal will therefore be to explore what kind of data can be extracted from these search trees, and to develop techniques for aggregating and visualising that data. The visualisation will be based on extensions of the tree map technique [4]. The project is suited to students with good mathematical and programming skills. Some experience with programming graphical user interfaces will be helpful. Helmut Simonis, Paul Davern, Jacob Feldman, Deepak Mehta, Luis Quesada, Mats Carlsson: A Generic Visualization Platform for CP. In David Cohen (ed): CP 2010. LNCS 6308, Springer, 2010. Christian Schulte: Oz Explorer: A Visual Constraint Programming Tool. In Lee Naish (ed): ICLP 1997. MIT Press, 1997. Pierre Deransart, Manuel V. Hermenegildo, Jan Maluszynski (Eds.): Analysis and Visualization Tools for Constraint Programming, Constrain Debugging (DiSCiPl project). LNCS 1870, Springer, 2000. Learning from Very Large Data (24 pts) Supervisor: Geoff Webb Background Machine learning is a fundamental technology that underlies the core features of many modern businesses such as social networking, internet search and online commerce. Demand for graduates with advanced machine learning skills is extremely high. Many advanced applications of machine learning have access to extraordinarily large quantities of data. However, there is a paradox that most modern machine learning techniques with the theoretical capabilities to produce the most accurate classifiers from large data (those with very low asymptotic error) have computational complexity that makes them infeasible to apply to large data (are super-linear on the data quantity). Project aim and basic outline of approach This project will explore approaches to developing feasible classifiers with low asymptotic error that have linear or sub-linear computational complexity. As a starting point we will look at modifications to the Averaged N-Dependence Estimators algorithm that reduce its complexity with respect to its parameter N. Pre- and co-requisite knowledge and units studied as appropriate Either advanced Java programming skills or a strong mathematical background is required. MML time series and Bayesian nets with discrete and continuous attributes Supervisors: A./Prof. David Dowe Background The first application of MML to Bayesian nets including both discrete and continuous-valued attributes was in Comley & Dowe (2003), refined in Comley & Dowe (2005)[whose final camera-ready version was submitted in Oct 2003], based on an idea in Dowe & Wallace (1998) . The Minimum Message Length (MML) principle from Bayesian information theory (Wallace (2005), Dowe (2011a) ) enables us (given sufficient data) to infer any computable or expressible model from data (e.g., Wallace & Dowe (1999a) and chapter 2 of Wallace (2005) ). One of the particular specialties of MML is when the amount of data per parameter is sparse, such as the Neyman-Scott (1948) problem. In such cases, we see the classical approach of Maximum Likelihood and many approaches converge to the wrong answer (even for arbitrarily much data), but Dowe & Wallace (1997) and chap. 4 of Wallace (2005) and sec. 6 of Dowe (2011a)all show MML doing well. Aims and outline We seek to enhance this original work to Bayesian nets which can change with time, using the mathematics of MML time series in Fitzgibbon, Dowe & Vahid (2004) . The student will be required to: Understanding relevant underlying mathematics, Developing necessary mathematics, Developing software for relevant mathematics, Testing and applying software on real-world data. URLs and references Pre- and Co-requisite knowledge The ability to program is essential. The work will use Minimum Message Length (MML) and will become quite mathematical. MML inference of SVMs, DEs, time series, etc. Supervisors: A./Prof. David Dowe Background The Minimum Message Length (MML) principle from Bayesian information theory (Wallace (2005), Dowe (2011a) ) enables us (given sufficient data) to infer any computable or expressible model from data (e.g., Wallace & Dowe (1999a) and chapter 2 of Wallace (2005) ). When the amount of data per parameter is scarce, such as in the Neyman-Scott (1948), we see the classical approach of Maximum Likelihood and many approaches converge to the wrong answer (even for arbitrarily much data), but Dowe & Wallace (1997) , chap. 4 of Wallace (2005)and sec. 6 of Dowe (2011a)all show MML doing well. The information-theoretic log-loss (logarithm of probability) scoring system is unique in being invariant to the parameterisation of questions (Dowe, 2008a, footnote 175; Dowe, 2011a , sec. 3). This gives even further justification to the MML approach. Given this generality, indeed this universality, MML can be reliably applied to any inference problem. We list here three of several (or infinitely many?) possible examples - namely, inference of support vector machines (SVMs), inference of differential equations (DEs) and/or inference of econometric time series, among many other examples. Examples of many applications include modelling of dynamical systems and modelling financial markets. Aims and outline We focus on one specific project - be it inference of support vector machines (SVMs), inference of differential equations (DEs), inference of econometric time series and/or whatever. We then use Minimum Message Length (MML) to infer the model from the data. We compare with alternative methods on both artificially generated data and real-world data. The student will be required to: Understand relevant underlying mathematics, Develop necessary mathematics, Develop software for relevant mathematics, Test and applying software on real-world data. URLs and references Pre- and Co-requisite knowledge The ability to program is essential. The work will use Minimum Message Length (MML) and will become quite mathematical no matter what direction the project takes. (Google) Maps Databases Supervisors: Assoc. Prof. David Taniar Background Are you interested in developing algorithms to solve queries, such as: "Given a map containing some places of interest, find three closest places of interest from a given location"; or "Given a map containing 100 objects of interest, draw a graph which represents that represent the nearest neighbour of each object". Aims and outline This project aims to develop efficient algorithms to process spatial database queries, by incorporating some of the properties of computational geometry. URLs and references http://en.wikipedia.org/wiki/Nearest_neighbor_graph Pre- and co-requisite knowledge Have a strong interest in math, including geometry; and Have a passion in solving puzzles (e.g. what does the picture shown in http://en.wikipedia.org/wiki/Nearest_neighbor_graph mean?) Security Analysis of NTLM Authentication Protocol Supervisors: Ron Steinfeld Background NTLM is a widely used authentication protocol designed to secure remote login in several Microsoft network applications. Despite its popular use, its security properties are not well understood. The protocol has some known security weaknesses. Identifying further weaknesses is critical to undertanding the risks associated with its use. Aims and outline The aim of this project is to investigate and improve current understanding of the NTLM protocol's security and its vulnerabilities. In particular, the project will explore the feasibility of adapting known attacks on other similar protocols, such as WEP, to break one or more security goals of NTLM under suitable conditions. Part of the project will involve implementing and testing known and new attacks using the open-source implementation of NTLM. Other topics in cryptography-related areas are available for interested students. URLs and references [1] The NTLM Authentication Protocol and Security Support Provider -- http://davenport.sourceforge.net/ntlm.html (This page describes the NTLM protocol). [2] "Intercepting Mobile Communications: The Insecurity of 802.11" -- http://www.isaac.cs.berkeley.edu/isaac/wep-draft.pdf (This paper describes attacks on the WEP protocol). [3] "The Java CIFS Client Library" -- http://jcifs.samba.org/ (This site contains an open-source implementation of the NTLM protocol). [4] "Understanding the Windows SMB NTLM Authentication Weak Nonce Vulnerability" -- http://www.ampliasecurity.com/research/NTLMWeakNonce-bh2010-usa-ampliasecurity.pdf(This presentation explains some known vulnerabilities of NTLM). Pre- and co-requisite knowledge Familiarity with the basics of cryptography would be an advantage. The student should have good mathematical and programming skills. Fluid visual interfaces for exploring graph databases Supervisors: Tim Dwyer, Michael Wybrow Background Graph Databases are a type of "NoSQL" database that are quickly gaining popularity. For certain types of data, modelling that data with a graph (or network) is a much better fit than the traditional tabular relational database model. Various textual languages exist to query graph databases (e.g. SPARQL, Gremlin) however a much more natural way to allow people to explore this type of data is through direct manipulation of interactive visuals. Yet, current techniques for doing this are rudimentary and not very friendly. Aims and outline This project is about creating fast, fluid ways for people to interact with graph data using direct interaction exploiting HTML5 and multitouch. Finding useful associations in data Supervisors: Geoff Webb Background Association discovery is an important area of data mining that identifies associations between factors in data. Many statistically significant associations are of little interest in practice because they are trivially implied by other associations. For example, if it is known that having prostate cancer implies being male and that prostate cancer is associated with abdominal pain then the fact that being male and having prostate cancer is associated with abdominal pain is not informative. Non-redundant associations repress a limited class of such associations (including the example above). Non-derivable associations repress a much larger class of such associations. It has been argued that the inference rules on which non-derivable associations are based are stronger than those naturally understood by an average user. Aims and outline This project will investigate whether this is true and develop and explore intermediate constraints between non-redundant and non-derivable associations to assess their relative value for discarding associations that will not be useful to normal users. URLs and references http://arxiv.org/pdf/cs.DB/0206004 Pre- and co-requisite knowledge Students will require strong programming skills, preferably in C++ or Java Predicting defects in software components Supervisors: Yuan-Fang Li, Reza Haffari Background: Software systems are becoming increasingly large and complex. Software quality assurance (QA) usually faces resource constraints in budget, personnel and time. Hence, the efficient allocation of QA resources is of high importance to maintain high quality standards. As a result, the ability of predicting defectiveness of software components (modules, classes, files, methods, etc.) through software metrics is an area of great practical value. Aim and Outline The aim of this project is to develop novel software defect prediction (SDP) frameworks that make use of advanced machine learning techniques. Specifically, we will be exploring the following problems: (1) novel process-oriented metrics, (2) new learning algorithms that is able to give evidence for prediction outcomes, and (3) new ranking algorithms that is able to rank components by predicted defect density/severity, etc. URLs and References [1] Lessmann, Stefan, et al. "Benchmarking classification models for software defect prediction: A proposed framework and novel findings." Software Engineering, IEEE Transactions on 34.4 (2008): 485-496. (http://ieeexplore.ieee.org/document/4527256/) [2] Menzies, Tim, Jeremy Greenwald, and Art Frank. "Data mining static code attributes to learn defect predictors." Software Engineering, IEEE Transactions on 33.1 (2007): 2-13. (http://ieeexplore.ieee.org/document/4027145/) [3] Menzies, Tim, et al. "Defect prediction from static code features: current results, limitations, new approaches." Automated Software Engineering 17.4 (2010): 375-407. (http://link.springer.com/article/10.1007/s10515-010-0069-5) [4] Lewis, Chris, et al. "Does bug prediction support human developers? findings from a google case study." Proceedings of the 2013 International Conference on Software Engineering. IEEE Press, 2013. (http://dl.acm.org/citation.cfm?id=2486838) [5] Moser, Raimund, Witold Pedrycz, and Giancarlo Succi. "A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction." Software Engineering, 2008. ICSE'08. ACM/IEEE 30th International Conference on. IEEE, 2008. (http://ieeexplore.ieee.org/document/4814129/) Pre- and Co-requisite Knowledge Strong programming skills Basic knowledge of software engineering, machine learning or data mining One of: FIT3080/FIT4004/FIT4009/FIT5171/FIT5047/FIT5142 Environmental Sensing with Swarms of Flying Robots Supervisors: Jan Carlo Barca and Karan Pedramrazi Background Swarms of robots that are capable of carrying out environmental sensing tasks offer an edge over traditional static sensor networks as the sensor carriers can move about autonomously in order to fulfil the most recent task requirements. Aim and Outline This project aims to devise mechanisms that enable swarms of quad copters to harvest environmental data and synthesise the captured information into three-dimensional maps. The student will work within Monash Swarm Robotics Laboratory and attend weekly meetings with researchers in the lab. This is a great opportunity for the selected student to learn about swarm robotics and work within a multi disciplinary team consisting of software, mechanical and electrical engineers. URLs and References Brambilla, M., Ferrante, E., Birattari, M. and Dorigo, M. (2012) "Swarm robotics: A review from the swarm engineering perspective", Swarm Intelligence, vol. 7, issue 1, pp 1-41. Available: http://iridia.ulb.ac.be/IridiaTrSeries/rev/IridiaTr2012-014r002.pdf Kumar, V. and Michael, N. (2012) "Opportunities and challenges with autonomous micro aerial vehicles, International Journal of Robotics Research", vol. 31, issue 11, pp. 1279-1291. Available: http://www.isrr-2011.org/ISRR-2011/Program_files/Papers/kumar-ISRR-2011.pdf Pre- and Co-requisite Knowledge Advanced C++ programming experience and a strong desire to work with sensor systems and flying robots is essential. Statistical Topic Models for Text Segmentation Supervisors: Reza Haffari and Ingrid Zukerman Background Text segmentation is the problem of taking a contiguous piece of text, for example closed-caption text from news video, and dividing it up into coherent sections. The segmentation problem has been researched in Natural Language Processing mainly in the context of discourse segmentation. Various segmentation models have been proposed based on (i) lexical cohesion to capture topical aspects of utterances, (ii) entity chains to capture interaction between the form of linguistic expression and local discourse coherence, and (iii) cue phrases. Aim and Outline Statistical topic models are the current state-of- the-art for text segmentation. In this project we augment topic models with additional sources of information, e.g., those coming from a domain expert, to enhance text segmentation. Pre- and Co-requisite Knowledge FIT3080 Intelligent systems or equivalent is a mandatory prerequisite, and solid knowledge of probability is desirable. Two Approaches to Optimising Chemical Engineering Design Supervisors: Maria Garcia de la Banda, Mark Wallace Background Motivating Simulation Typically the design of a complex system is formulated as a set of parameter settings. In such systems it is hard to predict how the parameter settings impact the performance of the system (for example how the length, width and curvature of an aeroplane wing affects its flying performance). Consequently the specified system must either be built and tested or, more usually, simulated. Simulation, using well-established tools such as Aspen, is used for evaluating the design of chemical processing plants. Addressing the Computational Cost of Simulation Given just 20 parameters with 5 alternative settings each, the number of alternative resulting designs is 5^20 which is around 10^14. If a single simulation takes a minute of computer resource, it would take several hundred million years to evaluate all the alternatives. Instead the simulation-optimisation community use heuristic techniques such as simulated annealing or genetic algorithms. These techniques seek a high quality solution by modifying previous solutions and using simulation to determine whether the new solution is better or worse. Multiple-Objective Optimisation (MOO) If there are multiple objective criteria – such as cost, throughput, and CO2 emissions - it no longer suffices to find a single good solution. Instead the evaluation procedure needs to explore many solutions to reveal the trade-offs between the different criteria. The number of solutions necessary to reveal the “efficient frontier”, where no criteria can be improved without degrading another one, may be in the hundreds or thousands. Even using the established techniques of simulation-optimisation, the multiple-objective optimisation problem (“MOO”!) is computationally prohibitive. The computational Cost of MOO At Monash chemical engineering researchers are investigating tractable approaches for solving the MOO problem. The idea (of the “Nondominated Sorting Genetic Algorithm – NSGA-II”) is to find additional solutions along the efficient-frontier by modifying and combining previous solutions on the frontier. As with any heuristic method, there is no guarantee of the quality of the results. For simulation-optimisation, search can be focussed around the current best solution, but for solving the MOO problem, search must be “spread” along the whole efficient frontier. Consequently the chances of the heuristic algorithm failing to find solutions on the efficient frontier are very high. Moreover, with each new criterion added to the set of objectives, the size of the frontier grows dramatically. Consequently there is an urgent need to find novel scalable approaches for the MOO problem. Aim and Outline Whatever approach is adopted for multi-objective optimisation in a Chemical plant, it is inevitable that a large number of parameter setting combinations must be evaluated. Consequently it is essential that the computationally-expensive simulation step must be taken out of the evaluation loop. There are two possible ways of achieving this: 1. Plants can be modelled using non-linear equations and inequations. Interval reasoning is an option for optimising models of this kind – assuming the number of parameters is small enough. 2. Plants can be broken down into a network of processing units. If the behaviour of each processing unit could be pre-computed (which could be done by simulation, for example, if only a few parameters were applicable to an individual unit), then the behaviour of the plant could be embedded in an optimisation process without using any further simulation. Honours Project Investigate a gas turbine combined cycle power station to understand the number of processing units, the parameters applicable to each unit, the model associated with each unit, and the model associated with the plant (simplified as necessary to meet the project timescale). Investigate approach 2, above, and compare the results achievable with this approach with those achieved by previous researchers using the NSGA-II algorithm on the same problem. Pre- and Co-requisite Knowledge This project is most suited for students with good mathematical and modelling skills Measuring the performance of multi-objective optimisation algorithms Supervisors: Aldeida Aleti Background One of the main aspect of the performance of an optimisation algorithm is the current fitness of the solution(s), where the aim is to minimise/maximise its value. This is usually the way performance is measured in a single-objective algorithm, in which the solution with the best fitness value (smallest value in minimisation problems and largest value in maximisation problems) is reported for each run of the algorithm. In a multiobjective problem, and in the absence of a priori preference ranking of the objectives, the optimisation process produces a set of nondominated solutions, which make a trade-off between the fitness functions. As a results, the improvement made by the algorithm is expressed in multiple solutions with multiple fitness values. Measuring the performance of a multi-objective optimisation algorithm is not as straightforward, since it requires the use of aggregate measures which capture multiple fitness values from multiple solutions. Aim and Outline In this project, we will investigate different methods and develop new metrics for assessing the performance of multi-objective optimisation algorithms. URLs and References http://users.monash.edu.au/~aldeidaa/ Pre- and Co-requisite Knowledge Prior knowledge in optimisation would be helpful. Clustering and Association Analysis for Identifying Technology & Process Innovation Potentials Supervisors: Associate Professor Vincent Lee and Dr Yen Cheung Background: In the quest for sustainable growth, industrial firms have to identify potential disruptive process or technology during continuous innovation search. Patent data sets which are semistructured and embedded with rich rare topics. By measuring the homogeneity and heterogeneity of patents that can lead to the discovery of potential technology or process innovation opportunities. Aim and Outline 1. This project aims to use data mining tool to cluster and develop patent clusters; 2. Analyse the association of variant of patents for identifying potential technology and or process innovation for new market development. URLs and References [1]. Chiu, T.F., Hong, C.F., & Chiu, Y.T.: Exploring Technology Opportunities in an Industry via Clustering Method and Association Analysis. In C. Badica et al. (Eds.), Lecture Notes in Artificial Intelligence, 8083, pp. 593-602, (2013) [2] Chiu, T.F.: A Proposed IPC-based Clustering Method for Exploiting Expert Knowledge and its Application to Strategic Planning, Journal of Information Science, pp. 1-17 (online 18 October 2013). [3] Weka Data mining tool [4] Runco, M. A and Acar, S. (2013), Divergent Thinking as an Indicator of Creative Potential, Creativity Research Journal, http://www.tandfonline.com/loi/hcrj20 Pre- and Co-requisite Knowledge Some knowledge on the use of WEKA data mining tool for clustering and discovery of knowledge (text document) from similarity measures. Agile Smart Grid Architecture Supervisors: Associate Professor Vincent Lee and Dr Ariel Liebman Background Many multisite industrial firms have to respond to the call for reduction in CO2 emission in their business and production process operations. The incorporation of heterogeneous local renewable energy (wind, solar etc) sources and energy storage capacity in their electricity distribution grid bring greater degree of uncertainty that demand timely reconfiguring the grid architecture to optimise overall energy consumption. Aim and Outline The project aims to: Analyse and evaluate (using simulation tool) the various feasible agile smart grid architectures, their communication protocols and control schemes. URLs and References [1] Jason Bloomberg, The Agile Architecture Revolution, 2013, John Wiley and Sons Press, ISBN 978-1-118-41787-4 (ebook) [2] IEEE Transactions on Smart Grid Pre- and Co-requisite Knowledge Some knowledge on graph theory based algorithmic development for sensor network. A Predictive Cyber Security Cost Model for Financial Services Sector Supervisors: Associate Professor Vincent Lee Background Intensive competition in global digital economy has given rise to escalating cyber and physical systems crimes by malicious data miner who plan to gain personal and institutional competitive advantages. Malicious insiders exist in all industries. Amongst all reported cyber- crimes, cybercrimes that are committed by malicious insiders in financial services sector are among the most significant threats to networked systems and data. This is reflected by many enterprises have experienced more than 50% of cybercrimes that have been derived from malicious insiders [1]. A malicious insider is a trusted insider (e.g. current employee, contractor, customer, or business partner) who abuses his/her trust to disrupt operations, corrupt data, ex-filtrate sensitive information, or comprise an IT system, causing loss or damage [2]. Body of academic research and professional practice literature focus mainly on how to detect and how to prevent cybercrimes using security mechanisms that were justified their uses mainly based on past crime patterns. There is, however, research using quantitatively forecast on the cost impact of cybercrimes is still a challenging task. Aim and Outline Two main aims of this proposal are: 1. To formulate a predictive cybercrime cost model which can be applied to malicious insider attack within a financial institution; and 2. To verify the predictive power of the formulated model with empirical data. Outline This research project attempts to use both qualitative and quantitative approaches for enterprise to estimate the cost impact of a malicious insider attack. Estimation of cost impact is central to allocation of IT security budgets. Depending on the project outcome, funding to attend conference is available. URLs and References [1] Adam Cummings, Todd Lewellen, David McIntire, Andrew P. Moore and Randall Trzeciak. “Insider Threat Study: Illicit Cyber Activity Involving Fraud in the US Financial Services Sector- A special Report”, July 2012, Software Engineering Institute, Carnegie Mellon University. [2] Dawn Cappelli, Andrew Moore, Randall Trzeciak. “The CERT Guide to Insider Threats- How to Prevent, Detect, and Respond to Information Technology Crimes (Theft, abotage, fraud), Chapter 3, 2012, Addison-Wesley. [3] Vincent CS Lee and Yee-wei Law. “ Cyber-Physical System risk detection and simulation”, oral presentation to The International Symposium onCyberSecurity (CyberSec2013), Nanyang Technology University, 28-29 January 2013. [4] Kim-Kwang Raymond Choo. “Cyber threat landscape faced by financial and insurance industry”, Australia’s national research and knowledge centre on crime and justice, Trends & Issues in crime and criminal justice, No. 408 February, 2011. Pre- and Co-requisite Knowledge Basic knowledge on economic modelling and use of simulation software package (MATLAB, ExtenSim) Online Handwritten Signature Verification Supervisors: Gopal Gupta Background This project is in an area of my earlier research. A number of research papers were published. Aim and Outline There is considerable interest in authentication based on handwritten signature verification (HSV) because HSV is superior to many other biometric authentication techniques , for example, fingerprints or retinal pattern which are more reliable but much more intrusive. A number of HSV techniques have been explored over the last 30-40 years. I myself have looked into using dynamic parameters when a signature is signed online. Another approach was based on simulating the hand movements of the person signing online. Both these techniques work well and the results have been published. This project involves finding even more reliable techniques perhaps by exploring yet another approach to online HSV based on identifying curves and straight lines as the signature is signed. You will need to study curves and lines identification techniques in pattern recognition and then use those techniques and perhaps develop new techniques for use in online HSV. The project requires some mathematical knowledge and programming experience. URLs and References (with R. C. Joyce) Using Position Extrema Points to Capture Shape in On-line Handwritten Signature Verification, Pattern Recognition, Vol. 40 pp 2811-2817, 2007 The State of the Art in On-line Handwritten Signature Verification, approx 38pp, Faculty of Information Technology, Technical Report, 2006 Pre- and Co-requisite Knowledge The student must have some mathematical background and experience in programming Automatic Sound Generation and Recognition Supervisors: Gopal Gupta Background No research publication in this area but it is related to an industry project. Aim and Outline Some applications require techniques that automatically recognise sounds. Other applications (for example, in digital films) require techniques that can generate any given sound for use in the application. The first part of this project involves developing techniques for sound recognition that detect a wide variety of sounds. The sound recogniser should work even when there is high level of ambient noise. The other part of the project is to develop techniques to generate sounds so that a number of given sounds may be generated from a given list of sounds. Once again it may be necessary to generate sounds for a noisy background environment. The project will involve searching for publications in this area and then perhaps using some ideas from them. The project requires some mathematical knowledge and programming experience. The impact of using Business Intelligence in healthcare (18 pts) Supervisors: Caddie Gao and Frada Burstein Background Business Intelligence (BI) tools are widely used in many data-intensive organisations to support better decision-making and service delivery. BI tools have been recently adopted as one of the components of the information infrastructure within healthcare context as well. However, the impact of using BI on the healthcare outcomes still needs investigation. Aim and Outline The project will examining BI tools developments in a large Australian hospital and their impact on the business using a Case study approach. Pre- and Co-requisite Knowledge To tackle this project students need an undergraduate degree in IT (preferably in business information systems) or be a student in the Master of Business Information Systems. Some work experience will be looked at as a bonus. Visualising Lazy Functional Program Execution Supervisors: Tim Dwyer and Chris Mears Background Pure lazy functional programming languages such as Haskell remain the most advanced programming paradigm in common use. Laziness and functional purity allow the compiler to optimise code in much more sophisticated ways than standard imperative language compilers. Haskell syntax is also, arguably, a more natural and concise way to model problems and algorithms for solving them. However, the difficulty for programmers in these types of languages is understanding what is actually happening with such compiled, optimised code when it is executing. This is a serious blocker to wider adoption of the pure functional paradigm. While a lot of the research in functional languages over the years has been devoted to language and compiler design, it seems less effort has gone into developing really “user friendly” practical tools for developers. There is some recent work in this direction (see links to IHaskell and ghc-vis) however much work remains in making such tools informative and interactive. Aim and Outline To develop novel interactive visualisation tools to support programmers in understanding the efficiency and memory use of running haskell programs. Pre- and Co-requisite Knowledge Some knowledge of functional programming would be very useful (ideally haskell, but Lisp/ML/etc are also a good foundation). An interest in graphics, visualisation and software usability would also be advantageous. Qualitative investigation of software development team behaviour Supervisors: Robert Merkel, Robyn McNamara, Narelle Warren (TBC) Background The overwhelming majority of academic software engineering research has been highly technical in nature, and focused on the development of tools and methodologies to assist in some particular aspect of the development process, such as formal specification methods, or test case generation tools. Even where attempts have been made to formally study the human aspects of software engineering, structured quantitative approaches have been most common. To explore human behaviour, the social sciences typically use a combination of qualitative and quantitative approaches. Qualitative methods including observation and interviews offer the chance to generate new insights about a domain - in this case software development - which then, if desired, can be investigated further using quantitative approaches. Aim and Outline The aim of this project is to gain insight into the work patterns, interactions, and key challenges facing a software development team in its task. It is hope that this will gain some insight into some of the similarities and differences between the team's nominal development process and what they actually do in practice. At this stage, it is planned to use a mixed-methods qualitative approach, with a period (of 1-2 weeks) of unstructured participant observation of an industrial software development group followed up with semi-structured interviews. Such an approach is similar to ethnographic methods used in anthropology. URLs and References Kanij, T.; Merkel, R.; Grundy, John, "A Preliminary Study on Factors Affecting Software Testing Team Performance," Empirical Software Engineering and Measurement (ESEM), 2011 International Symposium on , vol., no., pp.359,362, 22-23 Sept. 2011, http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6092588&tag=1 http://www.umanitoba.ca/faculties/arts/anthropology/courses/122/module1/methods.html Pre- and Co-requisite Knowledge Some background in software engineering units is required. Training and guidance in qualitative research methods will be provided. Democratising Big Data: Public Interactive Displays of Sustainability Data Supervisors: Lachlan Andrew, Tim Dwyer, Ariel Liebman, Geoff Webb Background We have access to potentially finely-grained data on energy use around the University and particularly in some key new buildings. We would like to create interesting interactive visualizations that allow people to explore this data. Aim and Outline One possibility is that we set up a public display that can be controlled by passers-by using a Microsoft Kinect interface. Another (complementary) possibility is that we design a mobile or web app that allows people to explore this data on their own device. The point is to raise peoples' awareness of energy usage and efforts to improve sustainability of buildings at Monash. The HCI (Human Computer Interaction) research goal is to explore how novel interactive visualization and effective UI design can engage casual observers. URLs and References http://intranet.monash.edu.au/bpd/services/environmental-sustainability.html Pre- and Co-requisite Knowledge This project should appeal to students with an interest in graphics and natural user interface design. How close are they? Conflict of interest in Academia Supervisors: Lachlan Andrew Background Peer review is central to the health of scientific publishing. This requires that the reviewers of a scientific paper be sufficiently independent of the authors. For example, the reviewer should not be a current collaborator, a former student or supervisor, or be working at the same institution. However, this is not always clear-cut. What if they published together 10 years ago What if they used to work at the same institution? What if they have a close collaborator in common? Aim and Outline This project will develop software to determine whether an academic has a "conflict of interest" with any of the authors of a document. It will use public databases such as Google Scholar and the mathematics Genealogy Project to determine how "close" a candidate reviewer is to the authors of a candidate paper. URLs and References http://scholar.google.com http://genealogy.math.ndsu.nodak.edu/ Pre- and Co-requisite Knowledge Independent problem solving skills Where does my electricity go? Supervisors: Reza Haffari, Lachlan Andrew, Ariel Liebman Background Have you ever wondered why your electricity bill is high on a particular month? Smart meters have the potential to tell us which devices consume most of our electricity, but we must coax the information out of them. Smart meters report half-hourly energy use to your retailer, but also can distribute finer time-scale data over a wireless LAN. We would like to "disaggregate" this data and determine how much energy is used by individual devices, eg air conditioning, fridges, heating, cooking etc. This helps awareness about usage pattern, hence potentially reducing significantly the electricity consumption. Aim and Outline In this project, we design and develop machine learning techniques suitable for analysing and mining electricity usage data. The ideal model will be able to accommodate other sources of valuable information as well, e.g. time of the day, season, and temperature records. Particularly we explore a powerful statistical model, called Factorial Hidden Markov Models (FHMMs), and augment it with additional components to capture domain knowledge. We will make use of publicly available data in this project (REDD data set from MIT: http://redd.csail.mit.edu). URLs and References http://redd.csail.mit.edu Pre- and Co-requisite Knowledge Basic probability Finding Monash's heating and cooling costs Supervisors: Lachlan Andrew, Geoff Webb, Tim Dwyer, Ariel Liebman Background We have access to potentially finely-grained data on energy use around the University and particularly in some key new buildings. We would like to "disaggregate" this data further, to identify how much energy is used by different systems, specifically HVAC (heating, ventilation and air conditioning), but also lighting or office equipment. Aim and Outline In this project, we will use a combination of manual sleuthing and data mining techniques to determine what component of Monash's electricity consumption is due to heating and cooling. This will combine the above data set with hourly temperature measurements to try to detect the times at which a building's air conditioning or heating turns on or off, and the power consumption while it is on. The resulting data will be useful for raising awareness about which energy-saving strategies are likely to produce substantial savings. This data will ideally also form the input to a data visualisation project to convey this data to the wider campus community. URLs and References http://intranet.monash.edu.au/bpd/services/environmental-sustainability.html Pre- and Co-requisite Knowledge Basic probability. Understanding Fourier transforms would be an advantage Planning for an uncertain energy future Supervisors: Aldeida Aleti, Ariel Liebman Background Electricity grids around the world and in Australian are in the midst of a profound transformation. New technologies such as rooftop solar panels, wind farms, and smart meters are challenging current paradigms in system planning and even threatening existing electricity utility business models. Aim and Outline Electricity utilities, system planners, and governments are facing many future trends that are extremely uncertain. For example there is a great deal of uncertainty about electricity demand growth (or decline) compounded by uncertainty in the rate at which renewable technology costs decline. This project aims to develop optimisation techniques to model the impacts of uncertainty in demand growth, technology costs, and electricity generation feed stocks on optimal investment strategies in renewable technologies in an electricity system. The project will take some of it’s inspiration from the work done by the CSIRO Future Grid Forum. Pre- and Co-requisite Knowledge Project will appeal to student with interest in simulation and modelling with some programming experience. No prior knowledge in optimisation and energy systems required. Complex and Clever Metadata Architectures for Research Data Management Supervisors: Joanne Evans, Yuan-Fang Li, Tom Denison, Henry Linger Background Metadata is one of the conundrums facing those designing and developing advanced scholarly information infrastructures to facilitate distributed, data intensive, collaborative research. Any infrastructure for heterogeneous data sharing must be able to cope with a plethora of metadata ontologies, schemas, standards, representations and encodings, as one-size-fits-all approaches suffer in terms of metadata quality and usefulness. They do not provide the degree of specificity needed for efficient and effective discovery, access, interpretation and use. Aim and Outline This research project will investigate the design of a metadata management architecture for a research hub that can cope with complexity and diversity, mapping and managing commensurability, as well as allowing for necessary specialisation and extension, in order to facilitate sharing and re-use of research data. Simulating batteries in smart grid Supervisors: Vincent Lee, Ariel Liebman, John Betts Background Electricity grids around the world and in Australian are in the midst of a profound transformation. New technologies such as rooftop solar panels, wind farms, and smart meters are challenging current paradigms in system planning and even threatening existing electricity utility business models. Aim and Outline This project aims to model the integration of batteries into the smart grid using cloud based high performance computing. The model incorporates an industry standard power system simulation tool called Plexos configured to find the optimal investment in renewable generation technologies in a complex electricity network. The project will entail incorporating models of a range of new battery technologies to determine whether batteries can significantly improve the cost of investing in renewable and other low carbon energy technologies. Pre- and Co-requisite Knowledge Projct will appeal to student with interest in simulation, business decision making and modelling. No prior knowledge in optimisation and energy systems required. Beating the World Record on Freight Transport Problems Supervisors: Mark Wallace, Richard Kelly Background The vehicle routing problem with time windows has been tackled by groups all over the world using all kinds of optimisation approaches. These approaches are validated and compared against a set of problem instances published at SINTEF. The major optimisation company Quintiq publishes their best results on their website. Aim and Outline The recent method of guided ejection chains has been successful on several transport applications, and another method based on large neighbourhood search was used in Richard Kelly's recent PhD to obtain world-class results, with a couple of world records. This project will experiment with a combination of guided ejection search and large neighbourhood search to obtain new world records. The expected outcomes of this project are: * New world-best results on vehicle routing with time window benchmarks; * A publication comparing our results with Quintiq's ; * An efficiently implemented guided ejection search implementation; URLs and References Quintiq World Records: http://www.quintiq.com/optimization/vrptw-world-records.html SINTEF VRPTW Benchmarks: http://www.sintef.no/Projectweb/TOP/VRPTW/ Nagata, Y., & Bräysy, O. (2009). A powerful route minimization heuristic for the vehicle routing problem with time windows. Operations Research Letters, 37(5), 333-338. Pisinger, D., & Ropke, S. (2007). A general heuristic for vehicle routing problems. Computers & operations research, 34(8), 2403-2435. Pre- and Co-requisite Knowledge Experience in programming in C++, an interest in optimisation technology, and the desire to win! Beating the World Record on Freight Transport Problems Supervisors: Mark Wallace, Richard Kelly Background The vehicle routing problem with time windows has been tackled by groups all over the world using all kinds of optimisation approaches. These approaches are validated and compared against a set of problem instances published at SINTEF. The major optimisation company Quintiq publishes their best results on their website. Aim and Outline The recent method of guided ejection chains has been successful on several transport applications, and another method based on large neighbourhood search was used in Richard Kelly's recent PhD to obtain world-class results, with a couple of world records. This project will experiment with a combination of guided ejection search and large neighbourhood search to obtain new world records. The expected outcomes of this project are: * New world-best results on vehicle routing with time window benchmarks; * A publication comparing our results with Quintiq's ; * An efficiently implemented guided ejection search implementation; URLs and References Quintiq World Records: http://www.quintiq.com/optimization/vrptw-world-records.html SINTEF VRPTW Benchmarks: http://www.sintef.no/Projectweb/TOP/VRPTW/ Nagata, Y., & Bräysy, O. (2009). A powerful route minimization heuristic for the vehicle routing problem with time windows. Operations Research Letters, 37(5), 333-338. Pisinger, D., & Ropke, S. (2007). A general heuristic for vehicle routing problems. Computers & operations research, 34(8), 2403-2435. Pre- and Co-requisite Knowledge Experience in programming in C++, an interest in optimisation technology, and the desire to win! ERP in the Cloud by SMEs Supervisors: Sue Foster Background Little research has been conducted to assess the effectiveness or otherwise of adopting ERP in the cloud in SMEs Aim and Outline To identify the critical issues that affect organisations that conduct their ERP systems in the cloud URLs and References Business Process Management Journal, 11(2), 158-170. Klause, H. & Rosemann, M. (2000). What is enterprise resource planning? Information Systems Frontiers (special issue of The Future of Enterprise Resource Planning Systems), 2 (2), 141-162. Lewis, P. J. (1993). Linking Soft Systems Methodology with Data-focused Information Systems Development, Journal of Information Systems, Vol. 3, 169-186. Markus, M.L., Axline, S., Petrie, D., & Tanis, C. (2000) Learning from adopters' experiences with ERP: problems encountered and success achieved. Journal of Information Technology , 15, 245-265. Nolan, & Norton Institute. (2000). SAP Benchmarking Report 2000, KPMG Melbourne. Queensland Health Corporate Publications: Change management Documents: Located at http://www.health.qld.gov.au/publications/change_management/ Parr., A. & Shanks, G. (2000). A model of ERP project implementation. Journal of Information Technology, 15, 289-303. Ross, J. W. (1999). "The ERP Revolution: Surviving Versus Thriving, Centre for Information System Research, Sloan School of Management, MA, August 1999. Scott, J. E., & Vessey, I. (2002). Managing risks in enterprise systems implementations. Communications of the ACM, April, Vol. 45, No 4. Retrieved on 19 March 2010, Located at: http://delivery.acm.org/10.1145/510000/505249/p74-scott.pdf?key1=505249&key2=8269509621&coll=GUIDE&dl=GUIDE&CFID=80880926&CFTOKEN=57269991 Sedera, D., Gable, G., & Chan., T. (2003). Measuring Enterprise Systems Success: A Preliminary Model. Ninth Americas Conference on Information Systems, 476-485. Shang, S., & Seddon, P. B. (2002). Assessing and managing the benefits of enterprise systems: the business manager's perspective. Information Systems Journal. 12, pp 271-299. Shang, S. & Seddon, P. B. (2000). "A comprehensive framework for classifying the benefits of ERP systems" in the proceedings of the twenty third Americas Conference on Information Systems. 1229-1698. Skok, W., & Legge, M. (2001). Evaluating Enterprise Resource Planning (ERP) Systems using an Interpretive Approach. ACM., SIGCPR, San Diego. 189-197. (Benefit realisation Sumner, M. (2000). "Risk factors in enterprise-wide/ERP projects." Journal of Information Technology 15(4): 317 - 327. Titulair, H. B., Oktamis, S., and Pinsonneault, A. (2005). Dimensions of ERP implementations and their impact on ERP Project outcomes. Journal of Information Technology Management. XVI, 1. Located at http://jitm.ubalt.edu/XVI-1/article1.pdf Pre- and Co-requisite Knowledge Enterprise information systems knowledge would be an advantage SOA implementation benefits, barriers and costs Supervisors: Sue Foster Background Little research has been conducted to assess the implementation barriers, benefits or costs of using Service Oriented Architecture Aim and Outline To identify the issues that affect organisations adopting SOA URLs and References http://www.health.qld.gov.au/publications/change_management/ Parr., A. & Shanks, G. (2000). A model of ERP project implementation. Journal of Information Technology, 15, 289-303. Ross, J. W. (1999). "The ERP Revolution: Surviving Versus Thriving, Centre for Information System Research, Sloan School of Management, MA, August 1999. Scott, J. E., & Vessey, I. (2002). Managing risks in enterprise systems implementations. Communications of the ACM, April, Vol. 45, No 4. Retrieved on 19 March 2010, Located at: http://delivery.acm.org/10.1145/510000/505249/p74-scott.pdf?key1=505249&key2=8269509621&coll=GUIDE&dl=GUIDE&CFID=80880926&CFTOKEN=57269991 Sedera, D., Gable, G., & Chan., T. (2003). Measuring Enterprise Systems Success: A Preliminary Model. Ninth Americas Conference on Information Systems, 476-485. Shang, S., & Seddon, P. B. (2002). Assessing and managing the benefits of enterprise systems: the business manager's perspective. Information Systems Journal. 12, pp 271-299. Shang, S. & Seddon, P. B. (2000). "A comprehensive framework for classifying the benefits of ERP systems" in the proceedings of the twenty third Americas Conference on Information System Pre- and Co-requisite Knowledge Enterprise information systems knowledge would be an advantage Extending the ERP system beyond the organisational boundaries Supervisors: Sue Foster Background Most research is conducted within the ERP system however ERP now extends beyond the organisational boundaries to establish links with vendors, sellers and a variety of other stakeholders Aim and Outline To identify the critical issues impacting on organisations that extend their ERP systems beyond organisational boundaries URLs and References ACC (1984). ERP implementations and their issues. Proceedings of the Australian Computer Conference, Sydney, Australian Computer Society, November Edn. 1.Journal of Computer Information Systems, Spring, 81-90. Barati, D. Threads of success and failure in business process improvement. Located at http://www.isixsigma.com/library/content/c070129a.asp Managing Barriers to business Reengineering success located at: http://www.isixsigma.com/offsite.asp?A=Fr&Url=http://www.prosci.com/w_0.htm Roseman, M. (2001). Business process Optimisation: Making Process Re-engineering Actually work. Coolong Consulting (Australia) Pty Ltd Bingi, P. Sharma M.K. and Godla J.K. (1999). "Critical Issues Affecting an ERP Implementation", Information Systems Management, Vol. 16, 3, 7-14. Boyle., T. A., & Strong, S. E. (2006). "Skill requirements of ERP graduates." Journal of Information Systems Education 17(4): 403-412. Curran, T. A., & Ladd, A. (2000). SAP R/3: business Blueprint: Understanding Enterprise Supply Chain Management (2nd Edn). Sydney: Prentice Hall Australia Pty, Ltd. Davenport, T. H. (2000a). Mission critical: Realising the promise of enterprise systems. Boston: Harvard Business School Press. Davenport, T. H. (2000b). The future of enterprise system-enabled organisations. Information Systems Frontiers (special issue of The future of Enterprise Resource Planning Systems Frontiers), 2(2), 163-180. Davenport (1998). Putting the enterprise into the enterprise system. Harvard Business Review. July-August 1998. Davenport, T. H., (1990). The New Industrial Engineering: Information Technology and Business Process Redesign, Sloan Management Review, 31(4), Summer, 11. Francoise, O., Bourgault., M. & Pellerin, R. (2009). ERP implementation through critical success factors' management. Business Process Management Journal, 15(3), 371-394. Hammer, M. (2000). Reengineering work: Don't' Automate Obliterate. Harvard Business Review. July-August. Pre- and Co-requisite Knowledge Enterprise information systems knowledge would be an advantage Adaptive Genetic Algorithms in Search-Based Software Engineering Supervisors: Aldeida Aleti Background Software testing is a crucial part of software development. It enables quality assurance, such as correctness, completeness and high reliability of the software systems. Current state-of-the-art software testing techniques employ search-based optimisation methods, such as genetic algorithms to handle the difficult and laborious task of test data generation. Despite their general applicability, genetic algorithms have to be parameterised in order to produce results of high quality. Different parameter values may be optimal for different problems and even different problem instances. In this project, we will investigate a new approach for generating test data, based on adaptive optimisation. The adaptive optimisation framework will use feedback from the optimisation process to adjust parameter values of a genetic algorithm during the search.????? Aim and Outline The goal of this project is to evaluate (based on simulations and realistic examples) adaptive genetic algorithms in the context of specific software engineering problem(s). The specific tasks are: - Understand the current approaches in search based software testing and adaptive genetic algorithms - Perform an experimental evaluation of adaptive genetic algorithms for software testing URLs and References http://users.monash.edu.au/~aldeidaa Pre- and Co-requisite Knowledge Programming skills in Java or C++, Understanding of, or willingness to learn, the software engineering and statistical foundations needed for the project. Detection of interesting patterns via the duality of user/message metadata on Twitter Supervisors: Marc Cheong Background There is a large amount of hidden metadata (data-about-data) available on Twitter, generated by users in their day-to-day activity on the microblogging site. With the increasing awareness on online privacy, the question of "What inferences or possible real-world patterns of can we gleam from a collection of metadata harvested on Twitter" intrigues researchers. Several inference algorithms and case studies have already been developed in Cheong's PhD thesis (2013) and tested on a 10GB dataset of real-world Twitter metadata; however, many improvements can be made in light of Twitter's evolution over the past couple years. Aim and Outline The aim of this project is to investigate new approaches to Twitter metadata analysis. This might be done by improving existing algorithms or creating new ones (based on theories in e.g. HCI, social science, etc) and evaluating the effectiveness in modelling/studying a real-world phenomenon. URLs and References Inferring social behavior and interaction on Twitter by combining metadata about users & messages (PhD thesis). <http://arrow.monash.edu.au/vital/access/manager/Repository/monash:120048> Pre- and Co-requisite Knowledge Knowledge in basic statistics, data mining techniques, social media. Creating an extensible framework for automated solving of cryptic crosswords using machine learning and natural language techniques Supervisors: Robyn McNamara, David Squire Background Cryptic crosswords are commonly found in newspapers all around the globe, from the British Guardian to our very own The Age and the Herald Sun. However, for a human, one of the challenges in learning such crosswords is the learning curve involved, as well as the inside knowledge required in deciphering (or parsing) a clue. Currently, there is a scarcity of computer-based approaches to parse cryptic crossword clues, let alone solve entire puzzles! A few such papers were written decades ago, such as Williams & Woodhead (1979) and Smith & du Boulay (1986). Commercial solvers such as William Tunstall-Pedoe's Crossword Maestro do exist; however the algorithms used in such solvers are proprietary. Aim and Outline Realising this niche, this project aims to create an extensible framework for the automated solving of cryptic crossword clues (and by extension, an entire cryptic crossword grid). This framework should ideally be plugin-based, to allow for extensibility in e.g. handling new clue types. The proposed solution could use existing sources of semantic relations between words, e.g. the Natural Language ToolKit (NLTK), or WordNet. URLs and References Related papers on Google Scholar <http://scholar.google.com.au/scholar?q=computer+cryptic+crosswords> Williams, P. W. and Woodhead, D. (1979). Computer assisted analysis of cryptic crosswords. <http://comjnl.oxfordjournals.org/content/22/1/67.abstract> Pre- and Co-requisite Knowledge Knowledge (or willingness to learn) the cryptic crossword is a must. From the technical standpoint: use of the NLTK (or similar) libraries, a good knowledge of data structures and search algorithms. Does virtualisation save energy? Supervisors: Lachlan Andrew Energy related expenses account for around half the cost of a data centre, and so minimising energy use is an important aspect of data centre management. A popular tool for achieving this is virtualisation, in which one powerful computer emulates multiple smaller computers. This allows many lightly-utilised computers, all consuming power, to be replaced by a small number of highly-utilised computers. However, virtualisation is not free. Emulation has substantial overheads, which are often ignored. Aim and Outline This project will involve measuring the performance (speed per watt) of virtualised hosts running on different hardware, and compare that with the performance of native execution. The final outcome will be a design guide telling operators when it is beneficial to virtualise and when it is not. Pre- and Co-requisite Knowledge Requires general programming skills. Will provide knowledge of: • experimental design • data centre energy management • writing code to interface to measurement equipment Clear text in JPEGs Supervisors: Lachlan Andrew Background JPEG is an inexact ("lossy") compression standard, which introduces errors. For photographs, these errors are usually insignificant, but for line drawings and text, they appear as smudges around crisp edges. A naive way to remove these smudges is to convert all grey pixels in black-and-white line drawings to either pure black or pure white. However, it is common for the lines in the original image to have grey edges to make the lines look smooth ("anti-aliasing"). Aim and Outline This project will develop software for removing speckles around text and lines in JPEG images. It will pose the decoding as an optimization problem. (For a given compressed file, it will find the image with the maximum number of background-coloured pixels that could possibly have been compressed into that file.) This will result in a much clearer image than the traditional decoding technique. If time permits, the resulting algorithm will be implemented as a plug-in for Chrome and Firefox. URLs and References https://en.wikipedia.org/wiki/Jpeg Pre- and Co-requisite Knowledge Programming skills (C/C++ and/or Matlab preferred) Basic mathematics (Knowledge of Fourier transforms helps) Probabilistic Methods for Information Retrieval Supervisors: Prof Wray Buntine Background In the world of Information Retrieval, BM25, a variant of TF-IDF is king. "Language models for information retrieval" have been developed as an alternative but is an incremental improvement at best, primarily because the models are mostly unigram. Richer predictive models would look at word interactions and could offer improvements. Aim and Outline To explore richer predictive models of text in the language modelling style and evaluate their performance on some standard collections. We have some probabilistic methods in mind here. An initial study would abandon computational considerations and test out different predictive models for retrieval performance ignoring cost. URLs and References The BM25 model is in "The Probabilistic Relevance Framework: BM25 and Beyond" at http://dl.acm.org/citation.cfm?id=1704810 Pre- and Co-requisite Knowledge FIT3080 Intelligent systems or equivalent is a prerequisite, and knowledge of probability and/or experimental computer science. Good programming experience (the code is in C). Visualisation Applications for "ContextuWall" in the Monash CAVE2 Supervisors: Dr Tim Dwyer and Prof Falk Schreiber Background Immersive Analytics is about creating computer software and hardware that support collaborative analysis, decision making, design and data understanding by providing an immersive multi-sensory user experience in which the users can directly interact with their data or design. It provides a powerful, natural interface to analytics software such as simulation, optimisation and data mining. Aim and Outline Immersive Analytics aims to create novel, natural ways for people to explore and interact with complex data. This project aims to repurpose Monash's$1.9 Million CAVE2 facility to better support data analysis. It will develop gesture-based interaction and large display methods in the CAVE2.

URLs and References
http://monash.edu.au/cave2

Pre- and Co-requisite Knowledge
Programming skills (ideally one or more of: Python, C#, C++, Java)

Visualising Biological Pathways in Cola.js
Supervisors: Dr Tim Dwyer and Prof Falk Schreiber

Background
Visualisation of biological processes and networks is increasingly important, and graphical standards are available to support knowledge representation in the biosciences (Systems Biology Graphical Notation, SBGN).

Aim and Outline
The future of the computing is the web and HTML5 now offers a complete platform for building rich interactive applications. Cola.js (A.K.A. 'WebCoLa') is an open-source JavaScript library developed by researchers in our Faculty for arranging your HTML5 documents and diagrams. This project will extend Cola.js to visualise biological networks and cellular processes.

URLs and References
www.sbgn.org

Pre- and Co-requisite Knowledge
Programming skills (ideally Javascript and HTML5), systems biology standards (SBGN, SBML)

Verification and validation of open APIs for banking
Supervisors: Yuan-Fang Li and Robert Merkel

Background:
Banks have a complex IT infrastructure with very high reliability, robustness, and security requirements. Many banks are currently developing open application programming interfaces (APIs) to make banking functionality available more flexibly, both within and across organisation boundaries. These open APIs interact in a variety of complex ways, and without proper quality assurance measures, such interactions may have undesirable and costly consequences.

Aim and Outline
In this project, we propose to combine formal methods and software testing techniques to model, verify and validate open banking APIs and their interactions. Modeling and checking the APIs will help to show the fundamental soundness of the APIs - or reveal potentially serious design flaws, if they exist. Then, the model can be used to efficiently test systems that implement the modeled APIs, thus giving confidence that the systems under test implement the APIs correctly. This project is supported by ANZ Bank and will have a focus on ANZ banking systems.

URLs and References
http://mit.bme.hu/~micskeiz/pages/modelbased_testing.html
http://openbankproject.com/en/

Pre- and Co-requisite Knowledge
Having studied FIT3013 or equivalent would be an advantage.

Supervisors: David Arnott and Caddie Gao

Background

Shadow IT is a reality in modern organisations and comprises IT applications and infrastructure that exist outside the boundaries and control of an organisation's formal IT structure, irrespective of
whether that structure is centralised or decentralised, or whether it is unsourced or outsourced. Shadow IT has been estimated as comprising up to half of an organisation's IT capability. It has been enabled by
the collapse in the cost of hardware, software, and networks, as well as increased IT education across all business disciplines. No research has been conducted into shadow BI and no industry source has claimed
it even exists.

Aim and Outline
The aim of the project is to investigate the existence and nature of shadow BI.

This project could be conducted using a case study or a survey.

Students working on this project will be required to be part of the DSS Lab. This includes attendance and participation in weekly seminars.

Pre- and Co-requisite Knowledge
To tackle this project students need an undergraduate degree in IT (preferably in business information systems) or be a student in the Master of Business Information Systems./p>

Galactic archaeology using Minimum Message Length
Supervisors: David Dowe and Prof. John Lattanzio

Background

We consider astronomical data of stars from GALAH/HERMES as given by the stars' relative chemical concentrations. The ratios of concentrations of different chemical elements gives some idea about the generation of stars.

Aim and Outline
We seek patterns in the data of chemical element concentrations in terms of clustering the stars into groups and also in terms of finding different ratios of concentrations in the various groups. We do this using the Bayesian information-theoretic Minimum Message Length (MML) principle of machine learning. The work will involve the statistical technique of latent factor analysis and the technique from statistics and machine learning of mixture modelling and clustering.

URLs and References
C. S. Wallace (2005), "Statistical and Inductive Inference by Minimum Message Length", Springer

Pre- and Co-requisite Knowledge
Good marks in university mathematics-related or statistics-related subjects, at least to first-year level, and an ability at or interest in mathematics. Likewise, knowledge of or interest in astronomy./p>

Extended Database Normalisation Using MML
Supervisors: David Dowe and Nayyar Zaidi

Background
Database Normalisation is typically done in order to avoid update, insertion and deletion anomalies - essentially making sure that any stored data won't be lost and that any update, insert or delete operations only have to be done in one place. The normalisation is depending upon the (so-called) business rules. But, in certain situations, the attributes might not have such helpful names as StudentId and SubjectId, and the business rules won't all be known. In such situations, it might be be necessary to infer both the business rules and the normalisation. Using Minimum Message Length (MML), his has already been done as far as third normal form (3NF).

Aim and Outline
We take this to higher normal forms, and then apply this on larger data-sets.

We also explore what sort of other machine learning and statistical techniques (other than MML) might be able to infer these normal forms.

URLs and References
David L. Dowe and Nayyar A. Zaidi (2010), "Database Normalization as a By-product of Minimum Message Length Inference", Proc. 23rd Australian Joint Conference on Artificial Intelligence (AI'2010) [Springer Lecture Notes in Artificial Intelligence (LNAI), vol. 6464], Adelaide, Australia, 7-10 December 2010, Springer, pp82-91.

C. S. Wallace (2005), "Statistical and Inductive Inference by Minimum Message Length", Springer.

Pre- and Co-requisite Knowledge
Good marks in university mathematics-related or statistics-related subjects, at least to first-year level, and an ability at or interest in mathematics. Also, satisfactory completion of at least one database-related subject.

Effects of automation on employment and society
Supervisors: David Dowe

Background
Going back perhaps as far as the printing press, automation has changed employment and society. Advances in technology in more recent decades see computers not only outperforming humans at tasks once thought to be the preserve of humans, but also increasingly performing jobs which many thought only humans could ever do. What will be the impact on employment and employment levels? Which careers are safer? What are likely impacts on society? While many have been anticipating the technological singularity (when machines are purported to become smarter than humans) at least as far back as Solomonoff (1967) and Good (1965), whenever that does or doesn't come, the effects of job displacement seem to be arriving with increasing rapidity.

Aim and Outline
To build upon the given references and other studies to address these questions of employment and society, initially in an Australian context. This work will possibly be partly supported financially by a branch of the Australian Government.

URLs and References
David H. Autor (3/Sept/2014), "Polanyi's Paradox and the Shape of Employment Growth" (47 pages)

Frey and Osborne (2013), "The future of employment: how susceptible are jobs to computerisation?"

"Humans Need Not Apply" [https://www.youtube.com/watch?v=7Pq-S557XQU, published on 13 Aug 2014]

Tali Kristal (2013), "The Capitalist Machine: Computerization, Workers' Power and the Decline in Labor's Share Within U.S. Industries", American Sociological Review, 78 (3), pp361-389.

R. J. Solomonoff (1967), "Inductive Inference Research Status Spring 1967".

Pre- and Co-requisite Knowledge
A knowledge of I.T. and automation, the ability to read and understand the given references. This includes having a sufficient background in mathematics and statistics to understand the relevant analyses. A knowledge of economics or sociology would be a bonus.

Big Data Processing: An Industry Case Study with the Railway Institute
Supervisors: Assoc Prof David Taniar

Background
Are you interested in learning Big Data? Big Data is a multi-million industry. This project focuses on processing large data volume, and it is in a collaboration with the Railway Institute at Monash, that
provides large datasets about railways. You will work with A/Prof David Taniar as well as a team from The Railway Institute, Monash.

Aim and Outline
This project aims to solve big data problems by developing programs to answer queries in a timely manner. You will program using MapReduce/Hadoop, and latest technologies, such as Spark.

Pre- and Co-requisite Knowledge
A strong background in programming and databases/p>

Music Database Processing
Supervisors: Assoc Prof David Taniar

Background

Do you play any classical music instruments? Do you want to combine your advanced musical skills with computer science. This project analyses classical musics using computer science techniques.

Aim and Outline
This project aims to process and analyse classical music recordings, including sonata form analysis, chord progression, concerto identification, etc. You will need to learn the basic of signal processing, and Matlab.

Pre- and Co-requisite Knowledge
You must be an intermediate music instrument player (e.g. minimum level 5 or 6 piano, violin/cello, brass, woodwind)./p>

Learning from non-stationary distributions
Supervisors: Geoff Webb

Background
The sheer volume and ubiquity of data in the information age demand increasingly effective technologies for data analysis. Most online data sources are non-stationary: factors bearing on their composition change over time, as do relations among those factors. But nearly all machine-learning algorithms assume invariance, which greatly reduces their usefulness.

Aim and Outline
This Project will be the first comprehensive investigation of emerging technologies for learning from non-stationary distributions, guided by the insight that subgroups change in different ways, at different times, and at different speeds. Outcomes will include robust, tested, and reliable data analytics for non-stationary data - enabling far more efficient use of big data, with countless real-world applications.

URLs and References
G.I. Webb (2014). Contrary to Popular Belief Incremental Discretization can be Sound, Computationally Efficient and Extremely Useful for Streaming Data. In Proceedings of the 14th IEEE International Conference on Data Mining. http://www.csse.monash.edu/~webb/Files/Webb14.pdf

Pre- and Co-requisite Knowledge
FIT2004 Algorithms and data structures or equivalent

Profiling Public Transport Users on Social Media.
Supervisors: Sue Bedingfield, Marc Cheong and Kerri Morgan

BackgroundSocial media is the lens of society. If we want to know about what people think about issues ranging from politics to current affairs, social media provides us with some insights of the views of individuals.

Aim and Outline

The aim of this project is to identify types of Public Transport (PT) users who are active on social media, their views on the PT system and to measure their engagement in this system. In particular, groups of people that are concerned about specific areas of the PT system. The source of information about public transport users will be obtained from social media data from Twitter. A list of seed users will be gathered in two ways: Firstly, identify all official Victorian Public Transport Twitter accounts, then the users who communicate with these accounts. Secondly, identify anyone who has tweeted about public transport in Victoria.

Two types of information will be gathered: the metadata of the users (available from Twitter) and text data from the tweets themselves obtained by text mining using RapidMiner or other software. The types of users commenting on public transport will then be profiled using clustering techniques on each of these domains. Cross-clustering techniques may be used to integrate these results.

This project will be of benefit to stakeholders of the PT system and can be generalised to other areas of public interest.

Pre- and Co-requisite KnowledgePotential students should have skills in data mining and statistics, a basic understanding of the Victorian PT system, and an interest in applying these skills to social media analysis. Interested students, please talk to Kerri, Marc and Sue.

Promoting Healthy Dietary Habits by Lifelogging with Google Glass
Supervisors: Tim Dwyer; Marc Cheong

Background
Google Glass is a wearable eyeglass computer which provides capabilities such as augmented reality, positional sensors (e.g. location/movement), image capture, and gesture-based interaction. Research in Glass have mainly dealt with applications in mission-critical areas such as assisted surgery, telemedicine, and measuring vital signs.

One potential area of research is the use of Glass in actively lifelogging food, alcohol, and nicotine consumption to promote a healthy lifestyle. There are existing mobile apps (e.g. alcohol calculators, and diet planners) and fitness wearables (e.g. the Fitbit); but none have the ubiquity, and versatility, of the Glass platform. An added bonus that is present with the Glass platform lies in its non-intrusiveness, which allows for user discretion (e.g. measuring and regulating alcohol consumption) at a social event. Proposals on using Glass for fitness and wellbeing have been alluded to / hypothesised on tech blogs, but no concrete implementation has been found thus far.

Aim and Outline
We propose the use of Glass to develop a lifelogging system to promote health and wellbeing by tracking consumption patterns. Features of this system may include, but are not limited to:
- Alcohol consumption patterns: helps the user pace/regulate drinking and promote safe driving habits. The Glass can capture the alcohol labels (standard drinks, % a.b.v., or even serving size), and reminds the user to tap/issue voice commands every time a drink is consumed. This logs alcohol consumption per session (helping the user measure alcohol intake for safe driving); and also monitor long term consumption patterns.
- Food and Calorie tracking: by capturing food/nutritional labels, barcodes, (or time permitting - image recognition), this helps the user calculate and track nutritional intake on a daily basis (e.g. avg. caloric intake, % of RDA etc).
- Nicotine consumption patterns: helps the user track the amount of nicotine consumed (e.g. via patches, or cigarettes) on a daily basis, useful for a nicotine user who is thinking of quitting.

URLs and References
Papers on augmented reality: e.g. Azuma et al (2001), http://ieeexplore.ieee.org/document/963459/

Pre- and Co-requisite Knowledge
- Human-Computer Interaction (specifically augmented reality)
- Mobile programming using the Android API
- Basic image processing (specifically OCR) techniques

Shareholder Value of IT Investments
Supervisors: Vincent Lee and Yen Cheung, and ANZ supervisor (to be advised)

Background
Preferably to commence from semester 2, 2015 through to end of semester 1, 2016.

Aim and Outline
Define ways to measure contribution of IT spend to shareholder value, a methodology or model that can be applied, and a theorem and heuristics that executives can leverage in making IT investment decisions.

Context

• Each year $1B+ of proposed IT investments are planned for the Bank • Alignment to business strategy and technology strategy is measured • Business benefits are put forward either by estimating the CTI value or based on an imperative kind of assertion • Usually there is an element of 'gut feel' in which some initiatives make the cut and some do not in an executive round-table. • This then becomes the official list of IT initiatives which then are tasked with delivering value Approach Address the research questions: • Technology benefits of IT investments (such as reuse, simplification, agility, layering, disruption, transformative potential) are not easily measured. There is an absence of literature on the subject especially quantitative/economic based. • Each$ of IT investment should not only provide business benefits but improve the digital assets of the Bank which in turn would improve shareholder value/economic returns
• Is it possible to measure IT spend contribution not only in terms of business benefit (CTI) but also in terms of improvement to the technology assets themselves as a contributor to shareholder value?
• That is we may get business benefits but if that initiative detracts from the enterprise value of technology as an asset then we should price that in the negative, deducting from the net business benefits.
• Conversely if we improve the value of technology as an enterprise asset then that should be priced positively.

How the deliverable will be used?

• Making the quantitative connection between IT investment and shareholder will help the Bank prioritise annual IT spend in addition to strategic alignment and business benefits (the GAP)
• Adding a delta price of initiatives which have a positive or deleterious effect on the enterprise value of technology will help in prioritisation
• Understanding what the next step might be in making the IT investment process more robust and scientific.

URLs and References
www.anz.com.au; semester 2, 2014 reading lists for FIT3051 and FIT5159 https://moodle.vle.monash.edu.au

Pre- and Co-requisite Knowledge
Student should have studied FIT3051 Decision support system for finance; and/or FIT5159 IT for financial decisions

Clustering of DNA sequences to identify DNA-binding sites
Supervisors: David Dowe, Mirana Ramialison (Australian Regenerative Medical Institute, or ARMI)

Background
DNA-binding proteins (transcription factors, a.k.a. master gene regulators) are genes that control the formation of our organs. They instruct each cell to become a specific organ by binding to specific DNA sequences.

Aim and Outline
We use clustering (and, more generally, mixture modelling) to cluster and identify the binding sites of these genes in order to decrypt the DNA code that determines the identity of a cell (e.g: heart cell, liver cell, …, etc.). Ultimately, this will give insight into understanding how regeneration process occurs (e.g.: how animals such as the zebrafish can regrow organs but humans can't).

The data consists of segments of DNA bases (A, C, G, T) that contain the binding site of transcription factors in an unknown location from different tissues and species (plant, mouse, human, etc.).There is an abundance (100s of Megabytes) of relevant public data and in-house ARMI data.

We will initially analyse this with k-means clustering and then introduce Minimum Message Length (MML) and Snob.

Snob uses MML to infer a model which specifies the number of clusters (or components), the statistical parameters (e.g., mean and standard deviation) of each cluster, the size (more specifically, the relative abundance or mixing proportion) of each class, and the assignment of parts of the data into respective classes.

DNA bases might be analysed 1 base at a time or in pairs of bases, etc.

URLs and References C. S. Wallace (2005), "Statistical and inductive inference by minimum message length'', Springer.

Wallace, C.S. and D.L. Dowe (2000). MML clustering of multi-state, Poisson, von Mises circular and Gaussian distributions, Statistics and Computing, Vol. 10, No. 1, Jan. 2000, pp73-83.

Haudry Y.*, Ramialison M.*, Paten B., Wittbrodt J., Ettwiller L. (2010) Using Trawler_standalone to discover overrepresented motifs in DNA and RNA sequences derived from various experiments including chromatin immunoprecipitation. (*=Co-first authors). Nature Protocols. 2010;5(2):323-34.

Pre- and Co-requisite Knowledge

Essential: Mathematics to at least 1st year university level, interest in bioinformatics.

Desired: Knowledge of bioinformatics, interest in regenerative medicine.

Big Data in Education: Learning Analytics using Data Mining
Supervisors: Chris Messom

Background
Big Data and the underlying technologies, (MapReduce, Hadoop, Spark etc) are revolutionising business analytics in both commercial and government sectors. Learning analytics are the data mining techniques used to support large scale learning support through learning management systems such as Moodle.

Aim and Outline
To review current state of the art in Learning Analytic systems and identify an area that would benefit from supervised and unsupervised classification data mining techniques.

Identify relevant research questions to be answered by the study.

Implement a prototype Big Data learning analytic system that interfaces to a learning management system (such as Moodle), to address the research questions.

URLs and References
Data Mining in Education: http://onlinelibrary.wiley.com.ezproxy.lib.monash.edu.au/doi/10.1002/widm.1075/full
Data Mining Tools: http://onlinelibrary.wiley.com.ezproxy.lib.monash.edu.au/doi/10.1002/widm.24/full
Data Mining: Practical Machine Learning Tools and Techniques (Third Edition): http://www.sciencedirect.com.ezproxy.lib.monash.edu.au/science/book/9780123748560
https://spark.apache.org/
https://en.wikipedia.org/wiki/MapReduce
https://cloud.sagemath.com/ and https://cloud.sagemath.com/help

Pre- and Co-requisite Knowledge
Some or all of: Java, linux/unix, WEKA data mining tools, Hadoop, MapReduce, Spark, Moodle and/or Sage Maths tools.

Inference of ecological species distribution models
Supervisors: David Dowe and Prof. Lewi Stone (RMIT)

Background
Identifying how species are distributed over the landscape, interact and self-organize into foodwebs are central goals in Ecology. Species Distribution Models, or SDMs, have become one of the fastest moving and top ranked research fields in the ecological and environmental sciences.

Aim and Outline
This project will attempt to derive innovative statistical tools to improve our understanding of species distributions. These models predict the spatial distribution of all individuals of a particular species within its potential geographic range. The models are generally fitted to observed spatial survey data of a single species together with local measurements of environmental or geographical conditions that might potentially influence species’ occurrence or location (e.g., temperature, rainfall or elevation). Predictions of a species’ spatial distribution may then be computed under different environmental scenarios, such as modifying the SDM's environmental parameters to reflect hypothetical climate or land use changes. Having the ability to predict the likely locations of a species under different environmental scenarios is important for a wide range of conservation management and policy contexts, including the management of threatened species, assessing the impact of development scenarios, determining biodiversity “hotspots,” and predicting the likely ranges of invasive species. Very few models proposed for the analysis of these data-sets account for the effects of errors in detection of individuals, even though nearly all surveys of natural populations are prone to detection errors, which can be significant. Failure to account for imperfect detectability in models can induce bias in the parameters and predictions. This is an exciting challenge for which solutions are sought.

In this project we will be investigating and developing new statistical techniques for dealing with these problems. Minimum Message Length (MML) has the potential to revolutionise current techniques.

URLs and References
C. S. Wallace (2005), "Statistical and inductive inference by minimum message length'', Springer.

D. L. Dowe (2011), "MML, hybrid Bayesian network graphical models, statistical consistency, invariance and uniqueness", Handbook of the Philosophy of Science - (HPS Volume 7) Philosophy of Statistics, P.S. Bandyopadhyay and M.R. Forster (eds.), Elsevier, [ISBN: 978-0-444-51862-0 {ISBN 10: 0-444-51542-9 / ISBN 13: 978-0-444-51862-0}], pp901-982, 1/June/2011.

Pre- and Co-requisite Knowledge
Essential: Mathematics to at least 1st year university level.
Desired: Knowledge of or interest in ecology.

Swarming and Robustness
Supervisors: Jan Carlo Barca

Background
Robustness is one of the key characteristics of swarms, but what are the factors that underpin this highly attractive quality?.

Aim and Outline
This project aims to address this question by formulating mechanisms that can be used to evaluate the robustness of swarms under varying communication and control topologies. The student will work within Monash Swarm Robotics Laboratory and attend weekly meetings with researchers & students in the lab. This is a great opportunity for the selected student to learn about swarm robotics and work within a multi-disciplinary team consisting of software, mechanical and electrical engineers.

URLs and References
M. Brambilla, E. Ferrante, M. Birattari, and M. Dorigo. (2012) "Swarm robotics: A review from the swarm engineering perspective", Swarm Intelligence, vol. 7, issue 1, pp 1-41. Available: http://iridia.ulb.ac.be/IridiaTrSeries/rev/IridiaTr2012-014r002.pdf

C. Ramirez-Atencia, G. Bello-Orgaz, M.D. R-Moreno, D. Camacho, (2014) "A simple CSP-based model for Unmanned Air Vehicle Mission Planning," Proceedings of 2014 IEEE International Symposium on Innovations in Intelligent Systems and Applications (INISTA), pp.146,153. Available: https://dl.dropboxusercontent.com/u/27121329/Mission%20Planning.pdf

Pre- and Co-requisite Knowledge

Security vulnerabilities of Bitcoin and the mitigation
Supervisors: Joseph Liu and Ron Steinfeld

Background
Bitcoin is a payment system invented in 2008. The system is peer-to-peer such that users can transact directly without needing an intermediary. Despite its advantages, security is a big concern for such a digital currency.

Aim and Outline
Although some potential attacks on the Bitcoin network and its use as a payment system, real or theoretical, have been identified by researchers, there exists more vulnerabilities yet to be discovered. The aim of this project is to identify some potential security vulnerabilities of Bitcoin and propose the corresponding mitigation.

URLs and References
[1] S. Nakamoto. Bitcoin: A peer-to-peer electronic cash system, 2008. https://bitcoin.org/bitcoin.pdf

[2] Marcin Andrychowicz, Stefan Dziembowski, Daniel Malinowski, and Lukasz Mazurek. Secure multiparty computations on Bitcoin. In Security and Privacy (SP), 2014 IEEE Symposium on Security and Privacy, May 2014.

Pre- and Co-requisite Knowledge
Familiarity with the basics of digital currency would be an advantage.

Eco Campus App Database
Supervisors: David Taniar, David Albrecht, Nancy Van Nieuwenhove (Biology), Gerry Rayner (Biology)

Background
An Eco Campus App is being developed that will enable Biology students to identify and locate species on the Clayton Campus. These students will be required to report on their sightings of the species. These reports could include taking pictures, recording video, and  making notes. The collection of  these reports will then form the basis of an e-portfolio which will be used as part of the student's assessment.

Aim and Outline
In this project we will investigate the various issues associated with the Eco Campus App database and the e-portfolios.

For example, students may not be able to locate a particular species. This may be due to a number of factors, i.e., incorrect information in the database or not being able to correctly identifying the species. This raises questions on how to maintain the integrity of the database. There are also questions  regarding what is the most  appropriate approach for storing the e-portfolios and how to utilise the information in the e-portfolios to update the Eco Campus App database.

Efficiency and effectiveness of Incremental Mutation Analysis
Supervisors: Robert Merkel

Background
Mutation testing is a well-established technique for assessing the quality of the test suite of a program. It works by automatically creating mutant versions of the software with changes such as changing operators, variable names, and the like, in such a way as the program  will still compile. The test  suites are then applied to the mutant versions, and the proportion of mutants which trigger a failure in the test suite is recorded. The higher the proportion - the mutation score - of the test suite, the better the test suite is presumed to be. Experiments  have found that mutation score  is a very good predictor of the ability of a test suite to detect real faults. However, the process of creating, compiling, and running the entire test suite on very large sets of mutant versions of software requires a lot of time.

In modern  software development , test suite  quality is often measured on an ongoing basis, as part of a process known as continuous integration. An automated test suite is used many times per day, as developers use it to check the correctness of their changes as they are added to the project's  version control repository. In  this circumstance, any test suite quality assessment must provide feedback to developers quickly if it is to be useful. Mutation analysis is currently too slow for use on even moderately-large projects in the context of continuous integration.

A previous honours project showed that mutation analysis could be effectively parallelized on distributed computing clusters, but the financial costs of conducting mutation analysis on a cluster can be high if conducted regularly.

PiT, a mutation analysis tool for Java, has some support for incremental analysis, in which the results previous execution of mutation analysis on one version of a software system are used to minimize the amount of new work required to analyze a subsequent version. The creator of PiT has proposed several  additional optimizations which are not  yet implemented. However, no assessment of the effect of these optimizations on the speed and accuracy of mutation analysis has been reported.

Aim and Outline
In this project, we will collect empirical evidence about the performance improvements, and effects on mutation score accuracy, of incremental mutation analysis.

The research will involve modifying an existing open source mutation analysis tool to collect additional data about internal operations, implementing (only to proof-of-concept standard) some or all of the proposed optimizations, and using the version control repository of existing open source software  projects to collect data to  measure their effectiveness.

URLs and References
Yue Jia; Harman, M., "An Analysis and Survey of the Development of Mutation Testing," Software Engineering, IEEE Transactions on , vol.37, no.5, pp.649,678, Sept.-Oct. 2011

MutPy Python mutation analysis tool: https://pypi.python.org/pypi/MutPy/0.4.0

PiT mutation analysis Tool for Java: http://pitest.org

Coles, H. Incremental analysis (PiT). http://pitest.org/quickstart/incremental_analysis/

Pre- and Co-requisite Knowledge
Useful skills for this project include:

• working knowledge of Unix programming
• Experience with cluster computing
• Familiarity with version control systems.
• Familiarity with unit testing frameworks such as JUnit or PyUnit.
• Understanding of basic descriptive or inferential statistics

None of these are essential, but the more familarity you have with these topics, the easier initial progress is likely to be.

Machine learning algorithms in “face icon maker” system for semantic and sentiment analysis of social network data
Supervisors: Associate Professor Vincent Lee & Dr Yen Cheung

Background
MIT Computer Science or Bachelor of Software Engineering

Aim and Outline
Social network such as Facebook and Twitter are known for its convenient and massive short text propagation. When users interact with each other (post, comment, or @someone), they attempt to make the text more attractive to express their feelings. A face icon system  is the platform that can  launch feelings. However, the tradition face icon uses non-adaptive algorithms for users to configure their preference, they have to search, post or installed in the system, which beside inconveniences, tedious and lack of desired accuracy. This project  aims to develop machine  learning algorithms that explore and exploit structured and semi-structure text data from social networks for semantic and sentiment analyses, which can be used by enterprise decision makers for improving product design and service quality.

URLs and References
https://css-tricks.com/icon-fonts-vs

SPECIAL ISSUE PAPER
Active learning in keyword search-based data integration
Zhepeng Yan · Nan Zheng · Zachary G. Ives ·Partha Pratim  Talukdar · Cong Yu
The VLDB Journal (2015) 24:611–631
DOI 10.1007/s00778-014-0374-x

Pre- and Co-requisite Knowledge
Text analytic mining, simulation software tool

Is cyber-crime operation cost predictable?
Supervisors: Associate Professor Vincent Lee & Dr Jianneng Cao (I2R, Singapore)

Background

MBIS or MIT Computer Science or Bachelor of Software Engineering (Hons)

Aim and Outline:
Body of practice-based literature has proposed cyber-crime estimated cost based generally on annual loss equivalent to justify for cyber-crime preventive budget for hard- and software purchase/development. Recent advance in big data analytic tools provide new insights  on the predictive capability of  real time cyber-crime operational cost. This project explores how to predict enterprise specific cyber-crime cost via big data analytic tools that exploit the descriptive, predictive and prescriptive analytics of cyber-crime data.

URLs and References
Brian Cashell, Willian D. Jackson, Mark Jickling, and Baird Webel (2014), “ The Economic Impact of Cyber-Attacks”, CRS Report for Congress, received through the CRS Web.

Howard E. Glavin (2003), “A Risk Modelling Methodology,” Computer Security Journal, vol. 19, no. 3 (Summer),pp.1-2; and Soo Hoo, How Much Is Enough? pp.4-12.

Ben Fischer (2014), “How to Calculate the Financial Impact of an Attack on your Network,” Arbor Networks Cyber-Security Summits (Oct 2014).

Arbor Networks white paper (2014), “The Risk vs. Cost of Enterprise DDoS Protection -How to Calculate the ROI from a DDoS Defense Solution,” 12 pages

Ponemon Institute Research Report (2014), “2014 Cost of Data Breach Study-Global Analysis”, May, Benchmark research sponsored by IBM

Pre- and Co-requisite Knowledge
Simulation software tool (e.g. Java Script, JADE or MATLAB)

Security for the Internet of Things (IoT)
Supervisors: Ron Steinfeld and Joseph Liu and Carsten Rudolph

Background:
The rapidly increasing number of devices connected to the Internet, especially small devices such as cameras, sensors, and actuators, making up the so-called Internet of Things (IoT), appears to be one of the big trends in computing for the near future. As such devices  are increasingly used to collect  potentially private data, as well as control critical infrastructure, the privacy and integrity security of IoT is becoming a highly important concern. Yet the massive scale of the emerging IoT, its highly distributed nature, and the low computational  abilities of many IoT  devices, pose new challenges in attempting to devise practical solutions for IoT security problems.

Aim and Outline
The goal of this project is to explore, implement and evaluate the practicality of protocols for securing the privacy and/or integrity of large scale, highly distributed IoT networks of low-power devices.

Examples of project topics include:

• Authentication protocols to enforce access control to IoT devices only to authorized users.
• Encryption protocols to provide privacy for IoT sensor data (e.g. for sending over the Internet to a cloud-based encrypted database).

Practical Implementation/evaluation -oriented projects will likely involve evaluating the secure protocol implementations on sample embedded hardware devices incorporating sensors, in collaboration with the Monash Dept. of Electrical and Computer Systems Engineering.

URLs and References:
[1] http://spectrum.ieee.org/telecom/security/how-to-build-a-safer-internet-of-things

Pre- and Co-requisite Knowledge
Depending on the nature of the project topic selected, the student should have either (1) Good programming skills and/or (2) good mathematical skills, and preferably both. Familiarity with the basics of cryptography would be an advantage.

Investigation of Cryptographic Code Obfuscation
Supervisors: Ron Steinfeld

Background
Code obfuscation is the process of hiding' the implementation of a program while preserving its functionality, which has potential applications in software IP protection, as well as a myriad of other cryptographic applications, such as efficient Broadcast Encryption.  Traditionally done by heuristic methods,  it was only recently [1] that plausible cryptographic methods of program obfuscation were proposed, but their secure cryptographic construction is still a major research problem.

Aim and Outline
The aim of this project is to evaluate the practicality of and explore improvements to some of the new theoretical code obfuscation methods related to sound security foundations, in particular the construction in [2]. Depending on student interest and capabilities,  the specific goals of this  project would be to evaluate the concrete memory and time requirements of these mechanisms, namely:

1. Memory Efficiency: Estimate concrete parameter sizes for the systems in [2] required to achieve a desired security /correctness level, based on the best known attacks on these systems, and known models for behaviour of those attacks.
2. Computational Efficiency: Evaluate the practical computational cost of the mechanism by implementing a prototype of the mechanism using efficient algorithms for the underlying mathematical computations (based on existing specialised arithmetic libraries), and evaluating its performance.

This is an opportunity for talented students to investigate state of the art cryptographic algorithms.

URLs and References
[1] S. Garg et al. Candidate Indistinguishability Obfuscation
and Functional Encryption for all circuits. Available at https://eprint.iacr.org/2013/451.pdf

[2] Z. Brakerski et al. Obfuscating Conjunctions under Entropic Ring LWE. In Proceeding
ITCS '16 Proceedings of the 2016 ACM Conference on Innovations in Theoretical Computer Science
Pages 147-156, ACM New York, NY, USA, 2016.

Pre- and Co-requisite Knowledge

Familiarity with the basics of cryptography would be an advantage. The student should have good mathematical and programming skills.

Encrypted Database System
Supervisor: Joseph Liu, Ron Steinfeld and David Taniar

Background
The convenience of outsourcing has led to a massive boom in cloud computing. However, this has been accompanied by a rise in hacking incidents exposing massive amounts of private information. Encrypted databases are a potential solution to this problem, in which the database is stored  on the cloud server in encrypted form, using a secret encryption key known only to the client (database owner), but not to the cloud server. However, existing encrypted database systems either are not secure enough, or suffer from various functionality and efficiency overhead limitations when compared  to unencrypted database, which can limit their practicality in various applications.

Aim and Outline
The goal of this project is to explore, develop and evaluate improvements to a selected functionality and/or efficiency aspect of existing encrypted database systems, with the aim of improving their practicality. Examples include:
*Efficient Implementation of encrypted database using standard  distributed computing frameworks like Apache Hive and/or NoSQL systems.
*Ranking Search Results: Current searchable encrypted database schemes do not support such a ranking functionality at the server. The goal is to investigate the feasibility of adding such functionality, while preserving a good  level of privacy against the server.

URLs and References
-Cash, D., Jarecki, S., Jutla, C., Krawczyk, H., Ro¸su, M.-C. and Steiner, M. Highly-scalable searchable symmetric encryption with support for boolean queries, Advances in Cryptology–CRYPTO 2013, Springer, pp. 353–373. Available online at https://eprint.iacr.org/2013/169.pdf

Pre- and Co-requisite Knowledge
The student should have (1) Good programming skills and/or (2) Familiarity with the basics of cryptography and distributed computing environments (such as Hadoop, Hive, HBase).

Previously Offered: Yes (previously offered title: Searchable Encrypted Databases) We have updated the title and the content.

Interactive visualisation of aerospace data using Virtual Reality displays
Supervisors: Maxime Cordeil, Tobias Czauderna (FIT), Callum Atkison (Eng)

Background
Aerospace data visualisation of 3D vector fields.

Aim and Outline
The project is focused on identifying and developing tools and an efficient workflow that will allow for the visualisation of the 3D velocity fields of large (8193 x 1000 x 1362 grid points and 340 Gb per time step) direct numerical simulations of the complex turbulent  flows in  the study of aerodynamics and aerospace engineering. The aim of the project is to establish a system by which we can efficiently render various aspect of this dataset and explore the relationships between different features and structures and their interaction in this  complex flow.  Typically, isosurfaces visualisations (see attached images) and animations of the 3D velocity fields are used to understand this data. This project will involve bringing this kind of visualisation to Virtual Reality displays such as the Oculus Rift and/or the CAVE2,  a large  virtual reality room with 80 high definition screen and tracking system.

URLs and References
https://ltrac.eng.monash.edu.au/

Pre- and Co-requisite Knowledge
3D programming

A Community Centred Data Aggregation Model
Supervisors: Vincent Lee, Yen Cheung, Chan Cheah

Background
Ratepayers Victoria (RV) is an incorporated association, purposed to advocate for Victorian ratepayers in matters of local government. Its mission is to assure and ensure good governance and compliance prevails in all council affairs. It also plays a role in developing  and implementing state wide systematic  reforms to ensure councils:

1. Are financially responsible and accountable to their ratepayers
2. Demonstrate open government and good governance in both local government and state government
3. Is socially and environmentally responsible in municipal service delivery and management.

To date, RV was part of the Fair Go Rates committee, and is now part of the Local Government Performance Reporting Framework (LGPRF) committee, developing better KPI metrics for Local Government.

RV also facilitates ratepayer advocates to better leverage technology to support their reform contributing activities. Therefore, it also plays important roles in forming strategic partnerships; ICT enabled governance tools, including future Local Government big data analytics data capture and reporting  capabilities

Aim and Outline
As a means to managing council complaints and compliments, this project’s goal is to design and develop a data aggregation model that provides a community centred and traceable approach in registering to resolving council complaints across different escalating  authorities. This model should also  incorporate useful analytics reporting to the different stakeholders.

URLs and References:

1. Council & complaints - A report on Current Practices and Issues (Feb 2015)
2. Complaint and complements management resources
3. The LG Act - to track which sections of the law a complaint or complement may breach or follow compliance.

Pre- and Co-requisite Knowledge
Prefer BIS majors otherwise none

A Community Centred Governance Model
Supervisors: Vincent Lee, Yen Cheung, Chan Cheah

Background
Ratepayers Victoria (RV) is an incorporated association, purposed to advocate for Victorian ratepayers in matters of local government. Its mission is to assure and ensure good governance and compliance prevails in all council affairs. It also plays a role in developing  and implementing state wide systematic  reforms to ensure councils:

1. Are financially responsible and accountable to their ratepayers
2. Demonstrate open government and good governance in both local government(LG) and state government (SG) contexts.
3. Is socially and environmentally responsible in municipal service delivery and management.

To date, RV was part of the Fair Go Rates committee, and is now part of the Local Government Performance Reporting Framework (LGPRF) committee, developing better KPI metrics for Local Government. RV also facilitates ratepayer advocates to better leverage technology to support their reform contributing  activities. Therefore, it also plays  important roles in forming strategic partnerships; ICT enabled governance tools, including future Local Government big data analytics data capture and reporting capabilities

Aim and Outline
This project involves the development of a governance model that incorporates Gov 2.0 concepts and the KPIs of the Local Government.

URLs and References

1. The LG Act
2. The LGPRF Workbook

Pre- and Co-requisite Knowledge
Prefer BIS majors otherwise none.

Change-point Detection in Hand Movement Data
Supervisors: Ingrid Zukerman, Jason Friedman (Tel Aviv University), Andisheh Partovi

Background
As part of an experiment on human perception and decision making (Physiology Department, Tel Aviv University), we have a set of hand movement data of subjects pointing in different directions, and possibly changing their direction in mid-action. We need to analyse these  data in order to determine  the time when the subjects have decided to change their pointing direction. This helps physiology researchers better understand the timeline of the decision making process in the brain. In order to identify the changes in the hand movement profile, the  student can utilise statistical  approaches such as Hidden Markov Models, which are often used in time series analysis and anomaly detection.

Aim and Outline
Developing a change-point detection algorithm to identify the changes in the trajectory of hand movements as soon they occur.

Pre- and Co-requisite Knowledge
FIT3080/FIT5047 Intelligent systems or equivalent is a mandatory prerequisite, and knowledge of time-series analysis is highly desirable.

Improving autonomous guidance within BB-8™ rolling droid by adding higher processing capabilities

Supervisors: Asad Khan, Richard Spindler (LateralBlast), David Hellewell (Intel Australia)

Background
The project will establish two-way communications between a low energy (LE) Bluetooth device, such as a Sphero’s BB-8™ rover droid and a computer for obstacle avoidance. The computer will host a parallel genetic algorithm (GA) and Fast Artificial Neural Net  (FANN) for calculating rapid solutions  to detected obstacles. A parallel implementation of the GA in C will be provided. This project will also provide on-loan the following items. (1) A BB-8™ droid, courtesy LateralBlast. (2) Intel Edison IoT computer, courtesy Intel.

Aim and Outline
LE Bluetooth connectivity will be established using Sphero’s Orbotix JavaScript SDK [1] for passing the droid’s sensory data to a higher speed processor. This processor will analyse the sensory data in real-time, using parallel GA and FANN, to compute  a suitable path for obstacle  avoidance. Limited range of the Bluetooth connection requires an intermediate computing device, which can be placed quite close to the droid for practical use. For this purpose, an Intel Edison IoT board [2], shall be made available for final testing  of the software.

URLs and References
[1] Orbotix JavaScript SDK https://www.npmjs.com/package/sphero
[2] Intel Edison IoT https://software.intel.com/en-us/iot/hardware/edison

Pre- and Co-requisite Knowledge
C/C++, and Java/Javascript. Knowledge of MPI (message passing interface) will be highly regarded.

A fast deeplearning framework for multiple scene analyses
Supervisors: Asad Khan, Y. Ahmet Sekercioglu (Heudiasyc France)

Background
The framework is expected to facilitate a number of applications requiring real-time image classification among multiple video streams. One such area is localisation of swarm robots.

Aim and Outline
This project will implement a parallel-distributed framework for rapid processing of multiple video streams for image classification. The code will be developed using the deeplearning module within OpenCV [1]. This code will be networked using message passing interface  (MPI) [2] or a similar  library. The code will thus be able to analyse an increasing number of video streams with relatively small increases in processing time.

URLs and References
[2] Open MPI Library, https://www.open-mpi.org/

Pre- and Co-requisite Knowledge

C/C++ and Python.  Knowledge of MPI (message passing  interface) and Java/Javascript will be highly regarded.

Mobile App for injury surveillance in Cricket

Supervisors: Asad Khan and Naj Soomro (Faculty of Medicine, Nursing and Health Sciences & CricDoc Pvt. Ltd.)

Background
Cricket is the most popular summer sport in Australia. At junior levels of cricket, injury incidence ranges between 15-49% with the injury rates.1,2 Traditionally, injury surveillance has relied up the use of paper based forms or complex computer software. 3,4 This  makes injury reporting laborious for  the staff involved. A mobile application that can be used on the field, may be a solution to better injury surveillance in cricket. CricDoc Pvt Ltd made an android based mobile App (CricPredict) in 2015, as a prototype.

Aim and Outline
Re-design the existing CricPredict (injury surveillance App) so that it can run across platforms, and provides better UI. The App will be field tested with Mildura West Cricket Club. The resulting protocol for the App, along with validation of injury data will be  published as a protocol paper  in Sports Technology Journal.
The student may be offered a Monash Summer Research Scholarship of \$1500 if their application is successful in the SRS summer round.

URLs and References

1. Orchard J, James T, Kountouris A, Blanch P, Sims K, Orchard J. Injury report 2011: Cricket Australia. Sport Health. 2011;29(4):16.
2. Das NS, Usman J, Choudhury D, Abu Osman NA (2014) Nature and Pattern of Cricket Injuries: The Asian Cricket Council Under-19, Elite Cup, 2013. PLoS ONE 9(6): e100028. doi:10.1371/journal.pone.0100028
3. Ranson C, Hurley R, Rugless L, Mansingh A, Cole J (2011) International cricket injury surveillance: a report of five teams competing in the ICC Cricket World Cup 2011. Br J Sports Med 47(10): 637–43.
4. Sports Medicine Australia, Cricket Injury Reporting form.
5. Soomro, N., R. Sanders, and M. Soomro. "Cricket injury prediction and surveillance by mobile application technology on smartphones." Journal of Science and Medicine in Sport 19 (2015): e6.
6. www.cricdoc.com

Pre- and Co-requisite Knowledge
A knowledge of Mobile programming, App Development & SQL servers.
Working knowledge of cross platform development applications like Meteor or PhoneGap will be useful.

Quality of data framework for supporting healthcare information management
Supervisors: Frada Burstein and Rob Meredith

Background
Data quality issues come up very high on the agenda when dealing with organisational decision-making. Prior research demonstrated that there are sets of criteria which should be taken into consideration as a framework to evaluate the quality of data, and such framework  has to be tailored depending  on the context of the organization and the purpose of the evaluation.

Aim and Outline
The project will take a generic framework for data quality evaluation as a starting point to demonstrate its applicability to the large Australian healthcare institution. It will follow design science research to refine the framework through applying it to suit  the needs of information management  team.

Pre- and Co-requisite Knowledge
Knowledge of systems analysis and design and decision support principles, information management and knowledge management units completion will be useful

Pathfinding for Games
Supervisors: Daniel Harabor

Background
Pathfinding is fundamental operation in video game AI: virtual characters need to move from location A to location B in order to explore their environment, gather resources or otherwise coordinate themselves in the course of play. Though simple in principle such problems  are surprisingly challenging for game  developers: paths should be short and appear realistic but they must be computed very quickly, usually with limited CPU resources and using only small amounts of memory.

Aim and Outline
In this project you will develop new and efficient pathfinding techniques for game characters operating in a 2D grid environment. There are many possibilities for you to explore. For example, you might choose to investigate a class of "symmetry breaking'' pathfinding  techniques which speed up search  by eliminating equivalent (and thus redundant) alternative paths. Another possibility involves dynamic settings where the grid world changes (e.g. an open door becomes closed) and characters must re-plan their routes. A third possibility is multi-agent  pathfinding, such as cooperative settings  where groups of characters move at the same time or where one character tries to evade another.

Successful projects may lead to publication and/or entry to the annual Grid-based Path Planning Competition.

URLs and References
http://www.harabor.net/daniel/index.php/pathfinding/

Pre- and Co-requisite Knowledge
Students interested in this project should be enthusiastic about programming.
They should also have some understanding of AI Search and exposure to the C
and/or C++ programming language.

Algorithm analysis techniques to improve Integer Programming solvers
Supervisors: Pierre Le Bodic

Background
Industry problems can often be modelled using Integer Programming (IP) [1], a mathematical abstraction. IP solvers (e.g. IBM Cplex [2]) provide an optimal solution to any problem described in that mathematical setting, but this process can take long. To be efficient,  state-of-the-art IP solvers combine multiple  solving algorithms. Each algorithm is usually well theoretically understood, but the combination used in solvers is not.

Aim and Outline
We will use algorithm analysis techniques (as in e.g. [3]) to theoretically investigate how algorithms are combined in state-of-the-art IP solving, and try to come up with better algorithms.

Pre- and Co-requisite Knowledge
A taste for algorithm analysis and computational complexity is desirable.

Game theoretical models of reputation dynamics
Supervisors: Julian Garcia

Background
This project uses game theory and computational models to study the dynamics of reputation and cooperation.

Aim and Outline
Reputation is important in enabling reliable interactions between strangers across many domains, including a host of applications online [1]. Simple mathematical models demonstrate that strangers can learn to cooperate with each other sustainably, if doing so will  enhance their reputation [2, 3].  These models use game theory to understand how agents learn to coordinate their independent actions [4]. While the resulting models of reputation dynamics are insightful, they are often based on games that are too simple and detached from reality  [5]. This project addresses that  gap by combining agent-based models and game theory.

URLs and References
[1] Resnick, Paul, et al. "Reputation systems." Communications of the ACM 43.12 (2000): 45-48.

[2] M. A. Nowak and K. Sigmund. Evolution of indirect reciprocity by image scoring. Nature, 393:573–577, 1998.

[3] M. A. Nowak and K. Sigmund. Evolution of indirect reciprocity. Nature, 437:1291–1298, 2005.

[4] M. A. Nowak. Five rules for the evolution of cooperation. Science, 314:1560–1563, 2006.

[5] FP Santos, FC Santos, et al. Social norms of cooperation in small-scale societies. PLoS Comput Biol, 12(1):e1004709, 2016. **

** Key reference.

Pre- and Co-requisite Knowledge
Problem solving skills, an interest in applied mathematics and simulation and skills in Python, C or both.

Deep Learning for Playing Games
Supervisors: Reza Haffari

Background
Deep Learning has revolutionised many subfields of artificial intelligence, including 'game playing'. The success of Google Deepmind's intelligent computer program in playing Go and Atari games has made a breakthrough in the field.

Aim and Outline
In this project, we look into the deep learning technology behind Deepmind's game players and try to understand and improvement it. Particularly, we look into employing better neural reinforcement learning algorithms for learning intelligent agents.

URLs and References
https://en.wikipedia.org/wiki/AlphaGo

Pre- and Co-requisite Knowledge
Data Structures and Algorithms (FIT2004), Intelligent Systems (FIT3080)

Deep Learning for Text Understanding
Supervisors: Reza Haffari

Background
Deep Learning has revolutionised many subfields of artificial intelligence, including automatic text understanding by machines. Recent successes based on this approach has high performing machine translation models, text summarisation models, etc.

Aim and Outline
In this project we aim to build neural models for better analysis of text. Potential applications include machine translation, text summarisation, textual reasoning, etc.

URLs and References
http://www.kdnuggets.com/2015/03/deep-learning-text-understanding-from-scratch.html
http://www.wildml.com/2016/01/attention-and-memory-in-deep-learning-and-nlp/#more-548

Pre- and Co-requisite Knowledge
Data Structures and Algorithms (FIT2004), Intelligent Systems (FIT3080)

Clustering for hierarchical time series forecasting with big time series data
Supervisors: Christoph Bergmeir

Background
Time series forecasting with large amounts of data gets more and more important in many fields. In this project, we will work with data from a large optical retail company that sells up to 70,000 different products in 44 different countries in over 6000 stores world wide. The goal is to produce accurate sales forecasts, which the company can use for store replenishment and -- more importantly -- supply chain management. The products are mainly produced in China, and have several week of lead time from production until they can be sold in a store.

Aim and Outline
The main challenge of this dataset is that many of the products are similar but have a short history as the assortment changes relatively quickly with fashion trends, so just using univariate time series forecasting may often not be possible due to this short history. In this project, we aim to apply different clustering techniques (kmeans, dbscan, MML-based clustering) on features extracted from the time series and features that are known independently (master data). In this way, we can determine the similarity between series and can then use the these similarities in subsequent forecasting steps, to achieve more accurate forecasts.

Pre- and Co-requisite Knowledge
R programming, Data Science, Machine Learning, Clustering techniques

Is Digital Health a Problem or a Solution: Health Informatics cases analysis and design
Supervisors: Prof Frada Burstein, A/Prof Henry Linger (FIT), Prof Marilyn Baird

Background
Digital Health agenda is the currently highly relevant for the delivery of an efficient health care in Australia and internationally. The success of digital health is essentially related to the technology infrastructure and information systems underlying its implementation. There are many known successful cases of digital health implantation, but may be even more examples where information systems and information technology resulted in a failure.

Aim and Outline
The aim of this project is to collect and analyse a range of successful and problematic cases of healthcare delivery where the information technology went right or wrong. The result of the analysis will be described and presented as a library of cases suitable for teaching health informatics to medical students.

Pre- and Co-requisite Knowledge
Interest in health informatics is a bonus.

Efficient algorithms for scheduling visits to student placements
Supervisors: Sue Bedingfield, Kerri Morgan and Dhananjay Thiruvady

Background, Aim and Outline

This project aims to develop an efficient algorithm for scheduling visits to students on placement. Each student must be visited by one of a group of visitors subject to a number of constraints including availability of students and their supervisors and visitors, time required to travel between locations, workload of each visitor, and a preference to have a different visitor visit a given student on each visit. Ideally, we want to minimise the distances travelled between locations, ensure that the number of visits at a single location occur sequentially and are allocated to the same visitor, and minimise the amount of time required by a visitor to complete their workload of visits.

Scheduling is itself hard, and this problem is potentially harder due to the many requirements. In this project, the student will explore the use of heuristics and mixed integer programming can be applied to obtain an efficient algorithm for this problem. The demand for an effective solution to this problem has wide-ranging applications particularly with increasing numbers of student placement programs.

Pre- and Co-requisite Knowledge
Required: Strong programming skills. Able to write scripts for tasks such as re-formatting data and collating results.
Preferred: An interest in learning about optimisation techniques applied to real world problems.

Deep learning to identify pool pump power consumption

Supervisors: Lachlan Andrew and Zahraa Abdallah

Background
With an increase in use of intermittent renewable energy, like wind in South Australia, there is increasing need for users to be adjust their power consumption to match availability ("demand response"). Pool filtration pumps consume large amounts of power, and people usually do not mind too much when they are operated, and so ceding control of pool pumps to the electricity company -- in exchange for a reduced bill -- is a promising form of demand response. To evaluate the potential of this, it is necessary to find out when people currently run their pool pumps.

Aim and Outline
The aim of this project is to implement a convolutional neural networks to identify the times of use of pool pumps that are on timers. This problem is equivalent to looking for rectangles in a very noisy image, with non-stationary, highly correlated and highly anisotropic noise (i.e., the noise is very different in different parts of the image, the noise at one pixel is very similarly to noise at nearby pixels, and the correlation is different in different directions).
The convolutions neural networks will be trained on several images where the rectangles have been identified manually. If time permits, the performance of the neural network will be compared with that of an existing heuristic algorithm.

Pre- and Co-requisite Knowledge
Basic maths is needed
Familiarity with the basics of neural networks is an advantage, but
not necessary.
Familiarity with either Matlab or R is an advantage.

Data analysis and visualisation for electrification of remote underdeveloped locations
Supervisors: Rajab Khalilpour, Ariel Liebman, Lachlan Andrew, Tim Dwyer

Background
A developing country has over 80,000 villages with about 10,000 being unelectrified. The government has allocated an insufficient budget for electrification of these villages? How would you prioritize the villages and select the right ones to be electrified first?

Aim and Outline
The aim of this project is to utilize machine learning clustering technics to assess a database with 80,000 rows and develop decision support tools for helping decision makers in finding the most optimal (least cost and fair) set of villages for electrification.

Pre- and Co-requisite Knowledge
Data analysis, machine learning, decision analysis

Pre-Visit Wayfinding for the Vision Impaired using a Prototype Interactive Controller and a Virtual 3D Environment Deployed on a Mobile Phone
Supervisors: Michael Morgan

Background:
Wayfinding and navigation in unfamiliar places is a challenging task for those with a vision impairment, as it is difficult to convey spatial information to them before they visit a site. While solutions in the form of tactile diagrams are available they are costly to produce, do not convey some spatial information well, are limited in the contextual information that they can provide and have issues in terms of relating the scale of the diagram to the scale of the actual environment. What is needed is a more interactive and 'embodied' way to explore a location before they visit it in order to create a mental model for navigating in the space and finding their way to significant target locations.

Interactive objects can be developed as controllers to link the physical 'embodied' world to 3D simulations of environments in order eliminate the need for vision-based interfaces. Using rapid prototyping with a combination 3D printing and low-cost computing devices (such as a Arduino board and a Adafruit Absolute Orientation Sensor), it is possible to create a controller object with a button-based interface. This device can then be connect wirelessly to a 3D simulation of a physical environment developed in Unity and deployed on a mobile phone. Feedback to the user can be achieved through audio and haptics cues in order to avoid the need for a vision-based interface.

Aim and Outline
The aim of the project is to create a prototype an interactive controller object and to connect this to a 3D simulation of a physical location deployed on a mobile phone. The project will explore:
* The features required for the controller object,
* The features of the 3D environment that need to be modelled (in this case the proof of concept study will be of the sensiLab area),
* Creating a simulation that runs on a mobile phone platform and that will receive motion data from the controller object,
* The interface requirements for the controller object and the 3D simulation needed to cater for the vision impaired users with respect to audio and tactile feedback,
* User testing of the proposed system.
This will enable vision impaired people to explore the layout and important features of the location before visiting it in person. Ideally, a person will be able to download the 3D simulation of any physical location they intend to visit to their mobile phone and to explore or refresh their understanding of the layout of the space. Possible applications include modelling public locations, such as transport hubs, government buildings that provide services and work environments.

This project is based in the sensiLab research workshop at the Caulfield campus. If you are accepted for this project you will need to work regularly within the lab to access the equipment and facilities needed to develop the project.

URLs and References
Talking d20 20-Sided Gaming Die, https://learn.adafruit.com/talking-d20-20-sided-gaming-die/overview
Roll-a-ball tutorial, https://unity3d.com/learn/tutorials/projects/roll-ball-tutorial

Pre- and Co-requisite Knowledge
Explicit knowledge of any specific technologies is not required, however the student must be prepared to investigate and use any new technologies that may be suitable for the project. Technologies will most likely include:
* 3D Printing, including basic modelling'
* Low cost computing and components, such as Arduino boards and gyroscope sensors,
* Unity interactive environment development and deployment on a mobile platform.

Interactive and exploratory visualisation of time series data
Supervisors: Zahraa Abdallah, Minyi Li

Background
A time series is a sequence of observation taken sequentially in time. Many sets of data appear as a time series in almost every application domain, e.g., daily fluctuations of stock market, traces of dynamic processes and scientific experiments, medical and biological experimental observations, various readings obtained from sensor networks, position updates of moving objects in location-based services, etc. As a consequence, in the last decade there has been a dramatically increasing amount of interest in techniques for time series mining and forecasting. The very first step to understand time series data is to visualize it. Exploratory time series visualization enables essential tasks in analysis. With interactive and explanatory visualization, we will be able to answer questions such as:
- How similar different time sets are
- Are there any spikes in the data?
- Is there any pattern that we can extract?
- Can we notice possible shifts between similar sets time series?

Aim and Outline
Our aim in this project is to build a web-based interactive tool to visualise time series data. The tool will be able to build the basic exploratory data analysis and statistics of time series at different time granularity using various features. We will also investigate methods to find similarities and differences between time series using a set of metrics such as Euclidean distance. Many publicly available time series datasets can be used in this project such as traffic data, weather data, stock market ..etc.

URLs and References
Time series data:https://datamarket.com/data/list/?q=provider:tsdl

Pre- and Co-requisite Knowledge
The student must have experience in programming.

Supervisors: Minyi LI, Zahraa Abdallah

Background
No matter whether you realize or not, preferences are everywhere in our daily lives. They occur as soon as we are faced with a choice problem.
? It could be as simple as a pairwise comparison involved only a single decision variable, e.g., “I preferred to have red wine rather than white wine for dinner tonight”;
? or it could involve multiple decision criteria, “which mobile and internet bundle deal would you prefer?”
? most oftenly, preferences are conditional, i.e., the attributes for making a decision/choice could have dependencies on each other. As an extended example from the choice of wine, your preference over wine could depends on your choice of main meal, e.g., you may prefer white wine to red wine if you are going to have fish, or reversely if you are having beef as the main course.
Understanding and predicting users' preferences play a key role in various fields of applications, e.g., recommender systems, adaptive user interface design, general product design and brand building, etc.. However, in real-world decision problems, users' preferences are usually very complex, i.e., they generally have multiple decision criteria and have to deal with an exponential number of choices. This makes the investigation directly through preference relations/ranking over the entire choice space become ineffective and infeasible. Therefore, more efficient ways of understanding a user's preference through its structures and the interactions between decision variables are essential.

Aim and Outline
In this project, we will investigate methods to construct the structure of user preferences and understand the interactions between decision variables from data – it involves learning from observations that reveal information about the preferences structure of an individual or a class of individuals, and building models that generalize beyond such training data. Research might involve learning preference structure from real world data sets including netflix movie rating data, car preference data, etc..

URLs and References
https://en.wikipedia.org/wiki/Preference_learning

Pre- and Co-requisite Knowledge
The student must have experience in programming.

Sentiment Analysis in Education and e-Government
Supervisors: Dr. Chris Messom, Dr. Yen Cheung

Background
Evidence of the influence of people's opinion on the types of products and services that will be offered are emerging from the fast-growing research in affective computing and sentiment analysis. In particular, mining sentiments over the Web for commercial, higher education and government intelligence applications are gaining research momentum. Current approaches to affective computing and sentiment analysis fall into 3 main categories: knowledge based techniques, statistical methods and hybrid methods. Whilst the knowledge based approach is popular with unambiguous text, it does not handle the semantics of natural language or human behaviour very well. Similarly, statistical methods are also semantically weak and usually require a large text input to affectively classify text. The hybrid approach combines both techniques to infer meaning from text.

Aim and Outline
This project aims to develop a sentiment harvest model/system to evaluate Educational Systems and e-Government Systems using the hybrid approach.

URLs and References
Adinolfi, Paola, Ernesto D'Avanzo, Miltiadis D. Lytras, Isabel Novo-Corti and Jose Picatoste. "Sentiment Analysis to Evaluate Teaching Performance." IJKSR 7.4 (2016): 86-107. Web. 8 Feb. 2017. doi:10.4018/IJKSR.2016100108

E. Cambria, "Affective Computing and Sentiment Analysis," in IEEE Intelligent Systems, vol. 31, no. 2, pp. 102-107, Mar.-Apr. 2016.

Cambria, E., Grassi, M., Hussain, A., & Havasi, C. (2012). Sentic computing for social media marketing. Multimedia Tools and Applications, 59(2), 557-577. doi:http://dx.doi.org.ezproxy.lib.monash.edu.au/10.1007/s11042-011-0815-0

M Araujo et al, “iFeel:A system that compares and combines sentiment analysis methods”, Proc. 23rd Int'l Conf. World wide Web, 2014, pp 75-78.

Pre- and Co-requisite Knowledge
Some basic knowledge of AI is preferred (such as completion of an AI unit at undergraduate level). Otherwise, a highly enthusiastic and keen novice AI researcher may also be suitable for this project.

Cloud ERP: An organisational motivation and learning perspective
Supervisors: Dr Mahbubur Rahim, Dr Sue Foster and Dr Taiwo Oseni (External)

Background
The scepticism and uncertainty that finance executives initially felt about moving their mission-critical enterprise systems to the cloud is gradually fading and is now being replaced by a growing enthusiasm for the financial flexibility and freedom that comes from using the cloud's modular, pay-as-you-go approach to accessing the latest technology innovations (Miranda, 2013). In the last 4 to 5 years, several surveys detail why CFOs are increasingly open to moving their, enterprise applications into the cloud. For example, a 2012 survey by Gartner, a Financial Executives Research Foundation (ERF) and technology advisory firm, found that 53 percent of CFOs expect up to a 12% rise in the number of enterprise transactions delivered through software-as-a-service over the next four years (Gartner, 2012).
Now, 5 years later, it would be worthwhile to investigate the success of the move of enterprise transactions to the cloud. Has this 12% or even more being achieved as expected? Or has a hybrid solution, where the most critical and resource-demanding modules are kept on premise while less critical ones are deployed on a public cloud, been a more appropriate solution?

Aim and Outline
Recent studies such as (Mezghani, 2014) now report organisational intentions to switch from on-premise to cloud ERP, identifying the antecedents and determinants of the decision. This study will explore, within a large organisation setting, organisational motivation and organisational learning associated with moving ERP business processes to the cloud. Organisational motivation and organisational learning attributes can be helpful for understanding why and how organisations migrate towards cloud-based ERP. The study will aim to investigate:
* How does organisational motivation influence how ERP-based business processes are migrated to cloud ERP model?

URLs and References
Miranda, S. (2013). Erp in the cloud: Cfos see the value of running enterprise applications as a service. Financial Executive, 29(1), 65-67.
Mezghani, K. (2014). Switching toward cloud erp: A research model to explain intentions. International Journal of Enterprise Information Systems, 10, 46+.
Gartner. (2012). Ceo survey 2012: The year of living hesitantly. Retrieved from https://www.gartner.com/doc/1957515/ceo-survey--year-living

Pre-requisite
Either completion or nearing completion of bachelor degree in IS/IT

Cloud ERP in SMEs: The role of consultants
Supervisors: Dr Mahbubur Rahim, Dr Sue Foster and Dr Taiwo Oseni (External)

Background
While cloud computing is receiving increased research focus in recent times, cloud ERP is arguably one of the most valuable and influential SaaS applications available in the market, and as such should be the next wave of ERP and cloud research (Chao, Peng, & Gala, 2014).
Given the importance of ERP systems and the inherent complexity of supported business processes, functionality of the software, amount of data processed, and the financial flexibility and freedom that comes from using the cloud's modular, pay-as-you-go approach to accessing the latest technology innovations SMEs are well-suited to exploit the best practices cloud ERPs support (Johansson, Alajbegovic, Alexopoulos, & Desalermos, 2014; Miranda, 2013).
Best practices of cloud ERPs and immediate access to infrastructure and software, factors that result in the fast deployment of cloud-based ERP permit SMEs, which typically have less and simpler activities, to deploy and utilize a constantly maintained and updated cloud ERP solution. The vendor, who maintains the system, can also guarantee its optimal use, ensuring their business continuity. However, many cloud providers offer such high security levels for their products that SMEs cannot manage the implementation themselves (Johansson et al., 2014). As such, many SMEs require an ERP consultant's help in migrating to cloud ERP.

Aim and Outline
The aim of the study is to document the experience of an SME and capture relevant lessons learned. Such lessons can be useful for enhancing cloud ERP research in addition to guiding other organisations who are considering a similar move. The study will investigate:
* What role do ERP consultants play in assisting SMEs in deploying cloud ERP?

URLs and References

Chao, G., Peng, A., & Gala, C. (2014). Cloud erp: A new dilemma to modern organisations? Journal of Computer Information Systems, 54(4), 22-30.
Johansson, B., Alajbegovic, A., Alexopoulos, V., & Desalermos, A. (2014). Cloud erp adoption opportunities and concerns: A comparison between smes and large companies. Paper presented at the Pre-ECIS 2014 Workshop" IT Operations Management"(ITOM2014).
Miranda, S. (2013). Erp in the cloud: Cfos see the value of running enterprise applications as a service. Financial Executive, 29(1), 65-67.

Pre-requisite

Either completion or nearing completion of bachelor degree in IS/IT

Semi-supervised learning for activity recognition from mobile sensors
Supervisors: Zahraa S. Abdallah

Background
The availability of real time sensory information through these sensors has led to the emergence of research into “Activity Recognition” (AR). Activity recognition aims to provide accurate and opportune information based on people's activities and behaviours. Many applications have demonstrated the usefulness of activity recognition. These include health and wellness, activity-based crowdsourcing and surveillance, and targeted advertising Researchers have addressed many challenges in activity recognition, while many other challenges are yet to be resolved. State of the art activity recognition research has focused on traditional supervised learning techniques. However, there is typically a small set of labelled training data available in addition to a substantial amount of unlabelled training data. The process of annotating such data or finding ground truth is tedious, time-consuming, erroneous, and may even be impossible in some cases. Thus, semi-supervised, active and incremental learning are increasingly being investigated for activity recognition to overcome the limitation of scarcity of annotated data.

Aims and Outline
We aim first to survey semi-supervised and unsupervised learning approaches for activity recognition. Then, we will need to develop new technique that incorporates active and incremental learning for more accurate activity recognition with limited availability of labelled data.

URLs and References
https://en.wikipedia.org/wiki/Semi-supervised_learning
https://en.wikipedia.org/wiki/Activity_recognition
https://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones

Pre- and Co-requisite Knowledge
The student must have experience in programming.

Detecting insider threats in streaming data
Supervisors: Zahraa Abdallah, Geoff Webb

Background
Insider threat detection is an emerging concern in the globe as a result of the increasing number of insider attacks in recent years. In the Cyber Security Watch Survey, the statistics revealed that 21% of attacks are insider attacks. For instance, the insider attack of Edward Snowden was reported as the biggest intelligence leakage in the US. This attacks maps to IP theft case scenario, where Snowden disclosed 1.7 million classified documents from the National Security Agency to mass media. We address the insider threat problem as a stream mining problem of data streams generated from security logs, network data, and email headers. The challenge here is to distinguish between a normal change in insider's behaviour, and the evolution of a new concept that may be an indication of a malicious insider threat.

Aims and OutlineIn this project, we aim to apply different stream mining techniques to detect insider threats using Internet logs. The main challenge is to discover the evolution of new concepts and distinguish between normal behaviour and threats in data streams. Streaming data typically arrive in high speed and require real time analysis. Thus, the efficiency of the applied techniques is a crucial factor to consider.

URLs and References
http://www.cert.org/insider-threat/research/cybersecurity-watch-survey.cfm?
https://en.wikipedia.org/wiki/Data_stream_mining

Pre- and Co-requisite Knowledge
The student must have experience in Java programming.

Inferring Concurrent Specifications for a Sequential Program
Supervisors: Chris Ling, Yuan-Fang Li

Background
It is non-trivial to automatically exploit potential parallelism present in the source code written in mainstream programming languages such as Java and C#. One of the main reasons is the implicit dependencies between the shared mutable states of data. In these languages, compilers follow the execution order (sequentially) in which the program is written in in order to avoid side effects. Therefore, programmers need to write parallel programs in order to exploit computing power offered by the now prevalent multi-core architecture.

It is generally acknowledged that writing parallel programs using multithreading, is a difficult and time-consuming task due to errors such as race-conditions and deadlocks. Therefore, there is a substantial need of methods and tools to for automated exploitation of parallelism.

Aims and Outline
In order to help programmers to reason about concurrency, researchers have developed a number of abstractions called 'Access Permissions'. Access Permissions characterise the way multiple threads can potentially access a shared state. Our goal is to develop techniques that can automatically infer implicit dependencies (read/write behaviours) from a sequential Java program. Such dependency information can eventually be used to automatically parallelise the execution of these programs instead of requiring programmers to write concurrent programs using multi-threading.

We have already developed a high-level algorithm to infer dependencies. In this project, we aim to refine the algorithm and develop an Eclipse plugin that implements our proposed technique.

URLs and References

[1] Kevin Bierhoff, Nels E. Beckman, and Jonathan Aldrich. Practical API protocol checking with access permissions. In ECOOP, pages 195- 219, 2009.
[2] John Boyland. Checking interference with fractional permissions. In Static Analysis, volume 2694 of Lecture Notes in Computer Science, pages 55-72. Springer Berlin Heidelberg, 2003.
[3] Stork, S., Naden, K., Sunshine, J., Mohr, M., Fonseca, A., Marques, P., & Aldrich, J. (2014). AEminium: A permission-based concurrent-by-default programming language approach. ACM Transactions on Programming Languages and Systems (TOPLAS), 36(1), 2.

Pre- and Co-requisite Knowledge
* Good programming skills in Java or a similar object-oriented language
* Knowledge of basic object oriented constructs
* Knowledge of graph as a data structure

Optimising of multi-telescope observations of gravitational wave events
Supervisors: Assoc. Prof. David Dowe, Dr. Evert Rol, Dr. Duncan Galloway (School of Physics & Astronomy, Faculty of Science)

Background
With the first detections of gravitational waves established, the challenge now for astrophysicists is to find their electromagnetic counterparts. Detection of these elusive counterparts will confirm and constrain the models for the progenitors of gravitational waves. Details such as distance confirmation and the (electromagnetic) energy emitted, and the environment of the gravitational wave (GW) event, provide the necessary information to model the scenario that lead to such a catastrophic event.

Finding these counterparts is challenging, as the current localisation by GW detectors often yields a search area of hundreds of square degrees, often in disparate areas of the sky. Coordinated follow-up observations are essential, especially for small field-of-view telescopes. In particular at optical wavelengths, where a multitude of small field-of-view telescopes exist, uncoordinated observations may result in many duplicated efforts, while missing out large portions of the larger localisation area.

Aims and Outline
For this project, we have developed an approach involving genetic algorithms to optimise the search for counterparts in such cases. This way, we can easily incorporate constraints such as the area visibility per telescope, differences in field-of-views, or the expected brightness evolution of the counterpart.

A potential disadvantage of using a genetic algorithm is that such an algorithm is generally slower than other optimisation algorithms. Searches for counterparts, however, are generally required to start as soon as possible after the GW event, and even a 15 minute delay may lose valuable information.

The goal of the project is then to find the best set of algorithm parameters for a wide set of scenarios, so that we can create a fast and flexible scheduling tool.

In short:
- improve the algorithm, in particular its speed. This can be done both by improving the actual code, and tuning the algorithm parameters
- make the algorithm (fitness function) more flexible, to easily incorporate a wide variety of (observing) constraints
- compare a variety of observing scenarios, to determine where the largest improvements can be made in scheduling

URLs and References
- Example (simulated) GW location maps: http://www.ligo.org/scientists/first2years/#2016
- Current follow-up observations of the first GW event: https://dcc.ligo.org/public/0122/P1500227/012/GW150914_localization_and_followup.pdf (in particular Figure 3)
- Earlier work done on telescope scheduling with genetic algorithms (but only for point sources): http://rts2.org/scheduling.pdf

Pre- and Co-requisite Knowledge
- (Heuristic) optimisation
- programming languages: Python, C
- Affinity with astronomy (there are no particular astrophysical requirements)

Immersive geovisualisation
Supervisor: Bernie Jenny

Background
Little is known how quantitative geodata in a 3D visualisation are most effectively displayed. Better visualisation methods are required for diverse data, such as air pollution, noise levels, bushfire propagation, or toxic gas emissions at waste disposal sites.

Aims and Outline
The goal of this project is to develop new immersive 3D visualisation methods that are accurate, effective, and unambiguous to read. The visualisations should be applicable to AR (Augmented Reality), VR (Virtual Reality) and interactive 3D maps. You will draw inspiration from light painting art projects for placing bars and profiles in a 3D scene to visualise quantitative geodata. User acceptance, effectiveness, and efficiency can be evaluated through expert feedback and/or a user study.

Pre- and Co-requisite Knowledge
Computer graphics/OpenGL

Three-dimensional geographic flow maps
Supervisor: Bernie Jenny

Background
Flow maps show the direction and volume of moving goods, ideas, money, etc. between places [1, 2]. Flow maps are rarely rendered as three-dimensional objects, or used in immersive 3D visualisation.

Aims and Outline
The goal of this project is to develop methods for the visualisation of geographic flows in three dimensions. Various options can be explored: (1) develop an algorithm for 2D maps that arranges flows on the z-axis and renders them with a ray-tracer; (2) conduct a user study to compare 2D and 3D flow maps; (3) develop a program with Unity to visualise 3D flows with immersive visualisation (head mounted displays or the Monash CAVE2)

URLs and References
[1] Jenny, B. et al. (2017). Design principles for origin-destination flow maps. Cartography and Geographic Information Science.
[2] Jenny, B. et al. (2017). Force-directed layout of origin-destination flow maps. International Journal of Geographical Information Science.
http://monash.edu.au/cave2

Pre- and Co-requisite Knowledge
C#, C++, or Java for option (3).

Georectification of old maps
Supervisor: Bernie Jenny

Background
Many map libraries are currently scanning their old maps to better protect these often fragile and precious items. Historians use the scanned maps in geographic information systems and combine them with other geospatial data. To make this possible, old maps are georectified, that is, the old map is deformed to align with the coordinate system of a modern map.

Aims and Outline
Various methods exist for georectifying raster images, but they all deform text labels and map symbols. This project aims at developing a georectification method that preserves text labels, point symbols and lines. You adapt the moving least-squares method, which is used in computer graphics for reconstructing a surface from a set of points. MapAnalyst, a tool for the analysis of old maps, can be extended with this new method.

URLs and References
http://mapanalyst.org
Jenny, B. and Hurni, L. (2011). Studying cartographic heritage: analysis and visualization of geometric distortions. Computers & Graphics, 35-2, p. 402–411.

Pre- and Co-requisite Knowledge
Java programming

Supervisor: Bernie Jenny

Background
Dot maps show quantities on maps, for example, each dot can represent 1000 people. Extracting quantities from dot maps is difficult, because dots often overlap in dense areas. Graduated dot maps are a recent improvement that use dots of variable size, for example, a small dot for 200 people, a medium dot for 1000 people, and a large dot for 10,000 people. It has been shown that with graduated dot maps quantities are easier to estimate. A recently suggested method (in review) uses (a) the capacity-constrained Voronoi tessellation (CCVT) algorithm to place dots in a “nice” blue-noise pattern without overlaps, and (b) the DBSCAN clustering algorithm to identify dense clusters of dots. The dots identified by the DBSCAN algorithm are replaced with larger dots.

Aims and Outline
This project has four components: (1) The method outlined above can result in dots being placed in areas that make no sense (e.g. dots for people placed on oceans). The CCVT algorithm needs to be modified to prevent this. (2) The DBSCAN clustering algorithm can be simplified. (3) A plug-in for a geographic information system, such as ArcGIS or QGIS, can be created to make this method available to map makers. (4) Sample maps should be created to evaluate the method and the plug-in.

URLs and References
Balzer, M., Schlomer, T. and Deussen, O., 2009. Capacity-constrained point distributions: A variant of Lloyd’s method. ACM Transactions on Graphics (Proceedings of SIGGRAPH 2006), ACM, 28 (3), article 86, 8 pages.
Ester, M., Kriegel, H., Sander, J. and Xu, X., 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96). AAAI Press. 226–231.

Pre- and Co-requisite Knowledge
Preferably Python

Implementing Trust Decisions in Distributed Computing
Supervisor: Carsten Rudolph

Background
Distributed computing models are omnipresent in modern service architectures, with an abundance of protocols that endeavour to attribute non-functional properties such as reliability, trust, or security to them. The concept of trust is fundamental to our usage and reliance on computers in our daily life. However, trust is often a matter of fact one has to accept rather than an informed decision of an individual to trust his computer to correctly calculate a desired result. In a distributed system, especially when the computers involved do not belong to a single individual or an individual can choose which computers should do their calculations, trust adds meaning to a result and becomes mission critical. The current body of research for secure distributed systems is focused on designing verifiably secure protocols to guarantee aforementioned properties for communication. Properties like trust and reliability can not be based on any particular protocol; a protocol can merely strive to render any meddling with communication between cooperating platforms ineffective. This project will work with definitions of trust in distributed systems and apply a novel formalism for reasoning about trust based on platforms and computations with and application popular scenarios for distributed systems.

Aims and Outline
The objective is building a tool demonstrating the feasibility of automating trust decision processes in future distributed computing scenarios.
You will be able to dive into latest concepts of trusted computing, understand them, and learn their applications in current and future systems, extend and apply your knowledge in distributed systems, and work on novel formalisms and techniques for reasoning with them.

You will be using programming tools like Scala (Java's, younger, smarter sibling) in combination with Akka (a recent framework for powerful reactive, concurrent, and distributed applications) for implementing distributed algorithms together with languages like Python with interfaces to logic programming for static reasoning. This project will allow you to broaden your knowledge in systems security and use your expertise as a software developer and computer scientist by bringing scientific reasoning and efficient use of security mechanisms to real world scenarios.

URLs and References
Position paper on Trust: https://pdfs.semanticscholar.org/e9e5/42fe723f74f8b8db4f8d9a400ee178dcdc9b.pdf
Formalisms for Trust Management: http://homes.cs.washington.edu/~pedrod/papers/iswc03.pdf

Pre- and Co-requisite Knowledge
This project requires basic knowledge on distributed systems/computer networks and IT security and good programming skills.

A Cyber Security Requirements Model for the Monash Micro-Grid
Supervisor: Carsten Rudolph and Ariel Liebman

Background
Microgrids in the formal definition of the U.S. Department of Energy are a group of interconnected loads and distributed energy sources (DERs) within defined electrical boundaries that act like a single controllable entities with respect to the grid. A microgrid can connect and disconnect from other, bigger grids, to enable it to operate both stand-alone or in grid connected modes. During disturbances, the generation and corresponding loads can separate from a distribution system to isolate a microgrids load from any disturbances without harming the grids integrity. The ability to operate stand-alone, or island-mode, has a potential to provide higher local reliability than that provided by a power system as a whole.

These microgrids need extensive attention from the computer security community in so as to make sure that not only during their design but also during their operation at cyber threads do not jeopardize requirements such as safety and reliability. In a broader context, bigger power networks in which microgrids are embedded need the same attention to make sure that the decoupling and integration of individual microgrids does not harm other connected grids. Monash is part of the research initiative towards smart microgrids and new energy technologies in collaboration with the Clean Energy Finance Corporation (CEFC)

"Monash University is intent on developing innovative solutions to the challenges in energy and climate change facing our world,” stated Monash University President and Vice-Chancellor Professor Margaret Gardner.

Aim and Outline
The goal of this project is to improve the understanding of security requirements of the future Monash electricity network. In order to develop this understanding, you will create a model of the network showing the main components and the processes within the network. Then, you will work with micro grid specialized to identify security requirements in terms of processes and data. The first result will be a formal or semi-formal model the provides a precise expression of security requirements on different levels. You will also be to explore suitability of approaches like business process modelling, formal modelling frameworks or more technical trust relation models to express security requirements of such an infrastructure. As part of the ongoing research innititive towards smart micro grids you will be providing a unique insight into cyber security related challenges w.r.t trust and security in smart grids. Finally, the model will be used to evaluate the impact of possible security solutions.

URLs and References
Community Energy Networks With Storage

S. Gürgens, P. Ochsenschläger, and C. Rudolph.
On a formal framework for security properties
International Computer Standards & Interface Journal (CSI), Special issue on formal methods, techniques and tools for secure and reliable applications, 2004.

N. Kuntze, C. Rudolph, M. Cupelli, J. Liu, and A. Monti.
Trust infrastructures for future energy networks (BibTeX).
In Power and Energy Society General Meeting - Power Systems Engineering in Challenging Times, 2010.

Pre- and Co-requisite Knowledge
The project is suitable for student with cyber security knowledge and a sound knowledge of computer networks.

Independent component analysis for identifying patterns of household energy consumption
Supervisors: Lachlan Andrew and Asef Nazari

Background
As we move towards greater reliance on renewable energy, there is a greater need to shift electricity load to times when the sun is shining and the wind is blowing. A step towards achieving this is to understand current energy usage patterns. One way to do this, without invasively monitoring hundreds of houses, is to apply statistical
pattern recognition techniques to large collections of houses.

Aim and Outline
This project will apply Independent Component Analysis (ICA), or alternatively non-negative matrix factorisation, to half-hourly electricity consumption data of thousands of houses to seek to identify components corresponding to tasks such as heating, cooling, and getting ready for work. The first steps will be to preprocess the data to reduce the computational burden, and to perform ICA. The next step will be to try to interpret the resulting components to infer the underlying causes of the energy consumption.

Pre- and Co-requisite Knowledge
The student should be comfortable with basic mathematics, including probability and linear algebra. The student will need to learn a language such as Matlab or R, or a package such as NumPy.

Deep Learning Methods in Wireless Communications

Background
The problem of channel decoding of linear codes over  Additive White Gaussian Noisy (AWGN) channels using deep learning methods/techniques has been studied recently. This problem is also considered for low density parity check codes (LDPC codes) and medium to high density parity check codes (HDPC codes) [1,2,3,4]. Deep learning approaches are proved useful in multiple-input multiple-output (MIMO) channel decoding too [5].

Aim and Outline
This project aims at finding a low-complexity, close to optimal channel decoding of lattices and lattice/codes in wireless communications. We consider short length well-known lattices such as Barnes-Wall lattices and higher dimension ones including LDPC lattices, LDLC, LDA, and Turbo lattices (see [6] and references therein). We may try different approaches to tackle these problems. This includes applying the above mentioned techniques to the underlying label code of a lattice or applying the deep learning methods to the corresponding trellis representation of the lattice (see [7] and references therein).

URLs and References
[1] E. Nachmani, Y. Be’ery, and D. Burshtein, "Learning to decode linear codes using deep learning," in 54’th Annual Allerton Conf. On Communication, Control and Computing, September 2016, arXiv preprint arXiv:1607.04793.
[2] L. Lugosch and W. J. Gross, "Neural offset min-sum decoding," in 2017 IEEE International Symposium on Information Theory, June 2017, arXiv preprint arXiv:1701.05931.
[3] N. Farsad and A. Goldsmith, "Detection algorithms for communication systems using deep learning," arXiv preprint arXiv:1705.08044, 2017.
[4] L. Lugosch and W. J. Gross, "Neural offset min-sum decoding," in 2017 IEEE International Symposium on Information Theory, June 2017, arXiv preprint arXiv:1701.05931.
[5] N. Samuel, T. Diskin, and A. Wiesel, "Deep mimo detection," arXiv preprint arXiv:1706.01151, 2017.
[6] H. Khodaiemehr, M.-R. Sadeghi, and A. Sakzad, "Practical Encoder and Decoder for Power Constraint 1-level QC-LDPC Lattices," To appear in IEEE Trans. on Communications, DOI: 10.1109/TCOMM.2016.2633343.
[7] A.H. Banihashemi and F.R. Kschischang, "Tanner graphs for block codes and lattices: construction and complexity," IEEE Trans. Inform. Theory, vol. 47, pp. 822–834, 2001.

Pre- and Co-requisite Knowledge
Ability to write computer programs in Matlab, basic knowledge of deep learning and/or coding theory and/or lattices.

Integer-Forcing Linear Receivers for Multiple-Input Multiple-Output (MIMO) Channels
Supervisors: Amin Sakzad and Pierre Le Bodic

Background
A new architecture called integer-forcing (IF) linear receiver has been recently proposed for multiple-input multiple-output (MIMO) fading channels, wherein an appropriate integer linear combination of the received symbols has to be computed as a part of the decoding process [1]. Methods based on lattice basis reduction algorithms are proposed to obtain the integer coefficients for the IF receiver [2]. Connections between the proposed IF linear receivers and lattice reduction-aided MIMO detectors (with equivalent complexity) are also studied [2] The concept of unitary precoded integer-forcing (UPIF) is also introduced and investigated in [3].

Aim and Outline
This project has two folds: (1) The problem of finding suitable integer linear combination of the received symbols has only been addressed with respect to \ell_2 norm. This project aims at solving the mentioned problem with respect to \ell_1 norm. (2) The other problem of finding the best unitary precoder for integer-forcing is a min-max optimization problem that needs to be addressed too. Both these problems should be studied analytically and numerically using computer simulations.

URLs and References
[1] J. Zhan, B. Nazer, U. Erez, and M. Gastpar, "Integer-forcing linear receivers," IEEE Trans. Inf. Theory, vol. 60, no. 12, pp. 7661–7685, Dec. 2014.
[2] A. Sakzad, J. Harshan, and E. Viterbo, "Integer-forcing MIMO linear receivers based on lattice reduction," IEEE Trans. Wireless Commun., vol. 12, no. 10, pp. 4905–4915, Nov. 2013.
[3] A. Sakzad and E. Viterbo, "Full Diversity Unitary Precoded Integer-Forcing," IEEE Trans. Wireless Commun., vol. 14, no. 8, pp. 4316–4327, Aug. 2015.

Pre- and Co-requisite Knowledge
Digital Communication, Integer Programming, Matlab

Learning when to "sacrifice" in ultimate tic-tac-toe
Supervisors: Aldeida Aleti and Pierre Le Bodic

Background
Ultimate tic-tac-toe is a more sophisticated version of the well-known and slightly boring tic-tac-toe. Each square of the ultimate tic-tac-toe contains a similar but smaller board. In order to win a square in the main board, you have to win the small board inside it. But the most important rule  is that you don't pick which of the nine boards to play on; it is determined by your opponents previous move. The square she picks determines the board you have to play in next.

This makes the game harder, but more exciting. You cannot just focus on the immediate reward, you must also think ahead and consider future moves. It requires deductive reasoning, conditional thinking, and understanding of the geometric concept of similarity.

Aim and Outline
In this project, we will investigate efficient algorithms that solve the ultimate tic-tac-toe, with the main focus on learning moves that require "sacrificing" immediate reward in order to win the game.

URLs and References
http://ultimatetictactoe.creativitygames.net/

Pre- and Co-requisite Knowledge
Knowledge of algorithms and problem solving.

Web traffic analysis for understanding patient perceptions of pharmacotherapy for Rheumatoid Arthritis
Supervisors: Pari Delir Haghighi, Frada Burstein and Helen Keen (University of Western Australia)

Background
Rheumatoid Arthritis (RA) is a common, incurable disabling disease. Conventional synthetic disease modifying therapies (csDMARDs) have been the standard of care, but recent years, biologic therapies have been used increasingly. Costs are rapidly escalating due to reductions in csDMARD use: reasons for this are unclear, and may be patient driven. Understanding patient perceptions of pharmacotherapy may aid optimisation of csDMARD use.

Aim and Outline
The project will focus on web traffic analytics by exploring, examining and reviewing Google search results for a set of keywords related to Rheumatoid arthritis and its treatment (i.e.  Conventional DMARDs (cDMARD and  biologic DMARDs (bDMARDs) and biosimilars). It will assess the content available on these websites for accuracy, credibility and suitability.

Web traffic analytics will be used to generate a list of the most visited websites with regards to the above keywords. These will be stratified into two broad categories - government/organisation affiliated and those that are user-generated. The former will be subject to Sentiment Analysis to review the tone of the information provided to the patient, the latter will be reviewed by physicians with experience in the relevant fields for accuracy of information provided.

URLs and References
Marinez et al (2016) Patient Understanding of the Risks and Benefits of Biologic Therapies in Inflammatory Bowel Disease: Insights from a Large-scale Analysis of Social Media Platforms, Inflamm Bowel Dis. 2017 Jul;23(7):1057-1064

Pre- and Co-requisite Knowledge
Information management and data analysis skills are preferred.

Modelling the effect of autonomous vehicles on other road users
Supervisors: Dr John Betts (FIT) Prof. Hai L. Vu (Faculty of Engineering)

Background
Underpinned by emerging technologies, connected and autonomous vehicles (CAVs) are expected to introduce significant changes to driver behaviour as well as traffic flow dynamics, and traffic management systems.

Aim and Outline
This project aims to investigate the impact of connected and autonomous vehicles on traffic flows and evaluate new possibilities for efficiently managing traffic on future urban road networks.

In this project, students will explore and evaluate the impact of this disruptive technology by implementing and integrating new car-following models into an existing traffic simulation to study the behaviour of CAVs and their interaction with other vehicles and their drivers.

Prerequisite Knowledge
Good programming skills in any modern programming language. Some modelling and simulation experience would be advantageous.

Decision models for managing large crowds
Supervisors: Dr John Betts (FIT) Prof. Hai L. Vu (Faculty of Engineering)

Background
As the populations grow, and urbanization increases, large crowds of people or pedestrians are becoming the norm in major cities. Large crowds are also often formed when there is a sporting, entertainment, cultural or religious event. It is important to plan and develop strategies for such large crowds in order to efficiently manage people and maintain a safe situation.

Aim and Outline
This project aims to develop a simulation tool that can assist with timely decisions and resource allocation in the emergency management of large crowds within an urban setting.

In this project, students will explore models and their implementation using agent-based simulation to simulate pedestrian behaviour and to develop crowd management strategies for large crowds.

Prerequisite Knowledge
Good programming skills in any modern programming language. Some modelling and simulation experience would be advantageous.

The Future City: modelling the impact of disruptive technologies
Supervisors: Dr John Betts (FIT) Prof. Hai L. Vu (Faculty of Engineering)

Background, Aim and Outline
Future cities are smart but full of surprises. In this project, students will explore an open source game engine (simcity.com) to build and model a future city. The focus will be in linking this open source software with another open source agent-based software (matsim.org) to evaluate the changes in society due to the emergence of disruptive technologies. For example, how are people’s transportation habits affected by the emergence of driverless cars or shared mobility?

Prerequisite Knowledge
Good programming skills in any modern programming language. Some modelling and simulation experience would be advantageous.

Real-time multisensory feedback in an augmented reality environment based on IoT data for mission critical scenarios
Supervisors: Dr Tim Dwyer

Background:
This project is sponsored by SAP (in the form of a scholarship stipend for the student).  The particular dataset of interest to SAP will be related to the energy distribution industry.  In SAP's words "In the world of enterprise systems, seconds are critical in providing power to everyday consumers. In large geographical countries such as Australia where assets are remotely distributed, being able to monitor assets is challenging.  Our approach will allow technicians to be notified using wearable haptic feedback including audio for notifications and an AR visual model to visualise an asset as a digital twin and make the right decision and use the right resources to assist with an issue".

Aims and outline
It involves the creation of an augmented reality data visualisation environment for an IoT data set that will be provided by SAP.  The environment will be constructed using either the Unity game engine or Apple ARkit API and should support collaborative (multiuser) exploration of the data set in the space around the users. Pre- and co-requisite knowledge
This project would suit a student with a strong programming background, ideally using C# or SWIFT languages, and with 3D graphics programming experience, for example using the Unity game engine or the ARkit.

Surface-Integrated Layouts in Augmented Reality

Supervisors: Barrett Ens and Kim Marriott

Background

Augmented reality (AR) allows virtual constructs to be overlaid onto the real-world environment. One potential application for this technology is to allow multiple application windows to be distributed around a user’s local environment. Such ‘surface-integrated’ window layouts may be useful for many applications, including mobile industrial work and data visualisation.

Aim and outline

The goal of this project is to develop new algorithms to manage surface-integrated layouts for AR content. The layouts will balance multiple factors such as spatial configuration of available surfaces, background surface colour and texture, and the context of users and other occupants in the environment. The developed algorithms will be implemented in prototype AR systems and evaluated in user studies.

Pre- and Co-requisites

Programming knowledge of Java or C#, knowledge of Unity programming environment is an asset but not required.

Natural Interaction Methods for Augmented Reality

Supervisors: Barrett Ens, Maxime Cordeil and Tim Dwyer

Background
Recently available Augmented Reality (AR) headsets allow users to see virtual information overlaid in the real-world. Making AR applications usable requires moving beyond traditional input devices such as mice, keyboards and trackpads. An appealing direction for interaction with these new devices is to use ‘natural’ methods such as voice, eye movements and hand gestures. Miniature sensors allow such modalities to be explored.

Aim and outline
This project aims to develop new input techniques for wearable AR. New techniques will use natural input methods, using data from wearable technology such as PupilLabs eye-tracking hardware and Leap motion sensors. The developed techniques will be evaluated in user studies.

URLs and References
https://pupil-labs.com/ https://www.leapmotion.com/en/

Pre- and Co-requisites
Programming knowledge of Java or C#, knowledge of Unity programming environment is an asset but not required. Knowledge of Human-Computer interaction is desired but not essential.

Immersive Visualisation for In-Situ Maintenance
Supervisors: Barrett Ens, Arnaud Prouzeau and Tim Dwyer

Background
Mobile workers conducting equipment maintenance often require maintenance records or other information related to the job site, which are typically accessed by laptop computers. Wearable Augmented Reality (AR) displays now allow visualization of such information directly in the user’s environment when and where it’s needed. However, the design and use of such immersive visualisations is not well investigated.

Aim and outline
This project will investigate the development of immersive visualisations using AR (and/or simulated using Virtual Reality). Visualisations will be created in mock-up environments or using 3D building models, and implemented in prototype systems. The developed visualisations will be evaluated in user studies.

Pre- and Co-requisites
Programming knowledge of Java or C#, knowledge of Unity programming environment is an asset but not required. Knowledge of Human-Computer interaction is desired but not essential.

Prediction and simulation for building management using Machine Learning
Supervisors: Arnaud Prouzeau, Christoph Bergmeir and Tim Dwyer

Background
In large facilities, like university campuses, building managers are in charge of the monitoring and maintenances of the different pieces of mechanical equipment. It includes the Heating, Ventilation and Air Conditioning system (HVAC) which is in charge of the conditioning of rooms. This system is composed of different pieces of equipment that goes from the small AC Unit to the large Boiler plants, and that continuously produce a significant amount of data. However, such data is for now barely used.

Aim and outline
This project will investigate the use of Machine Learning algorithm to, first, understand the real behaviour of the HVAC system, and second, to simulate its future performance. Such simulation could then be used to optimise its operation and predict failure. Data from current Monash buildings will be used in this project.

Pre- and Co-requisites
R and Python. Knowledge of Machine learning is desired but not essential.

Supervisors: Patrick Hutchings and Jon McCormack

Background
SensiLab has a small, intelligent robot on wheels that utilises a camera, multiple microphones and location and movement sensors to interact with people and its local environment.  The robot can conduct simple conversations, detect and locate different types of objects and move around the lab space.  It is powered by a Jetson TX1 computer, which allows for sophisticated, but tightly compacted artificial intelligence models to drive various aspects of its behaviour.  The robot can navigate the lab space, but currently lacks accuracy and consistency to operate among groups of people and other moving objects.

Aim and Outline
The aim of this project is to research viable means of navigating a wheeled robot around an indoor space using monocular computer vision, IMU sensors and location tracking beacons.  While there are a number of simultaneous localisation and mapping systems for indoor navigation, such as google cartographer and CNN-SLAM, specific requirements of this project have produced interesting challenges in this area.  The indoor space is known, and modelled in high detail and location beacons help localise the robot to 20cm accuracy, however the robot needs to operate around groups of moving people and the computational power of the robot is less than typically used for machine-vision only approaches.  The project will involve finding a suitable heuristics for integrating these different sensors and datastreams to reliably establish location and orientation of the robot in real-time.

URLs and References
CNN SLAM: http://campar.in.tum.de/pub/tateno2017cvpr/tateno2017cvpr.pdf

Pre- and Co-requisite Knowledge
Extensive programming experience with Python and some familiarity with OpenCV and machine vision are required for this project.  It is expected that machine learning, especially deep learning, will form a significant part of this project and the student will be expected to engage with this topic, with guidance from supervisors.

Teaching Machines to Draw
Supervisors: Jon McCormack

Background
Drawing is fundamental to modern human creative development. It encompasses a diverse range of visual art: from simple mark-making to complex and beautiful drawings created by leading visual artists. Drawing also has applications beyond visual art: it can be a source of record keeping, documentation, recording ideas, or even distraction. In humans, the development of drawing skill takes place typically between 2 and 16 years of age. Over time children go from basic scribbling and mark-making, to pre-schematic, then schematic stage (understanding ways of portraying an object based on active knowledge). Around 8-10 years of age, people are able to draw with a relative degree of realism, including the use of depth cues, shading and texturing. Natural development tends to reach a peak around 14-16 years, and drawing skill will don't develop further without active practice and reflection to improve one's drawing skills.

Aim and Outline
The aim of this project is to get a computer or robot to learn how to draw. Using neural networks trained on human drawing, the system should be able to generate its own sketches following basic sketching gestures from the human user. Networks trained n different drawing styles should produce drawings that further that style. One important research aspect of the project will be incorporating evaluation into the drawing system, so it can evaluate and improve on the quality of its drawing as it learns. This component will involve feature detection and clustering to see where each drawing falls in relation to the training set and human drawing more generally.

URLs and References
Human drawing development: http://www.learningdesign.com/Portfolio/DrawDev/kiddrawing.html
Harold Cohen's Aaron, one of the first AI visual art systems: http://www.aaronshome.com/aaron/index.html

Pre- and Co-requisite Knowledge
Some experience with neural networks and programming skills in Python. The ability to sketch or draw competently would be an asset, but is not essential.

Evolutionary Dynamics on Graphs
Supervisor: Jon McCormack

Background
Recent research in evolutionary dynamics has shown that certain population structures improve the flow of beneficial mutations through the population. These structures can amplify natural selection over fully connected populations. The relation between members of the population can be expressed as a graph, and different graph topologies change the rate at which a beneficial or harmful mutation spreads through a population.

Evolutionary algorithms are a broad collection of algorithmic techniques used in search, optimization and learning. Based on biological evolution, they have demonstrated competitive results for a number of important problems in computer science.

Aim and Outline
The aim of this project is to look at how different graph topologies can be used in evolutionary algorithms to avoid extinction or fixation of mutation within a population. Using different graph structures may help multivariate solutions emerge for certain problems. The application of the algorithms developed will be applied to a series of known problems to gage their performance with the current state-of-the-art.

URLs and References
http://www.nature.com/articles/s42003-018-0078-7
https://www.quantamagazine.org/mathematics-shows-how-to-ensure-evolution-20180626/
Nowak MA (2006). Evolutionary Dynamics: Exploring the Equations of Life. Harvard University Press. ISBN: 9780674023383.

Pre- and Co-requisite Knowledge
Experience with graph representations and a basic understanding of evolutionary algorithms would be of benefit.

Constructing Knowledge Graph for Android: Analyses and Applications
Supervisors: Li Li

Background
The success of Android, on the one hand, brings several benefits to app developers and users, while on the other hand, it makes Android the target of choice to attackers and opportunistic developers. To cope with this, researchers have introduced various approaches (e.g., privacy leaks identification [1], ad fraud detection, etc.) to secure Android apps so as to keep users from being infected [2]. All of the above approaches require a reliable benchmark dataset to evaluate their performance. Unfortunately, it is not easy to build such a targeted dataset from scratch. As a result, researchers often apply their approaches to randomly selected apps, including both relevant and irrelevant samples. Many efforts are hence wasted to analyse the irrelevant ones. As an example, many researchers now rely on AndroZoo [3], which provides over 5.8 million Android apps for the community, to obtain experimental samples. If a researcher is only interested in apps that are obfuscated and have used reflection in their code, she still has to download all the apps and then filter in the interested ones. Therefore, to supplement this work, we plan to represent the app metadata via a knowledge graph and share it with the community for our fellow researchers to quickly search for interesting apps.

Aim and Outline
The aim of this work is to provide a knowledge graph of Android apps for our fellow researchers working in the field of mobile app analysis to quickly search for relevant artefacts so as to facilitate their research in various means, e.g., to search for app samples exactly suitable for their experiments.

Outline:
This project is expected to be done in three aspects:
Aspect 1: To design and implement a prototype tool called KnowledgeZooClient, aiming to extract metadata from Android apps and integrate them into a graph database (i.e., knowledge graph). We have already implemented a prototype version that is made available on Github [4]. The students are expected to continuously contribute to this open project.
Aspect 2: Thanks to Aspect 1, we are able to build a knowledge graph of Android apps. In this project, we will release the constructed knowledge graph as an online service, namely KnowledgeZoo. The students working in this aspect are expected to maintain the online service and perform various empirical studies on top of the graph, so as to obtain empirical knowledge that cannot be easily got otherwise.
Aspect 3: This aspect aims to implement various applications on top of KnowledgeZoo. These applications need to go one step further to make the knowledge graph smarter. For example, the potential applications can introduce new relationships derived from existing ones through graph mining. Based on the java packages and the signing certificate, we can introduce similar'' or `repackaging'' relationships to APK nodes.
One thesis can cover one or more of the aforementioned aspects.

URLs and References
[1] Li Li, Alexandre Bartel, Tegawendé Bissyandé, Jacques Klein, Yves Le Traon, Steven Arzt, Siegfried Rasthofer, Eric Bodden, Damien Octeau and Patrick McDaniel, IccTA: Detecting Inter-Component Privacy Leaks in Android Apps, The 37th International Conference on Software Engineering (ICSE 2015)
[2] Li Li, Tegawendé F. Bissyandé, Mike Papadakis, Siegfried Rasthofer, Alexandre Bartel, Damien Octeau, Jacques Klein, Yves Le Traon, Static Analysis of Android Apps: A Systematic Literature Review, Information and Software Technology, 2017
[3] Li Li, Jun Gao, Médéric Hurier, Pingfan Kong, Tegawendé F Bissyandé, Alexandre Bartel, Jacques Klein, Yves Le Traon, AndroZoo++: Collecting Millions of Android Apps and Their Metadata for the Research Community, arXiv preprint arXiv:1709.05281, 2017
[4] https://github.com/lilicoding/KnowledgeZooClient

Pre- and Co-requisite Knowledge
Shell
Python or Java Android (plus, not necessary)

Creation of an automated pipeline to monitor and visualise the performance of LC-MS/MS systems
Supervisors: Tobias Czauderna, Michael Wybrow, Ralf Schittenhelm (Monash Biomedical Proteomics Facility)

Background
The Monash Biomedical Proteomics Facility (MBPF) provides mass spectrometry services for identification and quantification of proteins within and beyond Monash University. An essential need within the mass spectrometric community is the availability of robust workflows that monitor the performance of mass spectrometers and their liquid chromatography (LC-MS/MS) systems on a routine basis. For this purpose, commercially available and standardised quality control (QC) samples of known complexity and/or composition can be purchased for routine analysis.

Whilst the mass spectrometric acquisition of such QC samples is quite straight-forward and can be easily automated, the subsequent data analysis and examination is often cumbersome and inconvenient, mostly because of the frequency with which such QC samples have to be acquired. In addition, it is required to collate, archive, visualise, and inspect the analysed data and extracted information, which is very time-consuming so far.

Aim and Outline
We therefore propose to create an automated pipeline, which (i) analyses mass spectrometric QC data, (ii) extracts essential, performance-related information and (iii) visualises this information on a website for further inspection and archiving purposes. Initial versions of scripts are already in place at the Monash Biomedical Proteomics Facility to analyse mass spectrometric QC data and to extract relevant information from these data. The focus of this honours project will be to (i) extend/adapt the scripts for information extraction as necessary, (ii) create a web-based visualisation tool which converts this extracted information into appealing and meaningful graphics, and (iii) evaluate the developed tool with users from the MBPF.

URLs and References
https://www.monash.edu/researchinfrastructure/proteomics

Pre- and Co-requisite Knowledge
JavaScript programming knowledge, interest in visualisation and potentially knowledge in D3 or a similar visualisation framework

Graph layout for SBGN Entity Relationship maps
Supervisors: Tobias Czauderna, Michael Wybrow

Background
The Systems Biology Graphical Notation (SBGN) is an emerging standard for graphical representations of biochemical and cellular networks studied in systems biology. Three different views cover several aspects of the represented processes in different levels of detail: 1) Process Description maps describe temporal aspects of biochemical interactions in a network, 2) Entity Relationship maps show the relationships between entities in a network and how entities influence relationships, and 3) Activity Flow maps depict the flow of information between entities in a network. SBGN helps to communicate biological knowledge more efficient and accurate between different research communities in the life sciences.

Aim and Outline
The aim of the project is the development of a graph layout algorithm for SBGN Entity Relationship maps. These maps could be considered as hypergraphs with edges joining more than two nodes. Different approaches to this problem are possible, potentially the layout algorithm can be developed using the hypergraphs or the hypergraphs can be transformed into graphs with edges joining only two nodes. For either approach constraint-based layout techniques, developed by members of the Immersive Analytics Lab at the FIT and available in the Adaptagrams library, could be used.

URLs and References
http://www.sbgn.org
http://ialab.it.monash.edu/

Pre- and Co-requisite Knowledge
Programming knowledge, interest in graph visualisation and graph layout

Estimating high dimensional linear regression models using Bayesian inference
Supervisors: Daniel Schmidt

Background
High dimensional regression models are increasingly important in the current age of “big data”, both as analysis tools for problems with many predictors, as well as building blocks within other models such as deep neural networks. When there are many more predictors than samples, it is crucial to use regularisation, or penalization methods, that induce “sparsity” -- that is, force many model coefficients to be zero [1]. The Bayesian framework allows us to define penalization methods with excellent theoretical properties.

Aims and Outline
This project will look at using the expectation-maximisation (EM) algorithm [2] to build fast, novel algorithms to estimate sparse linear models - potentially with hierarchical groupings of the predictors - using state-of-the-art Bayesian methods. The resulting methods are likely to be the amongst the best techniques for the problem that currently exist.
This project would be most suited to students with good mathematical and programming skills.

URLS and References
[1] “Shrink globally, act locally”, N. G. Polson and J. G. Scott, http://faculty.chicagobooth.edu/nicholas.polson/research/papers/Bayes1.pdf
[2] “Maximum Likelihood from Incomplete Data via the EM Algorithm”, A. P. Dempster, N. M. Laird and D. B. Rubin, http://web.mit.edu/6.435/www/Dempster77.pdf

Pre-requisite knowledge
Ability to program (MATLAB/R/Python); linear regression; reasonable understanding Bayesian statistics

Interpretable non-linear modelling using Bayesian additive regression
Supervisors: Daniel Schmidt

Background
Linear regression remains an important modelling tool due to the fact that it produces models that are very easy to interpret. The drawback is, of course, that they only model linear relationships. Additive models [1] are a natural extension that relax this assumption by building the model as a sum of independent non-linear functions of the inputs. This retains a large degree of interpretability while increasing flexibility. The usual statistical questions that appear in linear models remain: how to estimate the relationships between predictors and the target, and how to decide if predictor variables are important or not?

Aims and Outline
This project will utilise recent developments in Bayesian regression to build new tools for flexible additive regression. The work will explore different smoothing techniques and will build on existing, highly efficient toolbox for Bayesian regression [2,3]. Ideally the student will add a layer of code that allows simple specification of additive models that utilises the statistical advantages of the Bayesian framework. This work has the potential to be picked up and utilised by others.

This project would be most suited to students with good mathematical and programming skills.

URLS and References
[1] “ Generalized Additive Models”, T. Hastie and R.Tibshirani, https://projecteuclid.org/euclid.ss/1177013604
[2] “Bayesian Grouped Horseshoe Regression with Application to Additive Models”, Z Xu, DF Schmidt, E Makalic, G Qian, JL Hopper, http://dschmidt.org/wp-content/uploads/2016/12/Grouped-Horseshoe-Regression-Xu-et-al-2016.pdf
[3] “High-Dimensional Bayesian Regularised Regression with the BayesReg Package”, E. Makalic and D. F. Schmidt, https://arxiv.org/abs/1611.06649

Pre-requisite knowledge
Ability to program (MATLAB/R/Python); linear regression; reasonable understanding Bayesian statistics

Bayesian regression for binary data: probit versus logistic
Supervisors: Daniel Schmidt

Background
There are two standard methods used to analyse binary data: logistic regression and probit regression. Both are commonly used in practice, and within the usual maximum likelihood framework, both are straightforward to implement. In the Bayesian framework, work on alternative representations [1] has meant that logistic regression has been substantially easier to implement and is the state-of-the-art technique. Very recent work [2] has identified a seemingly fast and efficient representation for probit regression that means it should be, in theory, competitive with, or superior to logistic techniques in terms of implementation speed.

Aims and Outline
This project will look to implement this new probit technique within an already established Bayesian regression toolbox [3], and then compare the performance and behaviour of probit regression with logistic regression in this setting.
This project would be most suited to students with good mathematical and programming skills.

URLS and References
[1] “Bayesian inference for logistic models using Polya-Gamma latent variables”, N. G. Polson, J. G. Scott, J. Windle, https://arxiv.org/abs/1205.0310
[2] “Conjugate Bayes for probit regression via unified skew-normals”, D.Durante, https://arxiv.org/pdf/1802.09565.pdf
[3] “High-Dimensional Bayesian Regularised Regression with the BayesReg Package”, E. Makalic and D. F. Schmidt, https://arxiv.org/abs/1611.06649

Pre-requisite knowledge
Ability to program (MATLAB/R/Python); linear regression; reasonable understanding Bayesian statistics

Learning which predictors are important in a linear regression using minimum message length
Supervisors: Daniel Schmidt

Background
Linear regression models remain (and will likely continue to remain) one of the most important building blocks in statistical modelling. They benefit from high degree of interpretability, and good performance in high dimensions even with relatively little data. Selecting which predictors are important and which ones are not remains an important aspect of linear modelling. The minimum message length (MML) principle [1], developed here at Monash, is a powerful tool for quantifying the fit of a model to data. Recent work on linear regression within the MML framework [2] has demonstrated excellent performance while remaining computationally straightforward.

Aims and Outline
This project would aim to develop/extend/test methodology for selecting a plausible set of predictors for regression models using the minimum message length idea. The ideas from [2] will serve as a suitable building block. There are several directions that the student could take; for example, they could:

1. explore or examine the performance of MML based model selection methods within the context of search procedures such as the lasso;
2. implement a novel MML version of the lasso method;
3. use the MML idea to quantify the plausibility of predictors;
4. implement the MML model selection criteria within new subset searching technology [3]

This project would be most suited to students with good mathematical and programming skills.

URLS and References
[1] “Statistical and Inductive Inference by Minimum Message Length”, C.S.Wallace
[2] “A Minimum Message Length Criterion for Robust Linear Regression”, C.K.Wong, E.Makalic, D.F.Schmidt, https://arxiv.org/pdf/1802.03141.pdf[3] “Best Subset Selection via a Modern Optimization Lens”, D.Bertsimas, A. King and R. Mazumder, https://arxiv.org/pdf/1507.03133.pdfPre-requisite knowledge
Ability to program (MATLAB/R/Python); linear regression; reasonable understanding Bayesian statistics

Experiments with Topic Models using Side Information
Supervisors:    Wray Buntine

Background
Topic models perform clustering of document collections.
https://en.wikipedia.org/wiki/Topic_model
Recent techniques dramatically improve performance by introducing so-called side information including things like year of publication, author, and synonym sets of words.  This better enables the models to work on tweets, and to deal with structured documents like papers with sub-sections.  We have one such improved topic modelling algorithm, called MetaLDA, developed at Monash, that is state of the art in terms of performance.

Aim and Outline
There are a number of interesting scenarios we would like to test out using MetaLDA.  How well can we model structured documents (for instance, medical articles that have different sections) and tweets or document collections whose nature should vary from year to year?  Moreover, can we model cross-lingual collections using cross-lingual side information about words?  So the aim is to develop a number of scenarios to test the system, prepare the text collections, perform the analysis, and interpret results.

URLs and References
MetaLDA software, an extension of Mallet, is available at
https://github.com/ethanhezhao/MetaLDA
The theory paper explaining the method is
https://arxiv.org/abs/1709.06365

Pre- and Co-requisite Knowledge
Good Python skills to do basic text munging and text database manipulation.
Basic understanding of probabilistic models to gain an understanding of what the algorithm does.  General practical machine learning knowledge. Flair for scripting and experimenting.

Multi-class classification with Bayesian Networks
Supervisors:  Wray Buntine and Francois Petitjean

Background
Given a news article, people like to classify them into a hierarchy with higher level nodes like "Sports", "International" and lower level nodes like "Rugby League" or "German Politics". Sometimes the hierarchy is not given, just the leaf nodes.  Two primitive techniques for doing this is to use a general classifier (e.g., such as naive Bayes), or to do a one against all classifier that tests each potential class individually.
See https://en.wikipedia.org/wiki/Multiclass_classification
A related task is multi-label classification https://en.wikipedia.org/wiki/Multi-label_classification

Aim and Outline
At Monash we have developed a state of the art classifier that extends the naive Bayes classifier by building a Bayesian network.
How could we apply this to multi-class and/or multi-label classification? The idea is to try a number of task decompositions and test them out. So we would manipulate datasets in different ways to build different structures using the software, in order to achieve different variants of a classifier.  The project would involve conceiving different approaches, massaging the inputs and outputs of the software to achieve this, running experiments and analysis.

URLs and References
Examples of multi-class datasets:
The Chordalysis software for building a Bayesian network is at:
https://github.com/fpetitjean/Chordalysis
Recent papers using the software are at:
https://topicmodels.org/2018/05/15/some-research-papers-on-hierarchical-models/

Pre- and Co-requisite Knowledge
Good Python skills to do basic text munging and text database manipulation. Basic understanding of probabilistic models to gain an understanding of what the algorithm does.  General practical machine learning knowledge. Flair for scripting and experimenting.   Good experience with Java would be beneficial but not essential.

Measuring inpatient acuity in General Medicine using EMRs (Electronic medical record)
Supervisors: Yen Cheung, Evan Newnham (Eastern Health)

Background
Accurate assessment of inpatient acuity has a number of important implications for our healthcare system.  From the systems perspective, measurement of acuity facilitates an understanding of demand and complexity that will inform resource utilisation, future workforce and systems planning.  Development of real-time acuity assessment tools that are presented in a visual format have the potential to be translated to all areas of healthcare system analytics.  For instance, robust real-time visual system analytics have enormous potential to inform the understanding of patient flow fluctuations, inefficiencies as well as opportunities for improvement that will ultimately permit the development of predictive modelling to better prepare the healthcare system for unavoidable but often predictable variabilities.

Aim and Outline
This project aims todevelop an interface for extraction of data to measure General Medicine acuityat a local hospital to support patient centre care. This acuity and measurement tool will lay the foundations for broader use across the patient journey in Emergency, Surgery, Specialty Medicine and Subacute.

URLs and References
Brennan C, Daly B, Patient Acuity: a concept analysis, journal of advanced nursing, 2009, pp 1114- 1126
Brennan C, The Oncology Acuity Tool: A reliable method for measuring patient acuity for nurse assignment decisions, Journal of Nursing Measurement, Vol 20, No. 3, 2012, pp 155-185

Pre- and Co-Requisite Knowledge
Interest and some knowledge of data analysis and data visualisation

Defend Adversarial Example Attacks on Neural Networks
Supervisors: Carsten Rudolph, Shuo Wang

Background
The motivation of using differential privacy to defend adversarial examples is summarized as follows. The privacy semantic enforced by differential privacy is that small changes in the database (e.g., deleting or changing one row or a small set of rows) will result in small and bounded changes in the output distribution. Similarly, robustness against adversarial examples can be described as ensuring that small changes in an input (such as changing a few pixels in an image) will not result in drastic changes to a deep neural network’s predictions (such as changing its label from a panda to a gibbon, or from stop sign to pass sign).
Therefore, we can consider inputs (e.g., images) of a deep neural network as databases in differential privacy, and individual features (e.g., a row of pixels) as rows in differential privacy, it is possible that randomizing the outputs of a deep neural network’s prediction results to satisfy differential privacy on a small number of pixels in an image. Then, one could guarantee robustness of predictions against adversarial examples with regards to changes up to the same number of pixels.
Specifically, we will add an additional layer as a noisy layer that is used to randomize the computation of the networks to bound the sensitivity of changes in the input.  We then perform the neural network training and prediction progress to account for this randomization and user the differential privacy bounds to reason about the robustness of individual predictions. Namely, given inputs on which the neural networks need to give a robust prediction, this DP-Nets can guarantee that there no other inputs nearby (up to a particular distance) those inputs whose changes can cause a different prediction.

Aim and Outline
the main focus is to use the theory of differential privacy to learn robust deep learning or machine learning models, with provable theoretical guarantees for a subset of predictions, to guarantee the robustness against adversarial perturbations of bounded size.

Pre- and Co-requisite Knowledge
- Good programming knowledge (Python)
- Basic understanding of differential privacy and neural networks

Supervisors: Carsten Rudolph and Shuo Wang

Background
Although it is commonly acknowledged that data sharing considerably benefits research progress, there are still two major challenges for the data mining research community:
(1) data availability is limited in many fields, e.g., medical or health
(2) the privacy concern derived from participant sensitive information hampers the data sharing, even from generated synthetic data.
Existing privacy-preserving sharing approaches always apply excessive sanitization used to ensure privacy, resulting in significant loss of the data utility.
We will explore a more practical differentially private synthetic data releasing model based on generative adversarial network model trained on the original data for data curator to provide a freely accessible public version, instead of perturbing and publishing the original data.

Aim and Outline
The main focus is
(1) to enable generation of an unlimited amount of synthetic data;
(2) to provide robust privacy guarantees satisfying the differential privacy while retaining considerable utility;
(3) to realize practical training and perturbing stability by applying optimization strategies.
Furthermore, we will explore a privacy-preserving GAN that enables to handle high-dimensional and discrete data, incorporating Adversarial Autoencoders (AAE).

URLs and References
An introduction to GANs and further links can be found here:

Pre- and Co-requisite Knowledge
- Good programming knowledge (Python)
- Basic understanding of deep learning, neural networks.
- Basic understanding of security and privacy.

Virtual Trusted Platform Module meets Intel Software Guard Extensions
Supervisors: Carsten Rudolph

Background
The Trusted Platfom Module (TPM) is being used extensively for tasks like gathering and and attesting a system state, storing and generating cryptographic data, and providing a platform idendity. Unfortunately, a computer might be just an illusion created using system virtualization and the TPM might be used by the hypervisor exclusively. A hardware TPM does not support multiple users or systems and thus virtualizing it so as to make similar functionality available gained interest in virtualization software and security communities. The biggest challenge in virtualizing a security device is providing adequate protection - such a device must be secure for and secure against a client. Intel’s Software Guard Extensions offer a practical way to implement critical programs using secure execution environments. By using SGX we may be able to implement vTPM functionality while providing adequate protection for secrets and cryptographic operations of virtual machines and cloud services.

Aim and Outline
(1) Designing the integration of vTPM functionality (this might be a reduced set) into Intel SGX enclaves.
(2) Implementing a prototype vTPM and test environment for system processes using the pro- vided security functionality.
(3) Evaluation of your design and prototype on a cloud scale by analyzing time and resource consumption as well as option of integration for open source hypervisors.

https://software.intel.com/en-us/sgx

Pre- and Co-requisite Knowledge
(1) Interested in working with a group of security researchers and communicating your ideas.
(2) Programming in C/C++ and Python.
(3) System oriented programming and open source operating systems.
(4) You should have basic knowledge of asymmetric cryptography and related program libraries for an easy start.

A computational model for the evolution of social preferences
Supervisors: Julian Garcia

Background
This project uses game theory and computational models to study the evolution of social preferences. A social preference represents how an individual evaluates the outcomes of a social interaction. Consider the following situation: Someone offers you 100 dollars, and asks you to share some of it with your friend. If your friend accepts your offer both you and she keep and share the money according to your proposal. However, if your friend does not accept the offer you both get 0 dollars. How much money would you offer to your friend? What is the minimum offer you would accept as "the friend"? Traditional game theoretical models of dictate you should offer nothing, and your friend should be fine with that. This is not what happens in the real world. If you are offered too little, a sense of envy may lead you to reject low offers. Social preferences are the mathematical formalism to introduce things like guilt and envy into game theory models.

Aim and Outline
The aim of this model is to produce a computational model for the evolution of social preferences, using genetic programming. This will be useful in providing an evolutionary explanation for social preferences in humans, and also in artificially evolving social preferences for artificial agents.

URLs and References
- Alger, Ingela, and Jörgen W. Weibull. "Homo moralis—preference evolution under incomplete information and assortative matching." Econometrica 81.6 (2013): 2269-2302.

- Poli, Riccardo, et al. A field guide to genetic programming. Lulu. com, 2008.

Pre- and Co-requisite Knowledge
Problem solving skills, an interest in game theory, applied mathematics and simulation. Skills in Python, Java or both.

Network-based algorithm for drug discovery for malaria
Supervisors: Hieu Nim, Graham Farr

Background
Biological networks play an important role in malaria infection in human, where effective drug combinations are yet to be discovered. In the past, biological network topology and associated data have been sparse due to their high cost and complexity. Rapid advances have been made in recent years in network medicine, creating an ideal opportunity for algorithm-based network analysis in search of drug targets.

Aim and Outline
The student is expected to integrate malaria data from collaborators at the Biomedicine Discovery Institute, with biological networks obtained from NCBI PubMed literature. An example of such biological networks is at http://msb.embopress.org/content/6/1/453. Other network databases can be integrated as appropriate, which includes KEGG and SIGNOR. The network will need to be analysed, using various graph theory techniques, to study the topology that can potentially identify “interesting nodes and edges” of the network. These could be potential drug targets, which can be validated in the wet-lab by our collaborators.

Pre- and Co-requisite Knowledge
A strong background in programming skills is necessary. Interest in science and medicine is a plus.

Algorithms to predict outcome from clinical data in lupus
Supervisors: Hieu Nim, Alberta Hoi, Eric Morand (Monash Health)

Background
Systemic lupus erythematosus (SLE, lupus) is a severe auto-immune disease without effective treatments. This is partially caused by data complexity, where many parameters can contribute to the disease outcome in a time-dependent manner. This presents an opportunity for machine-based extraction of useful knowledge from complex data.

Aim and Outline
The student is expected to integrate and analyse data from collaborators at Monash Health. Data are highly complex (demographic, laboratory test, physician’s assessment) and time-dependent. The student will implement predictive algorithms to analyse the data, and perform cross validation to evaluate the effectiveness of the prediction.

Pre- and Co-requisite Knowledge
A strong background in programming skills is necessary. Interest in science and medicine is a plus.